# WebR in Quarto HTML Documents

Get started with building a model in this R Markdown document that accompanies Preprocess your data with recipes tidymodels start article.

If you ever get lost, you can visit the links provided next to section headers to see the accompanying section in the online article.

## Introduction

Load necessary packages:

Load and wrangle data:

Before moving forward, letâ€™s reduce the size of our data so we can run these analyses with the default computational resources on RStudio Cloud. By doing so we will avoid aborting our session.

Letâ€™s sample 20% of the rows and assign it as our data:

Note that since we are using a subset of the original data set, the results you generate here will be slightly different than the *Preprocess your data with recipes* article.

Check the number of delayed flights:

For example, the number of `late`

and `on_time`

flights you get here are less than the number of flights you see in the article. The proportions are very close, though, suggesting that our random sampling was indeed random and did not over- or under-sample one category vs. the other.

Take a look at data types and data points:

Summarise the dataset:

## Data splitting

Create training and test sets:

Try typing `?initial_split`

in the console to get more details about the splitting function from `rsample`

package.

## Create recipe and roles

Letâ€™s initiate a new recipe:

You can see more details about how to create **recipes** by typing `?recipe`

in the console.

Update variable roles of a recipe with `update_role`

:

You can also read more about adding/updating/removing roles with `?roles`

.

To get the current set of variables and roles, use the `summary()`

function:

## Create features

What happens if we transform `date`

column to `numeric`

?

From `date`

we can derive more meaningful features such as:

- the day of the week,
- the month, and
- whether or not the date corresponds to a holiday.

Add **steps** to your recipe to generate these features:

Check out help documents for these step functions with `?step_date`

, `?step_holiday`

, `?step_rm`

.

Create dummy variables using `step_dummy()`

:

Check if some destinations present in test set are not included in the training set:

Remove variables that contain only a single value with `step_zv()`

:

## Fit a model with a recipe

Recall the Build a model article.

This time we build a model specification for logistic regression using the `glm`

engine:

For more details try typing `?set_engine`

and `?glm`

in the console.

Bundle the model specification (`lr_mod`

) with the recipe (`flights_rec`

) to create a *model workflow*:

Prepare the recipe and train the model:

Be patient; this step will take a little time to compute.

Pull the fitted model object then use the `broom::tidy()`

function to get a tidy tibble of model coefficients:

## Use a trained workflow to predict

Simply apply fitted model to `test_data`

and predict outcomes.

Get predicted class probabilities and bind them with some variables from the test data:

Note that the result you get here will be different than the online article since we only fitted the model to the subset of the actual data set.

Letâ€™s look at model performance with ROC curve (`roc_curve()`

) and plot by piping it to the `autoplot()`

.

Similarly, `roc_auc()`

estimates the area under the curve:

Good job!

Now itâ€™s your turn to test out this workflow *without* this recipe!

In the Build a model article, we did not use a recipe but used a **formula** instead.

You can use `workflows::add_formula(arr_delay ~ .)`

instead of `add_recipe()`

(remember to remove the identification variables first!), and see whether our recipe improved our modelâ€™s ability to predict late arrivals.