All functions |
|
---|---|
Apply basic recipe to the dataframe that includes a text column. The basic recipe includes tokenization (using bigrams), removing stop words, filtering stop words by max tokens = 1,000, and normalization of document length using TF-IDF. |
|
Build a pipeline from creating tuning parameters, search spaces, workflows, 10-fold cross-validation samples to finding the best model from lasso, random forest, XGBoost to fitting the best model from each algorithm to the data |
|
Evaluate a classification model output |
|
Create 10-fold cross-validation samples |
|
Create search spaces for the algorithms based on the hyperparameters |
|
Create tuning parameters for algorithms (i.e., lasso, random forest, and XGBoost). |
|
Create workflows for the algorithms based on the hyperparameters |
|
Find the best version of each algorithm based on the hyperparameters and 10-fold cross-validation. |
|
Fit the best model from each algorithm to the data. |
|
Creating training and testing data based on stratified random sampling (SRS) and preprocessing steps |
|
Visualize the importance of top 20 features |
|
Visualize a classification model output |