R/classify_text.r
apply_basic_recipe.Rd
Apply basic recipe to the dataframe that includes a text column. The basic recipe includes tokenization (using bigrams), removing stop words, filtering stop words by max tokens = 1,000, and normalization of document length using TF-IDF.
apply_basic_recipe(
input_data,
formula,
text,
token_threshold = 1000,
add_embedding = NULL,
embed_dims = 100
)
An input data.
A formula that specifies the relationship between the outcome and predictor variables (e.g, category
~ text
.
The name of the text column in the data.
The maximum number of the tokens will be used in the classification.
Add word embedding for feature engineering. The default value is NULL. Replace NULL with TRUE, if you want to add word embedding.
Word embedding dimensions. The default value is 100.
A prep object.