Apply basic recipe to the dataframe that includes a text column. The basic recipe includes tokenization (using bigrams), removing stop words, filtering stop words by max tokens = 1,000, and normalization of document length using TF-IDF.

apply_basic_recipe(
  input_data,
  formula,
  text,
  token_threshold = 1000,
  add_embedding = NULL,
  embed_dims = 100
)

Arguments

input_data: An input data.
formula: A formula that specifies the relationship between the outcome and predictor variables (e.g, category ~ text.
text: The name of the text column in the data.
token_threshold: The maximum number of the tokens will be used in the classification.
add_embedding: Add word embedding for feature engineering. The default value is NULL. Replace NULL with TRUE, if you want to add word embedding.
embed_dims: Word embedding dimensions. The default value is 100.

Value

A prep object.