Steps to evaluating salsa success:
Used an API to get information on 500 recipes which contain the query string “salsa” and have been classified as a “condiment or sauce”.
Tried to clean and stem the ingredients from these recipes the best I could (see
util.py). Note in some cases the ingredient was listed ambiguously: onion, versus red, white or green onion. I did not standardize on those.
The process resulted in a feature matrix of each recipe (500 rows by 228 ingredients). Each row is filled with zeros or ones indicating the presence of an ingredient. (Information about amounts was much trickier to obtain and standardize.)
- Ran an ordered logistic regression model for predicting these recipes ratings (scale of 1 to 5), where my covariates were the first 60 most common ingredients (these ingredients described 90% of all recipes). The significant variables (p < 0.05) were: