Machine Learning Repository for SEO
SEO is a field that is rich with data, yet many young SEOs may not be equipped to learn tools that will prepare them for the future. We want to support our community by using our expertise to provide access to more advanced tools that will allow SEOs of all levels to play with the technologies that will shape the future of our work.
- Provide a repositiory that makes it possible to learn about ML specifically targeted to those interested in SEO
- Provide a repository that allows a novice user to run a simple model on something meaningful for SEO.
- Provide a repository that allows advanced users to save time on data getting, cleaning, preprocessing, and model selection.
- Allow users to showcase work and models developed.
- Have users get involved with the future development of the repo.
- List popular repos, articles, and papers especially applicable to ML for SEO.
- Model Zoo: Highlight models that SEOs have developed.
- Provide APIs to popular data providers for getting data (moz, ahrefs, semrush, GSC, GA, Algorithmia, Aylien, etc)
- In addition, can we approach providers to supply toy datasets that are clean and easily trained on.
- Provide crawler for getting website data (good example: https://github.com/clips/pattern)
- Provide prepossessing libraries especially suited for text and other commmon SEO data.
- Text to embeddings
- Pretrained language model states
- FB and Google Parsers
- Provide library of popular ML models.
- Provide modern feature selection, optimizers, and parameter tuning.
- Provide easy to follow examples for getting data, preprocessing, and training.
- [Future] A react interface.
- Provide documentation.
Goal Set 1:
- Create a new organiztion. What is this called?
- Review Objectives and Team Members.
- Decide on framework (Pytorch, Keras, TensorFlow, Chainer, etc)
- Decide if we start fresh or fork an existing repo.
- Create Slack group (or Channel) to discuss, as needed.
- Decide on initial goals and timing.