jroakes/seoml.md

## seoml.md

      
    Raw
  

              seoml.md
            
          
    Machine Learning Repository for SEO

SEO is a field that is rich with data, yet many young SEOs may not be equipped to learn tools that will prepare them for the future. We want to support our community by using our expertise to provide access to more advanced tools that will allow SEOs of all levels to play with the technologies that will shape the future of our work.
Objectives


Provide a repositiory that makes it possible to learn about ML specifically targeted to those interested in SEO
Provide a repository that allows a novice user to run a simple model on something meaningful for SEO.
Provide a repository that allows advanced users to save time on data getting, cleaning, preprocessing, and model selection.
Allow users to showcase work and models developed.
Have users get involved with the future development of the repo.

Rough Structure


List popular repos, articles, and papers especially applicable to ML for SEO.
Model Zoo: Highlight models that SEOs have developed.
Provide APIs to popular data providers for getting data (moz, ahrefs, semrush, GSC, GA, Algorithmia, Aylien, etc)

In addition, can we approach providers to supply toy datasets that are clean and easily trained on.


Provide crawler for getting website data (good example: https://github.com/clips/pattern)
Provide prepossessing libraries especially suited for text and other commmon SEO data.

Text to embeddings
Pretrained language model states
TFIDF
NLTK
FB and Google Parsers
Spacy


Provide library of popular ML models.
Provide modern feature selection, optimizers, and parameter tuning.
Provide easy to follow examples for getting data, preprocessing, and training.
[Future] A react interface.
Provide documentation.

References


Awesome Machine Learning
Machine Learning From Scratch
Auto ML
Awesome Pytorch
Allenai
Pytorch NLP

Goal Set 1:


Create a new organiztion. What is this called?
Review Objectives and Team Members.
Decide on framework (Pytorch, Keras, TensorFlow, Chainer, etc)
Decide if we start fresh or fork an existing repo.
Create Slack group (or Channel) to discuss, as needed.
Decide on initial goals and timing.