Skip to content

Instantly share code, notes, and snippets.

@voltek62
Forked from jroakes/seoml.md
Created January 28, 2019 22:30
Show Gist options
  • Save voltek62/8bd730d79bb24a573755646c632ae6fc to your computer and use it in GitHub Desktop.
Save voltek62/8bd730d79bb24a573755646c632ae6fc to your computer and use it in GitHub Desktop.
ML Repository for SEO

Machine Learning Repository for SEO

SEO is a field that is rich with data, yet many young SEOs may not be equipped to learn tools that will prepare them for the future. We want to support our community by using our expertise to provide access to more advanced tools that will allow SEOs of all levels to play with the technologies that will shape the future of our work.

Objectives

  • Provide a repositiory that makes it possible to learn about ML specifically targeted to those interested in SEO
  • Provide a repository that allows a novice user to run a simple model on something meaningful for SEO.
  • Provide a repository that allows advanced users to save time on data getting, cleaning, preprocessing, and model selection.
  • Allow users to showcase work and models developed.
  • Have users get involved with the future development of the repo.

Rough Structure

  • List popular repos, articles, and papers especially applicable to ML for SEO.
  • Model Zoo: Highlight models that SEOs have developed.
  • Provide APIs to popular data providers for getting data (moz, ahrefs, semrush, GSC, GA, Algorithmia, Aylien, etc)
    • In addition, can we approach providers to supply toy datasets that are clean and easily trained on.
  • Provide crawler for getting website data (good example: https://github.com/clips/pattern)
  • Provide prepossessing libraries especially suited for text and other commmon SEO data.
    • Text to embeddings
    • Pretrained language model states
    • TFIDF
    • NLTK
    • FB and Google Parsers
    • Spacy
  • Provide library of popular ML models.
  • Provide modern feature selection, optimizers, and parameter tuning.
  • Provide easy to follow examples for getting data, preprocessing, and training.
  • [Future] A react interface.
  • Provide documentation.

References

Goal Set 1:

  • Create a new organiztion. What is this called?
  • Review Objectives and Team Members.
  • Decide on framework (Pytorch, Keras, TensorFlow, Chainer, etc)
  • Decide if we start fresh or fork an existing repo.
  • Create Slack group (or Channel) to discuss, as needed.
  • Decide on initial goals and timing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment