Skip to content

Instantly share code, notes, and snippets.

@jroakes

jroakes/seoml.md

Last active Aug 31, 2020
Embed
What would you like to do?
ML Repository for SEO

Machine Learning Repository for SEO

SEO is a field that is rich with data, yet many young SEOs may not be equipped to learn tools that will prepare them for the future. We want to support our community by using our expertise to provide access to more advanced tools that will allow SEOs of all levels to play with the technologies that will shape the future of our work.

Objectives

  • Provide a repositiory that makes it possible to learn about ML specifically targeted to those interested in SEO
  • Provide a repository that allows a novice user to run a simple model on something meaningful for SEO.
  • Provide a repository that allows advanced users to save time on data getting, cleaning, preprocessing, and model selection.
  • Allow users to showcase work and models developed.
  • Have users get involved with the future development of the repo.

Rough Structure

  • List popular repos, articles, and papers especially applicable to ML for SEO.
  • Model Zoo: Highlight models that SEOs have developed.
  • Provide APIs to popular data providers for getting data (moz, ahrefs, semrush, GSC, GA, Algorithmia, Aylien, etc)
    • In addition, can we approach providers to supply toy datasets that are clean and easily trained on.
  • Provide crawler for getting website data (good example: https://github.com/clips/pattern)
  • Provide prepossessing libraries especially suited for text and other commmon SEO data.
    • Text to embeddings
    • Pretrained language model states
    • TFIDF
    • NLTK
    • FB and Google Parsers
    • Spacy
  • Provide library of popular ML models.
  • Provide modern feature selection, optimizers, and parameter tuning.
  • Provide easy to follow examples for getting data, preprocessing, and training.
  • [Future] A react interface.
  • Provide documentation.

References

Goal Set 1:

  • Create a new organiztion. What is this called?
  • Review Objectives and Team Members.
  • Decide on framework (Pytorch, Keras, TensorFlow, Chainer, etc)
  • Decide if we start fresh or fork an existing repo.
  • Create Slack group (or Channel) to discuss, as needed.
  • Decide on initial goals and timing.
@BritneyMuller

This comment has been minimized.

Copy link

@BritneyMuller BritneyMuller commented Jul 16, 2018

This looks great, JR!!!

New Org ideas:

  • Technical SEOs
  • SEO Cyborgs
  • ML for SEO
  • Russ Jones is a dingus
  • Machine Learning for SEOs

Track Usage Idea:
Since Github does not allow JavaScript code to run inside plain text Gists, we can use the GA Beacon to log visits in real-time to Gists.
![Analytics](https://ga-beacon.appspot.com/UA-XXXXX-X/gist-id?pixel)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.