Skip to content

Instantly share code, notes, and snippets.

@reidsanders
reidsanders / code_scrape.py
Last active August 12, 2020 08:27
Basic script to create a code dataset by downloading repos from github
#!python3
# -*- coding: utf-8 -*-
"""
Basic script to create a code dataset by downloading code from github.
Takes url representing language and repository search, then goes through the pages, downloading the repositories and extracting the code files, then just concatenates them and splits into train and val (not really randomized -- a bit complicated in this).
Has several hardcoded assumptions.
"""