This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!python3 | |
# -*- coding: utf-8 -*- | |
""" | |
Basic script to create a code dataset by downloading code from github. | |
Takes url representing language and repository search, then goes through the pages, downloading the repositories and extracting the code files, then just concatenates them and splits into train and val (not really randomized -- a bit complicated in this). | |
Has several hardcoded assumptions. | |
""" |