Last active
September 5, 2018 12:38
-
-
Save warenlg/e7eb96204f36359c32d5823b3948d144 to your computer and use it in GitHub Desktop.
I put the shareable link to the CSV file at the beginning of the notebook. Just in case https://drive.google.com/open?id=1es02UUFUWlR9k4hswCSQCAsSOqjma06y
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You can find the initial distribution compiled from the PGA CSV file in this gist https://gist.github.com/warenlg/44bd576637ee161929a3f7e1a88554f5
However, you'll see that the number don't match, the reason :
src-d/ml
did not includelang
in the output parquet files. So I had to filter by file extension, and here I missed a lot of files.