Skip to content

Instantly share code, notes, and snippets.

@warenlg
Last active September 5, 2018 12:38
Show Gist options
  • Save warenlg/e7eb96204f36359c32d5823b3948d144 to your computer and use it in GitHub Desktop.
Save warenlg/e7eb96204f36359c32d5823b3948d144 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@EgorBu
Copy link

EgorBu commented Sep 5, 2018

Hi, thanks for the analysis!
Is it possible to add distributions repositories based on JS files count?

@warenlg
Copy link
Author

warenlg commented Sep 5, 2018

You can find the initial distribution compiled from the PGA CSV file in this gist https://gist.github.com/warenlg/44bd576637ee161929a3f7e1a88554f5

However, you'll see that the number don't match, the reason :

  1. The step to preprocess all PGA in parquet files misses some guys
  2. At the time it has been run, the preprocess command from src-d/ml did not include lang in the output parquet files. So I had to filter by file extension, and here I missed a lot of files.

@warenlg
Copy link
Author

warenlg commented Sep 5, 2018

I put the shareable link to the CSV file at the beginning of the notebook. Just in case https://drive.google.com/open?id=1es02UUFUWlR9k4hswCSQCAsSOqjma06y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment