Skip to content

Instantly share code, notes, and snippets.

View lukaszgryglicki's full-sized avatar
💀
Darkstar

Łukasz Gryglicki lukaszgryglicki

💀
Darkstar
View GitHub Profile
@lukaszgryglicki
lukaszgryglicki / eclipse.csv
Created May 22, 2017 15:48
Eclipse's github org repositories sorted by number of commits desc
repo commits
eclipse/che 7089
eclipse/jetty.project 2274
eclipse/kura 1088
eclipse/kapua 1068
eclipse/omr 639
eclipse/xtext-core 603
eclipse/xtext-xtend 504
eclipse/californium 431
eclipse/xtext-eclipse 400
@lukaszgryglicki
lukaszgryglicki / cloudfoundry.csv
Created May 22, 2017 13:14
Cloud Foundry's repos' commits in 201605-201704
repo commits
cloudfoundry/buildpacks-ci 18696
cloudfoundry/bosh 2332
cloudfoundry/runtime-ci 1804
cloudfoundry/cf-release 1740
cloudfoundry/cli 1590
cloudfoundry/diego-release 1484
cloudfoundry/uaa 1460
cloudfoundry/loggregator 1381
cloudfoundry/cf-deployment 1323
@lukaszgryglicki
lukaszgryglicki / top50_big_query.sql
Last active April 26, 2017 07:06
Top 50 repos excluding all authors containing "bot" case insensitive, no forks etc
SELECT
org.login as org,
repo.name as repo,
count(*) as activity,
SUM(IF(type = 'IssueCommentEvent', 1, 0)) as comments,
SUM(IF(type = 'PullRequestEvent', 1, 0)) as prs,
SUM(IF(type = 'PushEvent', 1, 0)) as commits,
SUM(IF(type = 'IssuesEvent', 1, 0)) as issues,
EXACT_COUNT_DISTINCT(JSON_EXTRACT(payload, '$.commits[0].author.email')) AS authors
from (
@lukaszgryglicki
lukaszgryglicki / README.txt
Last active April 21, 2017 13:07
9 CNCF projects groupped data for last 12 months 201604-201703
1. Use cncf.io page and then all 9 projects repos on github (this is the base)
2. Use finder sql with all 9 projects to find their repos, also some of them moved during last year. We need complex condition
to get all CNCF's projects data (possible some more) - it will be postprocessed by ruby tool.
3. Got final condition and saved results to google sheet.
4. File `data.csv` manually updated (added project value where it is different than org)
5. File `data.csv` processed by `analysis.rb` to produce `projects.csv` to be used as google sheet chart source.
Files:
`finder.sql` to lookup github repos
`final.sql` final bigquery that produces input data for ruby tool
@lukaszgryglicki
lukaszgryglicki / README.txt
Last active February 22, 2022 12:52
Top 30 projects from March 2017 (non forked, no bot counting) with projects hints from top 30 201604-201703
This is a top 30 projects from March 2017.
It uses the same algorithm as Top 30 201604 - 201703 but skips all activity of users with name LIKE '%bot%' and takes only 2016-03 data.
Non forked repos, no bot activity, all others the same as in previous top 30s.
`data.csv` is a file generated by BigQuery `query.sql`.
I've manually (iterative) added projects to repos there (query doesn't return CSV "project" column - because there is no such data in BigQuery's githubarchive table)
Created ruby tool `analysis.rb` that takes intput from `data.csv` and `hint.csv` (hint.csv is output of top 30 last year bigquery with manually added projects)
hint.csv is used there: https://docs.google.com/spreadsheets/d/1IDkNpQ1Xa_zIsf8askHswxwW1GceXNoStdtnRm_PJVs/edit#gid=1169691230
`analysis.csv` loads hint.csv and creates mapping repo --> project.
@lukaszgryglicki
lukaszgryglicki / README.txt
Created April 21, 2017 08:23
Check if angular have more issues than kubernetes (angular and kubernetes repos combined, from all time, statistics got manually from each separate github repo - manually)
For each repo in angular and kubernetes (from those that were in Top 50) go to the github site manually and:
Click Issues and get opened and closed
Click Pull Requests and get opened and closed
Click code and get: commits,authors,branches,releases,watch,star,fork
Repeat for all kubernetes and angular repos.
Save this data manually in `all_time.csv`
Created tool `manual.rb` in ruby which:
-Sums data per org (kubernetes and angular)
-Groping by org, summary row have "repo1+repo2+...+repoN" name
-Sum all values except "authors" (can be the same in repos), for authors use max from all repos.
@lukaszgryglicki
lukaszgryglicki / README.txt
Last active April 20, 2017 13:22
Clearbit batch query
File `input.csv` is taken from cncf/gitdm's output on kubernetes/kubernetes repo (list of unknown emails)
File `input_enriched.csv` comes from clearbit batch enrichement (cost $45) (so not attached here)
File `analysis.rb` is a Ruby tool that tries to get data from enriched data. It first generates `names.txt` file - for which
exact mapping email -> employer was found, then it falls back to email -> full name (that needs to be searched manually).
File `companies.txt` contains exact mapping found. This must be adjusted manually because sometimes the same company have slightly different name there.
File `names.txt` is for manual work to find other missing employments.
Work is like this:
Run cncfdm.py locally and see top missing developers. Search for them in `names.txt` finally try to locate them on standard google search
Repeat until results are OK.
@lukaszgryglicki
lukaszgryglicki / README.txt
Last active April 19, 2017 12:52
Kubernetes test-infra authors activity
data.csv comes from top 50 BigQuery, manually edited (added column "projects" to group by projects):
analysis.rb - tool in ruby for analysing data.csv
results.txt - output of analysis.rb
getdata.sql - to get data about kubernetes/test-infra from BigQuery
@lukaszgryglicki
lukaszgryglicki / README.txt
Last active April 19, 2017 11:16
data.csv contains top 500 repos (as defined in other gists), analysis.rb is a ruby tool for analysing repos, orgs and to marege repos from orgs into projects
data.csv contains top 500 repos (as defined in other gists),
analysis.rb is a ruby tool for analysing repos, orgs and to marege repos from orgs into projects
We gather data from data.csv, and if there is a project defined then group by project, else by org, and finally by repo.
Then stop at pry.debug and update/examine data
Finally save generated structures to:
projects.csv
repos.csv
combined.csv
Gmail (373):
aanm90@gmail.com aaronjlevy@gmail.com ablock84@gmail.com adi.ofry@gmail.com admpyle@gmail.com adnanh@gmail.com afe.young@gmail.com agonzalezro@gmail.com ahmetalpbalkan@gmail.com ainonic@gmail.com akram.benaissi@gmail.com alakriti@gmail.com alanwill81@gmail.com albatross0@gmail.com aledbf@gmail.com ales.nosek@gmail.com alex.chesser@gmail.com alexdwanerobinson@gmail.com alfredo.espejel.corvera@gmail.com allan.caffee@gmail.com anantharamu@gmail.com ancosen@gmail.com andrei.kopats@gmail.com andrew.stuart2@gmail.com andrewmoorewatson@gmail.com anshichao.cn@gmail.com ant.mironov@gmail.com anthony.elizondo@gmail.com antmanler@gmail.com apelisse@gmail.com aps.sids@gmail.com argregoryian@gmail.com arisu1000@gmail.com aronchick@gmail.com arun.gupta@gmail.com arve.knudsen@gmail.com ashw7n@gmail.com avesh.ncsu@gmail.com avinash.sridharan@gmail.com bastien974@gmail.com bayualdiyansyah@gmail.com bearnard@gmail.com beeradb@gmail.com ben.the.elder@gmail.com bjoern.erik.strand@gmail.com bobintornado@gmail.com b