Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save marnovo/c8839f71a8de59fd70afbc512bbdc385 to your computer and use it in GitHub Desktop.
Save marnovo/c8839f71a8de59fd70afbc512bbdc385 to your computer and use it in GitHub Desktop.
code snippets - Feb2019
SELECT
files.file_path, ref_commits.repository_id, files.blob_content
FROM
files
NATURAL JOIN
commit_files
NATURAL JOIN
ref_commits
WHERE
ref_commits.ref_name = 'HEAD'
AND ref_commits.history_index BETWEEN 0 AND 5
AND is_binary(blob_content) = false
AND files.file_path NOT REGEXP '^vendor.*'
AND (
blob_content REGEXP '(?i)facebook.*[\'\\"][0-9a-f]{32}[\'\\"]'
OR blob_content REGEXP '(?i)twitter.*[\'\\"][0-9a-zA-Z]{35,44}[\'\\"]'
OR blob_content REGEXP '(?i)github.*[\'\\"][0-9a-zA-Z]{35,40}[\'\\"]'
OR blob_content REGEXP 'AKIA[0-9A-Z]{16}'
OR blob_content REGEXP '(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}'
OR blob_content REGEXP '.*-----BEGIN PRIVATE KEY-----.*'
OR blob_content REGEXP '.*-----BEGIN RSA PRIVATE KEY-----.*'
OR blob_content REGEXP '.*-----BEGIN DSA PRIVATE KEY-----.*'
OR blob_content REGEXP '.*-----BEGIN OPENSSH PRIVATE KEY-----.*'
);
SELECT
language,
COUNT(repository_id) AS repository_count
FROM (
SELECT DISTINCT
language(t.tree_entry_name, b.blob_content) AS language,
r.repository_id
FROM
refs r
JOIN commits c ON r.commit_hash = c.commit_hash
JOIN commit_trees ct ON c.commit_hash = ct.commit_hash
JOIN tree_entries t ON ct.tree_hash = t.tree_hash
JOIN blobs b ON t.blob_hash = b.blob_hash
WHERE
r.ref_name REGEXP 'refs/heads/HEAD/.*' OR r.ref_name = 'HEAD'
) AS q1
GROUP BY language
ORDER BY repository_count DESC;

Linux Kernel 3D visualization w/ Tensorboard and t-SNE

Steps:

  1. Go to Tensorflow projector with loaded Linux kernel data.
  2. Select 'T-SNE' on the left panel in the opened browser
  3. (Stop if running) Set 'Learning Rate' to 1 and (re)run
  4. Wait with your browser window open as similar files will be coupled together (see notes)
  5. A few thousand iterations already give good visibility of the clusters

Notes:

  1. This is a mere example, running over a free service and inside the browser and not meant to represent a production environment
  2. Please note some browsers may require the window to be visible or active to give proper processing power
SELECT repository_id, count(*) as contributor_count
FROM (
SELECT DISTINCT repository_id, commit_author_email
FROM commits
) AS q
GROUP BY repository_id
ORDER BY contributor_count DESC
LIMIT 10;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment