Skip to content

Instantly share code, notes, and snippets.

@rubyroobs
Last active April 8, 2024 11:11
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rubyroobs/77cc19f8985a12ca81e7d53789c144c5 to your computer and use it in GitHub Desktop.
Save rubyroobs/77cc19f8985a12ca81e7d53789c144c5 to your computer and use it in GitHub Desktop.
a starting point for investigating anomalous contributions in git repositories
# ruby's git repo investigation zsh one-liner-ish thingy
# (a starting point for investigating anomalous contributions in git repositories)
echo "$(find . -type f ! -size 0 ! -path './.git*' -exec grep -IL . "{}" \;)" | \
sed -e "s/^\.\///g" | \
while read line; \
do \
echo ">>>>>>>>$line"; \
echo "$(git log --follow --find-renames=40% --pretty=format:"%ad%x0A%h%x0A%an%x20<%ae>%x0A%s" -- "$line" | head -n 4)"; \
commitdates="$(git log --follow --find-renames=40% --pretty=format:"%ae" -- "$line" | head -n 1 | xargs -I {} git log --author={} --pretty=format:"%ad")"; \
echo "$(echo -n "$commitdates" | grep -c '^') commits authored by email (first commit on $(echo -n "$commitdates" | tail -n 1))"; \
echo "========binwalk"; \
binwalk --disasm --signature --opcodes --extract --matryoshka --depth=32 --rm $line; \
echo "^^^^^^^^strings"; \
echo "$(strings $line | grep -E "^.*([a-z]{3,}|[A-Z]{3,}|[A-Z][a-z]{2,}).*$" | sort | uniq -c | sort -nr | awk '{$1="";print}' | sed 's/^.//' | head -n 25)"; \
echo "<<<<<<<<"; \
done | less # or save it to a file with ">> output.txt", up to you!
# before/after running you might want to clean up the repo to HEAD state
# if there's untracked files (especially matroyshka extractions etc) the commit data might just be completely wrong
# WARNING: THIS WILL DELETE ALL UNTRACKED FILES IN YOUR GIT REPO!!!!!
# git clean -fxd
#
# what it does:
# - for each binary files in the current git HEAD
# - prints the filenamename
# - get the last commit changing it and prints commit sha, date, author name/email and commit msg
# - does very crude stats about the authors contribution from that email (first date and number of commits - bear in mind this couuuuld be spoofed)
# - prints binwalk output running in matroyshka extraction mode (it'll keep extracted nested archives etc...)
# - print top 25 strings that look of interest (ABC, abc, Abc or longer words)
# - pipe it to less or whatever you prefer
#
# caveats/warnings:
# - only tested on macos zsh with my local env
# - i dont remember which grep/git/posix utils etc i have
# - incredibly unoptimizied
# - you will need at least binwalk, git, strings for this?
#
# revisions:
# - better instructions, clean up slightly and expand it over multiple lines (thank you Raven!)
# - add strings and committer stats
# - add sample output for easier sharing
# - added another sample output showing some of the benign files in the project containing an actual compressed ELF binary
#
# future ideas:
# - structured outputting (🥲)
# - automate running this on repositories that security critical software is dependent on (like sshd -> systemd -> xz)
# - use git(hub|lab) APIs to enhance the contributor information shown
# - cluster results by contributor
# - time bound investigation (get recently changed files from git) and include non-binary but anomalous extensions
#
# This is what the current revision would output for one of the commits adding the exploit-carrying test fixtures in CVE-2024-3094
# (see https://www.openwall.com/lists/oss-security/2024/03/29/4 for more info)
#
# Sadly this looks pretty similar to other random binary fixtures in the repository (example below), but may inspire your own tweaks for other things to add~
>>>>>>>>tests/files/good-2cat.xz
Fri Feb 23 23:09:59 2024 +0800
cf44e4b7
Jia Tan <redacted>
Tests: Add a few test files.
451 commits authored by email (first commit on Fri Jan 28 20:47:55 2022 +0800)
========binwalk
Scan Time: 2024-03-30 15:15:13
Target File: /fakepath/xz/tests/files/good-2cat.xz
MD5 Checksum: b5615ce7d8388853cdb843b86d92fd57
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
Scan Time: 2024-03-30 15:15:13
Target File: /fakepath/xz/tests/files/good-2cat.xz
MD5 Checksum: b5615ce7d8388853cdb843b86d92fd57
Signatures: 445
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 xz compressed data
68 0x44 xz compressed data
Scan Time: 2024-03-30 15:15:13
Target File: /fakepath/xz/tests/files/_good-2cat.xz.extracted/0
MD5 Checksum: 2d99a8021719ca8257afa1fde6c29902
Signatures: 445
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
Scan Time: 2024-03-30 15:15:13
Target File: /fakepath/xz/tests/files/_good-2cat.xz.extracted/44
MD5 Checksum: 30cfe120f91122e304a5db4e71d3c112
Signatures: 445
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
^^^^^^^^strings
__World__
__Hello__
<<<<<<<<
# Here is another benign test file - at the very least it's interesting to see some of the lengths the attacker went to so that their binary would blend in with the other test fixtures.
>>>>>>>>tests/files/bad-1-stream_flags-3.xz
Wed Nov 19 20:46:52 2008 +0200
e114502b
Lasse Collin <redacted>
Oh well, big messy commit again. Some highlights: - Updated to the latest, probably final file format version. - Command line tool reworked to not use threads anymore. Threading will probably go into liblzma anyway. - Memory usage limit is now about 30 % for uncompression and about 90 % for compression. - Progress indicator with --verbose - Simplified --help and full --long-help - Upgraded to the last LGPLv2.1+ getopt_long from gnulib. - Some bug fixes
1801 commits authored by email (first commit on Sun Dec 9 00:42:33 2007 +0200)
========binwalk
Scan Time: 2024-03-30 15:15:04
Target File: /fakepath/xz/tests/files/bad-1-stream_flags-3.xz
MD5 Checksum: c570f7455b47752caf9cad0aa46f2445
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
Scan Time: 2024-03-30 15:15:04
Target File: /fakepath/xz/tests/files/bad-1-stream_flags-3.xz
MD5 Checksum: c570f7455b47752caf9cad0aa46f2445
Signatures: 445
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 xz compressed data
Scan Time: 2024-03-30 15:15:04
Target File: /fakepath/xz/tests/files/_bad-1-stream_flags-3.xz.extracted/0
MD5 Checksum: fbf68a8e34b2ded53bba54e68794b4fe
Signatures: 445
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
^^^^^^^^strings
World!
Hello
<<<<<<<<
# Including for comparison also, this is an older benign test file from 2009 that is a compressed x86 ELF binary (the reason for it is explained in the commit message)
>>>>>>>>tests/files/good-1-x86-lzma2.xz
Fri Feb 6 09:13:15 2009 +0200
975d8fd7
Lasse Collin <redacted>
Recreated the BCJ test files for x86 and SPARC. The old files were linked with crt*.o, which are copyrighted, and thus the old test files were not in the public domain as a whole. They are freely distributable though, but it is better to be careful and avoid including any copyrighted pieces in the test files. The new files are just compiled and assembled object files, and thus don't contain any copyrighted code.
1801 commits authored by email (first commit on Sun Dec 9 00:42:33 2007 +0200)
========binwalk
Scan Time: 2024-03-30 15:15:35
Target File: /fakepath/xz/tests/files/good-1-x86-lzma2.xz
MD5 Checksum: cac74e22da04c4e067d21ae170863bf2
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
Scan Time: 2024-03-30 15:15:35
Target File: /fakepath/xz/tests/files/good-1-x86-lzma2.xz
MD5 Checksum: cac74e22da04c4e067d21ae170863bf2
Signatures: 445
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 xz compressed data
Scan Time: 2024-03-30 15:15:35
Target File: /fakepath/xz/tests/files/_good-1-x86-lzma2.xz.extracted/0
MD5 Checksum: ce212d6a1cfe73d8395a2b42f94c2419
Signatures: 445
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 ELF, 32-bit LSB relocatable, Intel 80386, version 1 (SYSV)
^^^^^^^^strings
Ebv )
<<<<<<<<
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment