Skip to content

Instantly share code, notes, and snippets.

@pedramamini
Last active July 31, 2021 14:54
Show Gist options
  • Save pedramamini/d63585c243428a0bbf900c801c250bb7 to your computer and use it in GitHub Desktop.
Save pedramamini/d63585c243428a0bbf900c801c250bb7 to your computer and use it in GitHub Desktop.
InQuest Labs Rule Generator
#!/opt/research/venv/bin/python
"""
IQ Auto DIFF leverages InQuest Labs API to collate a list of post DFI string features from both malicious (bad) and
seemingly benign (non) files. While ignoring gibberish, we'll next identify the string features exclusive to each set.
The idea is that the top 25 strings found from the malicious (mal) corpus can be inclusive for a YARA rule where the
top 25 strings from the seemingly benign (non) corpus can be exclusive for a YARA rule.
This script will take upwards of 20 minutes to run.
[pedram@kb python-inquestlabs]$ ./iq_auto_diff.py
[2021-07-27T19:15:46] collecting bad-strings...
[2021-07-27T19:15:47] ( 0/ 625) added 4 of 4 bad-strings from b9d480b53939d96aea2eabb86bb5bed17f83cf5956ec11f3ca7c343dc7cb05cb
[2021-07-27T19:15:47] ( 1/ 625) added 6 of 8 bad-strings from ef800a80043aba4078438af92e7dc2b360757ae68877af2accc1d984a06e92e1
[2021-07-27T19:15:47] SKIPPING ee897f27958a508c860e008c586d7c335c7425945016cf400ff4819c5b895368 26 >= 25
[2021-07-27T19:15:48] ( 3/ 625) added 29 of 52 bad-strings from 991c1f9c4828b6de37f61e4373158ede52fc9ad86e1c54742fd83d55a12ec418
[2021-07-27T19:15:48] ( 4/ 625) added 2 of 3 bad-strings from 078bdfc35828fd0335ff587ade18f7335fbed0c83ec2d723ff0f58fd3ebb9b02
[2021-07-27T19:15:48] ( 5/ 625) added 6 of 8 bad-strings from 85b3925bdb3abcaa063a2abcc473dd70ee62e4366ed161b9bcf02f1d5160b88c
[2021-07-27T19:15:48] ( 6/ 625) added 4 of 4 bad-strings from 5db1dccf660a2d8dbda318ff242334f80067dc3c54dc1e599a494f76af68c9ea
[2021-07-27T19:15:48] SKIPPING 06307c675c735666313776cc2b216008adda1c3c35d44f70b7465235df9a3bf4 38 >= 25
[2021-07-27T19:15:48] ( 8/ 625) added 6 of 8 bad-strings from 8800a34984d142ce3e9209a127e0afeb012a417ee05216c29723613f5c963d06
[2021-07-27T19:15:48] SKIPPING d2bfd118b6032e930cd8d5b0a5896c54dd71af4d8e267c25f23afb03681d3513 33 >= 25
[2021-07-27T19:15:49] ( 10/ 625) added 56 of 92 bad-strings from 3f6ed78466474de7c20310e9c1b56717ae6cd2f3bec125999c4af9ac73603125
[2021-07-27T19:15:49] ( 11/ 625) added 0 of 1 bad-strings from 23ac10976b03d7b7dbcec9c3d4e0f5819879abae0190ecaf30c5dee43cd5f9b1
[2021-07-27T19:15:49] SKIPPING e953dc8cbf31c3db525a3567707b91d88772a3019beaa4d2e887cf685bc28747 32 >= 25
[2021-07-27T19:15:49] ( 13/ 625) added 61 of 112 bad-strings from 4b676017946838b213aac288bf990fd0ea9f3535aea517b6c75c9d332e0a8ad4
[2021-07-27T19:15:49] ( 14/ 625) added 4 of 4 bad-strings from bf378c1b74197db12ea5262c0588a0e2999bcfc04f090079664f53dfeebbfaa3
[2021-07-27T19:15:49] SKIPPING 866ae1b419e58577ae8cc037ca22809dda853dddddad04b4f02d92001fa9de2f 30 >= 25
[2021-07-27T19:15:50] ( 16/ 625) added 1131 of 1337 bad-strings from d609852ba9fe2252e601569e47efcc9fbda67ee8624e97c47d89df7e14b98dac
[2021-07-27T19:15:51] ( 17/ 625) added 63 of 77 bad-strings from f4ea5710b7134994881964444e6d43d3b815c7ed0d3583362c269dcbf84793a3
[2021-07-27T19:15:51] ( 18/ 625) added 1144 of 1337 bad-strings from 5500467e5648750a276c8075aa845cbb00662391242ec55387d18291e3628423
[2021-07-27T19:15:52] ( 19/ 625) added 51 of 107 bad-strings from e9f32af91c41a9835e638d962ff6e00c9dfc5129ca67fef205e8689329aff6b1
[2021-07-27T19:15:52] SKIPPING ba7731b6dc348e539c3e92a30f7811579de525f4346b223430b0b668e68e30c0 31 >= 25
...
[2021-07-27T19:18:16] ( 623/ 625) added 90 of 228 bad-strings from 0e1323f3722b60ff5d50ac58d0c7cc262d737cb52f23b27db40ee9f73df19e84
[2021-07-27T19:18:17] ( 624/ 625) added 95 of 114 bad-strings from bad87ffc75b6468db906719d5499da425a2add13fc301410fc4693e6b3575f7c
[2021-07-27T19:18:17] collecting non-strings...
[2021-07-27T19:18:18] ( 0/ 4375) added 202 of 260 non-strings from 52a8f92c18f08fa6680d46e88b4b805b83eb69ef70f41d9a21f3e085a6b226d7
[2021-07-27T19:18:18] ( 1/ 4375) added 5 of 7 non-strings from 7ea90c38c53641d6d81f5787aa0ed05036ce7e30cec0425ed31274159f22e919
[2021-07-27T19:18:19] ( 2/ 4375) added 154 of 942 non-strings from a52abc70f3e7b833ef758e0eb616ce0e3f1fe2878ae31a95c1628fbf5ef46900
[2021-07-27T19:18:19] ( 3/ 4375) added 396 of 437 non-strings from 94ec49e72a1a060919da040acfceb1d756846a8d9f9456f9a966732384680b69
[2021-07-27T19:18:19] ( 4/ 4375) added 147 of 165 non-strings from 1fb39be63e0b0b642c8c8e543d2aa0f09b564cba130d2a83dfc5478661f16034
[2021-07-27T19:18:20] ( 5/ 4375) added 127 of 155 non-strings from b5c023feaa8a49eb7d75d28fea63aebe6bd7e28b8d057f98c1910068884425ac
[2021-07-27T19:18:20] ( 6/ 4375) added 7 of 7 non-strings from 3e62a36bd359b57046d28db87e78349e0eb654ae78829bd2ab0a415bee42dec9
[2021-07-27T19:18:20] SKIPPING 4abb15fbbb55b3ff744cab305094af13f64ed3e882d0d09eb020607d40f284ff 3 >= 3
[2021-07-27T19:18:20] ( 8/ 4375) added 42 of 43 non-strings from 8a1a38f7666ef11f56446a44b40084ebdd0cd7f6d9c36a408e2df8eb28ccd66b
[2021-07-27T19:18:21] ( 9/ 4375) added 334 of 1337 non-strings from b47cbadc448f538e1432ea280a518165b356caf82003b4941044541a31209378
[2021-07-27T19:18:21] ( 10/ 4375) added 591 of 650 non-strings from e91a25a7ad8e526455dfe3a2eb32aa8637f19564fcbbf01ec57d5d893fc3425e
...
[2021-07-27T19:45:12] ( 4373/ 4375) added 0 of 0 non-strings from 5e871c6ecdcd59fbf82014457844127293ce87ee22d8a5c525276779348b1c8f
[2021-07-27T19:45:12] ( 4374/ 4375) added 145 of 149 non-strings from b227920b448569aad7557d09b848e2a2033875638a27ec4310b5162470130a3b
[2021-07-27T19:45:12] collected 80144 bads and 1003436 nons
bads...
94 Each th
37 Enable Content
28 This document is protected
28 000208206-
28 1SPecialiST RePackfalsefalsefalse16.0300
28 62144Y 2
28 6515072 D
28 all Err.
28 Decode64
28 Dim bOu
28 el as Bo
28 .exe /c H""p27he0
28 IfD4T8hen
28 ) Mod OI]
28 ng Ats C
28 not vali
28 np@Isc-is
28 onst clO@neMask
28 Raise(vb@Object
28 root\cim
28 rublic F
28 Sel`. CQ
28 Sheet0=0, 0, 0, 0, C
28 @Templat@eDeriv
28 wers6(63!B
nons...
263 closeExcel
235 BordersC"`
233 xlEdgeRight%v`
231 #,##0.00_
214 "\"#,##0.00;[Red]"\"\-#,##0.00
214 "\"#,##0;[Red]"\"\-#,##0
211 _ "\"* #,##0.00_ ;_ "\"* \-#,##0.00_ ;_ "\"* "-"??_ ;_ @_
211 "\"#,##0.00;"\"\-#,##0.00
210 _ "\"* #,##0_ ;_ "\"* \-#,##0_ ;_ "\"* "-"_ ;_ @_
210 "\"#,##0;"\"\-#,##0
204 StartRow
199 Enter_Values
199 SetRowsStyle
181 Sheet1 (2)$
180 Lsub_SheetAdd-
172 color2F`
165 ub model
159 [DEFECT-0001] OleFileError=OLE DirEntry index out of range
158 .ColorIndex = xlAutomatic
151 (ByVal L
147 Followed Hyperlink
147 .LineStyle = xlContinuous
144 ineNo As
141 With Selection.Borders(xlEdgeTop)
137 Lsub_SheetAdd
"""
import os
import time
import pickle
import datetime
# batteries not included.
import inquestlabs
go = time.time()
labs = inquestlabs.inquestlabs_api()
bads = []
nons = []
bcs = {}
ncs = {}
# for vt_positivies, don't include bads with >= in the list, don't include nons with >= in the list.
VT_SQUELCH_HIGH = 25
VT_SQUELCH_LOW = 3
IQ_SQUELCH_HIGH = 6
IQ_SQUELCH_LOW = 4
SKIP_GIBBERISH = True
# time stamp for printing.
def ts ():
return datetime.datetime.now().isoformat().split(".")[0]
# is the string gibberish?
# NOTE: this is an InQuest internal model and not accessible to the piblic.
def is_gibberish (s):
HOME = "/opt/research/machine-learning/vectorizer"
try:
import gib_detect_train
except:
return False
model_data = pickle.load(open(os.path.join(HOME, 'gib_model.pki'), 'rb'))
model_mat = model_data['mat']
threshold = model_data['thresh']
if gib_detect_train.avg_transition_prob(s, model_mat) <= threshold:
return True
return False
# do it.
if __name__ == "__main__":
collections = \
[
# is malicious? label, data store, VT squelch, InQuest squelnch
(True, "bad-strings", bads, VT_SQUELCH_HIGH, IQ_SQUELCH_HIGH),
(False, "non-strings", nons, VT_SQUELCH_LOW, IQ_SQUELCH_LOW),
]
# collect data.
for is_malicious, label, datastore, vt_squelch, iq_squelch in collections:
print("[%s] collecting %s..." % (ts(), label))
listings = labs.dfi_list(malicious=is_malicious)
for idx, listing in enumerate(listings):
vt_positives = listing.get("vt_positives", 0) or 0
# XXX - we don't have this information in context at the moment.
# inquet_score = listing.get("")
if vt_positives >= vt_squelch:
print("[%s] SKIPPING %s %s >= %s" % (ts(), listing['sha256'], vt_positives, vt_squelch))
continue
try:
details = labs.dfi_details(listing['sha256'])
except:
print("[%s] FAILED getting %s" % (ts(), listing['sha256']))
pass
strings = details.get('string_features', [])
# skip over gibberish.
added = 0
for s in strings:
if not SKIP_GIBBERISH or not is_gibberish(s):
datastore.append(s)
added += 1
# banner.
banner = "[%s] (%5d/%5d) added %5d of %5d %s from %s"
banner %= ts(), idx, len(listings), added, len(strings), label, details['sha256']
print(banner)
print("[%s] collected %5d bads and %5d nons" % (ts(), len(bads), len(nons)))
# determine unique sets of strings between bads and nons.
set_bads = set(bads)
set_nons = set(nons)
onlybads = set_bads - set_nons
onlynons = set_nons - set_bads
# collect the string counts, skipping over strings that exist in the other set.
for l in bads:
if l in onlybads:
bcs[l] = bcs.get(l, 0) + 1
for l in nons:
if l in onlynons:
ncs[l] = ncs.get(l, 0) + 1
# build a list of [(lineA, countA), (lineB, countB), ... ]
bcss = list(bcs.items())
ncss = list(ncs.items())
# sort the list by (count, line) in reverse order.
bcss.sort(key=lambda x: x[1], reverse=True)
ncss.sort(key=lambda x: x[1], reverse=True)
# XXX - NOT Python 3x compatible
# def comparator (a, b):
# return -cmp((a[1], a[0]), (b[1], b[0]))
# bcss.sort(key=comparator)
# ncss.sort(key=comparator)
# output the lines.
print("bads...")
for line, count in bcss[:25]:
print("%5d %s" % (count, line))
print("nons...")
for line, count in ncss[:25]:
print("%5d %s" % (count, line))
# QED.
print("[%s] done in %d minutes." % (ts(), int(time.time() - go)/60))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment