Skip to content

Instantly share code, notes, and snippets.

@Mec-iS
Created October 13, 2025 13:10
Show Gist options
  • Select an option

  • Save Mec-iS/247515726913c139a320e6c7600c795c to your computer and use it in GitHub Desktop.

Select an option

Save Mec-iS/247515726913c139a320e6c7600c795c to your computer and use it in GitHub Desktop.
Loading CVE JSON files...
310841it [01:36, 3206.27it/s]
Loaded 310841 CVEs
Generating embeddings...
Model loaded from: ./domain_adapted_model
Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9714/9714 [06:25<00:00, 25.22it/s]
Embeddings shape: (310841, 384), sample: [-0.05824163 0.1438971 0.05094874 0.04408188 -0.01066008]...
Building ArrowSpace...
[pyarrowspace] Convert pyarray2 and Vec<Vec>
[pyarrowspace] items shape: (310841, 384)
[pyarrowspace] items[0][:5]: [-0.5824162811040878, 1.4389710128307343, 0.5094874277710915, 0.44081881642341614, -0.10660084895789623]
[pyarrowspace] NaNs: 0, Infs: 0
[pyarrowspace] Building from rows
[pyarrowspace] built ArrowSpace: nitems=310841, nfeatures=384, lambdas_len=310841
Build time: 2514.30s
Searching 3 queries...
Model loaded from: ./domain_adapted_model
Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 162.29it/s]
Embeddings shape: (3, 384), sample: [-0.0122127 0.05706869 -0.01711867 -0.07662857 -0.11884194]...
======================================================================
Query 1: authenticated arbitrary file read path traversal
======================================================================
[pyarrowspace] search: qlen=384, lambda_q=0.024841
[pyarrowspace] search: qlen=384, lambda_q=0.024841
[pyarrowspace] search: qlen=384, lambda_q=0.024841
Results: cosine=10, hybrid=10, taumode=10, using min=10
Cosine (τ=1.0)
----------------------------------------------------------------------
1. CVE-2004-2290 (no title) [0.7329]
2. CVE-2002-1345 (no title) [0.7120]
3. CVE-2009-4053 (no title) [0.7115]
4. CVE-2009-4957 (no title) [0.7112]
5. CVE-2002-0347 (no title) [0.7082]
6. CVE-2024-22050 Iodine Static File Server Path Traversal Vulnerability [0.7005]
7. CVE-2009-1314 (no title) [0.6968]
8. CVE-2007-4842 (no title) [0.6966]
9. CVE-2008-6785 (no title) [0.6936]
10. CVE-2008-4319 (no title) [0.6896]
Hybrid (τ=0.8)
----------------------------------------------------------------------
1. CVE-2004-2290 (no title) [0.7861]
2. CVE-2009-4053 (no title) [0.7690]
3. CVE-2002-1345 (no title) [0.7689]
4. CVE-2009-4957 (no title) [0.7672]
5. CVE-2002-0347 (no title) [0.7664]
6. CVE-2024-22050 Iodine Static File Server Path Traversal Vulnerability [0.7569]
7. CVE-2007-4842 (no title) [0.7552]
8. CVE-2009-1314 (no title) [0.7543]
9. CVE-2008-6785 (no title) [0.7542]
10. CVE-2007-4546 (no title) [0.7505]
Taumode (τ=0.62)
----------------------------------------------------------------------
1. CVE-2004-2290 (no title) [0.8339]
2. CVE-2009-4053 (no title) [0.8207]
3. CVE-2002-1345 (no title) [0.8201]
4. CVE-2002-0347 (no title) [0.8188]
5. CVE-2009-4957 (no title) [0.8176]
6. CVE-2008-6785 (no title) [0.8088]
7. CVE-2007-4842 (no title) [0.8079]
8. CVE-2024-22050 Iodine Static File Server Path Traversal Vulnerability [0.8076]
9. CVE-2009-1314 (no title) [0.8060]
10. CVE-2015-2780 (no title) [0.8057]
Correlations:
Cosine vs Hybrid: ρ=0.967, τ=0.889
Cosine vs Taumode: ρ=0.817, τ=0.611
Hybrid vs Taumode: ρ=0.867, τ=0.722
NDCG@10:
Hybrid vs Cosine: 0.9950
Taumode vs Cosine: 0.9884
Taumode vs Hybrid: 0.9939
Tail Quality (Ranks 4-10):
Cosine (τ=1.0):
T/H ratio: 0.9731
CV: 0.0103
Hybrid (τ=0.8):
T/H ratio: 0.9783
CV: 0.0079
Taumode (τ=0.62):
T/H ratio: 0.9824
CV: 0.0063
======================================================================
Query 2: remote code execution in ERP web component
======================================================================
[pyarrowspace] search: qlen=384, lambda_q=0.072469
[pyarrowspace] search: qlen=384, lambda_q=0.072469
[pyarrowspace] search: qlen=384, lambda_q=0.072469
Results: cosine=10, hybrid=10, taumode=10, using min=10
Cosine (τ=1.0)
----------------------------------------------------------------------
1. CVE-2012-0980 (no title) [0.6630]
2. CVE-2008-6260 (no title) [0.6590]
3. CVE-2008-5223 (no title) [0.6548]
4. CVE-2002-0233 (no title) [0.6481]
5. CVE-2008-2792 (no title) [0.6392]
6. CVE-2006-0821 (no title) [0.6390]
7. CVE-2008-1407 (no title) [0.6379]
8. CVE-2025-26186 (no title) [0.6362]
9. CVE-2008-5269 (no title) [0.6358]
10. CVE-2009-1323 (no title) [0.6339]
Hybrid (τ=0.8)
----------------------------------------------------------------------
1. CVE-2012-0980 (no title) [0.7201]
2. CVE-2008-6260 (no title) [0.7169]
3. CVE-2008-5223 (no title) [0.7132]
4. CVE-2002-0233 (no title) [0.7116]
5. CVE-2009-1323 (no title) [0.7012]
6. CVE-2008-2792 (no title) [0.7010]
7. CVE-2006-0821 (no title) [0.7004]
8. CVE-2008-1407 (no title) [0.6998]
9. CVE-2021-31760 (no title) [0.6997]
10. CVE-2025-26186 (no title) [0.6986]
Taumode (τ=0.62)
----------------------------------------------------------------------
1. CVE-2012-0980 (no title) [0.7716]
2. CVE-2008-6260 (no title) [0.7690]
3. CVE-2002-0233 (no title) [0.7687]
4. CVE-2008-5223 (no title) [0.7658]
5. CVE-2009-1323 (no title) [0.7618]
6. CVE-2021-31760 (no title) [0.7617]
7. CVE-2014-8366 (no title) [0.7583]
8. CVE-2002-1636 (no title) [0.7576]
9. CVE-2008-2792 (no title) [0.7566]
10. CVE-2006-0821 (no title) [0.7557]
Correlations:
Cosine vs Hybrid: ρ=0.833, τ=0.778
Cosine vs Taumode: ρ=0.857, τ=0.714
Hybrid vs Taumode: ρ=0.905, τ=0.786
NDCG@10:
Hybrid vs Cosine: 0.9876
Taumode vs Cosine: 0.9681
Taumode vs Hybrid: 0.9847
Tail Quality (Ranks 4-10):
Cosine (τ=1.0):
T/H ratio: 0.9691
CV: 0.0067
Hybrid (τ=0.8):
T/H ratio: 0.9791
CV: 0.0058
Taumode (τ=0.62):
T/H ratio: 0.9869
CV: 0.0044
======================================================================
Query 3: SQL injection in login endpoint
======================================================================
[pyarrowspace] search: qlen=384, lambda_q=0.079769
[pyarrowspace] search: qlen=384, lambda_q=0.079769
[pyarrowspace] search: qlen=384, lambda_q=0.079769
Results: cosine=10, hybrid=10, taumode=10, using min=10
Cosine (τ=1.0)
----------------------------------------------------------------------
1. CVE-2007-5916 (no title) [0.7630]
2. CVE-2009-3430 (no title) [0.7353]
3. CVE-2014-1466 (no title) [0.7338]
4. CVE-2008-1631 (no title) [0.7311]
5. CVE-2008-5573 (no title) [0.7261]
6. CVE-2008-6582 (no title) [0.7244]
7. CVE-2008-6941 (no title) [0.7220]
8. CVE-2009-0738 (no title) [0.7220]
9. CVE-2006-0192 (no title) [0.7212]
10. CVE-2008-6332 (no title) [0.7211]
Hybrid (τ=0.8)
----------------------------------------------------------------------
1. CVE-2007-5916 (no title) [0.8024]
2. CVE-2014-1466 (no title) [0.7807]
3. CVE-2008-5573 (no title) [0.7761]
4. CVE-2009-3430 (no title) [0.7758]
5. CVE-2008-1631 (no title) [0.7734]
6. CVE-2008-6941 (no title) [0.7717]
7. CVE-2009-2354 (no title) [0.7691]
8. CVE-2006-0192 (no title) [0.7689]
9. CVE-2008-6332 (no title) [0.7677]
10. CVE-2008-6582 (no title) [0.7673]
Taumode (τ=0.62)
----------------------------------------------------------------------
1. CVE-2007-5916 (no title) [0.8379]
2. CVE-2014-1466 (no title) [0.8229]
3. CVE-2008-5573 (no title) [0.8212]
4. CVE-2008-6941 (no title) [0.8164]
5. CVE-2009-2354 (no title) [0.8124]
6. CVE-2009-3430 (no title) [0.8124]
7. CVE-2006-0192 (no title) [0.8117]
8. CVE-2008-1631 (no title) [0.8115]
9. CVE-2005-2012 (no title) [0.8109]
10. CVE-2008-6332 (no title) [0.8096]
Correlations:
Cosine vs Hybrid: ρ=0.817, τ=0.667
Cosine vs Taumode: ρ=0.667, τ=0.571
Hybrid vs Taumode: ρ=0.817, τ=0.722
NDCG@10:
Hybrid vs Cosine: 0.9784
Taumode vs Cosine: 0.9533
Taumode vs Hybrid: 0.9888
Tail Quality (Ranks 4-10):
Cosine (τ=1.0):
T/H ratio: 0.9731
CV: 0.0046
Hybrid (τ=0.8):
T/H ratio: 0.9798
CV: 0.0038
Taumode (τ=0.62):
T/H ratio: 0.9816
CV: 0.0024
Top-10 plot saved to cve_top10_comparison.png
/datadisk/publish/pyarrowspace/tests/test_2_CVE_db.py:296: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
bp = ax3.boxplot(tail_data, labels=['Cosine', 'Hybrid', 'Taumode'],
/datadisk/publish/pyarrowspace/tests/test_2_CVE_db.py:296: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
bp = ax3.boxplot(tail_data, labels=['Cosine', 'Hybrid', 'Taumode'],
/datadisk/publish/pyarrowspace/tests/test_2_CVE_db.py:296: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
bp = ax3.boxplot(tail_data, labels=['Cosine', 'Hybrid', 'Taumode'],
Tail analysis plot saved to cve_tail_analysis.png
======================================================================
SUMMARY
======================================================================
Average NDCG@10:
Hybrid vs Cosine: 0.9870
Taumode vs Cosine: 0.9699
Average Tail/Head Ratios:
Cosine (τ=1.0): 0.9718 ± 0.0019
Hybrid (τ=0.8): 0.9790 ± 0.0006
Taumode (τ=0.62): 0.9836 ± 0.0023
→ Higher T/H ratio = Better long-tail quality
→ ArrowSpace (τ<1.0) maintains higher tail scores
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment