Skip to content

Instantly share code, notes, and snippets.

View mayya-sharipova's full-sized avatar
🎯
Focusing

Mayya Sharipova mayya-sharipova

🎯
Focusing
View GitHub Profile
@mayya-sharipova
mayya-sharipova / concurrent_merges_benchmark_open_ai3.md
Last active April 19, 2024 20:02
Benchmarking concurrent merges - open_ai track.md

open_ai track

open_ai track modified: 1) exclude for merges to finish 2) include force-merge

Track progression:

  • standalone indexing with 5 clients
  • standalone search with 1 client
  • standalone search with 8 clients
  • parallel indexing with 1 client (target throughput of 500 docs/s)
  • parallel search with 3 clients (target throughput of 100 op/s)
  • force-merge
@mayya-sharipova
mayya-sharipova / concurrent_merges_benchmark_open_ai.md
Last active April 18, 2024 19:34
Benchmarking concurrent merges - open_ai track.md

open_ai track

Track progression:

  • standalone indexing with 5 clients
  • standalone search with 1 client
  • standalone search with 8 clients
  • parallel indexing with 1 client (target throughput of 500 docs/s)
  • parallel search with 3 clients (target throughput of 100 op/s)

Baseline VS Candidate

  • baseline: lucene_snapshot branch of Elasticsearch (with MADV_RANDOM feature flag)
@mayya-sharipova
mayya-sharipova / benchmarks_merge2.md
Created March 27, 2024 17:30
Benchmarking multi threaded merge so_vecotr
Metric Task Baseline Contender Diff Unit Diff %
Cumulative indexing time of primary shards 30.6003 31.6822 1.08185 min +3.54%
Min cumulative indexing time across primary shard 15.2671 15.7348 0.46777 min +3.06%
Median cumulative indexing time across primary shard 15.3002 15.8411 0.54092 min +3.54%
Max cumulative indexing time across primary shard
Metric Task Baseline Contender Diff Unit Diff %
Cumulative indexing time of primary shards 9.5825 9.45558 -0.12692 min -1.32%
Min cumulative indexing time across primary shard 4.77432 4.70707 -0.06725 min -1.41%
Median cumulative indexing time across primary shard 4.79125 4.72779 -0.06346 min -1.32%
Max cumulative indexing time across primary shard
@mayya-sharipova
mayya-sharipova / lucene_vector_merge_policy3.md
Last active October 16, 2023 13:39
Investigation on merge policy for vectors

Investigation on merge policy for vectors

Merge policy has been modified for vectors:

  • Instead of merging segments of equal size, a merge is composed of the current largest segment with smallest segments
  • Instead of choosing a merge with the smallest skew (preference was given to merges of segments of same size), preference is given to a merge of biggest segment with smallest segments.

These modifications were done to take advantage of optimization during merging that keeps the largest graph from merging segments and appends vectors from other merging segments to it.

Experiments

@mayya-sharipova
mayya-sharipova / lucene_vector_merge_policy2.md
Last active October 16, 2023 13:38
Vectors Merge Policy

Investigation on merge policy for vectors

Merge Policies:

  • baseline: current TieredMergePolicy on Lucene main (floor_segment_size=2MB, maxMergeAtOnce=10, segsPerTier=10)
  • candidate1: the same TieredMergePolicy with floor_segment_size=200MB
  • candidate2: the same TieredMergePolicy floor_segment_size=200MB, maxMergeAtOnce=20, segsPerTier=20

Experiment 1

  • 10M vectors of 100 dims
  • NRT reader opens every second (to simulate concurrent with indexing searches)
@mayya-sharipova
mayya-sharipova / LuceneAnnBenchmarks9_4.md
Last active September 12, 2022 17:38
Lucene ann benchmarks 9.3.1 and 9.4 comparison
  • 5Gb heap
  • 2Gb for indexing buffer
  • glove-100-angular M:16 efConstruction:100

recall dropped by 3-8%; QPS dropped by 30-50%

9.3 recall 9.3 QPS 9.4 recall 9.4 QPS
n_cands=10 0.620 2745.933 0.565 1856.128
@mayya-sharipova
mayya-sharipova / TestVectorsAPI.java
Last active October 26, 2021 13:52
TestVectorsAPI.java
import jdk.incubator.vector.FloatVector;
import jdk.incubator.vector.VectorSpecies;
import java.util.Random;
import static jdk.incubator.vector.VectorOperators.ADD;
import static jdk.incubator.vector.FloatVector.SPECIES_256;
public class TestVectorsAPI {
private static final int SIZE = 128;
private static float[] v1 = new float[SIZE];
@mayya-sharipova
mayya-sharipova / wikimedium10m_profiler.md
Last active March 30, 2021 20:59
wikimedium10m profiler

baseline

Profiler for cpu:
PROFILE SUMMARY from 921352 events (total: 921352)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
12.85%        118371        java.io.BufferedOutputStream#write()
@mayya-sharipova
mayya-sharipova / wikimedium1m_profiler.md
Last active March 30, 2021 20:00
wikimedium1m profiler

baseline

Profiler for cpu:
PROFILE SUMMARY from 107744 events (total: 107744)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK