Mayya Sharipova mayya-sharipova

## concurrent_merges_benchmark_open_ai3.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mayya-sharipova
                / concurrent_merges_benchmark_open_ai3.md
            
            
              Last active
              April 19, 2024 20:02
            
              
                Benchmarking concurrent merges - open_ai track.md
              
          
    open_ai track

open_ai track modified: 1) exclude for merges to finish 2) include force-merge
Track progression:

standalone indexing with 5 clients
standalone search with 1 client
standalone search with 8 clients
parallel indexing with 1 client (target throughput of 500 docs/s)
parallel search with 3 clients (target throughput of 100 op/s)
force-merge


## concurrent_merges_benchmark_open_ai.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mayya-sharipova
                / concurrent_merges_benchmark_open_ai.md
            
            
              Last active
              April 18, 2024 19:34
            
              
                Benchmarking concurrent merges - open_ai track.md
              
          
    open_ai track

Track progression:

standalone indexing with 5 clients
standalone search with 1 client
standalone search with 8 clients
parallel indexing with 1 client (target throughput of 500 docs/s)
parallel search with 3 clients (target throughput of 100 op/s)

Baseline VS Candidate


baseline: lucene_snapshot branch of Elasticsearch (with MADV_RANDOM feature flag)


## benchmarks_merge2.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mayya-sharipova
                / benchmarks_merge2.md
            
            
              Created
              March 27, 2024 17:30
            
              
                Benchmarking multi threaded merge so_vecotr
              
          
Metric
Task
Baseline
Contender
Diff
Unit
Diff %


Cumulative indexing time of primary shards

30.6003
31.6822
1.08185
min
+3.54%


Min cumulative indexing time across primary shard

15.2671
15.7348
0.46777
min
+3.06%


Median cumulative indexing time across primary shard

15.3002
15.8411
0.54092
min
+3.54%


Max cumulative indexing time across primary shard


## benchmarks_merge.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mayya-sharipova
                / benchmarks_merge.md
            
            
              Last active
              March 27, 2024 13:34
            
          
Metric
Task
Baseline
Contender
Diff
Unit
Diff %


Cumulative indexing time of primary shards

9.5825
9.45558
-0.12692
min
-1.32%


Min cumulative indexing time across primary shard

4.77432
4.70707
-0.06725
min
-1.41%


Median cumulative indexing time across primary shard

4.79125
4.72779
-0.06346
min
-1.32%


Max cumulative indexing time across primary shard


## lucene_vector_merge_policy3.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mayya-sharipova
                / lucene_vector_merge_policy3.md
            
            
              Last active
              October 16, 2023 13:39
            
              
                Investigation on merge policy for vectors
              
          
    Investigation on merge policy for vectors

Merge policy has been modified for vectors:

Instead of merging segments of equal size, a merge is composed of the current largest segment with smallest segments
Instead of choosing a merge with the smallest skew (preference was given to merges of segments of same size), preference is given to a merge of biggest segment with smallest segments.

These modifications were done to take advantage of optimization during merging that keeps the largest graph from merging segments and appends vectors from other merging segments to it.
Experiments


## lucene_vector_merge_policy2.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mayya-sharipova
                / lucene_vector_merge_policy2.md
            
            
              Last active
              October 16, 2023 13:38
            
              
                Vectors Merge Policy
              
          
    Investigation on merge policy for vectors

Merge Policies:

baseline: current TieredMergePolicy on Lucene main  (floor_segment_size=2MB, maxMergeAtOnce=10, segsPerTier=10)
candidate1: the same TieredMergePolicy with floor_segment_size=200MB
candidate2: the same TieredMergePolicy floor_segment_size=200MB, maxMergeAtOnce=20, segsPerTier=20

Experiment 1


10M vectors of 100 dims
NRT reader opens every second (to simulate concurrent with indexing searches)


## LuceneAnnBenchmarks9_4.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mayya-sharipova
                / LuceneAnnBenchmarks9_4.md
            
            
              Last active
              September 12, 2022 17:38
            
              
                Lucene ann benchmarks 9.3.1 and 9.4 comparison
              
          
5Gb heap
2Gb for indexing buffer
glove-100-angular M:16 efConstruction:100

recall dropped by 3-8%; QPS dropped by 30-50%


9.3 recall
9.3 QPS
9.4 recall
9.4 QPS


n_cands=10
0.620
2745.933
0.565
1856.128


## TestVectorsAPI.java
import jdk.incubator.vector.FloatVector;
import jdk.incubator.vector.VectorSpecies;
import java.util.Random;

import static jdk.incubator.vector.VectorOperators.ADD;
import static jdk.incubator.vector.FloatVector.SPECIES_256;

public class TestVectorsAPI {
  private static final int SIZE = 128;
  private static float[] v1 = new float[SIZE];

## wikimedium10m_profiler.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mayya-sharipova
                / wikimedium10m_profiler.md
            
            
              Last active
              March 30, 2021 20:59
            
              
                wikimedium10m profiler
              
          
    baseline

Profiler for cpu:
PROFILE SUMMARY from 921352 events (total: 921352)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
12.85%        118371        java.io.BufferedOutputStream#write()

  
## wikimedium1m_profiler.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mayya-sharipova
                / wikimedium1m_profiler.md
            
            
              Last active
              March 30, 2021 20:00
            
              
                wikimedium1m profiler
              
          
    baseline

Profiler for cpu:
PROFILE SUMMARY from 107744 events (total: 107744)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
Metric	Baseline	Contender	Diff	Unit	Diff %
Cumulative indexing time of primary shards	30.6003	31.6822	1.08185	min	+3.54%
Min cumulative indexing time across primary shard	15.2671	15.7348	0.46777	min	+3.06%
Median cumulative indexing time across primary shard	15.3002	15.8411	0.54092	min	+3.54%
Max cumulative indexing time across primary shard
Metric	Baseline	Contender	Diff	Unit	Diff %
Cumulative indexing time of primary shards	9.5825	9.45558	-0.12692	min	-1.32%
Min cumulative indexing time across primary shard	4.77432	4.70707	-0.06725	min	-1.41%
Median cumulative indexing time across primary shard	4.79125	4.72779	-0.06346	min	-1.32%
Max cumulative indexing time across primary shard
	import jdk.incubator.vector.FloatVector;
	import jdk.incubator.vector.VectorSpecies;
	import java.util.Random;

	import static jdk.incubator.vector.VectorOperators.ADD;
	import static jdk.incubator.vector.FloatVector.SPECIES_256;

	public class TestVectorsAPI {
	private static final int SIZE = 128;
	private static float[] v1 = new float[SIZE];