Skip to content

Instantly share code, notes, and snippets.

@dakrone

dakrone/wiki.md Secret

Created December 27, 2013 23:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dakrone/8cfb64314eef725c1a9e to your computer and use it in GitHub Desktop.
Save dakrone/8cfb64314eef725c1a9e to your computer and use it in GitHub Desktop.

Numbers for wiki1000k-noop

Wikipedia text field, 1 million documents, no optimizations

totalTermBytes terms.size() estimation time (ms)
4031222 509207 158
14426761 1665389 309
14773268 1709906 176
3700396 470154 28
3885451 488270 28
3833333 484665 30
1287291 167984 12
833056 111611 8
803728 106971 6
189648 26123 4
757215 100871 9
781669 104058 9
786072 104116 9
168621 23471 2
195466 26391 2
173945 24039 2
183806 25080 2
192048 26788 1
169506 23552 2
696865 93460 9
186574 25934 2
214869 29119 2
171981 24243 2
152400 21872 1
172544 24284 3
679587 91024 5
365390 49862 4
265914 36686 4
273236 37695 4
275688 37427 3
760938 102103 6
186239 25969 3
188341 25745 1
208884 29088 1
179485 25380 2
185630 26237 2
183757 25840 10
179777 24806 2
189111 26163 3
165141 23144 2
282407 39052 2
165932 22804 2
149672 20983 1
172189 23997 1
164196 22968 5
174007 24038 7
695856 94745 20
@dakrone
Copy link
Author

dakrone commented Dec 27, 2013

Sum of execution time is 906ms, the index is 2.4gb.

@dakrone
Copy link
Author

dakrone commented Dec 27, 2013

Numbers for wiki1000k

Wikipedia text field, 1 million documents, fully optimized

segment totalTermBytes terms.size() estimation time (ms)
1 24380545 2748911 935
2 24510473 2763384 956

@dakrone
Copy link
Author

dakrone commented Dec 27, 2013

I should mention both of these indices have 2 primary shards.

@dakrone
Copy link
Author

dakrone commented Dec 30, 2013

Some more numbers (total fielddata loading times):

Numbers for BlockTreeTermsReader estimations

Numbers for wiki1000k-noop

Wikipedia text field, 1 million documents, no optimizations

segment totalTermBytes terms.size() estimation (ms) total fd loading (ms)
1 4031222 509207 314 2292
2 14426761 1665389 748 14404
3 14773268 1709906 717 15038
4 3700396 470154 123 1564
5 3885451 488270 111 1658
6 1287291 167984 56 340
7 3833333 484665 146 1709
8 833056 111611 24 189
9 803728 106971 31 175
10 189648 26123 9 82
11 757215 100871 25 178
12 781669 104058 28 170
13 786072 104116 32 189
14 168621 23471 8 27
15 195466 26391 7 27
16 173945 24039 5 18
17 183806 25080 7 28
18 192048 26788 23 47
19 169506 23552 6 23
20 186574 25934 4 27
21 214869 29119 7 23
22 171981 24243 7 27
23 152400 21872 6 20
24 172544 24284 5 51
25 365390 49862 21 75
26 696865 93460 25 131
27 679587 91024 25 108
28 265914 36686 10 49
29 273236 37695 6 43
30 275688 37427 12 50
31 760938 102103 30 119
32 186239 25969 8 38
33 188341 25745 7 24
34 208884 29088 7 27
35 179485 25380 9 40
36 185630 26237 9 51
37 183757 25840 9 32
38 179777 24806 8 33
39 189111 26163 6 42
40 165141 23144 9 28
41 282407 39052 10 47
42 165932 22804 6 25
43 149672 20983 5 19
44 172189 23997 8 26
45 164196 22968 4 21
46 174007 24038 6 25
47 695856 94745 28 171

Numbers for wiki1000k

Wikipedia text field, 1 million documents, fully optimized

segment totalTermBytes terms.size() estimation (ms) total fd loading (ms)
1 24380545 2748911 878 39305
2 24510473 2763384 903 39516

@dakrone
Copy link
Author

dakrone commented Dec 30, 2013

See also these graphs:

noop

op

The percentage of estimation vs loading time goes down the larger the segment is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment