Created
June 15, 2017 17:31
-
-
Save viveknarang/628e533c3fdc652c666a9ae3db1e4983 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[13:14] == SolrUser [6020d824@gateway/web/freenode/ip.96.32.216.36] has joined #solr | |
[13:14] == shellac [~textual@vpn-user-249-054.nomadic.bris.ac.uk] has joined #solr | |
[13:14] == clyon_ [~clyon@69-12-169-18.dedicated.static.sonic.net] has joined #solr | |
[13:15] <SolrUser> Hi, So I am trying to index documents using the ConcurrentUpdateSolrClient but even with different thread counts I do not notice any changes in indexing time ? | |
[13:16] <SolrUser> is there something i am not doing right ? | |
[13:20] <@hoss> SolrUser: it's posible that the code you have feeding docs to your SolrClient is itself the bottleneck | |
[13:20] <@hoss> in general, i'm not really a fan of ConcurrentUpdateSolrClient -- it doesn't do what most people expect, and it's error handling mechanincs are confusing | |
[13:20] <SolrUser> okay | |
[13:21] <@hoss> i haven't done extenisive testing to prove it, but i'm pretty sure that with a well configured HttpClient using aggressive keep-alive options, you can get HttpSolrClient to be just as efficient | |
[13:21] <@hoss> where CUSC is *suppose* to be helpful is that you can "throw documents" at it w/o worrying about your client threads waiting for the response from solr | |
[13:22] == Guest4 [~textual@77.16.69.51.tmi.telenormobil.no] has joined #solr | |
[13:22] <@hoss> but as it's javadocs note, it buffers docs and sends them in a single connection -- arguably to reduce network overhead of lots of connections | |
[13:22] <SolrUser> what would you recommend me to use for threaded indexing ? | |
[13:22] == Brig_ [4227a68a@gateway/web/freenode/ip.66.39.166.138] has joined #solr | |
[13:22] <@elyograg> CUSC is good for initial bulk loading where you don't care about knowing whether the indexing worked. | |
[13:23] <@elyograg> SolrUser: handling multiple threads yourself and using either HttpSolrClient or CloudSolrClient. | |
[13:23] <SolrUser> okay | |
[13:23] == bauruine [~bauruine@2a01:4f8:130:8285:fefe::36] has quit [Max SendQ exceeded] | |
[13:24] <@hoss> like i said: i supsect if you had an HttpSolrClient, and you made sure the underlying HttpClient had a high max threads per host (or whatever it's called these days) and long keep-alive options, you could probably ramp up the number of threads using a single instance and see faster indexing (untill either your client CPUs or solr CPUs gets saturated) | |
[13:24] <SolrUser> okay | |
[13:25] <@elyograg> Something I've been working on is a design with a central queue with mulitple consumer threads pulling things off the queue, building document lists, and sending requests ... with one or more threads adding to the queue. | |
[13:25] <SolrUser> that would sure be great | |
[13:27] == Brig_ [4227a68a@gateway/web/freenode/ip.66.39.166.138] has quit [Ping timeout: 260 seconds] | |
[13:27] <@elyograg> like hoss says, if the HttpClient object is configured right, one SolrClient object can handle many threads. | |
[13:27] <SolrUser> One more question: Should indexing time be constant or should it vary ? I am observing varying indexing times for exact data set on same machine and their variation is huge. | |
[13:28] <@elyograg> the defaults for HttpClient objects only allow two threads to run at the same time. | |
[13:29] <SolrUser> humm interesting i didnt knew that | |
[13:30] <@elyograg> if you're not committing as part of the update, times should be somewhat similar, but variation can happen. If you're committing, it could be extremely variable. | |
[13:30] == bauruine [~bauruine@2a01:4f8:130:8285:fefe::36] has joined #solr | |
[13:31] <SolrUser> Humm. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment