Last active
January 5, 2023 21:17
-
-
Save Amit88k/acc541a068d4916d87e58b5f646a64c3 to your computer and use it in GitHub Desktop.
Indexing MongoDB data in Apache Solr
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Indexing data for fast and efficient retrieval is one of the important feture, that each application requires. I have used MongoDB to store data and Solr to index the data. Although MongoDB provides built-in full-text search capabilities but does not provide advanced indexing and search features. | |
I have used Linux OS environment for this Gist. The following softwares are required for this Gist- | |
- Java | |
- Python | |
- Apache Solr | |
- MongoDB | |
- Mongo Connector | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Install JDK8: | |
# sudo yum install java-1.8.0-openjdk-devel | |
Set JAVA_HOME / PATH for a single user | |
Login to your account and open .bash_profile file | |
# vi ~/.bash_profile | |
Set JAVA_HOME as follows using syntax export JAVA_HOME=<path-to-java>. If your path is set to /usr/java/jdk1.5.0_07/bin/java, set it as follows: | |
export JAVA_HOME=/usr/java/java-1.8.0-openjdk/bin/java |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Install python on linux machine: #yum install -y python36u | |
Get Python path: #which python | |
You need to setup global config in /etc/profile OR /etc/bash.bashrc file for all users: | |
#vi /etc/profile | |
Setup python just like PATH / JAVA_PATH variables as follows: | |
export PATH=$PATH:/usr/bin/python3.6 (path that you get after running command -> #which python) | |
Save the changes | |
#source /etc/profile |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Create a directory /solr to install the software and set its permissions to global (777). using following commands: | |
# mkdir /solr (I generally prefer to do in /opt) | |
# chmod 777 /solr | |
# cd /solr | |
Download and extract the Apache Solr tgz file: | |
# wget http://apache.mirror.vexxhost.com/lucene/solr/5.3.1/solr-5.3.1.tgz | |
# tar -xvf solr-5.3.1.tgz | |
NOTE: You can download other vresions of solr from here: https://archive.apache.org/dist/lucene/solr/ | |
Goto solr-5.3.1 directory: | |
# cd solr-5.3.1 | |
Start solr server: | |
# ./bin/solr start | |
Check status of solr server: | |
# ./bin/solr status | |
Create a solr core: | |
# ./bin/solr create -c wlslog | |
Next, we need to configure Apache Solr. The fields in the MongoDB documents to be indexed are specified in the schema.xml configuration file. Open the schema.xml in a vi editor. | |
# vi /solr/solr-5.3.1/server/solr/wlslog/conf/schema.xml | |
Add fields time_stamp,category,type,servername,code, and msg. Mongo Connector also stores the metadata associated with the each MongoDB document it indexes in fields ns and _ts. Also add the ns and _ts fields to the schema.xml. | |
<?xml version="1.0" encoding="UTF-8" ?> | |
<schema name="example" version="1.5"> | |
<field name="time_stamp" type="string" indexed="true" stored="true" multiValued="false" /> | |
<field name="category" type="string" indexed="true" stored="true" multiValued="false" /> | |
<field name="type" type="string" indexed="true" stored="true" multiValued="false" /> | |
<field name="servername" type="string" indexed="true" stored="true" multiValued="false" /> | |
<field name="code" type="string" indexed="true" stored="true" multiValued="false" /> | |
<field name="msg" type="string" indexed="true" stored="true" multiValued="false" /> | |
<field name="_ts" type="long" indexed="true" stored="true" /> | |
<field name="ns" type="string" indexed="true" stored="true"/> | |
<field name="_version_" type="long" indexed="true" stored="true"/> | |
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> | |
<uniqueKey>id</uniqueKey> | |
<fieldType name="string" class="solr.StrField" sortMissingLast="true" /> | |
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/> | |
</schema> | |
Fields not defined in schema.xml are not indexed. We also need to configure the org.apache.solr.handler.admin.LukeRequestHandler request handler in the solrconfig.xml. Requests to Solr server are routed through the request handler. Open the solrconfig.xml in the vi editor. | |
# vi ./solr-5.3.1/server/solr/wlslog/conf/solrconfig.xml | |
Specify the request handler for the Mongo Connector. | |
<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" /> | |
Also configure the auto commit to true so that Solr auto commits the data from MongoDB after the configured time. | |
<autoCommit> | |
<maxTime>15000</maxTime> | |
<openSearcher>true</openSearcher> | |
</autoCommit> | |
After modifying the schema.xml and solrconfig.xml the Solr server needs to be restarted. | |
# bin/solr restart | |
NOTE: I'll add the complete files schema.xml and solrconfig.xml in other section. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Configure the package management system (yum). | |
Create a /etc/yum.repos.d/mongodb-org-4.2.repo file so that you can install MongoDB directly using yum: | |
[mongodb-org-4.2] | |
name=MongoDB Repository | |
baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/4.2/x86_64/ | |
gpgcheck=1 | |
enabled=1 | |
gpgkey=https://www.mongodb.org/static/pgp/server-4.2.asc | |
2. Install the MongoDB packages: | |
# sudo yum install -y mongodb-org | |
3. Create a directory for dbpath: | |
# sudo mkdir /data/db | |
4. Start MongoDB | |
# mongod --port 27017 --dbpath /data/db --replSet rs0 --bind_ip localhost,BigData | |
Note: Here BigData is the host name. | |
5. Start mongo shell: | |
# mongo | |
6. MongoDB shell gets started. We need to initiate the replica set. Run the following command to initiate the replica set: | |
# rs.initiate() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In this section we shall create a MongoDB collection to store documents. | |
1. Set the MongoDB database to “solr”, which is created implicitly when a new database is initialized by invoking some command on it. | |
# use solr | |
2. Store MongoDB documents in a collection called “wlslog”. First, find if the collection already exists and does it have documents in it with the following command. | |
# db.wlslog.find() | |
3. If some documents get listed the collection, wlslog exists and has documents in it. Drop the wlslog collection with the following command. | |
# db.wlslog.drop() | |
4. Create the wlslog collection again with the following command. | |
# db.createCollection("wlslog") | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In this section, we shall add some documents to the wlslog collection. Create JSON for three documents doc1, doc2, and doc3. | |
1. doc1 = {"time_stamp":"Apr-8-2014-7:06:16-PM-PDT","category": "Notice","type":"WebLogicServer", | |
"servername": "AdminServer","code":"BEA-000365","msg": "Server state changed to STANDBY" } | |
2. doc2 ={"time_stamp":"Apr-8-2014-7:06:17-PM-PDT","category": "Notice","type":"WebLogicServer", | |
"servername": "AdminServer","code":"BEA-000365","msg": "Server state changed to STARTING" } | |
3. doc3 ={"time_stamp":"Apr-8-2014-7:06:18-PM-PDT","category": "Notice","type":"WebLogicServer", | |
"servername": "AdminServer","code":"BEA-000360","msg": "Server started in RUNNING mode" } | |
Three documents get created. | |
4. Add the three documents to MongoDB with the following command: | |
# db.wlslog.insert([doc1,doc2,doc3]) | |
The three documents get added. The nInserted value of 3 indicates that 3 documents have been added. | |
5. Query the wlslog collection with the find() method. | |
# db.wlslog.find() | |
The three documents get listed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. To install the Mongo Connector run the following command. | |
# pip install mongo-connector | |
2. New in mongo-connector 2.5.0, to install mongo-connector with the solr-doc-manager run: | |
# pip install 'mongo-connector[solr]' | |
3. Run monog-connector: | |
# mongo-connector --unique-key=id -n solr.wlslog -m localhost:27017 -t http://localhost:8983/solr/wlslog -d solr_doc_manager |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment