Skip to content

Instantly share code, notes, and snippets.

@MahbbRah
Last active September 2, 2021 11:39
Show Gist options
  • Save MahbbRah/39dd7b2fc0bf184dd5241de5fa87e718 to your computer and use it in GitHub Desktop.
Save MahbbRah/39dd7b2fc0bf184dd5241de5fa87e718 to your computer and use it in GitHub Desktop.
Workaround and starting guide with ElasticSearch | Bulk Upload Data to Elasticsearch ES | Fix issues Y& enable Pagination | Add datetime date field to ES collection
In this gist I'm gonna write personalized walk-through that I have been completed with `ES` and the issues while working with it.
I Bought a `EC2` instance from AWS, the instance was `t2_large` Initially I bought 8GBs memory then it was too short
then I extended with another 8GB.
As a beginner it was quite tough for me getting familiar with it.
I need ES stack to do full-text search with speed, with one of our existing a real big database.
**Note:** On older ES setup requires you to install Java and it's depdency for ES to run. but in newer version it comes as completely packed.
I mentioned this because i've wasted bunch of time reading third party old tutorials to install ES on my machine. it was just only matter of
few commands.
So, first step was to get the data then import to ES. the problem was I could only get the data as CSV which then I tried to import
through Kibana. Kibana could import it successfully but things were breaking and was not good enough.
So, What I did is converted them into `JSON` structure. then I tried to upload that JSON data through Kibana. Again it didn't work.
then after whole lot of searching I realized that I need to convert this into `NDJSON` format. with additional fields for `index` on ES
the steps was involved for me are below
#Convert data from CSV to JSON
This task was pretty easy though. I used nodejs with a library `csv-parser` to read CSV file and one more thing to remember that
If you ever want to load big files (CSV,JSON) then load as stream, `fs` lib has a built in method for that. This will you'll
be able to load millions of data in no time. So yeah this way I loaded my CSV and converted to JSON. Or maybe you can use any third
party online service to convert CSV into JSON.
#Convert JSON to NDJSON and format the NDJSON file for ES
This is super easy task but if you're just starting new with ES stack and these thing then you may end up on loop
We used a tools called `jq` it works on windows,linux. This tool is really powerful and intelligent at the same time.
With one lines of code you'll get what you need the command is:
`jq -c -r ".[]" input.json | while read line; do echo '{"index": {"_index":"index_name","_type":"document_name"}}'; echo $line; done > output.json`
`-c compact instead of pretty-printed output`
`-r output raw strings, not JSON texts;`
`.[] get the data from input to output`
Then it reads line by line and appends a new line `{"index": {"_index":"index_name","_type":"document_name"}}` to format the
data for ES and ES requires index name and you may can add _type as well but it's optional.
you need to provide the location of input/output file and it's name. to store the modified results.
That's it~
#Upload NDJSON file to ES through HTTP request
Now it's time to upload the file through HTTP request. unfortunately kibana data uploader still not matured enough so we need to
do this through HTTP request and upload binary file directly through `curl`
here's how
` curl -s -H "Content-Type: application/x-ndjson" -XPOST server_ip:port/_bulk --data-binary @input_file.json`
this is the command to upload binary file directly through `curl`. once the uploading starts you'll see logs. if there are no errors
in log then you can assume data is uploaded successfully.
#Pagination through data
ES doesn't allow you to load more than 10k data through a single request. in order to load more data you need to go through pagination
for pagination, I found there's trick you need to follow in order to get this working.
For pagination you will must need to have a `date` field. Otherwise you won't be able to do the pagination.
I found adding a date data directly from NDJSON file to ES is bit of tricky.
and it tooks my hours. I felt that it was wired.
**few attempts and issues: **
1. I tried to provide ms timestamp through a field and then did the mapping target field with `date` type data. When I imported data
to ES it shows field type `long` which is wired.
2. I tought this could be issue coz I didn't provide a `format` then I again took the attempt and provide a data format and pass
`epoch_millis` still it didn't work I'm not sure what was the error or if it was `long` but didn't work
3. Tried to change date format from ms timestamp to a regular date string, and added the format that reflects with NDJSON data
again what I get is error it was illegal_data error thing and i couldn't insert the data. the error was with date field.
4. Vice versa tried few times, and then someone give hint to not put `_type` on JSON file. So this way I tried and removed the type
field from JSON data and kept only `_index` and set the mapping date field format to `format": "strict_date_optional_time||epoch_millis"`
and also in my JSON field revert back to Milisecond datetime. and then again did the bulk upload and it worked!
again, having a datetime field is very important without it you cannot do pagination according to my experiences. So make sure you have
a datefield in order to do that.
If you want to learn about ES data pagination then you may can follow this: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment