Skip to content

Instantly share code, notes, and snippets.

@PaulMougel
Last active June 21, 2016 10:19
Show Gist options
  • Save PaulMougel/a89de604ddb2d2ecd8ea to your computer and use it in GitHub Desktop.
Save PaulMougel/a89de604ddb2d2ecd8ea to your computer and use it in GitHub Desktop.
Attachment upload & indexation in Elasticsearch
# https://github.com/elasticsearch/elasticsearch-mapper-attachments
plugin install elasticsearch/elasticsearch-mapper-attachments/2.4.2
curl -X PUT http://localhost:9200/test
# Note that here we declare that the attachement is stored in the field "my_attachment"
curl -X PUT http://localhost:9200/test/pdf/_mapping -d '{"pdf": {"properties": {"my_attachment": {"type": "attachment"}}}}'
# The file has to fit in a JSON doc: base64 and end of lines encoded.
# http://stackoverflow.com/a/20046414/2137601
coded=`cat my_file.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
# As specified in the schema, we upload the file's content in the "my_attachment" field
# (we can also add other fields)
json="{\"my_attachment\":\"${coded}\"}"
rm -f json.file && echo "$json" > json.file
curl -X POST "localhost:9200/test/pdf" -d @json.file
$ curl -X GET http://localhost:9200/test/pdf/_search
$ curl -X GET http://localhost:9200/test/pdf/_search?q=word
@Tamanna-Sharma
Copy link

Hi,
I am trying to index a pdf into ES using the above given code.
But, in 1-file-upload.sh, at line 8, i am getting following error:

{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Failed to decode VALUE_STRING as base64 (MIME-NO-LINEFEEDS): Illegal white space character (code 0x20) as character #4 of 4-char base64 unit: can only used between units\n at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput@30431766; line: 1, column: 23]"}},"status":400}

In logs, following error is given:
[2016-06-21 10:13:41,176][INFO ][rest.suppressed ] /test/pdf Params: {index=test, type=pdf} MapperParsingException[failed to parse]; nested: JsonParseException[Failed to decode VALUE_STRING as base64 (MIME-NO-LINEFEEDS): Illegal white space character (code 0x20) as character #4 of 4-char base64 unit: can only used between units at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput@30431766; line: 1, column: 23]]; at org.elasticsearch.index.mapper.DocumentParser.innerParseDocument(DocumentParser.java:159) at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:79) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:304) at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:517) at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:508) at org.elasticsearch.action.support.replication.TransportReplicationAction.prepareIndexOperationOnPrimary(TransportReplicationAction.java:1053) at org.elasticsearch.action.support.replication.TransportReplicationAction.executeIndexRequestOnPrimary(TransportReplicationAction.java:1061) at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:170) at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.performOnPrimary(TransportReplicationAction.java:579) at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1.doRun(TransportReplicationAction.java:452) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.fasterxml.jackson.core.JsonParseException: Failed to decode VALUE_STRING as base64 (MIME-NO-LINEFEEDS): Illegal white space character (code 0x20) as character #4 of 4-char base64 unit: can only used between units at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput@30431766; line: 1, column: 23] at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1581) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getBinaryValue(UTF8StreamJsonParser.java:486) at com.fasterxml.jackson.core.JsonParser.getBinaryValue(JsonParser.java:1225) at org.elasticsearch.common.xcontent.json.JsonXContentParser.binaryValue(JsonXContentParser.java:190) at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:441) at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:314) at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:441) at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:267) at org.elasticsearch.index.mapper.DocumentParser.innerParseDocument(DocumentParser.java:127) ... 13 more

I am using ES 2.1.2 and Mapper-Attachment 3.1.2.
Please let me know what could be done to remove this error..

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment