- Doing a keyword query seems to result in an OR-type search - that is, every word is treated as a separate search item with OR operators strung between them, i.e. searching for 2015-Annual-Report actually searches for "2015 OR Annual OR Report", and the results contain anything named with "2015" or "Annual" or "Report".
- Doing a phrase query gives you more boolean control - searching for 2015-Annual-Report is treated as an OR search as above, while "2015-Annual-Report" (with quotations) is treated as an exact search. This works for parts of names as well as full names.
To review what should be indexed by Elasticsearch, you can click on "view raw" for any search result. If you conduct any search (including a wildcard search), the words "view raw" will appear in parentheses next to the AIP or item name. Clicking on this displays the JSON file that underlies the search. Everything in the JSON file should, I think, be searchable - but it's not.
Below there are two JSON files - one for an AIP (aip-search.json) and one for an object within the AIP (landing-zone-search.json; this is the landing-zone image). Searching for the term Peters in each file reveals that the term is present in the AIP-level JSON, but not the object-level JSON. The object-level JSON contains descriptive metadata from a different item in the AIP (the marbles image). As a result, searching for any of the descriptive metadata associated with landing-zone fails, and searching for descriptive metadata for marbles turns up landing-zone as a search result as well.
A further test seems to the suggest that the descriptive metadata from the last row of the CSV is indexed for each item. Running the transfer SampleData/DemoTransfer, the JSON for each object contains the same dmdSec:
"dmdSec": {
"ns0:xmlData_dict_list": [
{
"@xmlns:dc": "http://purl.org/dc/elements/1.1/",
"@xmlns:ns0": "http://www.loc.gov/METS/",
"@xmlns:ns1": "http://purl.org/dc/terms/",
"@xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
"ns1:dublincore_dict_list": [
{
"@xsi:schemaLocation": "http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dcterms.xsd",
"dc:contributor": null,
"dc:coverage": null,
"dc:creator": "Tesseract",
"dc:date": null,
"dc:description": "This image was retrieved from the Tesseract wiki (https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage) to test optical character recognition in Archivematica.",
"dc:format": null,
"dc:identifier": null,
"dc:language_unicode_list": [
"en",
"de"
],
"dc:publisher": null,
"dc:relation": null,
"dc:rights": null,
"dc:source": "Tesseract project",
"dc:subject_NoneType_list": [
null,
null,
null,
null,
null
],
"dc:title": "OCR image",
"dc:type": null
}
]
}
]
}
This is the descriptive metadata for the final item in the metadata.CSV, an OCR image provided by the Tesseract project. Checking the metadata.csv file for the transfer shows that the OCR image is last.
- File UUID: searching for file UUID (with or without File UUID filter) works
- File path: searching for file path, enclosing in quotes and using phrase search (but only with "Any" selected)
- File extension: search works (i.e. .jpg) as keyword or phrase
- AIP UUID: searching for AIP UUID works
- AIP name: search works, but looks for an exact match only
- Identifiers:
- Part of AIC: works fine as long as you use "AIC#" before the number
- AIC identifier: works fine as long as you use "AIC#" before the number
- Transfer metadata: this is metadata from the disk image metadata form; works
- Transfer metadata (other): not sure what this means