Skip to content

Instantly share code, notes, and snippets.

@alanorth
Last active December 7, 2022 15:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alanorth/630316c1b20572f192da9b4e88f39255 to your computer and use it in GitHub Desktop.
Save alanorth/630316c1b20572f192da9b4e88f39255 to your computer and use it in GitHub Desktop.
Example of using the DSpace 6.x REST API's find-by-metadata-field endpoint.

Step 1

POST a JSON object with your search terms:

$ curl -s -f -H "Content-Type: application/json" -X POST \
    "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field?expand=bitstreams" \
    -d '{"key":"dcterms.subject", "value":"climate variability","language": "en_US"}' | \
    python -m json.tool
...

A few notes:

  • You must know the exact name of the metadata field, but this should be easy to find.
  • The language is generally always en_US, which is an internal DSpace thing where theoretically metadata can be translated, but we rarely do that
  • The expand option is a feature of the REST API that allows you to include more information about each result. In this case I expand only the bitstreams (the less you expand, the faster and smaller the response). Include multiple expands with ?expand=metadata,bitstreams.

Step 2

Iterate over the item responses and find the retrieveLink for each item's bitstream:

      {
         "bundleName" : "THUMBNAIL",
         "checkSum" : {
            "checkSumAlgorithm" : "MD5",
            "value" : "a619e57f0824c5c023a65e0425b9757d"
         },
         "description" : "IM Thumbnail",
         "expand" : [
            "parent",
            "policies",
            "all"
         ],
         "format" : "JPEG",
         "handle" : null,
         "link" : "/rest/bitstreams/8cb37eee-cdc7-48be-9123-4b78777bf12d",
         "mimeType" : "image/jpeg",
         "name" : "OutcomesCaseStudySummary-CCAFS-P269-OICS2041.pdf.jpg",
         "parentObject" : null,
         "policies" : null,
         "retrieveLink" : "/rest/bitstreams/8cb37eee-cdc7-48be-9123-4b78777bf12d/retrieve",
         "sequenceId" : 4,
         "sizeBytes" : 23739,
         "type" : "bitstream",
         "uuid" : "8cb37eee-cdc7-48be-9123-4b78777bf12d"
      },

Step 3

Profit.

OutcomesCaseStudySummary-CCAFS-P269-OICS2041 pdf

Links

@marieALaporte
Copy link

@alanorth for my use case, I would need to pass an array of keywords (dcterms.subject). I am using the code below, but I am getting an error. Is this even possible, to retrieve entries using several metadata fields?

endpoint = 'https://dspacetest.cgiar.org'

headers = {
    'Content-Type': 'application/json',
}

json_data = [{
    'key': 'dcterms.subject',
    'value': 'climate change',
    'language': 'en_US',
    },{
    'key': 'dcterms.subject',
    'value': 'agriculture',
    'language': 'en_US'
    },{
    'key': 'dcterms.subject',
    'value': 'food security',
    'language': 'en_US',
}]

response = requests.post(endpoint+'/rest/items/find-by-metadata-field?expand=bitstreams', headers=headers, json=json_data)

@alanorth
Copy link
Author

alanorth commented Dec 7, 2022

Ah yes, you can only search for one term, and you can't use wildcards. :(

@marieALaporte
Copy link

That might be a real blocker for what we are trying to do.
This type of search is possible through the advanced filters feature on the interface. How hard would it be to have that available through the API, you think? There might be good reasons that this is not available, but for the use cases at CGIAR, where we use a repository and build tools on it, that can come really handy.

@alanorth
Copy link
Author

alanorth commented Dec 7, 2022

Hmm yeah that's not possible in the DSpace 6.x REST API. You'll have to search each term separately and keep the results in a list, then iterate over each one to get the bitstreams for each handle.

This type of search is possible through the advanced filters feature on the interface.

Yes, true. On that note, there's a feature in DSpace called OpenSearch that is kinda using the advanced filters. You can mimic the advanced search using a query like (pasting in a code block to preserve the spaces):

https://dspacetest.cgiar.org/open-search/discover?query=subject:"climate change" AND subject:"agriculture" AND subject:"food security"&rpp=1

That will give you the results you want, but you will still have to iterate over the results and then go hit the REST API for each item to see which bitstreams it has.

Oh, also, were you expecting those keywords to be ANDed or ORed? In the query above you can probably do more complex query logic, but your mileage may vary...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment