Skip to content

Instantly share code, notes, and snippets.

@otuoma
Created November 27, 2018 08:29
Show Gist options
  • Save otuoma/4b6fb6b22e5293c0a75b9258ef8e562c to your computer and use it in GitHub Desktop.
Save otuoma/4b6fb6b22e5293c0a75b9258ef8e562c to your computer and use it in GitHub Desktop.
Enable OAI and import records in DSpace
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# DSpace interoperability with OAI\n",
"\n",
"DSpace can act both as a data provider or consumer of data from other repositories. Several technologies are built-in that enable dspace to act as either a data provider or consumer.\n",
"1. **SWORD** (Simple Web-service Offering Repository Deposit) is enabled by ensuring that the sword webapp is available in [dspace]/webapps directory. It can be used to remotely deposit content into other repositories.\n",
"2. **REST** (Representational State Transfer) is a programming interface (API) that allows developers to create other applications that can create, read, update and delete objects (communities, collections and documents) in dspace.\n",
"3. **OAI-PMH** (Open Archives Initiative - Protocol for Metadata Harvesting) is a widely used technology for metadata exchange between digital repositories\n",
"\n",
"**A Data Provider** is a repository that has exposed its metadata for harvesting via OAI protocol.\n",
"\n",
"**A Service Provider** is a platform that harvests metadata from a data provider and makes it available for consumption by users usually via a searchable web interface.\n",
"\n",
"DSpace is fully compliant with OAI-PMH. It acts as a data provider and also as a service provider at the same time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Enabling OAI\n",
"OAI is enabled in DSpace when the oai webapp is copied to the [webapps]/oai directory. However, your existing records must be indexed for them to be discoverable by external harvesters.\n",
"\n",
"Ensure that solr.server and other relevant settings in [dspace]/config/local.cfg file are correctly configured:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"solr.server = http://localhost/solr"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dspace.hostname = my-university.ac.ke"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dspace.baseUrl = http://my-university.ac.ke"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dspace.name = My University Name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then in [dspace]/config/modules/oai.cfg ensure the following are correctly set"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"oai.url = http://my-university.ac.ke/oai"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"oai.solr.url=http://localhost/solr/oai"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"oai.identifier.prefix = my-university.ac.ke"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Confirm the **description.xml** file has the right correct values in [dspace]/config/crosswalks/oai/description.xml\n",
"\n",
"The **Repository identifier** should be the same value as the **hostname** in your repository URL for some harvestors to correctly harvest your metadata."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nano [dspace]/config/crosswalks/oai/description.xml"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To test if your oai is correctly set, go to this link on your repository:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"http://[repository-URL]/oai/request?verb=Identify"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You should see output similar to what's on this page http://erepository.mku.ac.ke/oai/request?verb=Identify"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you make change values in [dspace]/config/local.cfg and still don't see the changes, delete the file in [dspace]/var/oai/requests/ directory to clear the cache."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sudo rm ./var/oai/requests/cmVxdWVzdElkZW50aWZ5bnVsbG51bGxudWxsbnVsbG51bGxudWxs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import Records into the Index\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Existing records must be imported into the index before they can be reached by external harvesters. The dspace executable in [dspace]/bin/dspace is used to import records into the oai index."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sudo [dspace]/bin/dspace oai import -v -o"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It has the following switch options:\n",
"1. **-v** - Verbose i.e print out progress messages\n",
"2. **-o** - Optimize the index after importing\n",
"3. **-c** - Clear the existing index and import everything afresh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After executing this command, records will now be listed when you access the following URL on your repository:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"http://[my-university.ac.ke]/oai/request?verb=ListRecords&metadataPrefix=oai_dc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Updating the index\n",
"The index needs to be updated each time new records are added to the repository. This can be achieved using a scheduled cronjob. The cronjob should be executed as the user who has permissions to write to the index usually the tomcat-user:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sudo crontab -e -u tomcat8"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then add to the bottom of the file this code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"0 0 * * * [dspace]/bin/dspace oai import -o > /dev/null"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This executes the program everyday at midnight.\n",
"\n",
"We have excluded the -v switch because it is an automated process and there's no need to print the output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Bash",
"language": "bash",
"name": "bash"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment