Skip to content

Instantly share code, notes, and snippets.

@cudevmaxwell
Last active July 31, 2020 05:11
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cudevmaxwell/549f15f40e17acaba106 to your computer and use it in GitHub Desktop.
Save cudevmaxwell/549f15f40e17acaba106 to your computer and use it in GitHub Desktop.
Striking Out with Islandora 2.x, Fedora 4.x, and Apache Camel

Striking Out with Islandora 2.x, Fedora 4.x, and Apache Camel

Kevin Bowrin, 2015-02-27

A few days ago, Nick Ruest and the Islandora Foundation made available the technical documentation for the upcoming version of Islandora. The Islandora Foundation should be commended for their transparency and community building efforts. Even in this early stage of Islandora 2.x, the documentation has a great introduction to Islandora, the goals of the project, the planned architecture, and installation instructions for their Vagrant development box.

The Technical Design for the next version of Islandora reminds me of something Mike Giarlo said, when I was waxing on about my dream Digital Library / Repository software...

"All you need is a service bus!"

and quite right he was. Aaron Coburn and others from Duraspace have worked on an Apache Camel integration for Fedora, which earlier in January graduated from the labs and is ready to be used.

My Java is extremely rusty, but I'm certainly willing to relearn if it means the library community can start working more closely together on common problems, regardless of their frontend stack of choice.

So, let's do a code strike. As I learned from Harry Percival's excellent book Test-Driven Development with Python, "sometimes you just want to hack something together without any tests at all, just to see if it works, to learn it or get a feel for it. That’s absolutely fine." In this case, let's hack something together with the Services module for Drupal, Camel, and Fedora 4, just to get a feel for what development in Islandora might be like in the future.

I'm going to try and replicate an experiment I did with Go, Hugo, and the Message Broker built into Fedora 4. It updated a webpage with the contents of the dc:title property of a Fedora resource. Very simple, but a good starting point.

I'm using Windows, and I've already downloaded Virtualbox, Vagrant, and Github for Windows. I've cloned the Islandora Labs' Islandora repo: https://github.com/Islandora-Labs/islandora and I'm ready to go.

cd install
vagrant up

and wait until the provisioning is complete. The only additional compontents we need to install are the UUID Drupal module and the Entity API Drupal module. I'll use the UUID to pair the Fedora resources with the Drupal nodes.

SSH into the development machine using

vagrant ssh

or by SSHing into localhost:2222. (More details in the documentation.)

Then, download and enable the UUID module using Drush.

cd /var/www/html/drupal
sudo drush dl uuid --dev
sudo drush dl entity
sudo chown -R www-data:www-data sites/all/modules
sudo drush en -y uuid entity uuid_services

What are the requirements for our experiment?

  • When I create a new resource in Fedora, I'd like a new node to be created in Drupal. It should be a basic page, with it's Title field set to that resource's dc:title property.
  • If I update that resource's title in Fedora, I'd like the Basic Page's title to be updated as well.

For this experiment, I won't worry about updates in the other direction, from Drupal to Fedora.

Let's start with the Services module. The Islandora 2.x module, already provisioned in the development server, uses the Services module API to create an endpoint that we can use.

The uuid_services module we enabled earlier changed how Services works slightly. Instead of creating a new node by running this CURL command, which POSTs a JSON payload to the endpoint:

curl -H 'Content-Type: application/json' -d '{"title":"A title", "type":"page"}' http://localhost/islandora/node

we use this command:

(export UUID=$(uuid); curl -H 'Content-Type: application/json' -X PUT -d '{"title":"A title.", "type":"page", "uuid":"'"$UUID"'"}' http://localhost/islandora/node/$UUID)

This command sets a variable UUID to a newly generated uuid, then PUTs to the endpoint a JSON payload which contains the title, the node's content type, and the UUID. It uses the UUID as the identifier.

Running this command, I had this response:

`

<title>A title.</title> page b1d86244-be8e-11e4-99b3-0800279dae8c ... `

We can GET that node, using the UUID from the response (yours will be different, if you're following along).

curl -s http://localhost/islandora/node/b1d86244-be8e-11e4-99b3-0800279dae8c.json | python -m json.tool
{
    ....
    "changed": "1425048080",
    "created": "1425048080",
    "revision_timestamp": "1425048080",
    "title": "A title.",
    "uuid": "b1d86244-be8e-11e4-99b3-0800279dae8c",
    "vid": "11",
    "vuuid": "4bd6d1bb-25e6-4e62-a0fc-71ce24003dbb"
}

And we can update that node. Notice the change in version ID, version UUID, and title:

(export UUID="b1d86244-be8e-11e4-99b3-0800279dae8c" ; curl -v -H 'Content-Type: application/json' -X PUT -d '{"title":"A title - created at '"$(date)"'", "type":"page", "uuid":"'"$UUID"'"}' http://localhost/islandora/node/$UUID)

curl -s http://localhost/islandora/node/b1d86244-be8e-11e4-99b3-0800279dae8c.json | python -m json.tool
{
    ...
    "changed": "1425049158",   
    "created": "1425049158",    
    "revision_timestamp": "1425049158",
    "title": "A title - created at Fri Feb 27 14:59:18 UTC 2015",
    "uuid": "b1d86244-be8e-11e4-99b3-0800279dae8c",
    "vid": "12",
    "vuuid": "016becf0-8552-4592-9f74-b6656c62eb34"
}

Alright! We can create and update Drupal nodes based on their UUID.

The second part is building a Camel powered webapp which will mimic what we had CURL doing above. For this experiment, I'm just going to hack the files in Islandora Sync, which are simple right now. Sync will probably change quite a lot in the next few weeks or months, so your mileage with these instructions will vary.

The repository is automatically pulled down in our development server. The two files we'll want to edit are the JSON Transformer processor and the Node Create route.

wget -O /home/vagrant/islandora/camel/sync/src/main/java/ca/islandora/sync/processors/DrupalNodeCreateJsonTransform.java https://gist.githubusercontent.com/cudevmaxwell/254e4d387a15934d97e5/raw/b5f8eec6bb65dc5c80b14863d6cd5d6e58f59e2a/DrupalNodeCreateJsonTransform.java
wget -O  /home/vagrant/islandora/camel/sync/src/main/java/ca/islandora/sync/routes/DrupalNodeCreate.java https://gist.githubusercontent.com/cudevmaxwell/254e4d387a15934d97e5/raw/d8b84d4cb577161b119c4541009aba6f69b1d38d/DrupalNodeCreate.java
cd /home/vagrant/islandora/camel/sync
mvn install
rm /var/lib/tomcat7/webapps/sync.war 
# Wait until the webapp has stopped. (tail -f /var/log/tomcat7/catalina.out)
cp target/sync-0.0-SNAPSHOT.war /var/lib/tomcat7/webapps/sync.war
# Wait until the webapp has started. (tail -f /var/log/tomcat7/catalina.out)

You can see the contents of those files here: https://gist.github.com/cudevmaxwell/254e4d387a15934d97e5

Here's a gif of the result:

An animation of the Drupal and Fedora websites, showing that updating the Fedora resource updates the Drupal resource.

So, what's going on here?

When Fedora creates a new resource, it sends a message on a broker, which we can listen for and act on.

from("activemq:topic:fedora")

We give this route a routeID.

.routeId("fedoraInAdded")

We filter the incoming messages to find the ones we care about, in this case "NODE_ADDED" messages

.filter(header(JmsHeaders.EVENT_TYPE).contains(RdfNamespaces.REPOSITORY + "NODE_ADDED"))

The message tells us a resource was added. We need to get the resource from Fedora to load it into Drupal.

.to("fcrepo:localhost:8080/fcrepo/rest")

We use the fancy new fcrepo: Camel component here. The messages received from ActiveMQ have a header called "org.fcrepo.jms.identifier". Without the custom Camel component, we would have to explicitly set the identifier for the matched resource from the filter:, and then make a request to the REST API, as we would if we were using CURL or some other tool. In my earlier experiment with Go, that was done like this:

node := msg.Header.Get("org.fcrepo.jms.identifier")
req, _ := http.NewRequest("GET", "http://localhost:8080/fcrepo/rest"+node, nil)

That's already done for us: "the path will be populated by the org.fcrepo.jms.identifier header and appended to the endpoint URI."

The response of the request to Fedora returns XML encoded RDF. It is sent to our processor, DrupalNodeCreateJsonTransform.

.process(new DrupalNodeCreateJsonTransform())

What does the processor do?

//Create a JSON document for output to Drupal.
JSONObject outBody = new JSONObject();
//Tell Drupal this is an article.
outBody.put("type", "article");

//XML Boilerplate skipped here.
     
//Get the UUID. Every Fedora Resource should have at least one.
String uuid = inDocument.getElementsByTagNameNS("http://fedora.info/definitions/v4/repository#", "uuid").item(0).getTextContent();

//Get the dc:title. If not set, use the UUID.
String title = uuid;
if (inDocument.getElementsByTagNameNS("http://purl.org/dc/elements/1.1/", "title").getLength() > 0) {
        title = inDocument.getElementsByTagNameNS("http://purl.org/dc/elements/1.1/", "title").item(0).getTextContent();
}

outBody.put("title", title);
outBody.put("uuid", uuid);

Message outMessage = exchange.getOut();
// We want to use the PUT HTTP method.
outMessage.setHeader(Exchange.HTTP_METHOD, PUT);
// We'll need the UUID to build the correct address for the put later
// so store it in a header. 
outMessage.setHeader("uuid", uuid);
outMessage.setHeader(Exchange.CONTENT_TYPE, "application/json");
outMessage.setBody(outBody.toJSONString());

It sets the out message of our processor to be a JSON string. It contains the title and UUID from the new Fedora resource, and the type of node to create. We also set the Content-Type header to "application/json", and we set the HTTP method to PUT. Let's take a look at that CURL command again:

curl -H 'Content-Type: application/json' -X PUT -d '{"title":"A title", "type":"page", "uuid":"'"$UUID"'"}' http://localhost/islandora/node/$UUID

Everything except the last part of the command is done. We just need to set the correct URI, and send the request.

.setHeader(Exchange.HTTP_URI, simple("http://localhost/islandora/node/${header.uuid}"))
.to("http4:localhost/unusued")
.log("RESPONSE: ${headers} / ${body}")
.to("mock:result");

And we're done! Setting the URI with a dynamic value from the headers was a bit tricky, but I'm probably just doing something wrong.


I really like this approach, and I think it's the right choice for the Islandora project. Originally, I was leaning toward using Drupal 8's abstracted Entity Storage to allow Fedora 4 to act as a native Drupal backend. With Camel, we can worry less about those tight integrations and start sharing more of our work with the larger Fedora and library community. For example, we could write Camel processors to replicate the Archivematica format policy registry and conversion microservices, instead of writing seperate "preservation modules" for Omeka, Drupal, Hydra, etc.

Another interesting property of Camel is that sending and receiving messages from external web services seems very easy. One could write microservices in their language of choice, and only use Camel for message routing. Locally developed microservices or vendor provided APIs can be used without too much trouble, along with the processors and tools included in Islandora proper.

Kudos again to the Islandora Foundation and the Islandora devs. I can't wait to see what this year has in store.

@DiegoPino
Copy link

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment