Instantly share code, notes, and snippets.

Embed
What would you like to do?
RDF for Relational Database heads

RDF, Fedora, and ActiveFedora for relational heads

Systems Developer at Penn State hjc14@psu.edu / @hectorjcorrea

Slides at: http://tinyurl.com/rdf4rdbms

Notes on my current understanding of how RDF compares to the components of a traditional application that uses Ruby on Rails with a relational database backend (MySQL, PostgreSQL, Oracle.)

RDBMS concepts

Tables/columns/relationships

Book table:

name    type
-----   -------
id      integer
title   string
isbn    string

Page table:

name    type
-----   -------
id      integer
book_id integer    (foreign key to books)
number  integer
text    string

And we use SQL to query and update:

INSERT INTO books(id, title, isbn) 
    VALUE (1, "Lord of the Rings", "123-456-789")

INSERT INTO pages(id, book_id, number, text) 
    VALUE (1, 1, 1, 'hi frodo')

INSERT INTO pages(id, book_id, number, text) 
    VALUE (2, 1, 2, 'dude is that mordor?')

SELECT * FROM books WHERE title = 'lord of the rings'

SELECT * FROM pages WHERE book_id = 1

Rails / ActiveRecord

Assuming you have the tables described above you can define a couple of ActiveRecord classes in Rails to represent each table:

class Book < ActiveRecord::Base
  has_many :pages
end

class Page < ActiveRecord::Base
  belongs_to :book
end

Notice the has_many and belongs_to to indicate the one-to-many relationship. Fields are not indicated on the class but they are automatically picked from the tables at runtime.

You can access the database via ActiveRecord objects:

# Create a new Book object
b = Book.new
b.title = "Lord of the Rings"
b.isbn = "123-456-789"

# Create a couple of page objects and 
# add them to the book.pages collection.
p1 = Page.new(number: 1, text: "hi frodo")
b.pages < p1

p2 = Page.new(number: 2, text: "dude, is that mordor?")
b.pages < p2

# Save the book object 
# (will save both book and pages)
b.save

# Fetch the saved record
b = Book.find(1)
puts b.title              # => "Lord of the Rings"
puts b.pages[0].text      # => "hi frodo"

Resource Description Framework (RDF)

RDF is a W3C standard for data interchange on the Web (See http://www.w3.org/RDF)

There are no tables or columns in RDF. There are triples and graphs.

Triple is a three part statement that includes a subject, a predicate, and an object:

book1    title       "Lord of the Rings"

There are many ways to represent RDF including N-Triples, Turtle, and RDF/XML. The examples below use N-Triples. Here is how the previous triple would look like in N-Triples.

<book1> <title>     "Lord of the Rings"

An RDF graph is a collection of triples:

<book1> <title>   "Lord of the Rings"
<book1> <isbn>    "123-456-789"
<page1> <number>  "1"
<page1> <text>    "hi frodo"
<page2> <number>  "2"
<page2> <text>    "dude, is that mordor?"
<book1> <page>    <page1>
<book1> <page>    <page2>

Subjects and predicates in a triple are URIs. Objects can be URIs (to reference another object) or literals.

<http://libraries.psu.edu/catalog/book1>        <http://abc.org/1.1/title>   "Lord of the Rings"
<http://libraries.psu.edu/catalog/book1/page1>  <http://xyz.org/ns#/number>  "1"
<http://libraries.psu.edu/catalog/book1/page1>  <http://xyz.org/ns#/text>    "hi frodo"
<http://libraries.psu.edu/catalog/book1/page2>  <http://xyz.org/ns#/number>  "2"
<http://libraries.psu.edu/catalog/book1/page2>  <http://xyz.org/ns#/text>    "dude, is that mordor?"
<http://libraries.psu.edu/catalog/book1>        <http://xyz.org/ns#/pages>   <http://libraries.psu.edu/catalog/book1/page1>    
<http://libraries.psu.edu/catalog/book1>        <http://xyz.org/ns#/pages>   <http://libraries.psu.edu/catalog/book1/page2>    

It is common to use standard predicates defined by other organizations so that institutions can share information knowing that a specific predicate means the same thing across datasets. For example the following two predicates represent different things even if both are called "title"

http://purl.org/dc/elements/1.1/title
http://scholarsphere.psu.edu/ns#/title

A triple is roughly the equivalent of a cell (row/column) in a relational database (See http://workingontologist.org, page 31)

Fedora 4

Fedora 4 is a document repository suited for large objects (e.g, text, images, audio and video files) and natively supports RDF to store metadata about these objects.

Fedora stands for Flexible Extensible Digital Object Repository Architecture. See http://www.fedora-commons.org/about

Fedora provides an HTTP API to create and update objects.

For example, this request will create a new object in Fedora:

HTTP POST http://localhost:8983/fedora/rest/book1

...and something like this will add a couple of "fields" (RDF statements) to this new object:

HTTP POST http://localhost:8983/fedora/rest/book1
content-body
    <> <http://whatever/title> "Lord of the Rings"
    <> <http://whatever/isbn> "978-0618640157"

Rails/ActiveFedora

ActiveFedora is a Ruby gem that does for Fedora what ActiveRecord does for relational databases. This means that we can define a class as follow:

class BookObject < ActiveFedora::Base
  property :title, predicate: ::RDF::DC.title
  property :isbn, predicate: ::RDF::URI.new('http://libraries.psu.edu/metadata/isbn')
end

...and then create and fetch data using code as follows:

# Create an object...
b = BookObject.new( title: ["Lord of the Rings"], isbn: ["123-456-789"] )
b.save
puts b.id       # => "123"

# ...and fetch it
b = BookObject.find("123")
puts b.title     # => "Lord of the Rings"
puts b.isbn      # => "123-456-789"

ActiveFedora automatically adds a property hasModel to the Fedora object to represent what Ruby class this object should be serialized into when it's fetched. That's how b.title and b.isbn were populated in the previous example.

Notice that we do specify the fields (predicates) in our ActiveFedora models. This is because there is no table with a specific structure in Fedora where Rails could pick them up as ActiveRecord does for relational databases.

You can also define relationships like the one between Books and Pages.

Behind the scenes ActiveFedora uses ActiveTriples to handle triples and LDP to handle the HTTP communication to Fedora.

Here are several good basic examples on ActiveFedora by Esme: https://github.com/escowles/testdrive

The end

.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment