Skip to content

Instantly share code, notes, and snippets.

@ssp
Created May 28, 2013 21:08
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save ssp/5666143 to your computer and use it in GitHub Desktop.
Save ssp/5666143 to your computer and use it in GitHub Desktop.
Notes for CouchDB Bootcamp at ELAG 2013.
+ Part I: Getting to know CouchDB (10:00-12:30)
+ Hello
+ me = ssp@SUB
- https://github.com/ssp/
- http://www.sub.uni-goettingen.de/
- who are you?
- what are you doing?
- how do you think CouchDB could be interesting for you?
+ What is CouchDB?
- http://couchdb.apache.org
- http://en.wikipedia.org/wiki/Couchdb
+ History
- 2005 started by Damien Katz (Lotus Notes background)
- »Cluster of unreliable commodity hardware Data Base«
- ~ 2011 development stalled: CouchDB vs CouchBase
- development going on again: v 1.3.0 was released this winter
- perspective: integrate more features like BigCouch, GeoCouch and increase extensibility which is tricky due to Erlang
- other *ouchDBs exist
+ NoSQL
- No SQL
- schema less
+ Difference to SQL Databases
- pros and cons?
- what do we need schemas and structured queries for?
- Document Database
+ JavaScript & JSON
- all data stored as JSON objects
- answers in JSON objects
- URL parameters in JSON format
- everybody familiar with JSON?
+ there are a few quirks about it
- http://jsoneditoronline.org/
- http://jsonlint.com/
+ JavaScript is used for algorithms provided by the user
+ although the database is implemented in Erlang
- note on other implementations?
+ for/on/of the Web
- not just JavaScript
- http as the API
- REST-like
+ Map/Reduce
- for index building and analysis
- Google origin
+ Replication
- supported by CouchDB
- many interesting applications (e.g. mobile sync)
+ A look at Futon
- built into CouchDB at /_utils
- simple browser GUI to look at the database
+ DO IT
- who brought their own CouchDB?
- database list
- look into database
- create a record
+ look at fields:
- _id
- _rev
+ update a record:
- observe _rev
- note history
+ http: the »proper« interface to CouchDB
- REST-like
- uses http verbs GET/PUT/POST/DELETE
+ can use it with curl
- -X POST [set the http verb]
- -D - [include headers in output]
- -H "Content-Type: application/json" [set content type!]
- -d '{"key": "value"}' [data to send to the server; use -d @/file/path to upload file content]
- http://localhost:5984/elag/document-id
- + authentication using --netrc (with a ~/.netrc file) or user:password@ in the URL
- when querying use -H "Accept: application/json" to get the correct Content-Type
+ »Dev HTTP Client« for Chrome
- pretty and powerful alternative if you’re happier in a GUI environment
- haven’t found an equally powerful Firefox extension yet
+ Getting Data in and out of CouchDB
+ create a database
- curl -D - -netrc -X PUT http://localhost:5984/elag-demo
- {"ok":true}
+ get info on the new database
- curl -D - --netrc -X GET http://localhost:5984/elag-demo
- {"db_name":"elag-demo","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":79,"data_size":0,"instance_start_time":"1369572912989533","disk_format_version":6,"committed_update_seq":0}
+ create a JSON object:
- {"type":"event", "name":"ELAG 2013", "location":{"nl":"Gent", "en":"Ghent"} "url":"http://elag2013.org/"}
+ PUT JSON into DB as document »elag2013«
- curl -D- --netrc -X PUT -H "Content-Type: application/json" -d '{"type":"event", "name":"ELAG 2013", "location":{"nl":"Gent", "en":"Ghent"}, "url":"http://elag2013.org/"}' http://localhost:5984/elag-demo/elag2013
- 201 Created
- {"ok":true,"id":"elag2013","rev":"1-7fa340aeba1cf40f631db23f88dabdbb"}
+ retrieve record from DB
- curl -D - --netrc -X GET http://localhost:5984/elag-demo/elag2013
- 200 OK
- {"_id":"elag2013","_rev":"1-7fa340aeba1cf40f631db23f88dabdbb","type":"event","name":"ELAG 2013","location":{"nl":"Gent","en":"Ghent"},"url":"http://elag2013.org/"}
+ Update record in DB by sending a modified version:
- Cannot simply PUT the updated version:
- need to add the _rev of the existing record to overwrite it:
+ delete record from DB:
- DELETE + rev
- curl -D - --netrc -X DELETE 'http://localhost:5984/elag-demo/elag2013?rev=2-3da35de1601afd878196ca045dd1a57d'
- 200 OK
{"ok":true,"id":"elag2013","rev":"3-033f8c1fc5d6edb367d7392268613959"}
+ Re-add the deleted record using PUT
- do not need the revision information because it’s been deleted
- but revision ID increases
- curl -D- --netrc -X PUT -H "Content-Type: application/json" -d '{"type":"event", "name":"ELAG 2013", "location":{"nl":"Gent", "en":"Ghent"}, "url":"http://elag2013.org/"}' http://localhost:5984/elag-demo/elag2013
- 201 Created
{"ok":true,"id":"elag2013","rev":"4-97caea84f066baeecd43381479a90c56"}
+ POST a bunch of documents to the database: _bulk_docs
- need to send a JSON object with an array of documents: {"docs": […]}
- curl -D - --netrc -X POST -H "Content-Type: application/json" -d '{"docs":[{"_id":"elag2012","type":"event","name":"ELAG 2012","location":{"es":"Palma"}}, {"type":"event","name":"ELAG 2014"}, {"_id":"elag2013","type":"event","name":"ELAG 2013","location":{"nl":"Gent","en":"Ghent"},"url":"http://elag2013.org/","hashtag":"elag2013"} ]}' "http://localhost:5984/elag-demo/_bulk_docs"
- 201 Created
- [{"ok":true,"id":"elag2012","rev":"1-6ec55e49a69306f3b0aa999601f7d693"},{"ok":true,"id":"a142db5e52dec15e28589f522d00d1e3","rev":"1-4cd29401a7018b1c77a98704bc7b754f"},{"id":"elag2013","error":"conflict","reason":"Document update conflict."}]
- → granular update feedback, very powerful/flexible
+ GET all documents from a database: _all_docs
+ just metadata:
- curl -D- --netrc -X GET http://localhost:5984/elag-demo/_all_docs
- 200 OK
- {"total_rows":3,"offset":0,"rows":[
{"id":"a142db5e52dec15e28589f522d974580","key":"a142db5e52dec15e28589f522d974580","value":{"rev":"1-4cd29401a7018b1c77a98704bc7b754f"}},
{"id":"elag2012","key":"elag2012","value":{"rev":"1-6ec55e49a69306f3b0aa999601f7d693"}},
{"id":"elag2013","key":"elag2013","value":{"rev":"4-97caea84f066baeecd43381479a90c56"}}
]}
+ also document content: include_docs=true
- curl -D - --netrc -X GET "http://localhost:5984/elag-demo/_all_docs?include_docs=true"
- 200 OK
- {"total_rows":3,"offset":0,"rows":[
{"id":"a142db5e52dec15e28589f522d974580","key":"a142db5e52dec15e28589f522d974580","value":{"rev":"1-4cd29401a7018b1c77a98704bc7b754f"},"doc":{"_id":"a142db5e52dec15e28589f522d974580","_rev":"1-4cd29401a7018b1c77a98704bc7b754f","type":"event","name":"ELAG 2014"}},
{"id":"elag2012","key":"elag2012","value":{"rev":"1-6ec55e49a69306f3b0aa999601f7d693"},"doc":{"_id":"elag2012","_rev":"1-6ec55e49a69306f3b0aa999601f7d693","type":"event","name":"ELAG 2012","location":{"es":"Palma"}}},
{"id":"elag2013","key":"elag2013","value":{"rev":"4-97caea84f066baeecd43381479a90c56"},"doc":{"_id":"elag2013","_rev":"4-97caea84f066baeecd43381479a90c56","type":"event","name":"ELAG 2013","location":{"nl":"Gent","en":"Ghent"},"url":"http://elag2013.org/"}}
]}
+ attachments:
+ attach a file to a document
- need to pass the document revision
- curl -D - --netrc -H "Content-Type: image/png" --data-binary @/Users/ssp/Desktop/ASCII\ Projektor\ Update.png -X PUT "http://localhost:5984/elag-demo/elag2012/image.png?rev=1-6ec55e49a69306f3b0aa999601f7d693"
- 201 Created
- {"ok":true,"id":"elag2012","rev":"2-997115bae4cb556755e18c35b618f536"}
+ alternate way:
- avoid the disconnected rev parameter and send the revision in the If-Match header
- curl -D - --netrc -X PUT -H "Content-Type: image/png" -H "If-Match: 2-997115bae4cb556755e18c35b618f536" "http://localhost:5984/elag-demo/elag2012/image2.png"
- 201 Created
{"ok":true,"id":"elag2012","rev":"3-02c0504a6dade7af4d5eadc07178963d"}
+ attachment is not delivered back in JSON, just a »stub« in the _attachements object
- curl -D - -X GET "http://localhost:5984/elag-demo/elag2012/"
- 200 OK
- {"_id":"elag2012","_rev":"3-02c0504a6dade7af4d5eadc07178963d","type":"event","name":"ELAG 2012","location":{"es":"Palma"},"_attachments":{"image2.png":{"content_type":"image/png","revpos":3,"digest":"md5-1B2M2Y8AsgTpgAmY7PhCfg==","length":0,"stub":true},"image.png":{"content_type":"image/png","revpos":2,"digest":"md5-F05J2fzMe1L4i5GLoTrt8g==","length":833818,"stub":true}}}
- include the stub when updating the document the next time to ensure attachments are preserved without having to re-upload them
- play around a little / everybody comfortable? / can always do this in Futon
+ Example datasets on http://totoro.local/elag/
- bothmer.json
- couch-marc.json
- gnd-smith.json
+ Part II : (13:30-15:15)
+ design documents
- stored as documents _design/NAME (use the NAME elag), e.g. _design/elag
+ views
- stored in design documents’s views array
- called with subpath _view/NAME (use the NAME type) of design document, e.g. /elag-demo/_design/elag/_view/type
- Futon has a simple interface for creating and testing them
+ map
- map function creates an index entry for the document
- e.g. extract a field
- defined by a JavaScript function: function (doc) {}
- use the emit() command to add something to the index
+ use map function
- function(doc) {
emit(doc.type);
}
+ can also pass a second parameter to emit
- it will be available as the document’s value for the key
- available for reduce function, also available when calling views
+ Query views: Map (examples use the documents from bothmer.json)
+ all IDs and values:
- curl -D - -X GET "http://localhost:5984/elag-demo/_design/elag/_view/type"
- 200 OK
- {"total_rows":2077,"offset":0,"rows":[
{"id":"a142db5e52dec15e28589f522d019630","key":"allianzpartner","value":null},
{"id":"a142db5e52dec15e28589f522d01a528","key":"allianzpartner","value":null}, …]}
+ paging:
- skip=X [to skip documents 1-X]
- limit=Y [to only show Y documents]
- curl -D - -X GET 'http://localhost:5984/elag-demo/_design/elag/_view/type?skip=1000&limit=3'
- 200 OK
- {"total_rows":2077,"offset":1000,"rows":[
{"id":"a142db5e52dec15e28589f522d201f6d","key":"namensträger","value":null},
{"id":"a142db5e52dec15e28589f522d202d29","key":"namensträger","value":null},
{"id":"a142db5e52dec15e28589f522d203507","key":"namensträger","value":null}
]}
+ only a specific key:
- key=KEY (as JSON; include strings in double quotes)
- curl -D - -X GET 'http://localhost:5984/elag-demo/_design/elag/_view/type?key="allianzpartner"'
200 OK
- {"total_rows":2077,"offset":0,"rows":[
{"id":"a142db5e52dec15e28589f522d019630","key":"allianzpartner","value":null},
{"id":"a142db5e52dec15e28589f522d01a528","key":"allianzpartner","value":null},
{"id":"a142db5e52dec15e28589f522d01b358","key":"allianzpartner","value":null}, …]}
+ a specific range of keys:
- startkey=
- endkey=
- inclusive_end=true
+ ordering
- descending=true
+ also return the full documents
- include_docs=true
- adds a doc key to each result row object
- curl -D - -X GET 'http://localhost:5984/elag-demo/_design/elag/_view/type?skip=2000&limit=1&include_docs=true'
- 200 OK
- {"total_rows":2077,"offset":2000,"rows":[
{"id":"a142db5e52dec15e28589f522d3f28e9","key":"wappen","value":null,"doc":{"_id":"a142db5e52dec15e28589f522d3f28e9","_rev":"1-f10b16c3c7ad916a49f1039f0dc67aa9","infourl":"http://de.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=url|size|mime&format=json&titles=Datei:Treskow-Wappen.png\n","url":"http://upload.wikimedia.org/wikipedia/de/d/dd/Treskow-Wappen.png","filename":"Treskow-Wappen.png","height":787,"width":600,"mime":"image/png","descriptionurl":"http://de.wikipedia.org/wiki/Datei:Treskow-Wappen.png","type":"wappen","size":58566}}
]}
+ reduce
- reduce function computes a reduced value for all documents with the same mapped key
- e.g. function(key, values) { sum(values); }
+ already built in:
- _count
- _sum
- _stats
+ demo in Futon: use reduce to find out how many documents of each type are in the DB
- set reduce function of _design/elag/_view/type to: _count
+ Query views: Reduce
+ group
- curl -D - -X GET 'http://localhost:5984/elag-demo/_design/elag/_view/type?group=true'
- 200 OK
- {"rows":[
{"key":"allianzpartner","value":503},
{"key":"bothmerbothmer","value":23},
{"key":"event","value":3},
{"key":"namensträger","value":889},
{"key":"wappen","value":659}
]}
+ non-obvious but very powerful:
- we can map to an array and then pick the number of components that should be equal
- function(doc) {
if (doc.von) {
century = doc.von.substr(0,2);
}
emit([doc.type, century]);
}
- curl -D - -X GET 'http://localhost:5984/elag-demo/_design/elag/_view/type?group=true'
- 200 OK
- {"rows":[
{"key":["allianzpartner","11"],"value":2},
{"key":["allianzpartner","12"],"value":1},
{"key":["allianzpartner","13"],"value":13},
{"key":["allianzpartner","14"],"value":30},
{"key":["allianzpartner","15"],"value":82},
{"key":["allianzpartner","16"],"value":93},
{"key":["allianzpartner","17"],"value":108},
{"key":["allianzpartner","18"],"value":171},
{"key":["allianzpartner","19"],"value":3},
{"key":["bothmerbothmer","15"],"value":23},
{"key":["namensträger","11"],"value":4},
{"key":["namensträger","12"],"value":10},
{"key":["namensträger","13"],"value":39},
{"key":["namensträger","14"],"value":52},
{"key":["namensträger","15"],"value":116},
{"key":["namensträger","16"],"value":177},
{"key":["namensträger","17"],"value":246},
{"key":["namensträger","18"],"value":245},
{"key":["wappen","18"],"value":659}
]}
- can get the same result as before using grouplevel=1
- elag curl -D - -X GET 'http://localhost:5984/elag-demo/_design/elag/_view/type?group=true&group_level=1'
- 200 OK
- {"rows":[
{"key":["allianzpartner"],"value":503},
{"key":["bothmerbothmer"],"value":23},
{"key":["namensträger"],"value":889},
{"key":["wappen"],"value":659}
]}
+ example:
- for dates you could write the timestamp
- if you need grouping by day / month / year it may be helpful to map [2013, 5, 28]
+ Map/Reduce Summary
- quite simple
- very powerful
- requires different thinking than SQL, possibly simpler if you are used to things like Solr
- no JOIN
+ Intermission: couchapp
- http://couchapp.org/page/index
- using Futon for editing any of the following bits of design documents can be a bit of a pain thanks to escaping issues
+ the command line tool couchapp can help with that
- you can clone the design document into a file/folder structure on your hard drive
- use your favourite editor
- then push the design document back into CouchDB
- a bit like git, really
- try to install it
- mkdir myCouchDesign; cd myCouchDesign
- couchapp init
- couchapp clone http://localhost:5984/elag-demo
+ set .couchapprc to
- {
"env" : {
"default" : {
"db" : "http://localhost:5984/elag-demo"
}
}
}
- also check out the newer couchapp alternatives listed on the website to see which one suits you the best
+ show interface
- a way to implement different output formats
- in particular for different MIME types
- function (doc, req) {}
+ return object with
+ headers:
- Content-Type:
+ body:
- Text
+ json:
- JavaScript object
+ base64:
- binary data
+ example:
- function (doc, req) {
return {
"headers": {"Content-Type": "text/plain"},
"body": "Hello World, this is Document ID: " + doc._id
};
}
- curl -D - -X GET 'http://localhost:5984/elag-demo/_design/elag/_show/test/elag2013'
- 200 OK
- Content-Type: text/plain
- Hello World, this is Document ID: elag2013
+ includes automatic MIME Type Handling
- register MIME Types with internal name using registerType(INTERNALNAME, MIMETYPES)
- common MIME Types text/json/xml/html/atom/… are predefined
- provide output for them using provides(INTERNALNAME, function() { return {} });
- function(doc, req){
provides('json', function(){
return {'json': doc}
});
provides('html', function(){
var listItem = function (key, value) {
var result = '';
if (key) { result += '<dt>' + key + '</dt><dd>'; }
if (typeof value === 'object') {
result += '<dl>';
for (var i in value) {
if (i[0] !== '_') {
result += listItem(i, value[i]);
}
}
result += '</dl>';
}
else {
result += value;
}
if (key) { result += '</dd>'; }
return result;
}
return '<html><head><title>'.concat(doc._id, '</title>',
'<style type="text/css">dt { font-weight:bold; }</style></head>',
'<body><h1>Document »' + doc._id + '«</h1>',
listItem(undefined, doc),
'</body><html>');
});
registerType('text-json', 'text/json')
provides('text-json', function(){
return toJSON(doc);
});
}
- curl -D - -H "Accept: text/html" -X GET 'http://localhost:5984/elag-demo/_design/elag/_show/content-types/elag2013'
- 200 OK
Vary: Accept
Content-Type: text/html; charset=utf-8
- <html><head><title>elag2013</title><style type="text/css">dt { font-weight:bold; }</style></head><body><h1>Document »elag2013«</h1><dl><dt>type</dt><dd>event</dd><dt>name</dt><dd>ELAG 2013</dd><dt>location</dt><dd><dl><dt>nl</dt><dd>Gent</dd><dt>en</dt><dd>Ghent</dd></dl></dd><dt>url</dt><dd>http://elag2013.org/</dd><dt>hashtag</dt><dd>elag2013</dd></dl></body><html>
+ Idea: Can we use this for a LOD server
- configure the redirection headers as needed
+ create some triples for a MARC record?
- e.g. the ones in couch-marc
+ list interface
- show / document :: list/view
- can also use provides() / registerType()
+ call as /DB/_design/DESIGN/_list/LIST/VIEW, e.g.
- http://localhost:5984/gnd-smith/_design/elag/_list/json/id?include_docs=true
- useful for a quick CSV export
+ can send output in bits
- start({'headers': {})
- send() repeatable
+ Applications:
- simple CSV export
- dot file output for GraphViz
- KML output
- TeX output?
+ validation
- validate_doc_update
+ each design document can have one
- all of them are applied
- modularity
- e.g.: enforce that each document has a type field:
- function(newDoc, oldDoc, userCtx, secObj) {
if (!newDoc._deleted) {
if (!newDoc.type) {
throw({"forbidden": "documents need to have a type"});
}
}
}
- return when accepting document
- throw({'forbidden': 'Explanation'}) when not accepting
+ use cases
+ in web pages:
- DS
- edfu Analyse
+ other formats
- bothmer / GraphViz
- crcg with python / KML / GraphViz
+ Part III: (15:45-17:30)
+ own little project with CouchDB
- we have three data sets available
- did anybody bring their own? / wants to create some?
- Ideas?
- LOD?
+ Replication
- CouchDB includes a replicator
- »eventual consistency«
+ conceptually a good match for today’s frequently disconnected smartphones
- clone data to the phone, to be fully accessible at any time
- re-sync when network is available
+ replication setup stored in _replicator database
- slightly different in older CouchDB versions
- {"_id": "my_rep",
"source": "http://myserver.com:5984/foo",
"target": "bar",
"create_target": true }
- show in Futon
- can replicate continuously
- replication can be filtered
+ filter functions at filters/NAME
- return true/false to indicate whether the
- function(doc, req) {
if(doc._deleted) {
return true;
}
if (doc.von) {
if (parseInt(doc.von) > 1900) {
return false;
}
}
return true;
}
+ Changes feed
- (replication is based on it)
- continuous query also possible
+ Bonus: CouchDB Lucene
- separate Java application that integrates with CouchDB
- https://github.com/rnewson/couchdb-lucene
+ install / config
- brew install couchdb-lucene
- launch it
+ configure CouchDB
- external
fti
/usr/bin/python /usr/local/Cellar/couchdb-lucene/0.9.0/tools/couchdb-external-hook.p
- httpd_db_handlers
_fti
{couch_httpd_external, handle_external_req, <<"fti">>}
- httpd_global_handlers
_fti
{couch_httpd_proxy, handle_proxy_req, <<"http://127.0.0.1:5985">>}
+ add to design document at path fulltext/INDEXNAME/index
- function(doc) {
var result = new Document();
if (!doc._id.match(/^(_design)/)) {
function index(obj, keyPrefix) {
for (var key in obj) {
var value = obj[key];
switch (typeof(value)) {
case 'object':
index(value, (keyPrefix ? keyPrefix + "." : "") + key);
break;
case 'function':
break;
default:
result.add(value);
var fieldName;
if (obj.constructor === Array) {
fieldName = keyPrefix;
}
else {
fieldName = (keyPrefix ? keyPrefix + "." : "") + key;
}
result.add(value, {"field":fieldName , "store":"yes"});
break;
}
}
};
index(doc, "");
if (doc._attachments) {
for (var i in doc._attachments) {
result.attachment("default", i);
}
}
}
return result;
}
+ query at DB/_fti/_design/DESIGN/fulltext
- with q=LUCENEQUERY
- http://localhost:5984/elag-demo/_fti/_design/elag/everything?q=Druchtlev
+ Bonus TouchDB:
- CouchDB compatible library for iOS, Android, Mac, …
- includes replication
+ Bonus: hoodie
- http://hood.ie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment