Skip to content

Instantly share code, notes, and snippets.

@ssp ssp/SWIB CouchDB.md
Last active Dec 29, 2015

Embed
What would you like to do?
Notes for the hands-on CouchDB workshop at SWIB 13 in Hamburg (2013-11-25).

CouchDB Workshop

at SWIB13 in Hamburg (2013-11-25)

Notes for the hands-on steps done by the participants during the workshop.

  • Part I: Getting to know CouchDB (13:00-14:00)
    • Hello
      • who are you?
      • what are you doing?
      • how do you think CouchDB could be interesting for you?
    • What is CouchDB?
      • History
        • 2005 started by Damien Katz (Lotus Notes background)
        • »Cluster of unreliable commodity hardware Data Base«
        • ~ 2011 development stalled: CouchDB vs CouchBase
        • development going on again: v 1.3.0 was released this winter
        • perspective: integrate more features like BigCouch, GeoCouch and increase extensibility which is tricky due to Erlang
        • other *ouchDBs exist
      • NoSQL
        • No SQL
        • schema less
        • Difference to SQL Databases
          • pros and cons?
          • what do we need schemas and structured queries for?
        • Document Database
      • JavaScript & JSON
        • all data stored as JSON objects
        • answers in JSON objects
        • URL parameters in JSON format
        • everybody familiar with JSON?
      • for/on/of the Web
        • not just JavaScript
        • http as the API
        • REST-like interface
      • Map/Reduce
        • for index building and analysis
        • Google origin
      • Replication
        • supported by CouchDB
        • many interesting applications (e.g. mobile sync)
    • A look at Futon
      • built into CouchDB at /_utils
      • simple browser GUI to look at the database
      • DO IT
        • who brought their own CouchDB?
        • database list
        • look into database
        • create a record
        • look at fields:
          • _id
          • _rev
        • update a record:
          • observe _rev
          • note history
    • http: the »proper« interface to CouchDB
      • REST-like
      • uses http verbs GET/PUT/POST/DELETE
      • can use it with curl
        • -X POST [set the http verb]
        • -D - [include headers in output]
        • -H "Content-Type: application/json" [set content type!]
        • -d '{"key": "value"}' [data to send to the server; use -d @/file/path to upload file content]
        • http://localhost:5984/elag/document-id
          • authentication using --netrc (with a ~/.netrc file) or user:password@ in the URL
        • when querying use -H "Accept: application/json" to get the correct Content-Type
      • »Dev HTTP Client« for Chrome
        • pretty and powerful graphical client
        • haven’t found an equally powerful Firefox extension yet
    • Getting Data in and out of CouchDB
      • create a database
      • get info on the new database
        • curl -D - --netrc -X GET http://localhost:5984/swib-demo
        • {"db_name":"swib-demo","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":79,"data_size":0,"instance_start_time":"1385179213594752","disk_format_version":6,"committed_update_seq":0}
      • create a JSON object:
      • PUT JSON into DB as document »swib2013«
      • retrieve record from DB
      • delete record from DB:
      • Re-add the deleted record using PUT
        • do not need the revision information because it’s been deleted
        • but revision ID increases
        • curl -D- --netrc -X PUT -H "Content-Type: application/json" -d '{"type":"event", "name":"SWIB 2013", "location":{"de":"Hamburg"}, "url":"http://swib.org/swib13"}' http://localhost:5984/swib-demo/swib2013
        • 201 Created
        • {"ok":true,"id":"swib2013","rev":"3-73932ffd3c46a6a1a16236df249689f2"}
      • Change record in DB by sending a modified version:
      • POST a bunch of documents to the database: _bulk_docs
        • need to send a JSON object with an array of documents: {"docs": […]}
        • curl -D - --netrc -X POST -H "Content-Type: application/json" -d '{"docs":[{"_id":"swib2012","type":"event","name":"SWIB 2012","location":{"de":"Köln", "en":"Cologne"}}, {"type":"event","name":"SWIB 2014"}, {"_id":"swib2013","type":"event","name":"SWIB 2013","location":{"de":"Hamburg"},"url":"http://swib.org/swib13","hashtag":"swib2013"} ]}' "http://localhost:5984/swib-demo/_bulk_docs"
        • 201 Created
        • [{"ok":true,"id":"swib2012","rev":"1-698cd57203f826306f6b66a835f4bab1"},{"ok":true,"id":"9bd2a1b1e59a18aa615f2e4f23000853","rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2"},{"id":"swib2013","error":"conflict","reason":"Document update conflict."}]
        • → granular update feedback, very powerful/flexible
      • GET all documents from a database: _all_docs
        • just metadata:
          • curl -D- --netrc -X GET http://localhost:5984/swib-demo/_all_docs
          • 200 OK
          • {"total_rows":3,"offset":0,"rows":[ {"id":"9bd2a1b1e59a18aa615f2e4f23000853","key":"9bd2a1b1e59a18aa615f2e4f23000853","value":{"rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2"}}, {"id":"swib2012","key":"swib2012","value":{"rev":"1-698cd57203f826306f6b66a835f4bab1"}}, {"id":"swib2013","key":"swib2013","value":{"rev":"4-ae91e484504d0df3da5ab0e12b498dd9"}} ]}
        • also document content: include_docs=true
          • curl -D - --netrc -X GET "http://localhost:5984/swib-demo/_all_docs?include_docs=true"
          • 200 OK
          • {"total_rows":3,"offset":0,"rows":[ {"id":"9bd2a1b1e59a18aa615f2e4f23000853","key":"9bd2a1b1e59a18aa615f2e4f23000853","value":{"rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2"},"doc":{"_id":"9bd2a1b1e59a18aa615f2e4f23000853","_rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2","type":"event","name":"SWIB 2014"}}, {"id":"swib2012","key":"swib2012","value":{"rev":"1-698cd57203f826306f6b66a835f4bab1"},"doc":{"_id":"swib2012","_rev":"1-698cd57203f826306f6b66a835f4bab1","type":"event","name":"SWIB 2012","location":{"de":"Köln","en":"Cologne"}}}, {"id":"swib2013","key":"swib2013","value":{"rev":"4-ae91e484504d0df3da5ab0e12b498dd9"},"doc":{"_id":"swib2013","_rev":"4-ae91e484504d0df3da5ab0e12b498dd9","type":"event","name":"SWIB 2013","location":{"de":"Hamburg Wilhelmsburg"},"url":"http://swib.org/swib13"}} ]}
      • attachments:
        • attach a file to a document
          • need to pass:
            • file name appended to path
            • document revision
            • attachment MIME Type
            • attachment
          • curl -D - --netrc -X PUT -H "Content-Type: image/png" -H "If-Match: 1-698cd57203f826306f6b66a835f4bab1" --data-binary @IIsaBurito.png http://localhost:5984/swib-demo/swib2012/kitty.png
          • 100 Continue
          • 201 Created
          • {"ok":true,"id":"swib2012","rev":"2-e38f040510caae7aaaf3ec45e7816f63"}
        • attachment is not delivered back in JSON, just a »stub« in the _attachements object
          • curl -D - -X GET "http://localhost:5984/swib-demo/swib2012/"
          • 200 OK
          • {"_id":"swib2012","_rev":"2-e38f040510caae7aaaf3ec45e7816f63","type":"event","name":"SWIB 2012","location":{"de":"Köln","en":"Cologne"},"_attachments":{"kitty.png":{"content_type":"image/png","revpos":2,"digest":"md5-1TF4Rlk5CuCasqC1TfvHCg==","length":231873,"stub":true}}}
          • for the attachment use: curl -D - http://localhost:5984/swib-demo/swib2012/kitty.png
          • include the stub when updating the document the next time to ensure attachments are preserved without having to re-upload them
      • play around a little / everybody comfortable? / can always do this in Futon
      • Example datasets
        • bothmer.json
        • couch-marc.json
        • ct_sample.json
        • gnd-smith.json
  • Part II : (14:00-15:15)
    • design documents
      • stored as documents _design/NAME (use the NAME swib), e.g. _design/swib
    • views
      • stored in design documents’s views array
      • called with subpath _view/NAME (use the NAME type) of design document, e.g. /swib-demo/_design/swib/_view/type
      • Futon has a simple interface for creating and testing them
    • map
      • map function creates an index entry for the document
      • e.g. extract a field
      • defined by a JavaScript function: function (doc) {}
      • use the emit() command to add something to the index
      • use map function
        • function(doc) { emit(doc.type); }
      • can also pass a second parameter to emit
        • it will be available as the document’s value for the key
        • available for reduce function, also available when calling views
    • Query views: Map (examples use the documents from bothmer.json)
    • reduce
      • reduce function computes a reduced value for all documents with the same mapped key
      • e.g. function(key, values) { sum(values); }
      • already built in:
        • _count
        • _sum
        • _stats
      • demo in Futon: use reduce to find out how many documents of each type are in the DB
        • set reduce function of _design/swib/_view/type to: _count
    • Query views: Reduce
      • group
      • non-obvious but very powerful:
        • we can map to an array and then pick the number of components that should be equal
        • function(doc) { if (doc.von) { century = doc.von.substr(0,2); } emit([doc.type, century]); }
        • curl -D - -X GET 'http://localhost:5984/swib-demo/_design/swib/_view/type?group=true'
        • 200 OK
        • {"rows":[ {"key":["allianzpartner","11"],"value":2}, {"key":["allianzpartner","12"],"value":1}, {"key":["allianzpartner","13"],"value":13}, {"key":["allianzpartner","14"],"value":30}, {"key":["allianzpartner","15"],"value":82}, {"key":["allianzpartner","16"],"value":93}, {"key":["allianzpartner","17"],"value":108}, {"key":["allianzpartner","18"],"value":171}, {"key":["allianzpartner","19"],"value":3}, {"key":["bothmerbothmer","15"],"value":23}, {"key":["namensträger","11"],"value":4}, {"key":["namensträger","12"],"value":10}, {"key":["namensträger","13"],"value":39}, {"key":["namensträger","14"],"value":52}, {"key":["namensträger","15"],"value":116}, {"key":["namensträger","16"],"value":177}, {"key":["namensträger","17"],"value":246}, {"key":["namensträger","18"],"value":245}, {"key":["wappen","18"],"value":659} ]}
        • can get the same result as before using grouplevel=1
        • elag curl -D - -X GET 'http://localhost:5984/swib-demo/_design/swib/_view/type?group=true&group_level=1'
        • 200 OK
        • {"rows":[ {"key":["allianzpartner"],"value":503}, {"key":["bothmerbothmer"],"value":23}, {"key":["namensträger"],"value":889}, {"key":["wappen"],"value":659} ]}
        • example:
          • for dates you could write the timestamp
          • if you need grouping by day / month / year it may be helpful to map [2013, 5, 28]
    • Map/Reduce Summary
      • quite simple
      • very powerful
      • requires different thinking than SQL, possibly simpler if you are used to things like Solr
      • no JOIN
    • Intermission: couchapp
      • http://couchapp.org/page/index
      • using Futon for editing any of the following bits of design documents can be a bit of a pain thanks to escaping issues
      • the command line tool couchapp can help with that
        • you can clone the design document into a file/folder structure on your hard drive
        • use your favourite editor
        • then push the design document back into CouchDB
        • a bit like git, really
        • try to install it
        • mkdir myCouchDesign; cd myCouchDesign
        • couchapp init
        • couchapp clone http://localhost:5984/swib-demo
        • also check out the newer couchapp alternatives listed on the website to see which one suits you the best
    • show interface
      • a way to implement different output formats
      • in particular for different MIME types
      • function (doc, req) {}
      • return object with
        • headers:
          • Content-Type:
        • body:
          • Text
        • json:
          • JavaScript object
        • base64:
          • binary data
      • example:
      • includes automatic MIME Type Handling
        • register MIME Types with internal name using registerType(INTERNALNAME, MIMETYPES)

        • common MIME Types text/json/xml/html/atom/… are predefined

        • provide output for them using provides(INTERNALNAME, function() { return {} });

        • function(doc, req){ provides('json', function(){ return {'json': doc} });

          provides('html', function(){ var listItem = function (key, value) { var result = ''; if (key) { result += '

          ' + key + '
          '; } if (typeof value === 'object') { result += '
          '; for (var i in value) { if (i[0] !== '_') { result += listItem(i, value[i]); } } result += '
          '; } else { result += value; } if (key) { result += '
          '; } return result; }

            return '<html><head><title>'.concat(doc._id, '</title>',
            '<style type="text/css">dt { font-weight:bold; }</style></head>',
            '<body><h1>Document »' + doc._id + '«</h1>',
            listItem(undefined, doc),
            '</body><html>');
          

          });

          registerType('text-json', 'text/json') provides('text-json', function(){ return toJSON(doc); }); }

        • curl -D - -H "Accept: text/html" -X GET 'http://localhost:5984/swib-demo/_design/swib/_show/content-types/swib2013'

        • 200 OK Vary: Accept Content-Type: text/html; charset=utf-8

        • <title>swib2013</title><style type="text/css">dt { font-weight:bold; }</style>

          Document »swib2013«

          type
          event
          name
          ELAG 2013
          location
          nl
          Gent
          en
          Ghent
          url
          http://swib2013.org/
          hashtag
          swib2013
    • Idea: Can we use this for a LOD server
      • configure the redirection headers as needed
      • create some triples for a MARC record?
        • e.g. the ones in couch-marc
    • list interface
      • show / document :: list/view
      • can also use provides() / registerType()
      • useful for a quick CSV export
      • can send output in bits
        • start({'headers': {})
        • send() repeatable
      • Applications:
        • simple CSV export
        • dot file output for GraphViz
        • KML output
        • TeX output?
    • validation
      • validate_doc_update
      • each design document can have one
        • all of them are applied
        • modularity
      • e.g.: enforce that each document has a type field:
      • function(newDoc, oldDoc, userCtx, secObj) { if (!newDoc._deleted) { if (!newDoc.type) { throw({"forbidden": "documents need to have a type"}); } } }
      • return when accepting document
      • throw({'forbidden': 'Explanation'}) when not accepting
    • use cases
      • in web pages:
        • DS
        • edfu Analyse
      • other formats
        • bothmer / GraphViz
        • crcg with python / KML / GraphViz
  • Part III: (16:00-19:00)
    • own little project with CouchDB
      • we have three data sets available
      • did anybody bring their own? / wants to create some?
      • Ideas?
      • LOD?
    • Replication
      • CouchDB includes a replicator
      • »eventual consistency«
      • conceptually a good match for today’s frequently disconnected smartphones
        • clone data to the phone, to be fully accessible at any time
        • re-sync when network is available
      • replication setup stored in _replicator database
        • slightly different in older CouchDB versions
        • {"_id": "my_rep", "source": "http://myserver.com:5984/foo", "target": "bar", "create_target": true }
      • show in Futon
      • can replicate continuously
      • replication can be filtered
      • filter functions at filters/NAME
        • return true/false to indicate whether the
        • function(doc, req) { if(doc._deleted) { return true; } if (doc.von) { if (parseInt(doc.von) > 1900) { return false; } } return true; }
    • Changes feed
      • (replication is based on it)
      • continuous query also possible
    • Bonus: CouchDB River for ElasticSearch
    • Bonus: CouchDB Lucene
      • install / config
        • brew install couchdb-lucene
        • launch it
        • configure CouchDB
          • external fti /usr/bin/python /usr/local/Cellar/couchdb-lucene/0.9.0/tools/couchdb-external-hook.p
          • httpd_db_handlers _fti {couch_httpd_external, handle_external_req, <<"fti">>}
          • httpd_global_handlers _fti {couch_httpd_proxy, handle_proxy_req, <<"http://127.0.0.1:5985">>}
      • add to design document at path fulltext/INDEXNAME/index
        • function(doc) { var result = new Document(); if (!doc._id.match(/^(_design)/)) { function index(obj, keyPrefix) { for (var key in obj) { var value = obj[key]; switch (typeof(value)) { case 'object': index(value, (keyPrefix ? keyPrefix + "." : "") + key); break; case 'function': break; default: result.add(value); var fieldName; if (obj.constructor === Array) { fieldName = keyPrefix; } else { fieldName = (keyPrefix ? keyPrefix + "." : "") + key; } result.add(value, {"field":fieldName , "store":"yes"}); break; } } };

            index(doc, "");
          
            if (doc._attachments) {
            	for (var i in doc._attachments) {
            		result.attachment("default", i);
            	}
            }
          

          } return result; }

      • query at DB/_fti/_design/DESIGN/fulltext
    • Bonus TouchDB:
      • CouchDB compatible library for iOS, Android, Mac, …
      • includes replication
    • Bonus: hoodie

Thanks for your attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.