Instantly share code, notes, and snippets.

@ssp /SWIB CouchDB.md
Last active Dec 29, 2015

Embed
What would you like to do?
Notes for the hands-on CouchDB workshop at SWIB 13 in Hamburg (2013-11-25).

CouchDB Workshop

at SWIB13 in Hamburg (2013-11-25)

Notes for the hands-on steps done by the participants during the workshop.

  • Part I: Getting to know CouchDB (13:00-14:00)
    • Hello
      • who are you?
      • what are you doing?
      • how do you think CouchDB could be interesting for you?
    • What is CouchDB?
      • History
        • 2005 started by Damien Katz (Lotus Notes background)
        • »Cluster of unreliable commodity hardware Data Base«
        • ~ 2011 development stalled: CouchDB vs CouchBase
        • development going on again: v 1.3.0 was released this winter
        • perspective: integrate more features like BigCouch, GeoCouch and increase extensibility which is tricky due to Erlang
        • other *ouchDBs exist
      • NoSQL
        • No SQL
        • schema less
        • Difference to SQL Databases
          • pros and cons?
          • what do we need schemas and structured queries for?
        • Document Database
      • JavaScript & JSON
        • all data stored as JSON objects
        • answers in JSON objects
        • URL parameters in JSON format
        • everybody familiar with JSON?
      • for/on/of the Web
        • not just JavaScript
        • http as the API
        • REST-like interface
      • Map/Reduce
        • for index building and analysis
        • Google origin
      • Replication
        • supported by CouchDB
        • many interesting applications (e.g. mobile sync)
    • A look at Futon
      • built into CouchDB at /_utils
      • simple browser GUI to look at the database
      • DO IT
        • who brought their own CouchDB?
        • database list
        • look into database
        • create a record
        • look at fields:
          • _id
          • _rev
        • update a record:
          • observe _rev
          • note history
    • http: the »proper« interface to CouchDB
      • REST-like
      • uses http verbs GET/PUT/POST/DELETE
      • can use it with curl
        • -X POST [set the http verb]
        • -D - [include headers in output]
        • -H "Content-Type: application/json" [set content type!]
        • -d '{"key": "value"}' [data to send to the server; use -d @/file/path to upload file content]
        • http://localhost:5984/elag/document-id
          • authentication using --netrc (with a ~/.netrc file) or user:password@ in the URL
        • when querying use -H "Accept: application/json" to get the correct Content-Type
      • »Dev HTTP Client« for Chrome
        • pretty and powerful graphical client
        • haven’t found an equally powerful Firefox extension yet
    • Getting Data in and out of CouchDB
      • create a database
      • get info on the new database
        • curl -D - --netrc -X GET http://localhost:5984/swib-demo
        • {"db_name":"swib-demo","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":79,"data_size":0,"instance_start_time":"1385179213594752","disk_format_version":6,"committed_update_seq":0}
      • create a JSON object:
      • PUT JSON into DB as document »swib2013«
      • retrieve record from DB
      • delete record from DB:
      • Re-add the deleted record using PUT
        • do not need the revision information because it’s been deleted
        • but revision ID increases
        • curl -D- --netrc -X PUT -H "Content-Type: application/json" -d '{"type":"event", "name":"SWIB 2013", "location":{"de":"Hamburg"}, "url":"http://swib.org/swib13"}' http://localhost:5984/swib-demo/swib2013
        • 201 Created
        • {"ok":true,"id":"swib2013","rev":"3-73932ffd3c46a6a1a16236df249689f2"}
      • Change record in DB by sending a modified version:
      • POST a bunch of documents to the database: _bulk_docs
        • need to send a JSON object with an array of documents: {"docs": […]}
        • curl -D - --netrc -X POST -H "Content-Type: application/json" -d '{"docs":[{"_id":"swib2012","type":"event","name":"SWIB 2012","location":{"de":"Köln", "en":"Cologne"}}, {"type":"event","name":"SWIB 2014"}, {"_id":"swib2013","type":"event","name":"SWIB 2013","location":{"de":"Hamburg"},"url":"http://swib.org/swib13","hashtag":"swib2013"} ]}' "http://localhost:5984/swib-demo/_bulk_docs"
        • 201 Created
        • [{"ok":true,"id":"swib2012","rev":"1-698cd57203f826306f6b66a835f4bab1"},{"ok":true,"id":"9bd2a1b1e59a18aa615f2e4f23000853","rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2"},{"id":"swib2013","error":"conflict","reason":"Document update conflict."}]
        • → granular update feedback, very powerful/flexible
      • GET all documents from a database: _all_docs
        • just metadata:
          • curl -D- --netrc -X GET http://localhost:5984/swib-demo/_all_docs
          • 200 OK
          • {"total_rows":3,"offset":0,"rows":[ {"id":"9bd2a1b1e59a18aa615f2e4f23000853","key":"9bd2a1b1e59a18aa615f2e4f23000853","value":{"rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2"}}, {"id":"swib2012","key":"swib2012","value":{"rev":"1-698cd57203f826306f6b66a835f4bab1"}}, {"id":"swib2013","key":"swib2013","value":{"rev":"4-ae91e484504d0df3da5ab0e12b498dd9"}} ]}
        • also document content: include_docs=true
          • curl -D - --netrc -X GET "http://localhost:5984/swib-demo/_all_docs?include_docs=true"
          • 200 OK
          • {"total_rows":3,"offset":0,"rows":[ {"id":"9bd2a1b1e59a18aa615f2e4f23000853","key":"9bd2a1b1e59a18aa615f2e4f23000853","value":{"rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2"},"doc":{"_id":"9bd2a1b1e59a18aa615f2e4f23000853","_rev":"1-fb6dbfaea44b5ae3002c7f61e2bee5e2","type":"event","name":"SWIB 2014"}}, {"id":"swib2012","key":"swib2012","value":{"rev":"1-698cd57203f826306f6b66a835f4bab1"},"doc":{"_id":"swib2012","_rev":"1-698cd57203f826306f6b66a835f4bab1","type":"event","name":"SWIB 2012","location":{"de":"Köln","en":"Cologne"}}}, {"id":"swib2013","key":"swib2013","value":{"rev":"4-ae91e484504d0df3da5ab0e12b498dd9"},"doc":{"_id":"swib2013","_rev":"4-ae91e484504d0df3da5ab0e12b498dd9","type":"event","name":"SWIB 2013","location":{"de":"Hamburg Wilhelmsburg"},"url":"http://swib.org/swib13"}} ]}
      • attachments:
        • attach a file to a document
          • need to pass:
            • file name appended to path
            • document revision
            • attachment MIME Type
            • attachment
          • curl -D - --netrc -X PUT -H "Content-Type: image/png" -H "If-Match: 1-698cd57203f826306f6b66a835f4bab1" --data-binary @IIsaBurito.png http://localhost:5984/swib-demo/swib2012/kitty.png
          • 100 Continue
          • 201 Created
          • {"ok":true,"id":"swib2012","rev":"2-e38f040510caae7aaaf3ec45e7816f63"}
        • attachment is not delivered back in JSON, just a »stub« in the _attachements object
          • curl -D - -X GET "http://localhost:5984/swib-demo/swib2012/"
          • 200 OK
          • {"_id":"swib2012","_rev":"2-e38f040510caae7aaaf3ec45e7816f63","type":"event","name":"SWIB 2012","location":{"de":"Köln","en":"Cologne"},"_attachments":{"kitty.png":{"content_type":"image/png","revpos":2,"digest":"md5-1TF4Rlk5CuCasqC1TfvHCg==","length":231873,"stub":true}}}
          • for the attachment use: curl -D - http://localhost:5984/swib-demo/swib2012/kitty.png
          • include the stub when updating the document the next time to ensure attachments are preserved without having to re-upload them
      • play around a little / everybody comfortable? / can always do this in Futon
      • Example datasets
        • bothmer.json
        • couch-marc.json
        • ct_sample.json
        • gnd-smith.json
  • Part II : (14:00-15:15)
    • design documents
      • stored as documents _design/NAME (use the NAME swib), e.g. _design/swib
    • views
      • stored in design documents’s views array
      • called with subpath _view/NAME (use the NAME type) of design document, e.g. /swib-demo/_design/swib/_view/type
      • Futon has a simple interface for creating and testing them
    • map
      • map function creates an index entry for the document
      • e.g. extract a field
      • defined by a JavaScript function: function (doc) {}
      • use the emit() command to add something to the index
      • use map function
        • function(doc) { emit(doc.type); }
      • can also pass a second parameter to emit
        • it will be available as the document’s value for the key
        • available for reduce function, also available when calling views
    • Query views: Map (examples use the documents from bothmer.json)
    • reduce
      • reduce function computes a reduced value for all documents with the same mapped key
      • e.g. function(key, values) { sum(values); }
      • already built in:
        • _count
        • _sum
        • _stats
      • demo in Futon: use reduce to find out how many documents of each type are in the DB
        • set reduce function of _design/swib/_view/type to: _count
    • Query views: Reduce
      • group
      • non-obvious but very powerful:
        • we can map to an array and then pick the number of components that should be equal
        • function(doc) { if (doc.von) { century = doc.von.substr(0,2); } emit([doc.type, century]); }
        • curl -D - -X GET 'http://localhost:5984/swib-demo/_design/swib/_view/type?group=true'
        • 200 OK
        • {"rows":[ {"key":["allianzpartner","11"],"value":2}, {"key":["allianzpartner","12"],"value":1}, {"key":["allianzpartner","13"],"value":13}, {"key":["allianzpartner","14"],"value":30}, {"key":["allianzpartner","15"],"value":82}, {"key":["allianzpartner","16"],"value":93}, {"key":["allianzpartner","17"],"value":108}, {"key":["allianzpartner","18"],"value":171}, {"key":["allianzpartner","19"],"value":3}, {"key":["bothmerbothmer","15"],"value":23}, {"key":["namensträger","11"],"value":4}, {"key":["namensträger","12"],"value":10}, {"key":["namensträger","13"],"value":39}, {"key":["namensträger","14"],"value":52}, {"key":["namensträger","15"],"value":116}, {"key":["namensträger","16"],"value":177}, {"key":["namensträger","17"],"value":246}, {"key":["namensträger","18"],"value":245}, {"key":["wappen","18"],"value":659} ]}
        • can get the same result as before using grouplevel=1
        • elag curl -D - -X GET 'http://localhost:5984/swib-demo/_design/swib/_view/type?group=true&group_level=1'
        • 200 OK
        • {"rows":[ {"key":["allianzpartner"],"value":503}, {"key":["bothmerbothmer"],"value":23}, {"key":["namensträger"],"value":889}, {"key":["wappen"],"value":659} ]}
        • example:
          • for dates you could write the timestamp
          • if you need grouping by day / month / year it may be helpful to map [2013, 5, 28]
    • Map/Reduce Summary
      • quite simple
      • very powerful
      • requires different thinking than SQL, possibly simpler if you are used to things like Solr
      • no JOIN
    • Intermission: couchapp
      • http://couchapp.org/page/index
      • using Futon for editing any of the following bits of design documents can be a bit of a pain thanks to escaping issues
      • the command line tool couchapp can help with that
        • you can clone the design document into a file/folder structure on your hard drive
        • use your favourite editor
        • then push the design document back into CouchDB
        • a bit like git, really
        • try to install it
        • mkdir myCouchDesign; cd myCouchDesign
        • couchapp init
        • couchapp clone http://localhost:5984/swib-demo
        • also check out the newer couchapp alternatives listed on the website to see which one suits you the best
    • show interface
      • a way to implement different output formats
      • in particular for different MIME types
      • function (doc, req) {}
      • return object with
        • headers:
          • Content-Type:
        • body:
          • Text
        • json:
          • JavaScript object
        • base64:
          • binary data
      • example:
      • includes automatic MIME Type Handling
        • register MIME Types with internal name using registerType(INTERNALNAME, MIMETYPES)

        • common MIME Types text/json/xml/html/atom/… are predefined

        • provide output for them using provides(INTERNALNAME, function() { return {} });

        • function(doc, req){ provides('json', function(){ return {'json': doc} });

          provides('html', function(){ var listItem = function (key, value) { var result = ''; if (key) { result += '

          ' + key + '
          '; } if (typeof value === 'object') { result += '
          '; for (var i in value) { if (i[0] !== '_') { result += listItem(i, value[i]); } } result += '
          '; } else { result += value; } if (key) { result += '
          '; } return result; }

            return '<html><head><title>'.concat(doc._id, '</title>',
            '<style type="text/css">dt { font-weight:bold; }</style></head>',
            '<body><h1>Document »' + doc._id + '«</h1>',
            listItem(undefined, doc),
            '</body><html>');
          

          });

          registerType('text-json', 'text/json') provides('text-json', function(){ return toJSON(doc); }); }

        • curl -D - -H "Accept: text/html" -X GET 'http://localhost:5984/swib-demo/_design/swib/_show/content-types/swib2013'

        • 200 OK Vary: Accept Content-Type: text/html; charset=utf-8

        • <title>swib2013</title><style type="text/css">dt { font-weight:bold; }</style>

          Document »swib2013«

          type
          event
          name
          ELAG 2013
          location
          nl
          Gent
          en
          Ghent
          url
          http://swib2013.org/
          hashtag
          swib2013
    • Idea: Can we use this for a LOD server
      • configure the redirection headers as needed
      • create some triples for a MARC record?
        • e.g. the ones in couch-marc
    • list interface
      • show / document :: list/view
      • can also use provides() / registerType()
      • useful for a quick CSV export
      • can send output in bits
        • start({'headers': {})
        • send() repeatable
      • Applications:
        • simple CSV export
        • dot file output for GraphViz
        • KML output
        • TeX output?
    • validation
      • validate_doc_update
      • each design document can have one
        • all of them are applied
        • modularity
      • e.g.: enforce that each document has a type field:
      • function(newDoc, oldDoc, userCtx, secObj) { if (!newDoc._deleted) { if (!newDoc.type) { throw({"forbidden": "documents need to have a type"}); } } }
      • return when accepting document
      • throw({'forbidden': 'Explanation'}) when not accepting
    • use cases
      • in web pages:
        • DS
        • edfu Analyse
      • other formats
        • bothmer / GraphViz
        • crcg with python / KML / GraphViz
  • Part III: (16:00-19:00)
    • own little project with CouchDB
      • we have three data sets available
      • did anybody bring their own? / wants to create some?
      • Ideas?
      • LOD?
    • Replication
      • CouchDB includes a replicator
      • »eventual consistency«
      • conceptually a good match for today’s frequently disconnected smartphones
        • clone data to the phone, to be fully accessible at any time
        • re-sync when network is available
      • replication setup stored in _replicator database
        • slightly different in older CouchDB versions
        • {"_id": "my_rep", "source": "http://myserver.com:5984/foo", "target": "bar", "create_target": true }
      • show in Futon
      • can replicate continuously
      • replication can be filtered
      • filter functions at filters/NAME
        • return true/false to indicate whether the
        • function(doc, req) { if(doc._deleted) { return true; } if (doc.von) { if (parseInt(doc.von) > 1900) { return false; } } return true; }
    • Changes feed
      • (replication is based on it)
      • continuous query also possible
    • Bonus: CouchDB River for ElasticSearch
    • Bonus: CouchDB Lucene
      • install / config
        • brew install couchdb-lucene
        • launch it
        • configure CouchDB
          • external fti /usr/bin/python /usr/local/Cellar/couchdb-lucene/0.9.0/tools/couchdb-external-hook.p
          • httpd_db_handlers _fti {couch_httpd_external, handle_external_req, <<"fti">>}
          • httpd_global_handlers _fti {couch_httpd_proxy, handle_proxy_req, <<"http://127.0.0.1:5985">>}
      • add to design document at path fulltext/INDEXNAME/index
        • function(doc) { var result = new Document(); if (!doc._id.match(/^(_design)/)) { function index(obj, keyPrefix) { for (var key in obj) { var value = obj[key]; switch (typeof(value)) { case 'object': index(value, (keyPrefix ? keyPrefix + "." : "") + key); break; case 'function': break; default: result.add(value); var fieldName; if (obj.constructor === Array) { fieldName = keyPrefix; } else { fieldName = (keyPrefix ? keyPrefix + "." : "") + key; } result.add(value, {"field":fieldName , "store":"yes"}); break; } } };

            index(doc, "");
          
            if (doc._attachments) {
            	for (var i in doc._attachments) {
            		result.attachment("default", i);
            	}
            }
          

          } return result; }

      • query at DB/_fti/_design/DESIGN/fulltext
    • Bonus TouchDB:
      • CouchDB compatible library for iOS, Android, Mac, …
      • includes replication
    • Bonus: hoodie

Thanks for your attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment