Skip to content

Instantly share code, notes, and snippets.

@rnewson
Created April 14, 2012 21:31
  • Star 22 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save rnewson/2387973 to your computer and use it in GitHub Desktop.
CouchDB Future Feature List - Round 2
A list of features that we want to see in CouchDB. Needs to be voted on so that it can become a priority queue.
User Facing Features
====================
1. Conflicts are the rule, not the exception
All previous versions of CouchDB hide conflicts by default (selecting
an arbitrary but consistent winning revision). Expert users can find
and resolve conflicts.
Instead, expose the true picture by default, this includes;
* Reading a document with conflicts returns all conflicting versions,
not just the winner. This might manifest as the ?conflicts=true
response or could be a 300 (Multiple Choices) response.
* Always accept a write (as long as it passes all validate_doc_update
functions). This means that no response will give a 409 (Conflict) by
default. You can still insist on a matching revision by using the
If-Match header.
* _rev is frequently assumed to be a user-facing revision/versioning
system, our efforts to convince otherwise have failed. Embrace this
also and rename the field to _mvcc.
2. Replace Futon
A modern interface that has first class support for all features
(proper editing for validate_doc_update, show and list functions,
etc).
3. Improve the user and security model
* Support distributed identity systems such as OpenID
* Allow for easier external authentication
* Finer grained authorization (instead of the binary _admin or not)
* Instead of exposing /_users as a database, design an API to cover
all expected operations instead.
Fine-grained authorization would allow the ability grant read and write
access independently, among other things. Specifically it should be possible
to grant the ability to write but not read.
4. Remove reserved metadata from documents
CouchDB treats some fields specially (_id, _rev, _attachments),
requiring a transformation process when reading and writing documents.
Removing the fields would allow higher performance and alternative
data types. A question remains as to where they would go, as not all
map to a standard HTTP header (_rev maps neatly to ETag,
though). Custom HTTP headers is an obvious solution, are there others?
5. built-in map functions
To complement the built-in reduce functions (_sum, _count, _stats) for
common use cases.
6. DSL for index creation, validation functions, etc.
The DSL would be very simple, deliberately not capable of expressing
all possible algorithms, but can always be efficiently evaluated
within the native VM. This will be faster in general but also avoids
managing a pool of couchjs processes.
7. Support CORS
8. Support WebSockets
9. Support EventSource
10. Support SPDY
11. Support richer reduce functions
Any reduce function with an output that grows no more than logarithmically with the input but might be substantially larger than the current 200 byte threshold
12. Richer querying model
While CouchDB views are powerful, they are are not as capable as
relational queries. This is largely deliberate as the fully relational
model is hard to scale.
One method to improve things (and provide the ability to sort by
value) is to introduce chained MapReduce (currently only available on cloudant).
This item also includes any other enhancement to the kinds of querying that
CouchDB can perform.
13. Partial updates of documents
It should be possible to change just a subset of a document's
properties without needing to write an update function.
14. Partial reads of documents
It should be possible to read just a subset of a document's properties
without needing to write a show function.
15. Create an exclusive namespace for databases
Databases have a fixed position in the URL (/<dbname>/) but share it
with many other items (/_log, /_replicate, etc). Fix this by
introducing an exclusive namespace for dbs (e.g, /_db/<dbname>)
16. Improve replication interoperability between implementations and
versions
* Introduce a tiered replication model, starting with a very simple
'dump' and 'load' tier, all the way up to a highly optimized, but
complex, protocol that reduces redundant data transmission.
* This work would also form the basis of an export/import feature for
incremental and complete backups.
17. Enhance background task management
Currently, replication tasks can be cancelled, awkwardly. Compaction
tasks and view building tasks cannot, short of restarting the server
or remote shell access to the erland VM.
Provide a consistent and simple API for cancelling any running
task. This API will also provide status/progress information where
appropriate and pause/resume where possible too.
18. Documentation
Documentation is scattered, dated, and incomplete. Couchbase have
donated their improved docs. We will incorporate these into the new
house style, fill in any gaps, and commit to updating documentation in
line with new releases.
19. Global changes feed
One or both of;
* A feed of server events like db creation, update and deletion.
* A federated changes feed for a selected set of database and changes feeds.
20. Allow database renaming
Self-explanatory, but the feature is complicated by the bigcouch
merger as the rename of a sharded database is not atomic without effort.
21. Database "aliases" (symlinks)
Visiting a database symlink will seamlessly redirect to the target database.
22. _changes feed for views
It should be possible to subscribe to view changes the same way we can
subscribe to database changes. This will enable many useful things,
chained map-reduce being a notable one.
23. per-db overrides of server-wide settings
Allow db-specific overrides for otherwise server wide configuration
settings, where sensible.
Developer Facing Features
=========================
1. OTP compliance/refactoring
2. Different HTTP engine (webmachine -> cowboy/yaws/etc)
3. Have hard dependencies on SpiderMonkey versions. Also simplifies the build.
4. Test suites for different versions of replication, file formats, etc.
5. Move attachments out of database files (which removes make_blocks)
6. Plugin/addon/module interface
7. View server protocol enhancements/refactoring
8. Make .ini config files optional: (1) move defaults into the code, (2) instead of local/default, ship a fully complete config with all of its lines commented out
9. Database corruption detection and repair
While CouchDB's append-only model is very safe, underlying issues with
filesystems and hardware can still corrupt databases. CouchDB can;
* add checksums on everything (btree nodes, documents, etc)
* ship a tool to verify all checksums.
* include a repair tool (that extracts everything extractable)
* perhaps include ECC information to allow recovery from corruption.
@mcoolin
Copy link

mcoolin commented Apr 15, 2012

I'd add a definitions section for the following:
CORS - Cross-origin resource sharing (CORS) is a web browser technology specification, which defines ways for a web server to allow its resources to be accessed by a web page from a different domain. http://en.wikipedia.org/wiki/Cross-origin_resource_sharing

EventSource - Server-sent events is a technology for providing push notifications from a server to a browser client in the form of DOM events. The Server-Sent Events EventSource API is now being standardized as part of HTML5[1] by the W3C. http://en.wikipedia.org/wiki/Server-sent_events

SPDY - SPDY (pronounced speedy)[1] is an experimental networking protocol developed primarily at Google for transporting web content.[1] Although not currently a standard protocol, the group developing SPDY has stated publicly that it is working toward standardization (available now as an Internet Draft[2]), and has reference implementations available in both Google Chrome [3] and Mozilla Firefox.[4] SPDY is similar to HTTP, with particular goals to reduce web page load latency and improve web security. SPDY achieves reduced latency through compression, multiplexing, and prioritization.[1] The name is not an acronym, but is a shortened version of the word "speedy".[5] SPDY is a trademark of Google.[6] http://en.wikipedia.org/wiki/SPDY

I would include the two lists in any upcoming vote, perhaps as separate votes.

A few other comments:
On Number 4 in the first list: Seems to be only a partial description, remove data from record and do what with it? How would it be accessed and used. especially _id, likely to have a major impact on any project using couchdb.

On number 3: Hard dependencies on SpiderMonkey, Should this not be discussed more? Performance being a big issue, V8 may be a better choice or a set of native erlang routines or some other derivative.

@rnewson
Copy link
Author

rnewson commented Apr 15, 2012 via email

@marcenuc
Copy link

4 . Remove reserved metadata from documents

Removing the fields would allow higher performance and alternative
data types. A question remains as to where they would go, as not all
map to a standard HTTP header (_rev maps neatly to ETag,
though). Custom HTTP headers is an obvious solution, are there others?

I think it would be very handy to have an object with the metadata and the data. For example the following document:

{ _id: "ID", _rev: "1-abc", foo: "bar", ... }

would become:

{ _id: "ID", _rev: "1-abc", _data: { foo: "bar", ... } }

or, better:

{ id: "ID", rev: "1-abc", data: { foo: "bar", ... } }

This would allow for arrays, or even primitive types, as data:

{ id: "ID", rev: "1-abc", data: [ "bar", ... ] }

and it would not require parsing of HTTP headers. This is also very similar to the response of a view, where the returned object has metadata clearly separated from the document, when include_docs=true.

@rnewson
Copy link
Author

rnewson commented Apr 15, 2012 via email

@Jimflip
Copy link

Jimflip commented Aug 25, 2013

TTL - auto expiring records, and have them completely removed after expiration similar to couchbase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment