public
Last active

CouchDB Future Feature List - Round 2

  • Download Gist
gistfile1.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187
A list of features that we want to see in CouchDB. Needs to be voted on so that it can become a priority queue.
 
User Facing Features
====================
 
1. Conflicts are the rule, not the exception
 
All previous versions of CouchDB hide conflicts by default (selecting
an arbitrary but consistent winning revision). Expert users can find
and resolve conflicts.
 
Instead, expose the true picture by default, this includes;
 
* Reading a document with conflicts returns all conflicting versions,
not just the winner. This might manifest as the ?conflicts=true
response or could be a 300 (Multiple Choices) response.
 
* Always accept a write (as long as it passes all validate_doc_update
functions). This means that no response will give a 409 (Conflict) by
default. You can still insist on a matching revision by using the
If-Match header.
 
* _rev is frequently assumed to be a user-facing revision/versioning
system, our efforts to convince otherwise have failed. Embrace this
also and rename the field to _mvcc.
 
2. Replace Futon
 
A modern interface that has first class support for all features
(proper editing for validate_doc_update, show and list functions,
etc).
 
3. Improve the user and security model
 
* Support distributed identity systems such as OpenID
* Allow for easier external authentication
* Finer grained authorization (instead of the binary _admin or not)
* Instead of exposing /_users as a database, design an API to cover
all expected operations instead.
 
Fine-grained authorization would allow the ability grant read and write
access independently, among other things. Specifically it should be possible
to grant the ability to write but not read.
 
4. Remove reserved metadata from documents
 
CouchDB treats some fields specially (_id, _rev, _attachments),
requiring a transformation process when reading and writing documents.
 
Removing the fields would allow higher performance and alternative
data types. A question remains as to where they would go, as not all
map to a standard HTTP header (_rev maps neatly to ETag,
though). Custom HTTP headers is an obvious solution, are there others?
 
5. built-in map functions
 
To complement the built-in reduce functions (_sum, _count, _stats) for
common use cases.
 
6. DSL for index creation, validation functions, etc.
 
The DSL would be very simple, deliberately not capable of expressing
all possible algorithms, but can always be efficiently evaluated
within the native VM. This will be faster in general but also avoids
managing a pool of couchjs processes.
 
7. Support CORS
 
8. Support WebSockets
 
9. Support EventSource
 
10. Support SPDY
 
11. Support richer reduce functions
 
Any reduce function with an output that grows no more than logarithmically with the input but might be substantially larger than the current 200 byte threshold
 
12. Richer querying model
 
While CouchDB views are powerful, they are are not as capable as
relational queries. This is largely deliberate as the fully relational
model is hard to scale.
 
One method to improve things (and provide the ability to sort by
value) is to introduce chained MapReduce (currently only available on cloudant).
 
This item also includes any other enhancement to the kinds of querying that
CouchDB can perform.
 
13. Partial updates of documents
 
It should be possible to change just a subset of a document's
properties without needing to write an update function.
 
14. Partial reads of documents
 
It should be possible to read just a subset of a document's properties
without needing to write a show function.
 
15. Create an exclusive namespace for databases
 
Databases have a fixed position in the URL (/<dbname>/) but share it
with many other items (/_log, /_replicate, etc). Fix this by
introducing an exclusive namespace for dbs (e.g, /_db/<dbname>)
 
16. Improve replication interoperability between implementations and
versions
 
* Introduce a tiered replication model, starting with a very simple
'dump' and 'load' tier, all the way up to a highly optimized, but
complex, protocol that reduces redundant data transmission.
* This work would also form the basis of an export/import feature for
incremental and complete backups.
 
17. Enhance background task management
 
Currently, replication tasks can be cancelled, awkwardly. Compaction
tasks and view building tasks cannot, short of restarting the server
or remote shell access to the erland VM.
 
Provide a consistent and simple API for cancelling any running
task. This API will also provide status/progress information where
appropriate and pause/resume where possible too.
 
18. Documentation
 
Documentation is scattered, dated, and incomplete. Couchbase have
donated their improved docs. We will incorporate these into the new
house style, fill in any gaps, and commit to updating documentation in
line with new releases.
 
19. Global changes feed
 
One or both of;
 
* A feed of server events like db creation, update and deletion.
* A federated changes feed for a selected set of database and changes feeds.
 
20. Allow database renaming
 
Self-explanatory, but the feature is complicated by the bigcouch
merger as the rename of a sharded database is not atomic without effort.
 
21. Database "aliases" (symlinks)
 
Visiting a database symlink will seamlessly redirect to the target database.
 
22. _changes feed for views
 
It should be possible to subscribe to view changes the same way we can
subscribe to database changes. This will enable many useful things,
chained map-reduce being a notable one.
 
23. per-db overrides of server-wide settings
 
Allow db-specific overrides for otherwise server wide configuration
settings, where sensible.
 
Developer Facing Features
=========================
 
1. OTP compliance/refactoring
 
2. Different HTTP engine (webmachine -> cowboy/yaws/etc)
 
3. Have hard dependencies on SpiderMonkey versions. Also simplifies the build.
 
4. Test suites for different versions of replication, file formats, etc.
 
5. Move attachments out of database files (which removes make_blocks)
 
6. Plugin/addon/module interface
 
7. View server protocol enhancements/refactoring
 
8. Make .ini config files optional: (1) move defaults into the code, (2) instead of local/default, ship a fully complete config with all of its lines commented out
 
9. Database corruption detection and repair
 
While CouchDB's append-only model is very safe, underlying issues with
filesystems and hardware can still corrupt databases. CouchDB can;
 
* add checksums on everything (btree nodes, documents, etc)
* ship a tool to verify all checksums.
* include a repair tool (that extracts everything extractable)
* perhaps include ECC information to allow recovery from corruption.

I'd add a definitions section for the following:
CORS - Cross-origin resource sharing (CORS) is a web browser technology specification, which defines ways for a web server to allow its resources to be accessed by a web page from a different domain. http://en.wikipedia.org/wiki/Cross-origin_resource_sharing

EventSource - Server-sent events is a technology for providing push notifications from a server to a browser client in the form of DOM events. The Server-Sent Events EventSource API is now being standardized as part of HTML5[1] by the W3C. http://en.wikipedia.org/wiki/Server-sent_events

SPDY - SPDY (pronounced speedy)[1] is an experimental networking protocol developed primarily at Google for transporting web content.[1] Although not currently a standard protocol, the group developing SPDY has stated publicly that it is working toward standardization (available now as an Internet Draft[2]), and has reference implementations available in both Google Chrome [3] and Mozilla Firefox.[4] SPDY is similar to HTTP, with particular goals to reduce web page load latency and improve web security. SPDY achieves reduced latency through compression, multiplexing, and prioritization.[1] The name is not an acronym, but is a shortened version of the word "speedy".[5] SPDY is a trademark of Google.[6] http://en.wikipedia.org/wiki/SPDY

I would include the two lists in any upcoming vote, perhaps as separate votes.

A few other comments:
On Number 4 in the first list: Seems to be only a partial description, remove data from record and do what with it? How would it be accessed and used. especially _id, likely to have a major impact on any project using couchdb.

On number 3: Hard dependencies on SpiderMonkey, Should this not be discussed more? Performance being a big issue, V8 may be a better choice or a set of native erlang routines or some other derivative.

I deliberately left out definitions for those on the basis that if you
don't know what they are, you have no business asserting its a high
priority for our project.

For item 4, the description does state that a question remains on
where they go and suggests custom HTTP headers, so I don't follow your
point. _id in particular would still be in the URL.

For the spidermonkey issue, it is not at all clear that switching to
V8 will improve performance (it's largely a myth that V8 is faster
than spidermonkey anyway). The main feature for improving performance
is identified as "View server protocol enhancements/refactoring".

B.

On 15 April 2012 12:45, mcoolin
reply@reply.github.com
wrote:

I'd add a definitions section for the following:
CORS - Cross-origin resource sharing (CORS) is a web browser technology specification, which defines ways for a web server to allow its resources to be accessed by a web page from a different domain. http://en.wikipedia.org/wiki/Cross-origin_resource_sharing

EventSource - Server-sent events is a technology for providing push notifications from a server to a browser client in the form of DOM events. The Server-Sent Events EventSource API is now being standardized as part of HTML5[1] by the W3C. http://en.wikipedia.org/wiki/Server-sent_events

SPDY - SPDY (pronounced speedy)[1] is an experimental networking protocol developed primarily at Google for transporting web content.[1] Although not currently a standard protocol, the group developing SPDY has stated publicly that it is working toward standardization (available now as an Internet Draft[2]), and has reference implementations available in both Google Chrome [3] and Mozilla Firefox.[4] SPDY is similar to HTTP, with particular goals to reduce web page load latency and improve web security. SPDY achieves reduced latency through compression, multiplexing, and prioritization.[1] The name is not an acronym, but is a shortened version of the word "speedy".[5] SPDY is a trademark of Google.[6] http://en.wikipedia.org/wiki/SPDY

I would include  the two lists in any upcoming vote, perhaps as separate votes.

A few other comments:
On Number 4 in the first list: Seems to be only a partial description, remove data from record and do what with it? How would it be accessed and used. especially _id, likely to have a major impact on any project using couchdb.

On number 3: Hard dependencies on SpiderMonkey, Should this not be discussed more? Performance being a big issue, V8 may be a better choice or a set of native erlang routines or some other derivative.


Reply to this email directly or view it on GitHub:
https://gist.github.com/2387973

4 . Remove reserved metadata from documents

Removing the fields would allow higher performance and alternative
data types. A question remains as to where they would go, as not all
map to a standard HTTP header (_rev maps neatly to ETag,
though). Custom HTTP headers is an obvious solution, are there others?

I think it would be very handy to have an object with the metadata and the data. For example the following document:

{ _id: "ID", _rev: "1-abc", foo: "bar", ... }

would become:

{ _id: "ID", _rev: "1-abc", _data: { foo: "bar", ... } }

or, better:

{ id: "ID", rev: "1-abc", data: { foo: "bar", ... } }

This would allow for arrays, or even primitive types, as data:

{ id: "ID", rev: "1-abc", data: [ "bar", ... ] }

and it would not require parsing of HTTP headers. This is also very similar to the response of a view, where the returned object has metadata clearly separated from the document, when include_docs=true.

Thanks for the feedback but this is not the place for it. The dev@
thread is better but even there we are not voting on the contents of
the features, just their priority. Nor are we discussing detailed
solutions yet.

Sent from my iPhone

On 15 Apr 2012, at 13:58, Marcello Nuccio
reply@reply.github.com
wrote:

4 . Remove reserved metadata from documents

Removing the fields would allow higher performance and alternative
data types. A question remains as to where they would go, as not all
map to a standard HTTP header (_rev maps neatly to ETag,
though). Custom HTTP headers is an obvious solution, are there others?

I think it would be very handy to have an object with the metadata and the data. For example the following document:

{ _id: "ID", _rev: "1-abc", foo: "bar", ... }

would become:

{ _id: "ID", _rev: "1-abc", _data: { foo: "bar", ... } }

or, better:

{ id: "ID", rev: "1-abc", data: { foo: "bar", ... } }

This would allow for arrays, or even primitive types, as data:

{ id: "ID", rev: "1-abc", data: [ "bar", ... ] }

and it would not require parsing of HTTP headers. This is also very similar to the response of a view, where the returned object has metadata clearly separated from the document, when include_docs=true.


Reply to this email directly or view it on GitHub:
https://gist.github.com/2387973

TTL - auto expiring records, and have them completely removed after expiration similar to couchbase.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.