-
-
Save rnewson/2387973 to your computer and use it in GitHub Desktop.
A list of features that we want to see in CouchDB. Needs to be voted on so that it can become a priority queue. | |
User Facing Features | |
==================== | |
1. Conflicts are the rule, not the exception | |
All previous versions of CouchDB hide conflicts by default (selecting | |
an arbitrary but consistent winning revision). Expert users can find | |
and resolve conflicts. | |
Instead, expose the true picture by default, this includes; | |
* Reading a document with conflicts returns all conflicting versions, | |
not just the winner. This might manifest as the ?conflicts=true | |
response or could be a 300 (Multiple Choices) response. | |
* Always accept a write (as long as it passes all validate_doc_update | |
functions). This means that no response will give a 409 (Conflict) by | |
default. You can still insist on a matching revision by using the | |
If-Match header. | |
* _rev is frequently assumed to be a user-facing revision/versioning | |
system, our efforts to convince otherwise have failed. Embrace this | |
also and rename the field to _mvcc. | |
2. Replace Futon | |
A modern interface that has first class support for all features | |
(proper editing for validate_doc_update, show and list functions, | |
etc). | |
3. Improve the user and security model | |
* Support distributed identity systems such as OpenID | |
* Allow for easier external authentication | |
* Finer grained authorization (instead of the binary _admin or not) | |
* Instead of exposing /_users as a database, design an API to cover | |
all expected operations instead. | |
Fine-grained authorization would allow the ability grant read and write | |
access independently, among other things. Specifically it should be possible | |
to grant the ability to write but not read. | |
4. Remove reserved metadata from documents | |
CouchDB treats some fields specially (_id, _rev, _attachments), | |
requiring a transformation process when reading and writing documents. | |
Removing the fields would allow higher performance and alternative | |
data types. A question remains as to where they would go, as not all | |
map to a standard HTTP header (_rev maps neatly to ETag, | |
though). Custom HTTP headers is an obvious solution, are there others? | |
5. built-in map functions | |
To complement the built-in reduce functions (_sum, _count, _stats) for | |
common use cases. | |
6. DSL for index creation, validation functions, etc. | |
The DSL would be very simple, deliberately not capable of expressing | |
all possible algorithms, but can always be efficiently evaluated | |
within the native VM. This will be faster in general but also avoids | |
managing a pool of couchjs processes. | |
7. Support CORS | |
8. Support WebSockets | |
9. Support EventSource | |
10. Support SPDY | |
11. Support richer reduce functions | |
Any reduce function with an output that grows no more than logarithmically with the input but might be substantially larger than the current 200 byte threshold | |
12. Richer querying model | |
While CouchDB views are powerful, they are are not as capable as | |
relational queries. This is largely deliberate as the fully relational | |
model is hard to scale. | |
One method to improve things (and provide the ability to sort by | |
value) is to introduce chained MapReduce (currently only available on cloudant). | |
This item also includes any other enhancement to the kinds of querying that | |
CouchDB can perform. | |
13. Partial updates of documents | |
It should be possible to change just a subset of a document's | |
properties without needing to write an update function. | |
14. Partial reads of documents | |
It should be possible to read just a subset of a document's properties | |
without needing to write a show function. | |
15. Create an exclusive namespace for databases | |
Databases have a fixed position in the URL (/<dbname>/) but share it | |
with many other items (/_log, /_replicate, etc). Fix this by | |
introducing an exclusive namespace for dbs (e.g, /_db/<dbname>) | |
16. Improve replication interoperability between implementations and | |
versions | |
* Introduce a tiered replication model, starting with a very simple | |
'dump' and 'load' tier, all the way up to a highly optimized, but | |
complex, protocol that reduces redundant data transmission. | |
* This work would also form the basis of an export/import feature for | |
incremental and complete backups. | |
17. Enhance background task management | |
Currently, replication tasks can be cancelled, awkwardly. Compaction | |
tasks and view building tasks cannot, short of restarting the server | |
or remote shell access to the erland VM. | |
Provide a consistent and simple API for cancelling any running | |
task. This API will also provide status/progress information where | |
appropriate and pause/resume where possible too. | |
18. Documentation | |
Documentation is scattered, dated, and incomplete. Couchbase have | |
donated their improved docs. We will incorporate these into the new | |
house style, fill in any gaps, and commit to updating documentation in | |
line with new releases. | |
19. Global changes feed | |
One or both of; | |
* A feed of server events like db creation, update and deletion. | |
* A federated changes feed for a selected set of database and changes feeds. | |
20. Allow database renaming | |
Self-explanatory, but the feature is complicated by the bigcouch | |
merger as the rename of a sharded database is not atomic without effort. | |
21. Database "aliases" (symlinks) | |
Visiting a database symlink will seamlessly redirect to the target database. | |
22. _changes feed for views | |
It should be possible to subscribe to view changes the same way we can | |
subscribe to database changes. This will enable many useful things, | |
chained map-reduce being a notable one. | |
23. per-db overrides of server-wide settings | |
Allow db-specific overrides for otherwise server wide configuration | |
settings, where sensible. | |
Developer Facing Features | |
========================= | |
1. OTP compliance/refactoring | |
2. Different HTTP engine (webmachine -> cowboy/yaws/etc) | |
3. Have hard dependencies on SpiderMonkey versions. Also simplifies the build. | |
4. Test suites for different versions of replication, file formats, etc. | |
5. Move attachments out of database files (which removes make_blocks) | |
6. Plugin/addon/module interface | |
7. View server protocol enhancements/refactoring | |
8. Make .ini config files optional: (1) move defaults into the code, (2) instead of local/default, ship a fully complete config with all of its lines commented out | |
9. Database corruption detection and repair | |
While CouchDB's append-only model is very safe, underlying issues with | |
filesystems and hardware can still corrupt databases. CouchDB can; | |
* add checksums on everything (btree nodes, documents, etc) | |
* ship a tool to verify all checksums. | |
* include a repair tool (that extracts everything extractable) | |
* perhaps include ECC information to allow recovery from corruption. |
4 . Remove reserved metadata from documents
Removing the fields would allow higher performance and alternative
data types. A question remains as to where they would go, as not all
map to a standard HTTP header (_rev maps neatly to ETag,
though). Custom HTTP headers is an obvious solution, are there others?
I think it would be very handy to have an object with the metadata and the data. For example the following document:
{ _id: "ID", _rev: "1-abc", foo: "bar", ... }
would become:
{ _id: "ID", _rev: "1-abc", _data: { foo: "bar", ... } }
or, better:
{ id: "ID", rev: "1-abc", data: { foo: "bar", ... } }
This would allow for arrays, or even primitive types, as data:
{ id: "ID", rev: "1-abc", data: [ "bar", ... ] }
and it would not require parsing of HTTP headers. This is also very similar to the response of a view, where the returned object has metadata clearly separated from the document, when include_docs=true.
TTL - auto expiring records, and have them completely removed after expiration similar to couchbase.
I'd add a definitions section for the following:
CORS - Cross-origin resource sharing (CORS) is a web browser technology specification, which defines ways for a web server to allow its resources to be accessed by a web page from a different domain. http://en.wikipedia.org/wiki/Cross-origin_resource_sharing
EventSource - Server-sent events is a technology for providing push notifications from a server to a browser client in the form of DOM events. The Server-Sent Events EventSource API is now being standardized as part of HTML5[1] by the W3C. http://en.wikipedia.org/wiki/Server-sent_events
SPDY - SPDY (pronounced speedy)[1] is an experimental networking protocol developed primarily at Google for transporting web content.[1] Although not currently a standard protocol, the group developing SPDY has stated publicly that it is working toward standardization (available now as an Internet Draft[2]), and has reference implementations available in both Google Chrome [3] and Mozilla Firefox.[4] SPDY is similar to HTTP, with particular goals to reduce web page load latency and improve web security. SPDY achieves reduced latency through compression, multiplexing, and prioritization.[1] The name is not an acronym, but is a shortened version of the word "speedy".[5] SPDY is a trademark of Google.[6] http://en.wikipedia.org/wiki/SPDY
I would include the two lists in any upcoming vote, perhaps as separate votes.
A few other comments:
On Number 4 in the first list: Seems to be only a partial description, remove data from record and do what with it? How would it be accessed and used. especially _id, likely to have a major impact on any project using couchdb.
On number 3: Hard dependencies on SpiderMonkey, Should this not be discussed more? Performance being a big issue, V8 may be a better choice or a set of native erlang routines or some other derivative.