A database where you PUT/POST documents to trigger replications and you DELETE to cancel ongoing replications. These documents have exactly the same content as the JSON objects we used to POST to /_replicate/ (fields "source", "target", "create_target", "continuous", "doc_ids", "filter", "query_params".
Replication documents can have a user defined _id. Design documents (and _local documents) added to the replicator database are ignored.
The default name of this database is _replicator. The name can be changed in the .ini configuration, section [replicator], parameter db.
Let's say you PUT the following document into _replicator:
{ "_id": "my_rep", "source": "http://myserver.com:5984/foo", "target": "bar", "create_target": true }
In the couch log you'll see 2 entries like these:
[Thu, 17 Feb 2011 19:43:59 GMT] [info] [<0.291.0>] Document `my_rep` triggered replication `c0ebe9256695ff083347cbf95f93e280+create_target` [Thu, 17 Feb 2011 19:44:37 GMT] [info] [<0.124.0>] Replication `c0ebe9256695ff083347cbf95f93e280+create_target` finished (triggered by document `my_rep`)
As soon as the replication is triggered, the document will be updated by CouchDB with 3 new fields:
{ "_id": "my_rep", "source": "http://myserver.com:5984/foo", "target": "bar", "create_target": true, "_replication_id": "c0ebe9256695ff083347cbf95f93e280", "_replication_state": "triggered", "_replication_state_time": 1297974122 }
Note: special fields set by the replicator start with the prefix "_replication_".
- _replication_id: the ID internally assigned to the replication. This is the ID exposed by the output from /_active_tasks/.
- _replication_state: the current state of the replication;
- _replication_state_time: a Unix timestamp (number of seconds since 1 Jan 1970) that tells us when the current replication state (defined in _replication_state) was set
When the replication finishes, it will update the _replication_state field (and _replication_state_time) with the value "completed", so the document will look like:
{ "_id": "my_rep", "source": "http://myserver.com:5984/foo", "target": "bar", "create_target": true, "_replication_id": "c0ebe9256695ff083347cbf95f93e280", "_replication_state": "completed", "_replication_state_time": 1297974122 }
When an error happens during replication, the _replication_state field is set to "error" (and _replication_state gets updated of course).
When you PUT/POST a document to the _replicator database, CouchDB will attempt to start the replication up to 10 times (configurable under [replicator], parameter max_replication_retry_count). If it fails on the first attempt, it waits 5 seconds before doing a second attempt. If the second attempt fails, it waits 10 seconds before doing a third attempt. If the third attempt fails, it waits 20 seconds before doing a fourth attempt (each attempt doubles the previous wait period). When an attempt fails, the Couch log will show you something like:
[error] [<0.149.0>] Error starting replication `67c1bb92010e7abe35d7d629635f18b6+create_target` (document `my_rep_2`): {db_not_found,<<"could not open http://myserver:5986/foo/">>
Note: the _replication_state field is only set to "error" when all the attempts were unsuccessful.
There are only 3 possible values for the _replication_state field: "triggered", "completed" and "error". Continuous replications never get their state to "completed".
Lets suppose 2 documents are added to the _replicator database in the following order:
{ "_id": "doc_A", "source": "http://myserver.com:5984/foo", "target": "bar" }
and
{ "_id": "doc_B", "source": "http://myserver.com:5984/foo", "target": "bar" }
Both describe exactly the same replication (only their _ids differ). In this case document "doc_A" triggers the replication, getting updated by CouchDB with the fields _replication_state, _replication_state_time and _replication_id, just like it was described before. Document "doc_B" however, is only updated with one field, the _replication_id so it will look like this:
{ "_id": "doc_B", "source": "http://myserver.com:5984/foo", "target": "bar", "_replication_id": "c0ebe9256695ff083347cbf95f93e280" }
While document "doc_A" will look like this:
{ "_id": "doc_A", "source": "http://myserver.com:5984/foo", "target": "bar", "_replication_id": "c0ebe9256695ff083347cbf95f93e280", "_replication_state": "triggered", "_replication_state_time": 1297974122 }
Note that both document get exactly the same value for the _replication_id field. This way you can identify which documents refer to the same replication - you can for example define a view which maps replication IDs to document IDs.
To cancel a replication simply DELETE the document which triggered the replication. The Couch log will show you an entry like the following:
[Thu, 17 Feb 2011 20:16:29 GMT] [info] [<0.125.0>] Stopped replication `c0ebe9256695ff083347cbf95f93e280+continuous+create_target` because replication document `doc_A` was deleted
Note: You need to DELETE the document that triggered the replication. DELETEing another document that describes the same replication but it did not triggered it, will not cancel the replication.
When CouchDB is restarted, it checks its _replicator database and restarts any replication that is described by a document that either has its _replication_state field set to "triggered" or it doesn't have yet the _replication_state field set.
Note: Continuous replications always have a _replication_state field with the value "triggered", therefore they're always restarted when CouchDB is restarted.
Imagine your replicator database (default name is _replicator) has the two following documents that represent pull replications from servers A and B:
{ "_id": "rep_from_A", "source": "http://aserver.com:5984/foo", "target": "foo_a", "continuous": true, "_replication_id": "c0ebe9256695ff083347cbf95f93e280", "_replication_state": "triggered", "_replication_state_time": 1297971311 }
{ "_id": "rep_from_B", "source": "http://bserver.com:5984/foo", "target": "foo_b", "continuous": true, "_replication_id": "231bb3cf9d48314eaa8d48a9170570d1", "_replication_state": "triggered", "_replication_state_time": 1297974122 }
Now without stopping and restarting CouchDB, you change the name of the replicator database to another_replicator_db:
$ curl -vX PUT http://localhost:5984/_config/replicator/db -d '"another_replicator_db"' "_replicator"
As soon as this is done, both pull replications defined before, are stopped. This is explicitly mentioned in CouchDB's log:
[Fri, 11 Mar 2011 07:44:20 GMT] [info] [<0.104.0>] Stopping all ongoing replications because the replicator database was deleted or changed [Fri, 11 Mar 2011 07:44:20 GMT] [info] [<0.127.0>] 127.0.0.1 - - PUT /_config/replicator/db 200
Imagine now you add a replication document to the new replicator database named another_replicator_db:
{ "_id": "rep_from_X", "source": "http://xserver.com:5984/foo", "target": "foo_x", "continuous": true }
From now own you have a single replication going on in your system: a pull replication pulling from server X. Now you change back the replicator database to the original one _replicator:
$ curl -X PUT http://localhost:5984/_config/replicator/db -d '"_replicator"' "another_replicator_db"
Immediately after this operation, the replication pulling from server X will be stopped and the replications defined in the _replicator database (pulling from servers A and B) will be resumed.
Changing again the replicator database to another_replicator_db will stop the pull replications pulling from servers A and B, and resume the pull replication pulling from server X.
Imagine you have in server C a replicator database with the two following pull replication documents in it:
{ "_id": "rep_from_A", "source": "http://aserver.com:5984/foo", "target": "foo_a", "continuous": true, "_replication_id": "c0ebe9256695ff083347cbf95f93e280", "_replication_state": "triggered", "_replication_state_time": 1297971311 }
{ "_id": "rep_from_B", "source": "http://bserver.com:5984/foo", "target": "foo_b", "continuous": true, "_replication_id": "231bb3cf9d48314eaa8d48a9170570d1", "_replication_state": "triggered", "_replication_state_time": 1297974122 }
Now you would like to have the same pull replications going on in server D, that is, you would like to have server D pull replicating from servers A and B. You have two options:
- Explicitly add two documents to server's D replicator database
- Replicate server's C replicator database into server's D replicator database
Both alternatives accomplish exactly the same goal.
Hi there,
Running 1.1.0 in OSX. After deleting a replicator document couchdb crashed and now it won't start. Below is the complete trace when it crashed and after a typical start attempt.
Thanks,
-Nestor
[info] [<0.760.0>] 127.0.0.1 - - 'GET' / 200
[info] [<0.8222.0>] 127.0.0.1 - - 'DELETE' /_replicator/by_clientId?revs_info=true 409
[info] [<0.9051.0>] 127.0.0.1 - - 'GET' /_replicator/by_clientId 200
[info] [<0.9173.0>] 127.0.0.1 - - 'DELETE' /_replicator/by_clientId?rev=2-f39c18c522709854dcc19c6456565c0a 200
[info] [<0.97.0>] Stopped replication
1df345a6d2f41bc9849f61b51395df03+continuous+create_target
because replication documentby_clientId
was deleted[info] [<0.9304.0>] 127.0.0.1 - - 'POST' /_replicator 201
[info] [<0.97.0>] Stopping all ongoing replications because the replicator database was deleted or changed
[error] [<0.661.0>] ** Generic server <0.661.0> terminating
** Last message in was {'EXIT',<0.662.0>,
{shutdown,
{gen_server,call,
[<0.659.0>,next_missing_revs,infinity]}}}
** When Server state == {state,<0.630.0>,
{db,<0.625.0>,<0.626.0>,nil,
<<"1319075213275955">>,<0.623.0>,<0.627.0>,
{db_header,5,43,0,
{2774506,{21,0}},
{2775454,21},
{2768987,[]},
0,nil,nil,1000},
43,
{btree,<0.623.0>,
{2774506,{21,0}},
#Fun<couch_db_updater.10.19222179>,
#Fun<couch_db_updater.11.21515767>,
#Fun<couch_btree.5.124754102>,
#Fun<couch_db_updater.12.93888648>},
{btree,<0.623.0>,
{2775454,21},
#Fun<couch_db_updater.13.40165027>,
#Fun<couch_db_updater.14.82810239>,
#Fun<couch_btree.5.124754102>,
#Fun<couch_db_updater.15.104121193>},
{btree,<0.623.0>,
{2768987,[]},
#Fun<couch_btree.0.83553141>,
#Fun<couch_btree.1.30790806>,
#Fun<couch_btree.2.124754102>,nil},
43,<<"dms4">>,
"/usr/local/var/lib/couchdb/dms4.couch",[],[],
nil,
{user_ctx,null,[],undefined},
nil,1000,
[before_header,after_header,on_file_open],
false},
<0.659.0>,<0.662.0>,[],0,
{[],[]},
{<0.663.0>,#Ref<0.0.0.1135>},
false,0,nil,[],[]}
** Reason for termination ==
** {function_clause,
[{couch_rep_reader,handle_info,
[{'EXIT',<0.662.0>,
{shutdown,
{gen_server,call,
[<0.659.0>,next_missing_revs,infinity]}}},
{state,<0.630.0>,
{db,<0.625.0>,<0.626.0>,nil,<<"1319075213275955">>,<0.623.0>,
<0.627.0>,
{db_header,5,43,0,
{2774506,{21,0}},
{2775454,21},
{2768987,[]},
0,nil,nil,1000},
43,
{btree,<0.623.0>,
{2774506,{21,0}},
#Fun<couch_db_updater.10.19222179>,
#Fun<couch_db_updater.11.21515767>,
#Fun<couch_btree.5.124754102>,
#Fun<couch_db_updater.12.93888648>},
{btree,<0.623.0>,
{2775454,21},
#Fun<couch_db_updater.13.40165027>,
#Fun<couch_db_updater.14.82810239>,
#Fun<couch_btree.5.124754102>,
#Fun<couch_db_updater.15.104121193>},
{btree,<0.623.0>,
{2768987,[]},
#Fun<couch_btree.0.83553141>,
#Fun<couch_btree.1.30790806>,
#Fun<couch_btree.2.124754102>,nil},
43,<<"dms4">>,"/usr/local/var/lib/couchdb/dms4.couch",[],
[],nil,
{user_ctx,null,[],undefined},
nil,1000,
[before_header,after_header,on_file_open],
false},
<0.659.0>,<0.662.0>,[],0,
{[],[]},
{<0.663.0>,#Ref<0.0.0.1135>},
false,0,nil,[],[]}]},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
...
$ couchdb
Apache CouchDB 1.1.0 (LogLevel=info) is starting.
[info] [<0.97.0>] Stopping all ongoing replications because the replicator database was deleted or changed
[error] [<0.97.0>] ** Generic server couch_replication_manager terminating
** Last message in was {rep_db_update,
{[{<<"seq">>,3},
{<<"id">>,<<"3115e06cdcaade38e0e748d509000196">>},
{<<"changes">>,
[{[{<<"rev">>,
<<"2-b87fdc78d7e149c8027d17aef29ff279">>}]}]},
{doc,
{[{<<"_id">>,
<<"3115e06cdcaade38e0e748d509000196">>},
{<<"_rev">>,
<<"2-b87fdc78d7e149c8027d17aef29ff279">>},
{<<"id">>,<<"by_clientId">>},
{<<"source">>,<<"dms4">>},
{<<"target">>,
<<"http://bhubint.krfs.com:5984/dms4">>},
{<<"create_target">>,true},
{<<"continuous">>,true},
{<<"filter">>,
<<"replicateFilter/clientFilter">>},
{<<"_replication_state">>,<<"triggered">>},
{<<"_replication_state_time">>,
<<"2011-10-19T21:46:53-04:00">>},
{<<"_replication_id">>,
<<"0bef042c859d21c639c26b8cea151297">>}]}}]}}
** When Server state == {state,<0.103.0>,<0.104.0>,<<"_replicator">>,[],10}
** Reason for termination ==
** {noproc,{gen_server,call,[couch_httpd,{get,port}]}}
=ERROR REPORT==== 19-Oct-2011::22:03:17 ===
** Generic server couch_replication_manager terminating
** Last message in was {rep_db_update,
{[{<<"seq">>,3},
{<<"id">>,<<"3115e06cdcaade38e0e748d509000196">>},
{<<"changes">>,
[{[{<<"rev">>,
<<"2-b87fdc78d7e149c8027d17aef29ff279">>}]}]},
{doc,
{[{<<"_id">>,
<<"3115e06cdcaade38e0e748d509000196">>},
{<<"_rev">>,
<<"2-b87fdc78d7e149c8027d17aef29ff279">>},
{<<"id">>,<<"by_clientId">>},
{<<"source">>,<<"dms4">>},
{<<"target">>,
<<"http://bhubint.krfs.com:5984/dms4">>},
{<<"create_target">>,true},
{<<"continuous">>,true},
{<<"filter">>,
<<"replicateFilter/clientFilter">>},
{<<"_replication_state">>,<<"triggered">>},
{<<"_replication_state_time">>,
<<"2011-10-19T21:46:53-04:00">>},
{<<"_replication_id">>,
<<"0bef042c859d21c639c26b8cea151297">>}]}}]}}
** When Server state == {state,<0.103.0>,<0.104.0>,<<"_replicator">>,[],10}
** Reason for termination ==
** {noproc,{gen_server,call,[couch_httpd,{get,port}]}}
[error] [<0.97.0>] {error_report,<0.30.0>,
{<0.97.0>,crash_report,
[[{initial_call,
{couch_replication_manager,init,['Argument__1']}},
{pid,<0.97.0>},
{registered_name,couch_replication_manager},
{error_info,
{exit,
{noproc,{gen_server,call,[couch_httpd,{get,port}]}},
[{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}},
{ancestors,
[couch_secondary_services,couch_server_sup,<0.31.0>]},
{messages,[]},
{links,[<0.85.0>,<0.104.0>]},
{dictionary,[]},
{trap_exit,true},
{status,running},
{heap_size,987},
{stack_size,24},
{reductions,1513}],
[{neighbour,
[{pid,<0.104.0>},
{registered_name,[]},
{initial_call,{couch_event_sup,init,['Argument__1']}},
{current_function,{gen_server,loop,6}},
{ancestors,
[couch_replication_manager,couch_secondary_services,
couch_server_sup,<0.31.0>]},
{messages,[{'$gen_cast',stop}]},
{links,[<0.97.0>,<0.84.0>]},
{dictionary,[]},
{trap_exit,false},
{status,runnable},
{heap_size,233},
{stack_size,9},
{reductions,32}]}]]}}
=CRASH REPORT==== 19-Oct-2011::22:03:17 ===
crasher:
initial call: couch_replication_manager:init/1
pid: <0.97.0>
registered_name: couch_replication_manager
exception exit: {noproc,{gen_server,call,[couch_httpd,{get,port}]}}
in function gen_server:terminate/6
ancestors: [couch_secondary_services,couch_server_sup,<0.31.0>]
messages: []
links: [<0.85.0>,<0.104.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 987
stack_size: 24
reductions: 1513
neighbours:
neighbour: [{pid,<0.104.0>},
{registered_name,[]},
{initial_call,{couch_event_sup,init,['Argument__1']}},
{current_function,{gen_server,loop,6}},
{ancestors,[couch_replication_manager,
couch_secondary_services,couch_server_sup,
<0.31.0>]},
{messages,[{'$gen_cast',stop}]},
{links,[<0.97.0>,<0.84.0>]},
{dictionary,[]},
{trap_exit,false},
{status,runnable},
{heap_size,233},
{stack_size,9},
{reductions,32}]
[error] [<0.85.0>] {error_report,<0.30.0>,
{<0.85.0>,supervisor_report,
[{supervisor,{local,couch_secondary_services}},
{errorContext,child_terminated},
{reason,
{noproc,
{gen_server,call,[couch_httpd,{get,port}]}}},
{offender,
[{pid,<0.97.0>},
{name,replication_manager},
{mfargs,
{couch_replication_manager,start_link,[]}},
{restart_type,permanent},
{shutdown,1000},
{child_type,worker}]}]}}
Apache CouchDB has started. Time to relax.
=SUPERVISOR REPORT==== 19-Oct-2011::22:03:17 ===
Supervisor: {local,couch_secondary_services}
Context: child_terminated
Reason: {noproc,{gen_server,call,[couch_httpd,{get,port}]}}
Offender: [{pid,<0.97.0>},
{name,replication_manager},
{mfargs,{couch_replication_manager,start_link,[]}},
{restart_type,permanent},
{shutdown,1000},
{child_type,worker}]
[info] [<0.31.0>] Apache CouchDB has started on http://127.0.0.1:5984/
[info] [<0.132.0>] Stopping all ongoing replications because the replicator database was deleted or changed
[error] [<0.132.0>] ** Generic server couch_replication_manager terminating
** Last message in was {rep_db_update,
{[{<<"seq">>,3},
{<<"id">>,<<"3115e06cdcaade38e0e748d509000196">>},
{<<"changes">>,
[{[{<<"rev">>,
<<"2-b87fdc78d7e149c8027d17aef29ff279">>}]}]},
{doc,
{[{<<"_id">>,
<<"3115e06cdcaade38e0e748d509000196">>},
{<<"_rev">>,
<<"2-b87fdc78d7e149c8027d17aef29ff279">>},
{<<"id">>,<<"by_clientId">>},
{<<"source">>,<<"dms4">>},
{<<"target">>,
<<"http://bhubint.krfs.com:5984/dms4">>},
{<<"create_target">>,true},
{<<"continuous">>,true},
{<<"filter">>,
<<"replicateFilter/clientFilter">>},
{<<"_replication_state">>,<<"triggered">>},
{<<"_replication_state_time">>,
<<"2011-10-19T21:46:53-04:00">>},
{<<"_replication_id">>,
<<"0bef042c859d21c639c26b8cea151297">>}]}}]}}
** When Server state == {state,<0.133.0>,<0.134.0>,<<"_replicator">>,[],10}
** Reason for termination ==
** {bad_return_value,{not_found,json_mismatch}}
=ERROR REPORT==== 19-Oct-2011::22:03:17 ===
** Generic server couch_replication_manager terminating
** Last message in was {rep_db_update,
{[{<<"seq">>,3},
{<<"id">>,<<"3115e06cdcaade38e0e748d509000196">>},
{<<"changes">>,
[{[{<<"rev">>,
<<"2-b87fdc78d7e149c8027d17aef29ff279">>}]}]},
{doc,
{[{<<"_id">>,
<<"3115e06cdcaade38e0e748d509000196">>},
{<<"_rev">>,
<<"2-b87fdc78d7e149c8027d17aef29ff279">>},
{<<"id">>,<<"by_clientId">>},
{<<"source">>,<<"dms4">>},
{<<"target">>,
<<"http://bhubint.krfs.com:5984/dms4">>},
{<<"create_target">>,true},
{<<"continuous">>,true},
{<<"filter">>,
<<"replicateFilter/clientFilter">>},
{<<"_replication_state">>,<<"triggered">>},
{<<"_replication_state_time">>,
<<"2011-10-19T21:46:53-04:00">>},
{<<"_replication_id">>,
<<"0bef042c859d21c639c26b8cea151297">>}]}}]}}
** When Server state == {state,<0.133.0>,<0.134.0>,<<"_replicator">>,[],10}
** Reason for termination ==
** {bad_return_value,{not_found,json_mismatch}}
....