Skip to content

Instantly share code, notes, and snippets.

@mikeal
Created December 27, 2013 18:25
Show Gist options
  • Save mikeal/8150724 to your computer and use it in GitHub Desktop.
Save mikeal/8150724 to your computer and use it in GitHub Desktop.
CouchDB attachments manta thread
https://twitter.com/izs/status/416635696367951873
let's talk here, we have commments with mor ethan 140 characters!
@indutny
Copy link

indutny commented Dec 27, 2013

Delay is not that important (if it is < 10-15 minutes) :)

@mikeal
Copy link
Author

mikeal commented Dec 27, 2013

@izs this will work provided that you write custom comparison between R1 and R2 documents because the _rev will be pretty worthless. you'll also need to run constant compaction on R1 if you actually want to keep the disc size down.

@janl
Copy link

janl commented Dec 27, 2013

  1. No.
  2. No.
  3. No-ish, except that the code path for a local installation then would be different from a regular setup. Special care needs to be taken that this doesn’t break in the future. I’m not too worried about this not working right away, since it used to be the default for so long.

For the viability of R2, as you know, keep close contact with the CouchDB folks through dev@ or JIRA (preferred) or IRC or me personally, but you all know that.

@janl
Copy link

janl commented Dec 27, 2013

On the R1->R2 migrator script: that’s the one I haven’t manage to think through yet, but seems to me be most likely part of this that might make it fail.

Not to bikeshed too much:

An alternate setup could be a node “proxy” or couch plugin for offline users that knows how to mirror & update attachments from the CDN and serves it as a “local” or “offline” CDN. The proxy would rewrite dist.tarball to the “local CDN” and npm is none the wiser.

This would allow to use any sort of [distributed] binary update setup including rsync and bit torrent which seems more work for now, but might help down the road.

Then R2 is not really required any more.

@janl
Copy link

janl commented Dec 27, 2013

Addendum: the whole motivation for the “alternate setup” is to avoid having to have R2.

And an added benefit is that npm only needs to know about one type of operation as the remote and local setups would look alike.

@isaacs
Copy link

isaacs commented Dec 27, 2013

@mikeal

this will work provided that you write custom comparison between R1 and R2 documents because the _rev will be pretty worthless. you'll also need to run constant compaction on R1 if you actually want to keep the disc size down.

The comparison between R1 and R2 docs will be pretty simple, because the replication job will be the only thing ever writing to R2. Basically: "Are there attachments for all the versions? If not, go get them, and slap them on."

@janl

the code path for a local installation then would be different from a regular setup. Special care needs to be taken that this doesn’t break in the future. I’m not too worried about this not working right away, since it used to be the default for so long.

What's the difference between a "local installation" and a "regular setup"? By "regular setup", do you mean the public registry?

For the viability of R2, as you know, keep close contact with the CouchDB folks through dev@ or JIRA (preferred) or IRC or me personally, but you all know that.

I do, you guys are great :)

@isaacs
Copy link

isaacs commented Dec 27, 2013

@janl

... offline cdn ..
Then R2 is not really required any more.

R2 is still required for existing replicators to be minimally impacted. And, in fact, if R2 can be found at the current CouchDB location, then so much the better.

That proxy would be cool, though, and I'm sure people could benefit from it. See also:

http://npm.im/npm-proxy
http://npm.im/npm-registry-proxy
http://npm.im/npmd

@ everybody

Thank you for the feedback in this discussion. It's really helped to clarify the needs here. I'll make sure that your use cases keep working, and be sure to give you a heads up if you have to change anything.

@janl
Copy link

janl commented Dec 27, 2013

@izs

What's the difference between a "local installation" and a "regular setup"? By "regular setup", do you mean the public registry?

Yes, let’s use “remote” and “local” to mean the public registry and any local/offline copy respectively.

I thought about it some more and the fact that dist.tarball is a URL makes the code path the same in either case of old npm, npm+manta or any offline copy EXCEPT the logic to transparently follow up a manta 404 with a GET to the R1 document/attachment because we can assume that exists (that logic exists to avoid race conditions with the moving (copy & delete) of attachments to manta). So ignore my earlier comment about different code paths.

The only difference would be that the fallback would not work with an offline copy of R2 (in the original proposal) because there is no attachment that should return 404 (outside of other errors) whereas in my proposal the fallback has no place to fall back to that is also offline. but it could arguably fall back to online, as at that time the registry would be online anyway.

I’m not sure if I am making things more or less complicated now.

@janl
Copy link

janl commented Dec 27, 2013

R2 is still required for existing replicators to be minimally impacted. And, in fact, if R2 can be found at the current CouchDB location, then so much the better.

Fair enough :)

@konobi
Copy link

konobi commented Dec 27, 2013

How about a plain HTTP/FTP folder, with tarballs having redirects and the json of the couchdb dumped into a 'json' folder? Then it's just a plain sync and script to read in the json (or another non-couch server to serve up the info).

@isaacs
Copy link

isaacs commented Dec 27, 2013

@janl

The only difference would be that the fallback would not work with an offline copy of R2....
I’m not sure if I am making things more or less complicated now.

Yes, I <3 you, but now you are definitely making things more complicated :)

That fallback is written in VCL, and exists only on the Fastly CDN config. From the outside, if you're not looking at the http headers, you wouldn't know that npm wasn't fetching from Manta, or through a CDN. npm-the-client just fetches $registry/$pkg, and then follows the dist.tarball url in the doc, and expects it to be a tgz file matching the dist.shasum in the doc. It is completely server-architecture agnostic, other than that. (As proof: this is how it already works, and has for days now, and you're totally ok with it :)

The bottom line of this inquiry seems to be that I can do exactly what I want, and it's just going to cost me having a second DB setup so that replication users can keep doing their thing. Also, either the current isaacs.iriscouch.com/registry couch can be the canonical R2, or can replicate from it, and no one will even notice the change.

@isaacs
Copy link

isaacs commented Dec 28, 2013

@konobi

How about a plain HTTP/FTP folder, with tarballs having redirects and the json of the couchdb dumped into a 'json' folder? Then it's just a plain sync and script to read in the json (or another non-couch server to serve up the info).

That's roughly what's sitting in Manta under /isaacs/public/npm. You can even mount this with NFS, I'm told, though I haven't tried myself, and a folder with >52,000 entries might be rough on some systems :)

The format is:

/isaacs/public/npm
+-- $package_name
|   +-- doc.json - The same as the couchdb document
|   `-- _attachments
|       +-- $package_name-$version.tgz - Tarball attachments
|       `-- more tarballs...
`-- more packages...

@till
Copy link

till commented Dec 30, 2013

@isaacs Along with what @konobi said — how feasible are static files which I can use as a local registry? I'd like to mimimize my operational overhead. On my end, something like nginx which serves all the files from a directory. For upstream, ppl can wget on a regular basis which would make running mirrors a lot, lot easier as well

Also, I've never heard of Manta — what exactly is this? Joyent's version of S3? Also, is there anything to see yet and try?

@isaacs
Copy link

isaacs commented Jan 2, 2014

@till

Manta is http://www.joyent.com/products/manta. There's an SDK and docs. Because the files are in my public storage location, you can read them, link them to your space, run jobs over them, etc.

This much is already done. As of this moment, tarballs are being served through Fastly from Manta, not from CouchDB. They still are in CouchDB as attachments, but unless you're hitting the CouchDB endpoint directly (or if Manta doesn't have the file yet), you will be getting them from Manta.

It's probably not trivial to use static files as a local registry, at least, not one that you could write to. I haven't explored that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment