Skip to content

Instantly share code, notes, and snippets.

@warner
Created January 6, 2011 21:04
Show Gist options
  • Save warner/768580 to your computer and use it in GitHub Desktop.
Save warner/768580 to your computer and use it in GitHub Desktop.
Accounting notes
* Ostrom Accounting
** apply Elinor Ostrom's eight "design principles" (of stable local
common pool resource [CPR] management) from
http://en.wikipedia.org/wiki/Elinor_Ostrom
- 1. Clearly defined boundaries (effective exclusion of external
unentitled parties);
- 2. Rules regarding the appropriation and provision of common
resources are adapted to local conditions;
- 3. Collective-choice arrangements allow most resource appropriators
to participate in the decision-making process;
- 4. Effective monitoring by monitors who are part of or accountable
to the appropriators;
- 5. There is a scale of graduated sanctions for resource
appropriators who violate community rules;
- 6. Mechanisms of conflict resolution are cheap and of easy access;
- 7. The self-determination of the community is recognized by
higher-level authorities;
- 8. In the case of larger common-pool resources: organization in the
form of multiple layers of nested enterprises, with small local
CPRs at the base level.
** starting point: know who (self-described) is using your storage
space. Know whose storage space you are using. Make this information
clearly visible to everyone involved (i.e. everyone knows that
everyone else knows, etc).
*** tahoe.cfg:storage gets some new flags:
- accounting=enabled
- this turns on the lease-owner DB. Existing shares are marked
'anonymous'. New shares that arrive through the old
RIStorageServer interface are labeled according to the TubID of
the other end of the connection. New shares that arrive through
the new RIAccountableStorageServer interface are labeled
according to the account under which that interface object was
created (see below).
- accounting=required
- this reads "storage-accounts.txt" for a list of accounts. Each
contains a pubkey, a petname, and maybe some additional
information (either local notes, or self-describing data sent by
the privkey holder)
- the RIStorageServer interface no longer accepts shares. Only
RIAccountableStorageServer accepts them.
*** tahoe.cfg:client gets some new flags
- actually it needs to be in private/ somewhere
- add a privkey. If present, clients will connect to
RIStorageServer, then attempt to upgrade to
RIAccountableStorageServer by sending a signed upgrade request
- clients do all their storage ops through the
RIAccountableStorageServer, which causes their shares to be
labeled
- RIAccountableStorageServer also includes get-my-total-usage
methods
*** the welcome page gets a new control panel
- not sure if it needs to be user-private or not
- storage-server panel:
- contains lists of accounts that are consuming your storage
- if accounting=required, add buttons to freeze/thaw the account,
cautious button to delete all shares
- client panel:
- contains lists of servers that are holding your shares
- combo "grid" panel:
- contains both, correlated
*** maybe broadcast channel of activity
- daily, maybe at first hourly digest of aggregate usage
- "Bob uploaded 62MB of data". "Alice downloaded 146MB of data"
- "Bob is currently using 3.5GB of storage space"
- "Alice is currently hosting 4.2GB of shares and has 0.8GB free"
- also include new-server, new-client events
- "Carol joined the grid, offering 3.0GB of storage space"
- "Dave invited Edgar to join the grid"
- and server-admin actions
- "Carol froze Bob's shares: dude, you're using too much"
- "David deleted Alice's shares: you unfriended me on facebook so
I'm deleting all your data"
- also generalized chat
- "Bob says: anyone up for pizza tonight?"
*** storage server needs a new crawler
- or the existing LeaseCrawler needs some new features
- shares contain canonical lease info, but local
who-is-consuming-what and remote get-my-total-usage methods need
pre-generated totals
- once usage DB is complete, new shares are added at time of upload
- but we must be able to generate/regenerate usage DB from just the
shares (er, just shares plus table of ownerid->account data, since
share.lease.ownerid field is too small)
- should I punt and go to SQLite for this? hard, given that the
share files are canonical: you could have a crawler that updates
the SQLite DB, then get usage info by doing a SUM(), but both feel
expensive.
- usage doesn't need to be super accurate.
- crawler can keep a separate table for each prefixdir
- 1024 * numusers
- tell crawler when a lease is added or removed, it +/- the number
from that table
- when the crawler cycles around, the count can be made accurate
- bouncing the server will lose the counting work done on the
current bucket, so it will need to restart.
*** RIStorageServer gets new upgrade method
- accepts a signed request, returns RIAccountableStorageServer facet
- request needs to be scoped correctly: server1 should not be able
to get Alice's facet on server2. Request should include serverid.
- if #466 lands, we can add new keys to the "storage" service
announcement. Redefine "FURL" to mean "anonymous
RIStorageServer", and add maybe.. "login.furl" to be the login
desk (no, I don't like "login" either).
- .login(request) -> RIAccountableStorageServer or error
- "request" is [msg, sig, pubkey]
- "msg" is JSON-encoded dict of: {serverid=base32serverid,
clientid=base32keyid}
- servers only accept requests that contain their own serverid,
and for which clientid matches the pubkey
- We'll add other fields to this later, for certchains or
transitive introductions or whatever.
- for transitive introductions, request may also contain
recommendations / certchains / introduction path
- upgrade method may fail when server doesn't like the client
- might be a temporary failure: the upgrade request might get
elevated to the storage server admin for approval. Might want "try
again later (at time=T)" response code.
- storage requests to RIAccountableStorageServer might fail if
server-admin freezes or cancels the account. get-my-total-usage
should keep working in many cases.
** step two is to make this easier to configure
- Invitations
- transitive introductions
- account managers
- pay-for-storage
- tit-for-tat
** step three is to resolve the issues that blocked us in the past
- repairer: who pays for the new share?
- sub-accounts, delegation, allmydata partners
- public webapi node: extending accounting beyond node and through
webapi/WUI: when Bob uses a public WUI, how can his shares be
counted against his quota instead of the webapi operator's?
** details
*** RIStorageServer upgrade method
- it gets you an AccountDesk object (need better name)
- ideally we'd expose this object directly in the announcement,
rather than going through the legacy RIStorageServer, but having
an upgrade method allows new-client/new-server/old-introducer
*** AccountDesk (now called "Accountant")
- stern accountant type: cold, uncaring, strictly follows the rules
- you must beg for access to your safe deposit box
- maybe a magic share-moving wand, but tagged with a color that
you can't scrape off
**** "please give me access to my account", maybe .login()
- returns Account object
- requires proof of ownership of an account
- input is ECDSA-signed(rxFURL). We expect rxFURL to be on the
client's tub, rather than being a gift, but that's not necessary.
getReference(rxFURL)->granted(account_object)
- account is based on ECDSA signing key, so login requires a data
transfer, which requires the FURL back-reference
- actual method is:
msg = JSON.encode({"please-give-me-access-to-my-account-v1": rxFURL}).encode("utf-8")
account = login(msg, sig, pubkey)
- (not safe to make it a raw signature: make a distinct purpose)
- error is either returned to login() as exception, or to rxFURL
as rejected() method
**** Account object
- includes RIStorageServer methods, but scoped to one account
- also includes additional methods
*** lease crawler
- we want efficient updates of a table mapping from ownerid to
(allshares, sizeof_allshares)
- but the canonical data for that table lies in the (flat) share
files. The shares can be changed externally. it must accomodate
startup (shares but no table) and spontaneous loss of the table.
- so table should be regenerated/refreshed periodically. we can
tolerate inaccuracy as long as the time is bounded.
**** sqlite lease tables with generation numbers
- CREATE TABLE leases (prefix, si, ownerid, size, generation)
- 'size' is denormalized but probably helpful
- maybe include more data about the lease: sharing factor,
expiration time
- CREATE TABLE usage (prefix, ownerid, totalsize)
- CREATE TABLE lease_generations (prefix, complete_gen, new_gen)
- when quiescent, new_gen=NULL
- queries work against (leases where generation =
lease_generations[prefix].complete_gen)
- new/updated leases are added/changed in both gen=.complete_gen
and gen=.new_gen . Deleted leases are removed from both. If
.new_gen=NULL, only use .complete_gen . Figure size delta against
.complete_gen and inc/dec usage[prefix,ownerid].totalsize
- when the crawler starts on prefix "aa":
- lease_generations[prefix].new_gen = .complete_gen+1
- walk a chunk of shares, add lease data to .new_gen
- when prefix is done:
- update lease_generations[prefix].complete_gen = .new_gen
- lease_generations[prefix].new_gen = NULL
- DELETE leases[prefix, generation < new_gen]
- build set of ownerids used in this prefix
- foreach ownerid, sum usage across all leases in prefix,
update usage[prefix,ownerid]
- when Account wants usage: SUM usage[prefix=*,ownerid]
- when Account wants list of all shares, SELECT si FROM leases
WHERE lease=OWNERID AND generation = (SELECT complete_gen FROM
lease_generations WHERE lease_generations.prefix=leases.prefix)
- or something like that. Expensive, yeah. Cheaper if we all
deletions to lag and just use SELECT UNIQUE(si) FROM leases
WHERE lease=OWNERID : if they've deleted a share but the
crawler hasn't noticed yet..
- oh, yeah, just do that. If the crawler has caught up, any share
deletions will also be removed from leases[] (both
generations), so we'll be good. If a share has been deleted
out-of-band (i.e. admin does 'rm SHARE'), we'll be wrong until
the next cycle.
** trying to accomodate future modes
- bitcoin, other payment schemes, reciprocity
- when accounting is enabled but permissive (measure-not-prohibit),
accounts are created on-demand.
*** use RIStorageServer.get_version() to advertise accounting support
- and if accounting/v1 is present, advertise specific modes
- Ostrom-accounting is a required part of accounting/v1
- bitcoin is an optional feature
- accounting/v1 means do RIStorageServer.get_accountant() and then
forget about the initial (anonymous) RIStorageServer rref. Then
use the RIAccountant to get the RIAccount. Do everything else with
the account
- you can always get an account on demand, but it may not be able
to do anything. The server is not obligated to allocate anything
or remember your account until you e.g. add a lease.
*** RIAccount has some basic methods, maybe more if features are enabled
**** get_messages() -> dict
- these are messages that should be displayed to the user
- ["message"] should always be displayed: human-readable welcome or
warning message
- unrecognized keys are displayed, recognized keys are *not*
- e.g. {"bitcoin": "We accept bitcoin! See URL for details"}
- server's opportunity to teach client's user about new features
- if client knows about bitcoin, message is unnecessary
- message needs to be short (fits in small UI space). message
needs to be safe (no arbitrary HTML). maybe let each message
contain a small summary and a larger explanation, or a summary
and a URL (and hard-render the URL as a "learn more" link).
- maybe both ["warning"] and ["message"], display warning in red
**** get_status() -> dict {write:bool, read:bool, save:bool}
- both Ostrom-mode and BTC-mode share notion of account-status
- when all is well, clients can read and write as much as they
like
- if server admin gets annoyed, or they don't pay, account is
frozen: uploads are rejected but downloads are still allowed
- if they're really annoyed, downloads are rejected too, but
shares are retained
- ultimate punishment is to delete the shares
- (WRS) goes from TTT -> FTT -> FFT -> FFF
**** get_usage() -> dict {stored:int}
- sum of sizes of all shares on which this account has a lease
- should decide who pays for overhead, how it's recorded.. maybe
have two numbers
- leave room for other forms of usage
- in particular bandwidth: bytes in/out over last month (need way
to express time units)
**** get_bitcoin_data -> dict (only if bitcoin_v1 is advertised)
- price (BTC per byte-second, AWS $0.10/GB-month is 1fBTCpBS)
- current pre-paid balance
- lifetime of current usage (usage*price/balance=time)
- insert-coin address
- actually, maybe provide get_bitcoin_address() for this. If
bitcoin had a sort of "PO Number" label in the transactions
(which can be done by jamming an unused string into the
scriptSig, which would be tolerated by all clients, but it's
not standardized so clients don't provide APIs in or out), then
the sender could put their clientid in it, and receivers would
watch for transactions.
- but they don't. Easiest approach I can think of is to tell
each client a different bitcoin address, used only for them.
- Having a get_bitcoin_address() would let servers create
them lazily, on demand, rather than as soon as client
connects. Hm, on second thought, this doesn't win much.
- the Accountant needs to know how much BTC has arrived. It
only needs this when checking the books, so maybe once a
week. It could ask the bitcoin client for how much BTC is in
the owning BC-account, but eventually that BTC will be
transferred elsewhere. So really it needs a transaction log.
"bitcoind listreceivedbyaccount" might do it, but there is no
txnid field to distinguish between subsequent payments.
- anyways, there should be a payment dance, initiated by a
pay_bitcoin() method, which returns some information that needs
to be passed to the bitcoin client. Ideally the bitcoin txn is
included in the foolscap message, so the receiver can validate
it right away. If not, the receiver makes a persistent note to
expect the txn, and starts polling the bitcoin client for the
money.
**** set_nickname
- provide Ostrom-mode data about ourselves to the server
- meant for a human to see and consider
- not trusted beyond the Ostrom sense
**** usual share methods: allocate_bucket, get_bucket, add_lease
** known-storage-servers UI page
- this is on the client, showing the servers it knows about
*** each storage server row has a field for the common properties
- server message
- write/read/save status
- current usage
*** specific accounting modes provide additional fields/columns
**** when bitcoin is present on both sides:
- show price, current balance, lifetime
- show "pay for storage" field, with suggested BTC amount and
"Spend!" button
- if sending BTC doesn't increase quota right away, the field needs
to provide a pacifier message
- show previous payment history
**** reciprocity: show space their client is using on us
- blurs line between them-as-servers and them-as-clients. The
read/write/save mode we enforce on them should probably be
displayed next to the space they're using, and then the
freeze/thaw/delete buttons should go there too. And the bandwidth
they've consumed. Tricky.
*** show clientid here, so you can copy-and-paste it to servers
- like when you send them an email saying "please let me store
shares on your server"
** reciprocity
*** storage servers advertise the clientid that they benefit
- Bob's storage server will advertise Bob's clientid
- the rent-a-friend paid server will advertise a clientid for
whoever hired them: when Bob pays for the server, Bob's client
gets the benefits.
- not sure how to share a server between multiple friends: maybe
the advertisement should say 30% Bob, 40% Alice, etc.
*** client-side StorageFarmBroker tracks client usage
- it remembers that it has stored 2GB on server A
*** Accountant asks StorageFarmBroker for reciprocity benefit
- when considering how to treat client A, it asks broker about
clientid A
- broker checks all the servers it knows it is using, finds ones
that benefit clientid A, adds our usage on them, reports total to
Accountant
- Accountant deducts reciprocity benefit from client's total when
deciding if they're overquota or not
** DEV PLAN
*** DONE clientkey generation
*** DONE connection upgrade, signed rxFURL
*** DONE Account wrapper
*** DONE real Accountant.get_account(): need table of accountid->ownernum
- must be persistent
*** DONE send nickname to Account, other identifying data
*** lease crawler, db update, space totaler
*** DONE fake space-totaler numbers
*** DONE server-side space-consumed-per-client-account status display [5/5]
**** DONE nickname
**** DONE clientid
**** DONE current connection status (from address)
**** DONE last-heard-from time
**** DONE first created
*** DONE RIAccount get-space-i'm-using methods
*** DONE client-side retrieve-space-i'm-using, add to webui status display
*** DONE client-side show-status webui
*** DONE client-side show-server-message webui
*** client-side Account object, push message to it, instead of polling
when rendering status WUI page
*** server-side accounting controls: accounting=required, list of pubkeys
*** server-side status/control webui: "freeze" button
*** clean up client-side which-servers-i'm-using webui display
- add last-share-refused notice, and size of the request that was
rejected. This gives you an idea of which servers are full. Greg
Troxel's suggestion from the list, 04-Jan-2011.
*** then start playing with fake bitcoin controls
*** settle on format for client keys
- currently pub-v0-bo7uxkjfuu4zpqpfmzsknogorrvnyfxgs5nc776ddsbaoxswqt47owmnyof75jbzi6zr74cb4hoos
- might want to trim "pub-v0-"
- might want to add "client-pub-v0-" instead?
*** client-side usage tracker: hard
*** rearrange patches to move util.keyutil into #466
*** decide about client.key and server.key (same? separate?)
**** change to use NIST256p, not 192p
** Tahoe leasecrawler
[2010-12-29 Wed 20:59]
- take prodnet share catalog, turn into sqlite db, check performance
- sum size per ownernum
- total size. Guessing 1M shares: 40MB plus index
- pretend 1k owners. Assign each share 1.11 owners (all get 1, 10%
get 2, 1% get 3)
- (SI, ownernum, size)
- index on SI, index on ownernum
- Crawler: at start of each prefix, remove any db rows for which
there are no shares
- for each on-disk bucket, add/remove rows to match disk
- do all space-used-per-ownernum calculations on demand, from db
- db is derived from shares. Manually deleting a share will cause
wrong space usage number until crawler comes around. Same for
adding a share.
- for friendnets (few ownernums), getting size of all accounts is quick
- for prodnet (lots of ownernums), this could be expensive. Consider
using a separate thread, or separate process (account-manager
process, interacts through db and fs)
- move lease-expiration duties out of thr crawler over to the db. Do
the expiration check on each prefix just after it finishes. Maybe
add "expires-at" column to db, add a status display showing
histogram of share ages.
- do lease expiration manually via control panel. Panel shows
histogram of shares, ages, cutoff threshold as vertical line,
button to delete everything left of line. This would be a better
way to explain the current summary text on the lease-crawler page.
- db coherence: all client-triggered share ops touch both file and db
in the same turn. Crawler: at start of prefix, compute
set(os.listdir)-set(db) and remove those from db in the same turn.
Then over subsequent turns, scan each bucket and make db match.
That should maintain coherence.
- include flag to say whether all prefixes have been scanned at least
once. Status display should clearly say "incomplete" until this
flag is set.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment