wiredtiger/issue #20 api.txt #1

## issue #20 api.txt #1
=== WiredTiger Overview:

There should be a prominent version on the main page.  I'm using a #define
of WIREDTIGER_VERSION in my wiredtiger.h, I'm guessing we'll do something
like that eventually, can doxygen pick that up?   (I'm guessing the 1.0
opposite the odd-looking little box is the version?  Anyway, I'd make
this much more prominent, and it needs to map to the source code, of
course.)   We should make all released versions of the documents available
from our web site, at some point, separately from releases -- there's
no reason our web site can't serve docs for users.

The phrase "public interface" implies there's a private interface?

"We follow SQL terminology: a database is set of tables that are managed
together. Tables logically consist of rows, each row has a key and a
value. Tables may optionally have an associated schema, which splits the
key/value pair into a set of columns. Tables may also have associated
indices, each of which is ordered by some set of columns."
->
"We follow SQL terminology: a database is set of tables managed together.
Tables consist of rows, where each row is a key and its associated value.
Tables may optionally have an associated schema, splitting the value
into a set of columns.  Tables may also have associated indices, each of
which is ordered by one or more columns."

"WiredTiger supports column-oriented storage in addition to traditional
row-oriented storage. Instead of storing all fields from a row together,
WiredTiger can efficiently store and access sets of columns (including
single columns) separately.
->
"In addition to the traditional row-oriented storage where all columns
of a row are stored together, WiredTiger supports column-oriented storage,
where one or more columns can be stored individually, allowing more
efficient access and storage."

Should we move the rest of the "Introduction" somewhere else?  Does API
documentation normally discuss specific classes as part of the introduction?

Do we need an "Examples" paragraph given there's an "Examples" tab at
the top of the page?

I'd pad out the list of Programmer's Reference docs, that is, put a full
sentence, something like:

	+ WiredTiger Architecture
		A discussion of blah, blah, blah.
	+ Using WiredTiger
		A page for blah, blah, blah

It's a bit odd to have navigation tabs at the top, plus a list of links
in the page itself?   Maybe this is a doxygen thing, and I'm happy to
be guided by your esthetics here, but having a top-level navigation
button for "Data Structures", but not one for "API Reference" seems
backward?

Page titles are too long?   For example, the architecture page's title
is:
	<title>wiredtiger - WiredTiger Data Store API: WiredTiger
	Architecture - Code</title>

which means it won't even begin to fit in a tab's title, so all of the
pages appear to have identical tab browser names.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Related Pages

The "Related Pages" page has a link to "Using WiredTiger", but none of the
other pages listed in the Programmer's Reference?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger Architecture

Isn't there a better word for "Local Interface", ummm, "Functional API"?

We should IM a bit about the whole RPC Server thing -- you and I have
not thought about how the RPC server should fit together, but I was
thinking of using a shared memory segment to move stuff back & forth,
rather than marshalling/unmarshalling stuff to/from the RPC library (the
whole BDB RPC chunk really was a pain in the ass).   In other words, we
might not even use RPC, because we'll have a shared memory chunk we use
and that we define, and the only message passing we use is just enough
to say "there's stuff for the engine", and "the engine has returned to
the client".  As I said, I've only thought about this enough to convince
myself it was a solveable problem, nothing more concrete that that.

But, given that discussion, I'd suggest something like:


C API   <---> remote client <---> Java
              remote client <---> Python
	      remote client <---> C

and then the C API block talks to the WiredTiger Engine.

I guess I'm re-acting to the fact that I don't understand the difference
between the "C API" and the "Local Interface", and why the C API would
start a "C Client"?  Does it matter from the point of view of a programmer?

Since we no longer have a cache, does it make sense to have a cache
square?

Ditto "Access methods", since we only have 1?  Or are access methods
different flavors of row & column stores?

Here's a more general comment: because everything is focused on an
in-memory tree, we don't have any natural separate between the cache,
concurrency control and the access method(s) -- that's one big happy
glop.   Do you think txns will end up the same way?   I think logging
will continue to be separate.

"Concurency"
->
"Concurrency"

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger Design

Do we still want this in the docs?  (And, if we do, there's work to
be done.)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Table File Formats

Do we still want this in the docs?  (And, if we do, there's work to be
done.)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger API

The link on the main page is to "API Reference", so the two names should
match?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Mapping SQL onto the WiredTiger API

What's the plan/goal for this section?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Getting Started with the API

"does not exist when the program starts running."
->
"does not already exist."

It looks odd that the code in "connecting to a database" and "opening a
session" is split -- I'd make it two separate if statements, it's hard
to read the way it is.

"The code block above also shows simple error handling with
wiredtiger_strerror. The default behavior for more detailed errors is
to write them to stderr.  The default behavior for more detailed errors
is to write them to stderr. That can be overridden by passing an
implementation of WT_ERROR_HANDLER to wiredtiger_open or
WT_CONNECTION::open_session."
->
"The code block above also shows simple error handling with
wiredtiger_strerror (a function that returns a string describing an error
code passed as its argument).  More complex error handling can be
configured by passing an implementation of WT_ERROR_HANDLER to
wiredtiger_open or WT_CONNECTION::open_session."

Michael, can we not fold the lines in this example code?  There's no
reason to do so, given that the browser window is wider?

I'm still unhappy that set_key & set_value can fail.   It's just going
to be a total pain in the ass to handle errors in C.  If we can't remove
all possible failures, I think we need to simply store an "I failed"
error in the cursor structure, which is checked/returned when the real
function (in this case cursor->insert) is called.  There's no performance
penalty in doing that, and a huge gain for application writers.

"marshal" -> "marshall"

I realize set_{key,value} and get_{key,value} take variable arguments;
is there a reason we couldn't list those arguments in the cursor->insert
call?   I guess I'm asking, isn't cursor->insert(cursor, <random stuff>)
semantically equivalent to calling set_key & set_value separately?  Or,
maybe a better way to ask: if set_key/value have to figure out what's
being passed as arguments to them, why can't insert do that same magic,
whatever it is?   Ditto get_{key,value} & cursor->next.

Anyway, what I'm arguing, in general, is to put extra effort into making
things simple for application writers, it's more important than avoiding
magic underneath the covers -- and I think our current get/put API is
more complicated to write to, and handle errors from, than BDB's.

"If we weren't using the cursor for the call to WT_CURSOR::insert above,
this loop would simplify to:"
->
"Because the cursor was positioned in the table after the WT_CURSOR::insert
call, we had to re-position it using the WT_CURSOR::first call; if we
weren't using the cursor for the call to WT_CURSOR::insert above, this
loop would simplify to:"

Are we using object::method as a standard?   Is there a standard?  I
recall that BDB docs used object.method.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Configuration Strings

I still disagree with "If the "=<value>" part is omitted, the value of
1 is assumed."   (1) I doubt that there are enough defaults of 1 that
this is a significant win, and (2) it makes it easier for programmer's
to make a mistake, and (3) it makes it hard for maintenance programmers
to figure out what's going on.

"Values may be nested lists, for example:"
->
Why did we switch to Python?  That confused me for a minute, especially
the sudden appearance of parenthesis.

"10MiB" -> "10MB"

"priority to a transaction to reduce aborts"
->
"priority to a transaction"

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger API

Should wiredtiger_struct_pack (or some flavor of it) allocate and return
a size plus a buffer?  That might make it easier to build apps, you don't
have to figure out the size yourself.

Or, maybe the way to ask this question: why does wiredtiger_struct_size
(wiredtiger_struct_sizev) need arguments other than the format string?
Presumably you're calling wiredtiger_struct_size{v} to allocate memory
for the chunk -- why not just let WT allocate (and possibly resize?)
the buffer for you?

Some of the items aren't sorted?  (This looks like groups of laundry
lists to me, which means they should all be sorted?)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Packing and Unpacking Data

You've got an XXX here, obviously this one is still in motion.

I'd move "packing & unpacking" after schemas, you don't need pack/unpack
until you have columns, right?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Schemas

"*schema*"
->
bold, maybe?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Sharing Between Processes

I'd go with "multiprocess=on" not "sharing=on".

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Data Structures

It occurs to me, we have xxxOR three times, and xxxTION once.   Maybe
WT_CONNECTION should be WT_CONNECTOR?   WT_SESSOR doesn't make sense,
though.   *shrug*

"The WT_CURSOR struct is the interface to a cursor"
->
"The WT_CURSOR object (handle?) is the interface to a cursor"

This happens in a few places, we should probably search for "struct" and
consistently switch to object or handle, for example, there's a page
"WT_CURSOR Struct Reference".

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_CURSOR Struct Reference

Do we need language on the resulting cursor position after each method?

We need language that "on failure", cursor position is undetermined.
Apps that care need to either dup the cursor, or we could offer a config
string that dups on all ops for you?

I don't think the list of Examples really helps -- it's kind of a laundry
list.  I'd suggest a single Example that looks something like:

	Example:

	/* Close the cursor. */
	ret = cursor->close(cursor);

Obviously, it's more complicated for more complicated methods, but
something to show use on every method will help us in the field, I
believe.  I expect we need an example for each configuration string,
too, can we stuff that into the config string explanations?

Since we're using method names for next, prev and so on, shouldn't we
use method names for the exactp argument to search, that is, search
(exact match), range_next (smallest key larger than search key), range_prev
(largest key smaller than search key)?  I would argue exact matches are
what almost all apps want (BDB didn't add search_range for a long time,
IIRC), so why make programmers know about something they won't care
about?   That gets rid of the "exact >= 0" magic in ex_call_center.c,
for example.

There's no wording on the behavior of insert/update in the face of
existing/non-existing records.

I didn't see anything to deal with duplicate sets?  (All of BDB's
prev-dup, next-dup, no-next-dup blah, blah, blah.)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_SESSION Struct Reference

Do we need a salvage table op?

If add_schema can be used to change schema, maybe set_schema is better?

Why is checkpoint here, as opposed to being a WT_CONNECTION method?

Is it standard that processes can't have multiple txns going at the same
time?   Or, I guess you just open multiple sessions, OK.  The txn_begin
method needs to fail if you have one already, that's a programming error.
Ditto rollback_txn if there's no txn.

"column set" feels like an undefined term to me.

The trailing "(multiple)" wording isn't clear.

Using [xxx] for the default isn't clear.   Can't we explicitly call out
the default ("the isolation level for this transaction "serializable"
or "snapshot" or "read-committed" or "read-uncommitted"; default is
"snapshot").

"old log files"
->
"log files no longer needed for transactional durability"

Why have 0/1 for checkpoint "archive", "force", flush_cache or flush_log?
Have a default, and if you want to change it, the keyword changes it?

Ditto WT_SESSION::create_tables "exclusive" keyword.   Why do we need
it to have two values, it has only one possible meaning ("I want exclusive
access").

Parameters to configuration strings aren't sorted?

I think the "overwrite" keyword should be a per-operation flag, not a
per-cursor flag.  I'd go with 3 method names, myself: insert, insert_update,
update?

If we're going to allow the dup'd cursor to change stuff (for example,
the encoding), should we have a more general interface?  Maybe open_cursor
needs takes an optional dup-cursor argument, where stuff gets copied
from an existing cursor, but you can also change stuff.  For example,
"open_cursor" has a "dup" config string, but dup_cursor doesn't allow
you to change that behavior in the duplicated cursor.  Rather than have
the dup-cursor method track the open-cursor's arguments (and have to
explain which ones can be over-ridden), a single method might be simpler,
where the WT_CURSOR *entry arg "initializes the opened cursor to reference
the same table entry as the specified cursor, with the same modes as the
specified cursor, but modified by the new cursor's configuration"?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_CONNECTION Struct Reference

WT_CONNECTION::load_extention -- I wouldn't support a default, maintenance
programmers will hate it (I changed the name, and now it doesn't work?).
Force the programmer to set the name.

Is it useful to be able to set the error handler per session?  I was
expecting apps to set an error handler outside of wiredtiger (so, it's
a function that can't fail), and then it would be used for the life of
the app.  Then WT_CONNECTION::open_session doesn't need the argument.

I must be misunderstanding something: I don't see any functions that
create the WT_CURSOR_FACTORY or WT_ERROR_HANDLER handles?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_COLLATOR struct reference

Why does the compare function need a WT_SESSION handle, wasn't that
copied into the WT_COLLATOR structure when it was created?

And how do you create a WT_COLLATOR structure?

And how do you specify different collators for the keys and duplicate
values?  (I'm missing the connection between WT_COLLATORs and the table
create?)

I think most of these comments apply to the WT_EXTRACTOR Struct Ref
page as well.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Examples

We should say what each example is intended to show.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== ex_config.c

We probably need "ret =" in front of the create_table & begin_txn
statements.  (These lines were copied to a couple of places in
other example programs.)

Oh, and the top of ex_transaction.c says "ex_hello.c", the top of
ex_thread.c says "ex_access.c", ex_schema.c says "ex_column.c".

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Random comments --

If you put the Copyright statement on a line by itself, it makes it
easier to automatically upgrade them.

## issue 20 api.txt
=== WiredTiger Overview:

There should be a prominent version on the main page.  I'm using a #define
of WIREDTIGER_VERSION in my wiredtiger.h, I'm guessing we'll do something
like that eventually, can doxygen pick that up?   (I'm guessing the 1.0
opposite the odd-looking little box is the version?  Anyway, I'd make
this much more prominent, and it needs to map to the source code, of
course.)   We should make all released versions of the documents available
from our web site, at some point, separately from releases -- there's
no reason our web site can't serve docs for users.

The phrase "public interface" implies there's a private interface?

"We follow SQL terminology: a database is set of tables that are managed
together. Tables logically consist of rows, each row has a key and a
value. Tables may optionally have an associated schema, which splits the
key/value pair into a set of columns. Tables may also have associated
indices, each of which is ordered by some set of columns."
->
"We follow SQL terminology: a database is set of tables managed together.
Tables consist of rows, where each row is a key and its associated value.
Tables may optionally have an associated schema, splitting the value
into a set of columns.  Tables may also have associated indices, each of
which is ordered by one or more columns."

"WiredTiger supports column-oriented storage in addition to traditional
row-oriented storage. Instead of storing all fields from a row together,
WiredTiger can efficiently store and access sets of columns (including
single columns) separately.
->
"In addition to the traditional row-oriented storage where all columns
of a row are stored together, WiredTiger supports column-oriented storage,
where one or more columns can be stored individually, allowing more
efficient access and storage."

Should we move the rest of the "Introduction" somewhere else?  Does API
documentation normally discuss specific classes as part of the introduction?

Do we need an "Examples" paragraph given there's an "Examples" tab at
the top of the page?

I'd pad out the list of Programmer's Reference docs, that is, put a full
sentence, something like:

	+ WiredTiger Architecture
		A discussion of blah, blah, blah.
	+ Using WiredTiger
		A page for blah, blah, blah

It's a bit odd to have navigation tabs at the top, plus a list of links
in the page itself?   Maybe this is a doxygen thing, and I'm happy to
be guided by your esthetics here, but having a top-level navigation
button for "Data Structures", but not one for "API Reference" seems
backward?

Page titles are too long?   For example, the architecture page's title
is:
	<title>wiredtiger - WiredTiger Data Store API: WiredTiger
	Architecture - Code</title>

which means it won't even begin to fit in a tab's title, so all of the
pages appear to have identical tab browser names.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Related Pages

The "Related Pages" page has a link to "Using WiredTiger", but none of the
other pages listed in the Programmer's Reference?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger Architecture

Isn't there a better word for "Local Interface", ummm, "Functional API"?

We should IM a bit about the whole RPC Server thing -- you and I have
not thought about how the RPC server should fit together, but I was
thinking of using a shared memory segment to move stuff back & forth,
rather than marshalling/unmarshalling stuff to/from the RPC library (the
whole BDB RPC chunk really was a pain in the ass).   In other words, we
might not even use RPC, because we'll have a shared memory chunk we use
and that we define, and the only message passing we use is just enough
to say "there's stuff for the engine", and "the engine has returned to
the client".  As I said, I've only thought about this enough to convince
myself it was a solveable problem, nothing more concrete that that.

But, given that discussion, I'd suggest something like:


C API   <---> remote client <---> Java
              remote client <---> Python
	      remote client <---> C

and then the C API block talks to the WiredTiger Engine.

I guess I'm re-acting to the fact that I don't understand the difference
between the "C API" and the "Local Interface", and why the C API would
start a "C Client"?  Does it matter from the point of view of a programmer?

Since we no longer have a cache, does it make sense to have a cache
square?

Ditto "Access methods", since we only have 1?  Or are access methods
different flavors of row & column stores?

Here's a more general comment: because everything is focused on an
in-memory tree, we don't have any natural separate between the cache,
concurrency control and the access method(s) -- that's one big happy
glop.   Do you think txns will end up the same way?   I think logging
will continue to be separate.

"Concurency"
->
"Concurrency"

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger Design

Do we still want this in the docs?  (And, if we do, there's work to
be done.)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Table File Formats

Do we still want this in the docs?  (And, if we do, there's work to be
done.)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger API

The link on the main page is to "API Reference", so the two names should
match?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Mapping SQL onto the WiredTiger API

What's the plan/goal for this section?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Getting Started with the API

"does not exist when the program starts running."
->
"does not already exist."

It looks odd that the code in "connecting to a database" and "opening a
session" is split -- I'd make it two separate if statements, it's hard
to read the way it is.

"The code block above also shows simple error handling with
wiredtiger_strerror. The default behavior for more detailed errors is
to write them to stderr.  The default behavior for more detailed errors
is to write them to stderr. That can be overridden by passing an
implementation of WT_ERROR_HANDLER to wiredtiger_open or
WT_CONNECTION::open_session."
->
"The code block above also shows simple error handling with
wiredtiger_strerror (a function that returns a string describing an error
code passed as its argument).  More complex error handling can be
configured by passing an implementation of WT_ERROR_HANDLER to
wiredtiger_open or WT_CONNECTION::open_session."

Michael, can we not fold the lines in this example code?  There's no
reason to do so, given that the browser window is wider?

I'm still unhappy that set_key & set_value can fail.   It's just going
to be a total pain in the ass to handle errors in C.  If we can't remove
all possible failures, I think we need to simply store an "I failed"
error in the cursor structure, which is checked/returned when the real
function (in this case cursor->insert) is called.  There's no performance
penalty in doing that, and a huge gain for application writers.

"marshal" -> "marshall"

I realize set_{key,value} and get_{key,value} take variable arguments;
is there a reason we couldn't list those arguments in the cursor->insert
call?   I guess I'm asking, isn't cursor->insert(cursor, <random stuff>)
semantically equivalent to calling set_key & set_value separately?  Or,
maybe a better way to ask: if set_key/value have to figure out what's
being passed as arguments to them, why can't insert do that same magic,
whatever it is?   Ditto get_{key,value} & cursor->next.

Anyway, what I'm arguing, in general, is to put extra effort into making
things simple for application writers, it's more important than avoiding
magic underneath the covers -- and I think our current get/put API is
more complicated to write to, and handle errors from, than BDB's.

"If we weren't using the cursor for the call to WT_CURSOR::insert above,
this loop would simplify to:"
->
"Because the cursor was positioned in the table after the WT_CURSOR::insert
call, we had to re-position it using the WT_CURSOR::first call; if we
weren't using the cursor for the call to WT_CURSOR::insert above, this
loop would simplify to:"

Are we using object::method as a standard?   Is there a standard?  I
recall that BDB docs used object.method.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Configuration Strings

I still disagree with "If the "=<value>" part is omitted, the value of
1 is assumed."   (1) I doubt that there are enough defaults of 1 that
this is a significant win, and (2) it makes it easier for programmer's
to make a mistake, and (3) it makes it hard for maintenance programmers
to figure out what's going on.

"Values may be nested lists, for example:"
->
Why did we switch to Python?  That confused me for a minute, especially
the sudden appearance of parenthesis.

"10MiB" -> "10MB"

"priority to a transaction to reduce aborts"
->
"priority to a transaction"

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger API

Should wiredtiger_struct_pack (or some flavor of it) allocate and return
a size plus a buffer?  That might make it easier to build apps, you don't
have to figure out the size yourself.

Or, maybe the way to ask this question: why does wiredtiger_struct_size
(wiredtiger_struct_sizev) need arguments other than the format string?
Presumably you're calling wiredtiger_struct_size{v} to allocate memory
for the chunk -- why not just let WT allocate (and possibly resize?)
the buffer for you?

Some of the items aren't sorted?  (This looks like groups of laundry
lists to me, which means they should all be sorted?)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Packing and Unpacking Data

You've got an XXX here, obviously this one is still in motion.

I'd move "packing & unpacking" after schemas, you don't need pack/unpack
until you have columns, right?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Schemas

"*schema*"
->
bold, maybe?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Sharing Between Processes

I'd go with "multiprocess=on" not "sharing=on".

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Data Structures

It occurs to me, we have xxxOR three times, and xxxTION once.   Maybe
WT_CONNECTION should be WT_CONNECTOR?   WT_SESSOR doesn't make sense,
though.   *shrug*

"The WT_CURSOR struct is the interface to a cursor"
->
"The WT_CURSOR object (handle?) is the interface to a cursor"

This happens in a few places, we should probably search for "struct" and
consistently switch to object or handle, for example, there's a page
"WT_CURSOR Struct Reference".

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_CURSOR Struct Reference

Do we need language on the resulting cursor position after each method?

We need language that "on failure", cursor position is undetermined.
Apps that care need to either dup the cursor, or we could offer a config
string that dups on all ops for you?

I don't think the list of Examples really helps -- it's kind of a laundry
list.  I'd suggest a single Example that looks something like:

	Example:

	/* Close the cursor. */
	ret = cursor->close(cursor);

Obviously, it's more complicated for more complicated methods, but
something to show use on every method will help us in the field, I
believe.  I expect we need an example for each configuration string,
too, can we stuff that into the config string explanations?

Since we're using method names for next, prev and so on, shouldn't we
use method names for the exactp argument to search, that is, search
(exact match), range_next (smallest key larger than search key), range_prev
(largest key smaller than search key)?  I would argue exact matches are
what almost all apps want (BDB didn't add search_range for a long time,
IIRC), so why make programmers know about something they won't care
about?   That gets rid of the "exact >= 0" magic in ex_call_center.c,
for example.

There's no wording on the behavior of insert/update in the face of
existing/non-existing records.

I didn't see anything to deal with duplicate sets?  (All of BDB's
prev-dup, next-dup, no-next-dup blah, blah, blah.)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_SESSION Struct Reference

Do we need a salvage table op?

If add_schema can be used to change schema, maybe set_schema is better?

Why is checkpoint here, as opposed to being a WT_CONNECTION method?

Is it standard that processes can't have multiple txns going at the same
time?   Or, I guess you just open multiple sessions, OK.  The txn_begin
method needs to fail if you have one already, that's a programming error.
Ditto rollback_txn if there's no txn.

"column set" feels like an undefined term to me.

The trailing "(multiple)" wording isn't clear.

Using [xxx] for the default isn't clear.   Can't we explicitly call out
the default ("the isolation level for this transaction "serializable"
or "snapshot" or "read-committed" or "read-uncommitted"; default is
"snapshot").

"old log files"
->
"log files no longer needed for transactional durability"

Why have 0/1 for checkpoint "archive", "force", flush_cache or flush_log?
Have a default, and if you want to change it, the keyword changes it?

Ditto WT_SESSION::create_tables "exclusive" keyword.   Why do we need
it to have two values, it has only one possible meaning ("I want exclusive
access").

Parameters to configuration strings aren't sorted?

I think the "overwrite" keyword should be a per-operation flag, not a
per-cursor flag.  I'd go with 3 method names, myself: insert, insert_update,
update?

If we're going to allow the dup'd cursor to change stuff (for example,
the encoding), should we have a more general interface?  Maybe open_cursor
needs takes an optional dup-cursor argument, where stuff gets copied
from an existing cursor, but you can also change stuff.  For example,
"open_cursor" has a "dup" config string, but dup_cursor doesn't allow
you to change that behavior in the duplicated cursor.  Rather than have
the dup-cursor method track the open-cursor's arguments (and have to
explain which ones can be over-ridden), a single method might be simpler,
where the WT_CURSOR *entry arg "initializes the opened cursor to reference
the same table entry as the specified cursor, with the same modes as the
specified cursor, but modified by the new cursor's configuration"?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_CONNECTION Struct Reference

WT_CONNECTION::load_extention -- I wouldn't support a default, maintenance
programmers will hate it (I changed the name, and now it doesn't work?).
Force the programmer to set the name.

Is it useful to be able to set the error handler per session?  I was
expecting apps to set an error handler outside of wiredtiger (so, it's
a function that can't fail), and then it would be used for the life of
the app.  Then WT_CONNECTION::open_session doesn't need the argument.

I must be misunderstanding something: I don't see any functions that
create the WT_CURSOR_FACTORY or WT_ERROR_HANDLER handles?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_COLLATOR struct reference

Why does the compare function need a WT_SESSION handle, wasn't that
copied into the WT_COLLATOR structure when it was created?

And how do you create a WT_COLLATOR structure?

And how do you specify different collators for the keys and duplicate
values?  (I'm missing the connection between WT_COLLATORs and the table
create?)

I think most of these comments apply to the WT_EXTRACTOR Struct Ref
page as well.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Examples

We should say what each example is intended to show.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== ex_config.c

We probably need "ret =" in front of the create_table & begin_txn
statements.  (These lines were copied to a couple of places in
other example programs.)

Oh, and the top of ex_transaction.c says "ex_hello.c", the top of
ex_thread.c says "ex_access.c", ex_schema.c says "ex_column.c".

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Random comments --

If you put the Copyright statement on a line by itself, it makes it
easier to automatically upgrade them.
	=== WiredTiger Overview:

	There should be a prominent version on the main page. I'm using a #define
	of WIREDTIGER_VERSION in my wiredtiger.h, I'm guessing we'll do something
	like that eventually, can doxygen pick that up? (I'm guessing the 1.0
	opposite the odd-looking little box is the version? Anyway, I'd make
	this much more prominent, and it needs to map to the source code, of
	course.) We should make all released versions of the documents available
	from our web site, at some point, separately from releases -- there's
	no reason our web site can't serve docs for users.

	The phrase "public interface" implies there's a private interface?

	"We follow SQL terminology: a database is set of tables that are managed
	together. Tables logically consist of rows, each row has a key and a
	value. Tables may optionally have an associated schema, which splits the
	key/value pair into a set of columns. Tables may also have associated
	indices, each of which is ordered by some set of columns."
	->
	"We follow SQL terminology: a database is set of tables managed together.
	Tables consist of rows, where each row is a key and its associated value.
	Tables may optionally have an associated schema, splitting the value
	into a set of columns. Tables may also have associated indices, each of
	which is ordered by one or more columns."

	"WiredTiger supports column-oriented storage in addition to traditional
	row-oriented storage. Instead of storing all fields from a row together,
	WiredTiger can efficiently store and access sets of columns (including
	single columns) separately.
	->
	"In addition to the traditional row-oriented storage where all columns
	of a row are stored together, WiredTiger supports column-oriented storage,
	where one or more columns can be stored individually, allowing more
	efficient access and storage."

	Should we move the rest of the "Introduction" somewhere else? Does API
	documentation normally discuss specific classes as part of the introduction?

	Do we need an "Examples" paragraph given there's an "Examples" tab at
	the top of the page?

	I'd pad out the list of Programmer's Reference docs, that is, put a full
	sentence, something like:

	+ WiredTiger Architecture
	A discussion of blah, blah, blah.
	+ Using WiredTiger
	A page for blah, blah, blah

	It's a bit odd to have navigation tabs at the top, plus a list of links
	in the page itself? Maybe this is a doxygen thing, and I'm happy to
	be guided by your esthetics here, but having a top-level navigation
	button for "Data Structures", but not one for "API Reference" seems
	backward?

	Page titles are too long? For example, the architecture page's title
	is:
	<title>wiredtiger - WiredTiger Data Store API: WiredTiger
	Architecture - Code</title>

	which means it won't even begin to fit in a tab's title, so all of the
	pages appear to have identical tab browser names.

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== Related Pages

	The "Related Pages" page has a link to "Using WiredTiger", but none of the
	other pages listed in the Programmer's Reference?

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== WiredTiger Architecture

	Isn't there a better word for "Local Interface", ummm, "Functional API"?

	We should IM a bit about the whole RPC Server thing -- you and I have
	not thought about how the RPC server should fit together, but I was
	thinking of using a shared memory segment to move stuff back & forth,
	rather than marshalling/unmarshalling stuff to/from the RPC library (the
	whole BDB RPC chunk really was a pain in the ass). In other words, we
	might not even use RPC, because we'll have a shared memory chunk we use
	and that we define, and the only message passing we use is just enough
	to say "there's stuff for the engine", and "the engine has returned to
	the client". As I said, I've only thought about this enough to convince
	myself it was a solveable problem, nothing more concrete that that.

	But, given that discussion, I'd suggest something like:


	C API <---> remote client <---> Java
	remote client <---> Python
	remote client <---> C

	and then the C API block talks to the WiredTiger Engine.

	I guess I'm re-acting to the fact that I don't understand the difference
	between the "C API" and the "Local Interface", and why the C API would
	start a "C Client"? Does it matter from the point of view of a programmer?

	Since we no longer have a cache, does it make sense to have a cache
	square?

	Ditto "Access methods", since we only have 1? Or are access methods
	different flavors of row & column stores?

	Here's a more general comment: because everything is focused on an
	in-memory tree, we don't have any natural separate between the cache,
	concurrency control and the access method(s) -- that's one big happy
	glop. Do you think txns will end up the same way? I think logging
	will continue to be separate.

	"Concurency"
	->
	"Concurrency"

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== WiredTiger Design

	Do we still want this in the docs? (And, if we do, there's work to
	be done.)

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== Table File Formats

	Do we still want this in the docs? (And, if we do, there's work to be
	done.)

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== WiredTiger API

	The link on the main page is to "API Reference", so the two names should
	match?

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== Mapping SQL onto the WiredTiger API

	What's the plan/goal for this section?

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== Getting Started with the API

	"does not exist when the program starts running."
	->
	"does not already exist."

	It looks odd that the code in "connecting to a database" and "opening a
	session" is split -- I'd make it two separate if statements, it's hard
	to read the way it is.

	"The code block above also shows simple error handling with
	wiredtiger_strerror. The default behavior for more detailed errors is
	to write them to stderr. The default behavior for more detailed errors
	is to write them to stderr. That can be overridden by passing an
	implementation of WT_ERROR_HANDLER to wiredtiger_open or
	WT_CONNECTION::open_session."
	->
	"The code block above also shows simple error handling with
	wiredtiger_strerror (a function that returns a string describing an error
	code passed as its argument). More complex error handling can be
	configured by passing an implementation of WT_ERROR_HANDLER to
	wiredtiger_open or WT_CONNECTION::open_session."

	Michael, can we not fold the lines in this example code? There's no
	reason to do so, given that the browser window is wider?

	I'm still unhappy that set_key & set_value can fail. It's just going
	to be a total pain in the ass to handle errors in C. If we can't remove
	all possible failures, I think we need to simply store an "I failed"
	error in the cursor structure, which is checked/returned when the real
	function (in this case cursor->insert) is called. There's no performance
	penalty in doing that, and a huge gain for application writers.

	"marshal" -> "marshall"

	I realize set_{key,value} and get_{key,value} take variable arguments;
	is there a reason we couldn't list those arguments in the cursor->insert
	call? I guess I'm asking, isn't cursor->insert(cursor, <random stuff>)
	semantically equivalent to calling set_key & set_value separately? Or,
	maybe a better way to ask: if set_key/value have to figure out what's
	being passed as arguments to them, why can't insert do that same magic,
	whatever it is? Ditto get_{key,value} & cursor->next.

	Anyway, what I'm arguing, in general, is to put extra effort into making
	things simple for application writers, it's more important than avoiding
	magic underneath the covers -- and I think our current get/put API is
	more complicated to write to, and handle errors from, than BDB's.

	"If we weren't using the cursor for the call to WT_CURSOR::insert above,
	this loop would simplify to:"
	->
	"Because the cursor was positioned in the table after the WT_CURSOR::insert
	call, we had to re-position it using the WT_CURSOR::first call; if we
	weren't using the cursor for the call to WT_CURSOR::insert above, this
	loop would simplify to:"

	Are we using object::method as a standard? Is there a standard? I
	recall that BDB docs used object.method.

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== Configuration Strings

	I still disagree with "If the "=<value>" part is omitted, the value of
	1 is assumed." (1) I doubt that there are enough defaults of 1 that
	this is a significant win, and (2) it makes it easier for programmer's
	to make a mistake, and (3) it makes it hard for maintenance programmers
	to figure out what's going on.

	"Values may be nested lists, for example:"
	->
	Why did we switch to Python? That confused me for a minute, especially
	the sudden appearance of parenthesis.

	"10MiB" -> "10MB"

	"priority to a transaction to reduce aborts"
	->
	"priority to a transaction"

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== WiredTiger API

	Should wiredtiger_struct_pack (or some flavor of it) allocate and return
	a size plus a buffer? That might make it easier to build apps, you don't
	have to figure out the size yourself.

	Or, maybe the way to ask this question: why does wiredtiger_struct_size
	(wiredtiger_struct_sizev) need arguments other than the format string?
	Presumably you're calling wiredtiger_struct_size{v} to allocate memory
	for the chunk -- why not just let WT allocate (and possibly resize?)
	the buffer for you?

	Some of the items aren't sorted? (This looks like groups of laundry
	lists to me, which means they should all be sorted?)

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== Packing and Unpacking Data

	You've got an XXX here, obviously this one is still in motion.

	I'd move "packing & unpacking" after schemas, you don't need pack/unpack
	until you have columns, right?

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== Schemas

	"schema"
	->
	bold, maybe?

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== Sharing Between Processes

	I'd go with "multiprocess=on" not "sharing=on".

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== Data Structures

	It occurs to me, we have xxxOR three times, and xxxTION once. Maybe
	WT_CONNECTION should be WT_CONNECTOR? WT_SESSOR doesn't make sense,
	though. shrug

	"The WT_CURSOR struct is the interface to a cursor"
	->
	"The WT_CURSOR object (handle?) is the interface to a cursor"

	This happens in a few places, we should probably search for "struct" and
	consistently switch to object or handle, for example, there's a page
	"WT_CURSOR Struct Reference".

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== WT_CURSOR Struct Reference

	Do we need language on the resulting cursor position after each method?

	We need language that "on failure", cursor position is undetermined.
	Apps that care need to either dup the cursor, or we could offer a config
	string that dups on all ops for you?

	I don't think the list of Examples really helps -- it's kind of a laundry
	list. I'd suggest a single Example that looks something like:

	Example:

	/* Close the cursor. */
	ret = cursor->close(cursor);

	Obviously, it's more complicated for more complicated methods, but
	something to show use on every method will help us in the field, I
	believe. I expect we need an example for each configuration string,
	too, can we stuff that into the config string explanations?

	Since we're using method names for next, prev and so on, shouldn't we
	use method names for the exactp argument to search, that is, search
	(exact match), range_next (smallest key larger than search key), range_prev
	(largest key smaller than search key)? I would argue exact matches are
	what almost all apps want (BDB didn't add search_range for a long time,
	IIRC), so why make programmers know about something they won't care
	about? That gets rid of the "exact >= 0" magic in ex_call_center.c,
	for example.

	There's no wording on the behavior of insert/update in the face of
	existing/non-existing records.

	I didn't see anything to deal with duplicate sets? (All of BDB's
	prev-dup, next-dup, no-next-dup blah, blah, blah.)

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== WT_SESSION Struct Reference

	Do we need a salvage table op?

	If add_schema can be used to change schema, maybe set_schema is better?

	Why is checkpoint here, as opposed to being a WT_CONNECTION method?

	Is it standard that processes can't have multiple txns going at the same
	time? Or, I guess you just open multiple sessions, OK. The txn_begin
	method needs to fail if you have one already, that's a programming error.
	Ditto rollback_txn if there's no txn.

	"column set" feels like an undefined term to me.

	The trailing "(multiple)" wording isn't clear.

	Using [xxx] for the default isn't clear. Can't we explicitly call out
	the default ("the isolation level for this transaction "serializable"
	or "snapshot" or "read-committed" or "read-uncommitted"; default is
	"snapshot").

	"old log files"
	->
	"log files no longer needed for transactional durability"

	Why have 0/1 for checkpoint "archive", "force", flush_cache or flush_log?
	Have a default, and if you want to change it, the keyword changes it?

	Ditto WT_SESSION::create_tables "exclusive" keyword. Why do we need
	it to have two values, it has only one possible meaning ("I want exclusive
	access").

	Parameters to configuration strings aren't sorted?

	I think the "overwrite" keyword should be a per-operation flag, not a
	per-cursor flag. I'd go with 3 method names, myself: insert, insert_update,
	update?

	If we're going to allow the dup'd cursor to change stuff (for example,
	the encoding), should we have a more general interface? Maybe open_cursor
	needs takes an optional dup-cursor argument, where stuff gets copied
	from an existing cursor, but you can also change stuff. For example,
	"open_cursor" has a "dup" config string, but dup_cursor doesn't allow
	you to change that behavior in the duplicated cursor. Rather than have
	the dup-cursor method track the open-cursor's arguments (and have to
	explain which ones can be over-ridden), a single method might be simpler,
	where the WT_CURSOR *entry arg "initializes the opened cursor to reference
	the same table entry as the specified cursor, with the same modes as the
	specified cursor, but modified by the new cursor's configuration"?

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== WT_CONNECTION Struct Reference

	WT_CONNECTION::load_extention -- I wouldn't support a default, maintenance
	programmers will hate it (I changed the name, and now it doesn't work?).
	Force the programmer to set the name.

	Is it useful to be able to set the error handler per session? I was
	expecting apps to set an error handler outside of wiredtiger (so, it's
	a function that can't fail), and then it would be used for the life of
	the app. Then WT_CONNECTION::open_session doesn't need the argument.

	I must be misunderstanding something: I don't see any functions that
	create the WT_CURSOR_FACTORY or WT_ERROR_HANDLER handles?

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== WT_COLLATOR struct reference

	Why does the compare function need a WT_SESSION handle, wasn't that
	copied into the WT_COLLATOR structure when it was created?

	And how do you create a WT_COLLATOR structure?

	And how do you specify different collators for the keys and duplicate
	values? (I'm missing the connection between WT_COLLATORs and the table
	create?)

	I think most of these comments apply to the WT_EXTRACTOR Struct Ref
	page as well.

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== Examples

	We should say what each example is intended to show.

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	=== ex_config.c

	We probably need "ret =" in front of the create_table & begin_txn
	statements. (These lines were copied to a couple of places in
	other example programs.)

	Oh, and the top of ex_transaction.c says "ex_hello.c", the top of
	ex_thread.c says "ex_access.c", ex_schema.c says "ex_column.c".

	=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
	Random comments --

	If you put the Copyright statement on a line by itself, it makes it
	easier to automatically upgrade them.