Skip to content

Instantly share code, notes, and snippets.

@wiredtiger
Created January 31, 2012 13:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wiredtiger/1710388 to your computer and use it in GitHub Desktop.
Save wiredtiger/1710388 to your computer and use it in GitHub Desktop.
=== WiredTiger Overview:
There should be a prominent version on the main page. I'm using a #define
of WIREDTIGER_VERSION in my wiredtiger.h, I'm guessing we'll do something
like that eventually, can doxygen pick that up? (I'm guessing the 1.0
opposite the odd-looking little box is the version? Anyway, I'd make
this much more prominent, and it needs to map to the source code, of
course.) We should make all released versions of the documents available
from our web site, at some point, separately from releases -- there's
no reason our web site can't serve docs for users.
The phrase "public interface" implies there's a private interface?
"We follow SQL terminology: a database is set of tables that are managed
together. Tables logically consist of rows, each row has a key and a
value. Tables may optionally have an associated schema, which splits the
key/value pair into a set of columns. Tables may also have associated
indices, each of which is ordered by some set of columns."
->
"We follow SQL terminology: a database is set of tables managed together.
Tables consist of rows, where each row is a key and its associated value.
Tables may optionally have an associated schema, splitting the value
into a set of columns. Tables may also have associated indices, each of
which is ordered by one or more columns."
"WiredTiger supports column-oriented storage in addition to traditional
row-oriented storage. Instead of storing all fields from a row together,
WiredTiger can efficiently store and access sets of columns (including
single columns) separately.
->
"In addition to the traditional row-oriented storage where all columns
of a row are stored together, WiredTiger supports column-oriented storage,
where one or more columns can be stored individually, allowing more
efficient access and storage."
Should we move the rest of the "Introduction" somewhere else? Does API
documentation normally discuss specific classes as part of the introduction?
Do we need an "Examples" paragraph given there's an "Examples" tab at
the top of the page?
I'd pad out the list of Programmer's Reference docs, that is, put a full
sentence, something like:
+ WiredTiger Architecture
A discussion of blah, blah, blah.
+ Using WiredTiger
A page for blah, blah, blah
It's a bit odd to have navigation tabs at the top, plus a list of links
in the page itself? Maybe this is a doxygen thing, and I'm happy to
be guided by your esthetics here, but having a top-level navigation
button for "Data Structures", but not one for "API Reference" seems
backward?
Page titles are too long? For example, the architecture page's title
is:
<title>wiredtiger - WiredTiger Data Store API: WiredTiger
Architecture - Code</title>
which means it won't even begin to fit in a tab's title, so all of the
pages appear to have identical tab browser names.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Related Pages
The "Related Pages" page has a link to "Using WiredTiger", but none of the
other pages listed in the Programmer's Reference?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger Architecture
Isn't there a better word for "Local Interface", ummm, "Functional API"?
We should IM a bit about the whole RPC Server thing -- you and I have
not thought about how the RPC server should fit together, but I was
thinking of using a shared memory segment to move stuff back & forth,
rather than marshalling/unmarshalling stuff to/from the RPC library (the
whole BDB RPC chunk really was a pain in the ass). In other words, we
might not even use RPC, because we'll have a shared memory chunk we use
and that we define, and the only message passing we use is just enough
to say "there's stuff for the engine", and "the engine has returned to
the client". As I said, I've only thought about this enough to convince
myself it was a solveable problem, nothing more concrete that that.
But, given that discussion, I'd suggest something like:
C API <---> remote client <---> Java
remote client <---> Python
remote client <---> C
and then the C API block talks to the WiredTiger Engine.
I guess I'm re-acting to the fact that I don't understand the difference
between the "C API" and the "Local Interface", and why the C API would
start a "C Client"? Does it matter from the point of view of a programmer?
Since we no longer have a cache, does it make sense to have a cache
square?
Ditto "Access methods", since we only have 1? Or are access methods
different flavors of row & column stores?
Here's a more general comment: because everything is focused on an
in-memory tree, we don't have any natural separate between the cache,
concurrency control and the access method(s) -- that's one big happy
glop. Do you think txns will end up the same way? I think logging
will continue to be separate.
"Concurency"
->
"Concurrency"
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger Design
Do we still want this in the docs? (And, if we do, there's work to
be done.)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Table File Formats
Do we still want this in the docs? (And, if we do, there's work to be
done.)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger API
The link on the main page is to "API Reference", so the two names should
match?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Mapping SQL onto the WiredTiger API
What's the plan/goal for this section?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Getting Started with the API
"does not exist when the program starts running."
->
"does not already exist."
It looks odd that the code in "connecting to a database" and "opening a
session" is split -- I'd make it two separate if statements, it's hard
to read the way it is.
"The code block above also shows simple error handling with
wiredtiger_strerror. The default behavior for more detailed errors is
to write them to stderr. The default behavior for more detailed errors
is to write them to stderr. That can be overridden by passing an
implementation of WT_ERROR_HANDLER to wiredtiger_open or
WT_CONNECTION::open_session."
->
"The code block above also shows simple error handling with
wiredtiger_strerror (a function that returns a string describing an error
code passed as its argument). More complex error handling can be
configured by passing an implementation of WT_ERROR_HANDLER to
wiredtiger_open or WT_CONNECTION::open_session."
Michael, can we not fold the lines in this example code? There's no
reason to do so, given that the browser window is wider?
I'm still unhappy that set_key & set_value can fail. It's just going
to be a total pain in the ass to handle errors in C. If we can't remove
all possible failures, I think we need to simply store an "I failed"
error in the cursor structure, which is checked/returned when the real
function (in this case cursor->insert) is called. There's no performance
penalty in doing that, and a huge gain for application writers.
"marshal" -> "marshall"
I realize set_{key,value} and get_{key,value} take variable arguments;
is there a reason we couldn't list those arguments in the cursor->insert
call? I guess I'm asking, isn't cursor->insert(cursor, <random stuff>)
semantically equivalent to calling set_key & set_value separately? Or,
maybe a better way to ask: if set_key/value have to figure out what's
being passed as arguments to them, why can't insert do that same magic,
whatever it is? Ditto get_{key,value} & cursor->next.
Anyway, what I'm arguing, in general, is to put extra effort into making
things simple for application writers, it's more important than avoiding
magic underneath the covers -- and I think our current get/put API is
more complicated to write to, and handle errors from, than BDB's.
"If we weren't using the cursor for the call to WT_CURSOR::insert above,
this loop would simplify to:"
->
"Because the cursor was positioned in the table after the WT_CURSOR::insert
call, we had to re-position it using the WT_CURSOR::first call; if we
weren't using the cursor for the call to WT_CURSOR::insert above, this
loop would simplify to:"
Are we using object::method as a standard? Is there a standard? I
recall that BDB docs used object.method.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Configuration Strings
I still disagree with "If the "=<value>" part is omitted, the value of
1 is assumed." (1) I doubt that there are enough defaults of 1 that
this is a significant win, and (2) it makes it easier for programmer's
to make a mistake, and (3) it makes it hard for maintenance programmers
to figure out what's going on.
"Values may be nested lists, for example:"
->
Why did we switch to Python? That confused me for a minute, especially
the sudden appearance of parenthesis.
"10MiB" -> "10MB"
"priority to a transaction to reduce aborts"
->
"priority to a transaction"
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger API
Should wiredtiger_struct_pack (or some flavor of it) allocate and return
a size plus a buffer? That might make it easier to build apps, you don't
have to figure out the size yourself.
Or, maybe the way to ask this question: why does wiredtiger_struct_size
(wiredtiger_struct_sizev) need arguments other than the format string?
Presumably you're calling wiredtiger_struct_size{v} to allocate memory
for the chunk -- why not just let WT allocate (and possibly resize?)
the buffer for you?
Some of the items aren't sorted? (This looks like groups of laundry
lists to me, which means they should all be sorted?)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Packing and Unpacking Data
You've got an XXX here, obviously this one is still in motion.
I'd move "packing & unpacking" after schemas, you don't need pack/unpack
until you have columns, right?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Schemas
"*schema*"
->
bold, maybe?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Sharing Between Processes
I'd go with "multiprocess=on" not "sharing=on".
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Data Structures
It occurs to me, we have xxxOR three times, and xxxTION once. Maybe
WT_CONNECTION should be WT_CONNECTOR? WT_SESSOR doesn't make sense,
though. *shrug*
"The WT_CURSOR struct is the interface to a cursor"
->
"The WT_CURSOR object (handle?) is the interface to a cursor"
This happens in a few places, we should probably search for "struct" and
consistently switch to object or handle, for example, there's a page
"WT_CURSOR Struct Reference".
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_CURSOR Struct Reference
Do we need language on the resulting cursor position after each method?
We need language that "on failure", cursor position is undetermined.
Apps that care need to either dup the cursor, or we could offer a config
string that dups on all ops for you?
I don't think the list of Examples really helps -- it's kind of a laundry
list. I'd suggest a single Example that looks something like:
Example:
/* Close the cursor. */
ret = cursor->close(cursor);
Obviously, it's more complicated for more complicated methods, but
something to show use on every method will help us in the field, I
believe. I expect we need an example for each configuration string,
too, can we stuff that into the config string explanations?
Since we're using method names for next, prev and so on, shouldn't we
use method names for the exactp argument to search, that is, search
(exact match), range_next (smallest key larger than search key), range_prev
(largest key smaller than search key)? I would argue exact matches are
what almost all apps want (BDB didn't add search_range for a long time,
IIRC), so why make programmers know about something they won't care
about? That gets rid of the "exact >= 0" magic in ex_call_center.c,
for example.
There's no wording on the behavior of insert/update in the face of
existing/non-existing records.
I didn't see anything to deal with duplicate sets? (All of BDB's
prev-dup, next-dup, no-next-dup blah, blah, blah.)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_SESSION Struct Reference
Do we need a salvage table op?
If add_schema can be used to change schema, maybe set_schema is better?
Why is checkpoint here, as opposed to being a WT_CONNECTION method?
Is it standard that processes can't have multiple txns going at the same
time? Or, I guess you just open multiple sessions, OK. The txn_begin
method needs to fail if you have one already, that's a programming error.
Ditto rollback_txn if there's no txn.
"column set" feels like an undefined term to me.
The trailing "(multiple)" wording isn't clear.
Using [xxx] for the default isn't clear. Can't we explicitly call out
the default ("the isolation level for this transaction "serializable"
or "snapshot" or "read-committed" or "read-uncommitted"; default is
"snapshot").
"old log files"
->
"log files no longer needed for transactional durability"
Why have 0/1 for checkpoint "archive", "force", flush_cache or flush_log?
Have a default, and if you want to change it, the keyword changes it?
Ditto WT_SESSION::create_tables "exclusive" keyword. Why do we need
it to have two values, it has only one possible meaning ("I want exclusive
access").
Parameters to configuration strings aren't sorted?
I think the "overwrite" keyword should be a per-operation flag, not a
per-cursor flag. I'd go with 3 method names, myself: insert, insert_update,
update?
If we're going to allow the dup'd cursor to change stuff (for example,
the encoding), should we have a more general interface? Maybe open_cursor
needs takes an optional dup-cursor argument, where stuff gets copied
from an existing cursor, but you can also change stuff. For example,
"open_cursor" has a "dup" config string, but dup_cursor doesn't allow
you to change that behavior in the duplicated cursor. Rather than have
the dup-cursor method track the open-cursor's arguments (and have to
explain which ones can be over-ridden), a single method might be simpler,
where the WT_CURSOR *entry arg "initializes the opened cursor to reference
the same table entry as the specified cursor, with the same modes as the
specified cursor, but modified by the new cursor's configuration"?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_CONNECTION Struct Reference
WT_CONNECTION::load_extention -- I wouldn't support a default, maintenance
programmers will hate it (I changed the name, and now it doesn't work?).
Force the programmer to set the name.
Is it useful to be able to set the error handler per session? I was
expecting apps to set an error handler outside of wiredtiger (so, it's
a function that can't fail), and then it would be used for the life of
the app. Then WT_CONNECTION::open_session doesn't need the argument.
I must be misunderstanding something: I don't see any functions that
create the WT_CURSOR_FACTORY or WT_ERROR_HANDLER handles?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_COLLATOR struct reference
Why does the compare function need a WT_SESSION handle, wasn't that
copied into the WT_COLLATOR structure when it was created?
And how do you create a WT_COLLATOR structure?
And how do you specify different collators for the keys and duplicate
values? (I'm missing the connection between WT_COLLATORs and the table
create?)
I think most of these comments apply to the WT_EXTRACTOR Struct Ref
page as well.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Examples
We should say what each example is intended to show.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== ex_config.c
We probably need "ret =" in front of the create_table & begin_txn
statements. (These lines were copied to a couple of places in
other example programs.)
Oh, and the top of ex_transaction.c says "ex_hello.c", the top of
ex_thread.c says "ex_access.c", ex_schema.c says "ex_column.c".
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Random comments --
If you put the Copyright statement on a line by itself, it makes it
easier to automatically upgrade them.
=== WiredTiger Overview:
There should be a prominent version on the main page. I'm using a #define
of WIREDTIGER_VERSION in my wiredtiger.h, I'm guessing we'll do something
like that eventually, can doxygen pick that up? (I'm guessing the 1.0
opposite the odd-looking little box is the version? Anyway, I'd make
this much more prominent, and it needs to map to the source code, of
course.) We should make all released versions of the documents available
from our web site, at some point, separately from releases -- there's
no reason our web site can't serve docs for users.
The phrase "public interface" implies there's a private interface?
"We follow SQL terminology: a database is set of tables that are managed
together. Tables logically consist of rows, each row has a key and a
value. Tables may optionally have an associated schema, which splits the
key/value pair into a set of columns. Tables may also have associated
indices, each of which is ordered by some set of columns."
->
"We follow SQL terminology: a database is set of tables managed together.
Tables consist of rows, where each row is a key and its associated value.
Tables may optionally have an associated schema, splitting the value
into a set of columns. Tables may also have associated indices, each of
which is ordered by one or more columns."
"WiredTiger supports column-oriented storage in addition to traditional
row-oriented storage. Instead of storing all fields from a row together,
WiredTiger can efficiently store and access sets of columns (including
single columns) separately.
->
"In addition to the traditional row-oriented storage where all columns
of a row are stored together, WiredTiger supports column-oriented storage,
where one or more columns can be stored individually, allowing more
efficient access and storage."
Should we move the rest of the "Introduction" somewhere else? Does API
documentation normally discuss specific classes as part of the introduction?
Do we need an "Examples" paragraph given there's an "Examples" tab at
the top of the page?
I'd pad out the list of Programmer's Reference docs, that is, put a full
sentence, something like:
+ WiredTiger Architecture
A discussion of blah, blah, blah.
+ Using WiredTiger
A page for blah, blah, blah
It's a bit odd to have navigation tabs at the top, plus a list of links
in the page itself? Maybe this is a doxygen thing, and I'm happy to
be guided by your esthetics here, but having a top-level navigation
button for "Data Structures", but not one for "API Reference" seems
backward?
Page titles are too long? For example, the architecture page's title
is:
<title>wiredtiger - WiredTiger Data Store API: WiredTiger
Architecture - Code</title>
which means it won't even begin to fit in a tab's title, so all of the
pages appear to have identical tab browser names.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Related Pages
The "Related Pages" page has a link to "Using WiredTiger", but none of the
other pages listed in the Programmer's Reference?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger Architecture
Isn't there a better word for "Local Interface", ummm, "Functional API"?
We should IM a bit about the whole RPC Server thing -- you and I have
not thought about how the RPC server should fit together, but I was
thinking of using a shared memory segment to move stuff back & forth,
rather than marshalling/unmarshalling stuff to/from the RPC library (the
whole BDB RPC chunk really was a pain in the ass). In other words, we
might not even use RPC, because we'll have a shared memory chunk we use
and that we define, and the only message passing we use is just enough
to say "there's stuff for the engine", and "the engine has returned to
the client". As I said, I've only thought about this enough to convince
myself it was a solveable problem, nothing more concrete that that.
But, given that discussion, I'd suggest something like:
C API <---> remote client <---> Java
remote client <---> Python
remote client <---> C
and then the C API block talks to the WiredTiger Engine.
I guess I'm re-acting to the fact that I don't understand the difference
between the "C API" and the "Local Interface", and why the C API would
start a "C Client"? Does it matter from the point of view of a programmer?
Since we no longer have a cache, does it make sense to have a cache
square?
Ditto "Access methods", since we only have 1? Or are access methods
different flavors of row & column stores?
Here's a more general comment: because everything is focused on an
in-memory tree, we don't have any natural separate between the cache,
concurrency control and the access method(s) -- that's one big happy
glop. Do you think txns will end up the same way? I think logging
will continue to be separate.
"Concurency"
->
"Concurrency"
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger Design
Do we still want this in the docs? (And, if we do, there's work to
be done.)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Table File Formats
Do we still want this in the docs? (And, if we do, there's work to be
done.)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger API
The link on the main page is to "API Reference", so the two names should
match?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Mapping SQL onto the WiredTiger API
What's the plan/goal for this section?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Getting Started with the API
"does not exist when the program starts running."
->
"does not already exist."
It looks odd that the code in "connecting to a database" and "opening a
session" is split -- I'd make it two separate if statements, it's hard
to read the way it is.
"The code block above also shows simple error handling with
wiredtiger_strerror. The default behavior for more detailed errors is
to write them to stderr. The default behavior for more detailed errors
is to write them to stderr. That can be overridden by passing an
implementation of WT_ERROR_HANDLER to wiredtiger_open or
WT_CONNECTION::open_session."
->
"The code block above also shows simple error handling with
wiredtiger_strerror (a function that returns a string describing an error
code passed as its argument). More complex error handling can be
configured by passing an implementation of WT_ERROR_HANDLER to
wiredtiger_open or WT_CONNECTION::open_session."
Michael, can we not fold the lines in this example code? There's no
reason to do so, given that the browser window is wider?
I'm still unhappy that set_key & set_value can fail. It's just going
to be a total pain in the ass to handle errors in C. If we can't remove
all possible failures, I think we need to simply store an "I failed"
error in the cursor structure, which is checked/returned when the real
function (in this case cursor->insert) is called. There's no performance
penalty in doing that, and a huge gain for application writers.
"marshal" -> "marshall"
I realize set_{key,value} and get_{key,value} take variable arguments;
is there a reason we couldn't list those arguments in the cursor->insert
call? I guess I'm asking, isn't cursor->insert(cursor, <random stuff>)
semantically equivalent to calling set_key & set_value separately? Or,
maybe a better way to ask: if set_key/value have to figure out what's
being passed as arguments to them, why can't insert do that same magic,
whatever it is? Ditto get_{key,value} & cursor->next.
Anyway, what I'm arguing, in general, is to put extra effort into making
things simple for application writers, it's more important than avoiding
magic underneath the covers -- and I think our current get/put API is
more complicated to write to, and handle errors from, than BDB's.
"If we weren't using the cursor for the call to WT_CURSOR::insert above,
this loop would simplify to:"
->
"Because the cursor was positioned in the table after the WT_CURSOR::insert
call, we had to re-position it using the WT_CURSOR::first call; if we
weren't using the cursor for the call to WT_CURSOR::insert above, this
loop would simplify to:"
Are we using object::method as a standard? Is there a standard? I
recall that BDB docs used object.method.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Configuration Strings
I still disagree with "If the "=<value>" part is omitted, the value of
1 is assumed." (1) I doubt that there are enough defaults of 1 that
this is a significant win, and (2) it makes it easier for programmer's
to make a mistake, and (3) it makes it hard for maintenance programmers
to figure out what's going on.
"Values may be nested lists, for example:"
->
Why did we switch to Python? That confused me for a minute, especially
the sudden appearance of parenthesis.
"10MiB" -> "10MB"
"priority to a transaction to reduce aborts"
->
"priority to a transaction"
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WiredTiger API
Should wiredtiger_struct_pack (or some flavor of it) allocate and return
a size plus a buffer? That might make it easier to build apps, you don't
have to figure out the size yourself.
Or, maybe the way to ask this question: why does wiredtiger_struct_size
(wiredtiger_struct_sizev) need arguments other than the format string?
Presumably you're calling wiredtiger_struct_size{v} to allocate memory
for the chunk -- why not just let WT allocate (and possibly resize?)
the buffer for you?
Some of the items aren't sorted? (This looks like groups of laundry
lists to me, which means they should all be sorted?)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Packing and Unpacking Data
You've got an XXX here, obviously this one is still in motion.
I'd move "packing & unpacking" after schemas, you don't need pack/unpack
until you have columns, right?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Schemas
"*schema*"
->
bold, maybe?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Sharing Between Processes
I'd go with "multiprocess=on" not "sharing=on".
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Data Structures
It occurs to me, we have xxxOR three times, and xxxTION once. Maybe
WT_CONNECTION should be WT_CONNECTOR? WT_SESSOR doesn't make sense,
though. *shrug*
"The WT_CURSOR struct is the interface to a cursor"
->
"The WT_CURSOR object (handle?) is the interface to a cursor"
This happens in a few places, we should probably search for "struct" and
consistently switch to object or handle, for example, there's a page
"WT_CURSOR Struct Reference".
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_CURSOR Struct Reference
Do we need language on the resulting cursor position after each method?
We need language that "on failure", cursor position is undetermined.
Apps that care need to either dup the cursor, or we could offer a config
string that dups on all ops for you?
I don't think the list of Examples really helps -- it's kind of a laundry
list. I'd suggest a single Example that looks something like:
Example:
/* Close the cursor. */
ret = cursor->close(cursor);
Obviously, it's more complicated for more complicated methods, but
something to show use on every method will help us in the field, I
believe. I expect we need an example for each configuration string,
too, can we stuff that into the config string explanations?
Since we're using method names for next, prev and so on, shouldn't we
use method names for the exactp argument to search, that is, search
(exact match), range_next (smallest key larger than search key), range_prev
(largest key smaller than search key)? I would argue exact matches are
what almost all apps want (BDB didn't add search_range for a long time,
IIRC), so why make programmers know about something they won't care
about? That gets rid of the "exact >= 0" magic in ex_call_center.c,
for example.
There's no wording on the behavior of insert/update in the face of
existing/non-existing records.
I didn't see anything to deal with duplicate sets? (All of BDB's
prev-dup, next-dup, no-next-dup blah, blah, blah.)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_SESSION Struct Reference
Do we need a salvage table op?
If add_schema can be used to change schema, maybe set_schema is better?
Why is checkpoint here, as opposed to being a WT_CONNECTION method?
Is it standard that processes can't have multiple txns going at the same
time? Or, I guess you just open multiple sessions, OK. The txn_begin
method needs to fail if you have one already, that's a programming error.
Ditto rollback_txn if there's no txn.
"column set" feels like an undefined term to me.
The trailing "(multiple)" wording isn't clear.
Using [xxx] for the default isn't clear. Can't we explicitly call out
the default ("the isolation level for this transaction "serializable"
or "snapshot" or "read-committed" or "read-uncommitted"; default is
"snapshot").
"old log files"
->
"log files no longer needed for transactional durability"
Why have 0/1 for checkpoint "archive", "force", flush_cache or flush_log?
Have a default, and if you want to change it, the keyword changes it?
Ditto WT_SESSION::create_tables "exclusive" keyword. Why do we need
it to have two values, it has only one possible meaning ("I want exclusive
access").
Parameters to configuration strings aren't sorted?
I think the "overwrite" keyword should be a per-operation flag, not a
per-cursor flag. I'd go with 3 method names, myself: insert, insert_update,
update?
If we're going to allow the dup'd cursor to change stuff (for example,
the encoding), should we have a more general interface? Maybe open_cursor
needs takes an optional dup-cursor argument, where stuff gets copied
from an existing cursor, but you can also change stuff. For example,
"open_cursor" has a "dup" config string, but dup_cursor doesn't allow
you to change that behavior in the duplicated cursor. Rather than have
the dup-cursor method track the open-cursor's arguments (and have to
explain which ones can be over-ridden), a single method might be simpler,
where the WT_CURSOR *entry arg "initializes the opened cursor to reference
the same table entry as the specified cursor, with the same modes as the
specified cursor, but modified by the new cursor's configuration"?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_CONNECTION Struct Reference
WT_CONNECTION::load_extention -- I wouldn't support a default, maintenance
programmers will hate it (I changed the name, and now it doesn't work?).
Force the programmer to set the name.
Is it useful to be able to set the error handler per session? I was
expecting apps to set an error handler outside of wiredtiger (so, it's
a function that can't fail), and then it would be used for the life of
the app. Then WT_CONNECTION::open_session doesn't need the argument.
I must be misunderstanding something: I don't see any functions that
create the WT_CURSOR_FACTORY or WT_ERROR_HANDLER handles?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== WT_COLLATOR struct reference
Why does the compare function need a WT_SESSION handle, wasn't that
copied into the WT_COLLATOR structure when it was created?
And how do you create a WT_COLLATOR structure?
And how do you specify different collators for the keys and duplicate
values? (I'm missing the connection between WT_COLLATORs and the table
create?)
I think most of these comments apply to the WT_EXTRACTOR Struct Ref
page as well.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== Examples
We should say what each example is intended to show.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=== ex_config.c
We probably need "ret =" in front of the create_table & begin_txn
statements. (These lines were copied to a couple of places in
other example programs.)
Oh, and the top of ex_transaction.c says "ex_hello.c", the top of
ex_thread.c says "ex_access.c", ex_schema.c says "ex_column.c".
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Random comments --
If you put the Copyright statement on a line by itself, it makes it
easier to automatically upgrade them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment