Created
January 31, 2012 13:04
-
-
Save wiredtiger/1710388 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=== WiredTiger Overview: | |
There should be a prominent version on the main page. I'm using a #define | |
of WIREDTIGER_VERSION in my wiredtiger.h, I'm guessing we'll do something | |
like that eventually, can doxygen pick that up? (I'm guessing the 1.0 | |
opposite the odd-looking little box is the version? Anyway, I'd make | |
this much more prominent, and it needs to map to the source code, of | |
course.) We should make all released versions of the documents available | |
from our web site, at some point, separately from releases -- there's | |
no reason our web site can't serve docs for users. | |
The phrase "public interface" implies there's a private interface? | |
"We follow SQL terminology: a database is set of tables that are managed | |
together. Tables logically consist of rows, each row has a key and a | |
value. Tables may optionally have an associated schema, which splits the | |
key/value pair into a set of columns. Tables may also have associated | |
indices, each of which is ordered by some set of columns." | |
-> | |
"We follow SQL terminology: a database is set of tables managed together. | |
Tables consist of rows, where each row is a key and its associated value. | |
Tables may optionally have an associated schema, splitting the value | |
into a set of columns. Tables may also have associated indices, each of | |
which is ordered by one or more columns." | |
"WiredTiger supports column-oriented storage in addition to traditional | |
row-oriented storage. Instead of storing all fields from a row together, | |
WiredTiger can efficiently store and access sets of columns (including | |
single columns) separately. | |
-> | |
"In addition to the traditional row-oriented storage where all columns | |
of a row are stored together, WiredTiger supports column-oriented storage, | |
where one or more columns can be stored individually, allowing more | |
efficient access and storage." | |
Should we move the rest of the "Introduction" somewhere else? Does API | |
documentation normally discuss specific classes as part of the introduction? | |
Do we need an "Examples" paragraph given there's an "Examples" tab at | |
the top of the page? | |
I'd pad out the list of Programmer's Reference docs, that is, put a full | |
sentence, something like: | |
+ WiredTiger Architecture | |
A discussion of blah, blah, blah. | |
+ Using WiredTiger | |
A page for blah, blah, blah | |
It's a bit odd to have navigation tabs at the top, plus a list of links | |
in the page itself? Maybe this is a doxygen thing, and I'm happy to | |
be guided by your esthetics here, but having a top-level navigation | |
button for "Data Structures", but not one for "API Reference" seems | |
backward? | |
Page titles are too long? For example, the architecture page's title | |
is: | |
<title>wiredtiger - WiredTiger Data Store API: WiredTiger | |
Architecture - Code</title> | |
which means it won't even begin to fit in a tab's title, so all of the | |
pages appear to have identical tab browser names. | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Related Pages | |
The "Related Pages" page has a link to "Using WiredTiger", but none of the | |
other pages listed in the Programmer's Reference? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WiredTiger Architecture | |
Isn't there a better word for "Local Interface", ummm, "Functional API"? | |
We should IM a bit about the whole RPC Server thing -- you and I have | |
not thought about how the RPC server should fit together, but I was | |
thinking of using a shared memory segment to move stuff back & forth, | |
rather than marshalling/unmarshalling stuff to/from the RPC library (the | |
whole BDB RPC chunk really was a pain in the ass). In other words, we | |
might not even use RPC, because we'll have a shared memory chunk we use | |
and that we define, and the only message passing we use is just enough | |
to say "there's stuff for the engine", and "the engine has returned to | |
the client". As I said, I've only thought about this enough to convince | |
myself it was a solveable problem, nothing more concrete that that. | |
But, given that discussion, I'd suggest something like: | |
C API <---> remote client <---> Java | |
remote client <---> Python | |
remote client <---> C | |
and then the C API block talks to the WiredTiger Engine. | |
I guess I'm re-acting to the fact that I don't understand the difference | |
between the "C API" and the "Local Interface", and why the C API would | |
start a "C Client"? Does it matter from the point of view of a programmer? | |
Since we no longer have a cache, does it make sense to have a cache | |
square? | |
Ditto "Access methods", since we only have 1? Or are access methods | |
different flavors of row & column stores? | |
Here's a more general comment: because everything is focused on an | |
in-memory tree, we don't have any natural separate between the cache, | |
concurrency control and the access method(s) -- that's one big happy | |
glop. Do you think txns will end up the same way? I think logging | |
will continue to be separate. | |
"Concurency" | |
-> | |
"Concurrency" | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WiredTiger Design | |
Do we still want this in the docs? (And, if we do, there's work to | |
be done.) | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Table File Formats | |
Do we still want this in the docs? (And, if we do, there's work to be | |
done.) | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WiredTiger API | |
The link on the main page is to "API Reference", so the two names should | |
match? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Mapping SQL onto the WiredTiger API | |
What's the plan/goal for this section? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Getting Started with the API | |
"does not exist when the program starts running." | |
-> | |
"does not already exist." | |
It looks odd that the code in "connecting to a database" and "opening a | |
session" is split -- I'd make it two separate if statements, it's hard | |
to read the way it is. | |
"The code block above also shows simple error handling with | |
wiredtiger_strerror. The default behavior for more detailed errors is | |
to write them to stderr. The default behavior for more detailed errors | |
is to write them to stderr. That can be overridden by passing an | |
implementation of WT_ERROR_HANDLER to wiredtiger_open or | |
WT_CONNECTION::open_session." | |
-> | |
"The code block above also shows simple error handling with | |
wiredtiger_strerror (a function that returns a string describing an error | |
code passed as its argument). More complex error handling can be | |
configured by passing an implementation of WT_ERROR_HANDLER to | |
wiredtiger_open or WT_CONNECTION::open_session." | |
Michael, can we not fold the lines in this example code? There's no | |
reason to do so, given that the browser window is wider? | |
I'm still unhappy that set_key & set_value can fail. It's just going | |
to be a total pain in the ass to handle errors in C. If we can't remove | |
all possible failures, I think we need to simply store an "I failed" | |
error in the cursor structure, which is checked/returned when the real | |
function (in this case cursor->insert) is called. There's no performance | |
penalty in doing that, and a huge gain for application writers. | |
"marshal" -> "marshall" | |
I realize set_{key,value} and get_{key,value} take variable arguments; | |
is there a reason we couldn't list those arguments in the cursor->insert | |
call? I guess I'm asking, isn't cursor->insert(cursor, <random stuff>) | |
semantically equivalent to calling set_key & set_value separately? Or, | |
maybe a better way to ask: if set_key/value have to figure out what's | |
being passed as arguments to them, why can't insert do that same magic, | |
whatever it is? Ditto get_{key,value} & cursor->next. | |
Anyway, what I'm arguing, in general, is to put extra effort into making | |
things simple for application writers, it's more important than avoiding | |
magic underneath the covers -- and I think our current get/put API is | |
more complicated to write to, and handle errors from, than BDB's. | |
"If we weren't using the cursor for the call to WT_CURSOR::insert above, | |
this loop would simplify to:" | |
-> | |
"Because the cursor was positioned in the table after the WT_CURSOR::insert | |
call, we had to re-position it using the WT_CURSOR::first call; if we | |
weren't using the cursor for the call to WT_CURSOR::insert above, this | |
loop would simplify to:" | |
Are we using object::method as a standard? Is there a standard? I | |
recall that BDB docs used object.method. | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Configuration Strings | |
I still disagree with "If the "=<value>" part is omitted, the value of | |
1 is assumed." (1) I doubt that there are enough defaults of 1 that | |
this is a significant win, and (2) it makes it easier for programmer's | |
to make a mistake, and (3) it makes it hard for maintenance programmers | |
to figure out what's going on. | |
"Values may be nested lists, for example:" | |
-> | |
Why did we switch to Python? That confused me for a minute, especially | |
the sudden appearance of parenthesis. | |
"10MiB" -> "10MB" | |
"priority to a transaction to reduce aborts" | |
-> | |
"priority to a transaction" | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WiredTiger API | |
Should wiredtiger_struct_pack (or some flavor of it) allocate and return | |
a size plus a buffer? That might make it easier to build apps, you don't | |
have to figure out the size yourself. | |
Or, maybe the way to ask this question: why does wiredtiger_struct_size | |
(wiredtiger_struct_sizev) need arguments other than the format string? | |
Presumably you're calling wiredtiger_struct_size{v} to allocate memory | |
for the chunk -- why not just let WT allocate (and possibly resize?) | |
the buffer for you? | |
Some of the items aren't sorted? (This looks like groups of laundry | |
lists to me, which means they should all be sorted?) | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Packing and Unpacking Data | |
You've got an XXX here, obviously this one is still in motion. | |
I'd move "packing & unpacking" after schemas, you don't need pack/unpack | |
until you have columns, right? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Schemas | |
"*schema*" | |
-> | |
bold, maybe? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Sharing Between Processes | |
I'd go with "multiprocess=on" not "sharing=on". | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Data Structures | |
It occurs to me, we have xxxOR three times, and xxxTION once. Maybe | |
WT_CONNECTION should be WT_CONNECTOR? WT_SESSOR doesn't make sense, | |
though. *shrug* | |
"The WT_CURSOR struct is the interface to a cursor" | |
-> | |
"The WT_CURSOR object (handle?) is the interface to a cursor" | |
This happens in a few places, we should probably search for "struct" and | |
consistently switch to object or handle, for example, there's a page | |
"WT_CURSOR Struct Reference". | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WT_CURSOR Struct Reference | |
Do we need language on the resulting cursor position after each method? | |
We need language that "on failure", cursor position is undetermined. | |
Apps that care need to either dup the cursor, or we could offer a config | |
string that dups on all ops for you? | |
I don't think the list of Examples really helps -- it's kind of a laundry | |
list. I'd suggest a single Example that looks something like: | |
Example: | |
/* Close the cursor. */ | |
ret = cursor->close(cursor); | |
Obviously, it's more complicated for more complicated methods, but | |
something to show use on every method will help us in the field, I | |
believe. I expect we need an example for each configuration string, | |
too, can we stuff that into the config string explanations? | |
Since we're using method names for next, prev and so on, shouldn't we | |
use method names for the exactp argument to search, that is, search | |
(exact match), range_next (smallest key larger than search key), range_prev | |
(largest key smaller than search key)? I would argue exact matches are | |
what almost all apps want (BDB didn't add search_range for a long time, | |
IIRC), so why make programmers know about something they won't care | |
about? That gets rid of the "exact >= 0" magic in ex_call_center.c, | |
for example. | |
There's no wording on the behavior of insert/update in the face of | |
existing/non-existing records. | |
I didn't see anything to deal with duplicate sets? (All of BDB's | |
prev-dup, next-dup, no-next-dup blah, blah, blah.) | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WT_SESSION Struct Reference | |
Do we need a salvage table op? | |
If add_schema can be used to change schema, maybe set_schema is better? | |
Why is checkpoint here, as opposed to being a WT_CONNECTION method? | |
Is it standard that processes can't have multiple txns going at the same | |
time? Or, I guess you just open multiple sessions, OK. The txn_begin | |
method needs to fail if you have one already, that's a programming error. | |
Ditto rollback_txn if there's no txn. | |
"column set" feels like an undefined term to me. | |
The trailing "(multiple)" wording isn't clear. | |
Using [xxx] for the default isn't clear. Can't we explicitly call out | |
the default ("the isolation level for this transaction "serializable" | |
or "snapshot" or "read-committed" or "read-uncommitted"; default is | |
"snapshot"). | |
"old log files" | |
-> | |
"log files no longer needed for transactional durability" | |
Why have 0/1 for checkpoint "archive", "force", flush_cache or flush_log? | |
Have a default, and if you want to change it, the keyword changes it? | |
Ditto WT_SESSION::create_tables "exclusive" keyword. Why do we need | |
it to have two values, it has only one possible meaning ("I want exclusive | |
access"). | |
Parameters to configuration strings aren't sorted? | |
I think the "overwrite" keyword should be a per-operation flag, not a | |
per-cursor flag. I'd go with 3 method names, myself: insert, insert_update, | |
update? | |
If we're going to allow the dup'd cursor to change stuff (for example, | |
the encoding), should we have a more general interface? Maybe open_cursor | |
needs takes an optional dup-cursor argument, where stuff gets copied | |
from an existing cursor, but you can also change stuff. For example, | |
"open_cursor" has a "dup" config string, but dup_cursor doesn't allow | |
you to change that behavior in the duplicated cursor. Rather than have | |
the dup-cursor method track the open-cursor's arguments (and have to | |
explain which ones can be over-ridden), a single method might be simpler, | |
where the WT_CURSOR *entry arg "initializes the opened cursor to reference | |
the same table entry as the specified cursor, with the same modes as the | |
specified cursor, but modified by the new cursor's configuration"? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WT_CONNECTION Struct Reference | |
WT_CONNECTION::load_extention -- I wouldn't support a default, maintenance | |
programmers will hate it (I changed the name, and now it doesn't work?). | |
Force the programmer to set the name. | |
Is it useful to be able to set the error handler per session? I was | |
expecting apps to set an error handler outside of wiredtiger (so, it's | |
a function that can't fail), and then it would be used for the life of | |
the app. Then WT_CONNECTION::open_session doesn't need the argument. | |
I must be misunderstanding something: I don't see any functions that | |
create the WT_CURSOR_FACTORY or WT_ERROR_HANDLER handles? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WT_COLLATOR struct reference | |
Why does the compare function need a WT_SESSION handle, wasn't that | |
copied into the WT_COLLATOR structure when it was created? | |
And how do you create a WT_COLLATOR structure? | |
And how do you specify different collators for the keys and duplicate | |
values? (I'm missing the connection between WT_COLLATORs and the table | |
create?) | |
I think most of these comments apply to the WT_EXTRACTOR Struct Ref | |
page as well. | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Examples | |
We should say what each example is intended to show. | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== ex_config.c | |
We probably need "ret =" in front of the create_table & begin_txn | |
statements. (These lines were copied to a couple of places in | |
other example programs.) | |
Oh, and the top of ex_transaction.c says "ex_hello.c", the top of | |
ex_thread.c says "ex_access.c", ex_schema.c says "ex_column.c". | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
Random comments -- | |
If you put the Copyright statement on a line by itself, it makes it | |
easier to automatically upgrade them. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=== WiredTiger Overview: | |
There should be a prominent version on the main page. I'm using a #define | |
of WIREDTIGER_VERSION in my wiredtiger.h, I'm guessing we'll do something | |
like that eventually, can doxygen pick that up? (I'm guessing the 1.0 | |
opposite the odd-looking little box is the version? Anyway, I'd make | |
this much more prominent, and it needs to map to the source code, of | |
course.) We should make all released versions of the documents available | |
from our web site, at some point, separately from releases -- there's | |
no reason our web site can't serve docs for users. | |
The phrase "public interface" implies there's a private interface? | |
"We follow SQL terminology: a database is set of tables that are managed | |
together. Tables logically consist of rows, each row has a key and a | |
value. Tables may optionally have an associated schema, which splits the | |
key/value pair into a set of columns. Tables may also have associated | |
indices, each of which is ordered by some set of columns." | |
-> | |
"We follow SQL terminology: a database is set of tables managed together. | |
Tables consist of rows, where each row is a key and its associated value. | |
Tables may optionally have an associated schema, splitting the value | |
into a set of columns. Tables may also have associated indices, each of | |
which is ordered by one or more columns." | |
"WiredTiger supports column-oriented storage in addition to traditional | |
row-oriented storage. Instead of storing all fields from a row together, | |
WiredTiger can efficiently store and access sets of columns (including | |
single columns) separately. | |
-> | |
"In addition to the traditional row-oriented storage where all columns | |
of a row are stored together, WiredTiger supports column-oriented storage, | |
where one or more columns can be stored individually, allowing more | |
efficient access and storage." | |
Should we move the rest of the "Introduction" somewhere else? Does API | |
documentation normally discuss specific classes as part of the introduction? | |
Do we need an "Examples" paragraph given there's an "Examples" tab at | |
the top of the page? | |
I'd pad out the list of Programmer's Reference docs, that is, put a full | |
sentence, something like: | |
+ WiredTiger Architecture | |
A discussion of blah, blah, blah. | |
+ Using WiredTiger | |
A page for blah, blah, blah | |
It's a bit odd to have navigation tabs at the top, plus a list of links | |
in the page itself? Maybe this is a doxygen thing, and I'm happy to | |
be guided by your esthetics here, but having a top-level navigation | |
button for "Data Structures", but not one for "API Reference" seems | |
backward? | |
Page titles are too long? For example, the architecture page's title | |
is: | |
<title>wiredtiger - WiredTiger Data Store API: WiredTiger | |
Architecture - Code</title> | |
which means it won't even begin to fit in a tab's title, so all of the | |
pages appear to have identical tab browser names. | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Related Pages | |
The "Related Pages" page has a link to "Using WiredTiger", but none of the | |
other pages listed in the Programmer's Reference? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WiredTiger Architecture | |
Isn't there a better word for "Local Interface", ummm, "Functional API"? | |
We should IM a bit about the whole RPC Server thing -- you and I have | |
not thought about how the RPC server should fit together, but I was | |
thinking of using a shared memory segment to move stuff back & forth, | |
rather than marshalling/unmarshalling stuff to/from the RPC library (the | |
whole BDB RPC chunk really was a pain in the ass). In other words, we | |
might not even use RPC, because we'll have a shared memory chunk we use | |
and that we define, and the only message passing we use is just enough | |
to say "there's stuff for the engine", and "the engine has returned to | |
the client". As I said, I've only thought about this enough to convince | |
myself it was a solveable problem, nothing more concrete that that. | |
But, given that discussion, I'd suggest something like: | |
C API <---> remote client <---> Java | |
remote client <---> Python | |
remote client <---> C | |
and then the C API block talks to the WiredTiger Engine. | |
I guess I'm re-acting to the fact that I don't understand the difference | |
between the "C API" and the "Local Interface", and why the C API would | |
start a "C Client"? Does it matter from the point of view of a programmer? | |
Since we no longer have a cache, does it make sense to have a cache | |
square? | |
Ditto "Access methods", since we only have 1? Or are access methods | |
different flavors of row & column stores? | |
Here's a more general comment: because everything is focused on an | |
in-memory tree, we don't have any natural separate between the cache, | |
concurrency control and the access method(s) -- that's one big happy | |
glop. Do you think txns will end up the same way? I think logging | |
will continue to be separate. | |
"Concurency" | |
-> | |
"Concurrency" | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WiredTiger Design | |
Do we still want this in the docs? (And, if we do, there's work to | |
be done.) | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Table File Formats | |
Do we still want this in the docs? (And, if we do, there's work to be | |
done.) | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WiredTiger API | |
The link on the main page is to "API Reference", so the two names should | |
match? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Mapping SQL onto the WiredTiger API | |
What's the plan/goal for this section? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Getting Started with the API | |
"does not exist when the program starts running." | |
-> | |
"does not already exist." | |
It looks odd that the code in "connecting to a database" and "opening a | |
session" is split -- I'd make it two separate if statements, it's hard | |
to read the way it is. | |
"The code block above also shows simple error handling with | |
wiredtiger_strerror. The default behavior for more detailed errors is | |
to write them to stderr. The default behavior for more detailed errors | |
is to write them to stderr. That can be overridden by passing an | |
implementation of WT_ERROR_HANDLER to wiredtiger_open or | |
WT_CONNECTION::open_session." | |
-> | |
"The code block above also shows simple error handling with | |
wiredtiger_strerror (a function that returns a string describing an error | |
code passed as its argument). More complex error handling can be | |
configured by passing an implementation of WT_ERROR_HANDLER to | |
wiredtiger_open or WT_CONNECTION::open_session." | |
Michael, can we not fold the lines in this example code? There's no | |
reason to do so, given that the browser window is wider? | |
I'm still unhappy that set_key & set_value can fail. It's just going | |
to be a total pain in the ass to handle errors in C. If we can't remove | |
all possible failures, I think we need to simply store an "I failed" | |
error in the cursor structure, which is checked/returned when the real | |
function (in this case cursor->insert) is called. There's no performance | |
penalty in doing that, and a huge gain for application writers. | |
"marshal" -> "marshall" | |
I realize set_{key,value} and get_{key,value} take variable arguments; | |
is there a reason we couldn't list those arguments in the cursor->insert | |
call? I guess I'm asking, isn't cursor->insert(cursor, <random stuff>) | |
semantically equivalent to calling set_key & set_value separately? Or, | |
maybe a better way to ask: if set_key/value have to figure out what's | |
being passed as arguments to them, why can't insert do that same magic, | |
whatever it is? Ditto get_{key,value} & cursor->next. | |
Anyway, what I'm arguing, in general, is to put extra effort into making | |
things simple for application writers, it's more important than avoiding | |
magic underneath the covers -- and I think our current get/put API is | |
more complicated to write to, and handle errors from, than BDB's. | |
"If we weren't using the cursor for the call to WT_CURSOR::insert above, | |
this loop would simplify to:" | |
-> | |
"Because the cursor was positioned in the table after the WT_CURSOR::insert | |
call, we had to re-position it using the WT_CURSOR::first call; if we | |
weren't using the cursor for the call to WT_CURSOR::insert above, this | |
loop would simplify to:" | |
Are we using object::method as a standard? Is there a standard? I | |
recall that BDB docs used object.method. | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Configuration Strings | |
I still disagree with "If the "=<value>" part is omitted, the value of | |
1 is assumed." (1) I doubt that there are enough defaults of 1 that | |
this is a significant win, and (2) it makes it easier for programmer's | |
to make a mistake, and (3) it makes it hard for maintenance programmers | |
to figure out what's going on. | |
"Values may be nested lists, for example:" | |
-> | |
Why did we switch to Python? That confused me for a minute, especially | |
the sudden appearance of parenthesis. | |
"10MiB" -> "10MB" | |
"priority to a transaction to reduce aborts" | |
-> | |
"priority to a transaction" | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WiredTiger API | |
Should wiredtiger_struct_pack (or some flavor of it) allocate and return | |
a size plus a buffer? That might make it easier to build apps, you don't | |
have to figure out the size yourself. | |
Or, maybe the way to ask this question: why does wiredtiger_struct_size | |
(wiredtiger_struct_sizev) need arguments other than the format string? | |
Presumably you're calling wiredtiger_struct_size{v} to allocate memory | |
for the chunk -- why not just let WT allocate (and possibly resize?) | |
the buffer for you? | |
Some of the items aren't sorted? (This looks like groups of laundry | |
lists to me, which means they should all be sorted?) | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Packing and Unpacking Data | |
You've got an XXX here, obviously this one is still in motion. | |
I'd move "packing & unpacking" after schemas, you don't need pack/unpack | |
until you have columns, right? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Schemas | |
"*schema*" | |
-> | |
bold, maybe? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Sharing Between Processes | |
I'd go with "multiprocess=on" not "sharing=on". | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Data Structures | |
It occurs to me, we have xxxOR three times, and xxxTION once. Maybe | |
WT_CONNECTION should be WT_CONNECTOR? WT_SESSOR doesn't make sense, | |
though. *shrug* | |
"The WT_CURSOR struct is the interface to a cursor" | |
-> | |
"The WT_CURSOR object (handle?) is the interface to a cursor" | |
This happens in a few places, we should probably search for "struct" and | |
consistently switch to object or handle, for example, there's a page | |
"WT_CURSOR Struct Reference". | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WT_CURSOR Struct Reference | |
Do we need language on the resulting cursor position after each method? | |
We need language that "on failure", cursor position is undetermined. | |
Apps that care need to either dup the cursor, or we could offer a config | |
string that dups on all ops for you? | |
I don't think the list of Examples really helps -- it's kind of a laundry | |
list. I'd suggest a single Example that looks something like: | |
Example: | |
/* Close the cursor. */ | |
ret = cursor->close(cursor); | |
Obviously, it's more complicated for more complicated methods, but | |
something to show use on every method will help us in the field, I | |
believe. I expect we need an example for each configuration string, | |
too, can we stuff that into the config string explanations? | |
Since we're using method names for next, prev and so on, shouldn't we | |
use method names for the exactp argument to search, that is, search | |
(exact match), range_next (smallest key larger than search key), range_prev | |
(largest key smaller than search key)? I would argue exact matches are | |
what almost all apps want (BDB didn't add search_range for a long time, | |
IIRC), so why make programmers know about something they won't care | |
about? That gets rid of the "exact >= 0" magic in ex_call_center.c, | |
for example. | |
There's no wording on the behavior of insert/update in the face of | |
existing/non-existing records. | |
I didn't see anything to deal with duplicate sets? (All of BDB's | |
prev-dup, next-dup, no-next-dup blah, blah, blah.) | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WT_SESSION Struct Reference | |
Do we need a salvage table op? | |
If add_schema can be used to change schema, maybe set_schema is better? | |
Why is checkpoint here, as opposed to being a WT_CONNECTION method? | |
Is it standard that processes can't have multiple txns going at the same | |
time? Or, I guess you just open multiple sessions, OK. The txn_begin | |
method needs to fail if you have one already, that's a programming error. | |
Ditto rollback_txn if there's no txn. | |
"column set" feels like an undefined term to me. | |
The trailing "(multiple)" wording isn't clear. | |
Using [xxx] for the default isn't clear. Can't we explicitly call out | |
the default ("the isolation level for this transaction "serializable" | |
or "snapshot" or "read-committed" or "read-uncommitted"; default is | |
"snapshot"). | |
"old log files" | |
-> | |
"log files no longer needed for transactional durability" | |
Why have 0/1 for checkpoint "archive", "force", flush_cache or flush_log? | |
Have a default, and if you want to change it, the keyword changes it? | |
Ditto WT_SESSION::create_tables "exclusive" keyword. Why do we need | |
it to have two values, it has only one possible meaning ("I want exclusive | |
access"). | |
Parameters to configuration strings aren't sorted? | |
I think the "overwrite" keyword should be a per-operation flag, not a | |
per-cursor flag. I'd go with 3 method names, myself: insert, insert_update, | |
update? | |
If we're going to allow the dup'd cursor to change stuff (for example, | |
the encoding), should we have a more general interface? Maybe open_cursor | |
needs takes an optional dup-cursor argument, where stuff gets copied | |
from an existing cursor, but you can also change stuff. For example, | |
"open_cursor" has a "dup" config string, but dup_cursor doesn't allow | |
you to change that behavior in the duplicated cursor. Rather than have | |
the dup-cursor method track the open-cursor's arguments (and have to | |
explain which ones can be over-ridden), a single method might be simpler, | |
where the WT_CURSOR *entry arg "initializes the opened cursor to reference | |
the same table entry as the specified cursor, with the same modes as the | |
specified cursor, but modified by the new cursor's configuration"? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WT_CONNECTION Struct Reference | |
WT_CONNECTION::load_extention -- I wouldn't support a default, maintenance | |
programmers will hate it (I changed the name, and now it doesn't work?). | |
Force the programmer to set the name. | |
Is it useful to be able to set the error handler per session? I was | |
expecting apps to set an error handler outside of wiredtiger (so, it's | |
a function that can't fail), and then it would be used for the life of | |
the app. Then WT_CONNECTION::open_session doesn't need the argument. | |
I must be misunderstanding something: I don't see any functions that | |
create the WT_CURSOR_FACTORY or WT_ERROR_HANDLER handles? | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== WT_COLLATOR struct reference | |
Why does the compare function need a WT_SESSION handle, wasn't that | |
copied into the WT_COLLATOR structure when it was created? | |
And how do you create a WT_COLLATOR structure? | |
And how do you specify different collators for the keys and duplicate | |
values? (I'm missing the connection between WT_COLLATORs and the table | |
create?) | |
I think most of these comments apply to the WT_EXTRACTOR Struct Ref | |
page as well. | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== Examples | |
We should say what each example is intended to show. | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
=== ex_config.c | |
We probably need "ret =" in front of the create_table & begin_txn | |
statements. (These lines were copied to a couple of places in | |
other example programs.) | |
Oh, and the top of ex_transaction.c says "ex_hello.c", the top of | |
ex_thread.c says "ex_access.c", ex_schema.c says "ex_column.c". | |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | |
Random comments -- | |
If you put the Copyright statement on a line by itself, it makes it | |
easier to automatically upgrade them. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment