Created
April 29, 2010 20:35
-
-
Save mrjjwright/384199 to your computer and use it in GitHub Desktop.
The blueprint for a radically new version of my project NoSQLite
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Topic | |
- NoSQLite | |
- Consider 3 useful things: | |
- SQLite - SQLite is itself a great datastore because it is a | |
fast and powerful SQL database in one file and is widely | |
deployed and supported. | |
- JSON - JSON is a simple and cruft free way to describe and | |
transport objects. That is why programmers love it. There is | |
good support for it in virtually every language. | |
- HTTP - HTTP allows things to connect and talk to each other. | |
Every language supports it. | |
- So 3 things that are really useful, are widely deployed, | |
supported in most every language, and have minimal dependencies. | |
Hmm? Sounds like they should be combined. | |
- Many data stores use HTTP and JSON but they combine with them | |
something not so lightweight and portable. | |
- This makes useful only on bigger machines that can run | |
things like Erlang (I am looking at you CouchDB). | |
- The beauty of SQLite is that it runs on virtually any device. | |
It's really small. (And the same can be said for JSON and | |
HTTP). | |
- Since SQLite is a great datastore it serves as an great | |
foundation for other data stores on top of it. | |
- Let's combine SQLite, HTTP and JSON to make a really cool | |
datastore. | |
- So what kind of datastore do we want? | |
- One that just stores my objects with minimal fuss. | |
- Not just a key-value store, I need to search for things a bit | |
more ad-hoc and I don't want to give up SQL. | |
- One built on SQLite and that doesn't take away most of the | |
power of SQLite. | |
- One that talks HTTP and JSON and come with a nice REST | |
interface. | |
- One that treats other identical dbs as peers and is able to | |
sync with each automatically over HTTP. | |
- Why can I sync my version control repositories like git | |
but not my databases? | |
- How does CouchDB automatically sync? | |
- Will somebody please build something that syncs | |
automatically on top of SQLite? | |
- One that tracks changes to my content and gives me a history? | |
- My version control system gives a history of every | |
commit, why doesn't my db? | |
- My version control system lets me rollback to previous | |
versions of files, why doesn't my db let me rollback | |
objects? | |
- Are you willing to give up a little something on top of pure | |
SQLite to make all this happen? | |
- The ability to update your objects directly in SQL. | |
- A little bit bigger SQLite file size. | |
- Introducing NoSQLite. | |
- Features. | |
- NoSQLite talks JSON. | |
- Exposes a REST oriented HTTP based JSON API. | |
- Translate your programming objects to JSON and pass them | |
off for automatic storage in NoSQLite. | |
- Allows the API to work consistently across many languages | |
and HTTP. | |
- Each JSON object is stored in one row in a single table. | |
- JSON arrays are broken down into multiple rows. | |
- Each attribute of the JSON object is stored in one column. | |
- NoSQLite has a powerful JSON predicate search interface. | |
- This isn't just a key value store. | |
- Common SQL queries are easily realized with | |
predicates. | |
- You can execute arbitrary SQL too from JSON, why not? | |
- Results are returned in JSON format. | |
- NoSQLite is SQLite. | |
- It sounds contradictory but SQL and the relational model | |
are our friends, we use them for you. | |
- NoSQLite is built on top of SQL and the relational | |
model so you don't have to. But you still get most | |
of the benefits such as ad-hoc queries. | |
- Since NoSQLite is a plain old simple SQLite file | |
underneath you get the full power of SQL and can access | |
SQLite via the many programming wrappers available as | |
well. | |
- www.sqlite.org—wiki | |
<http://www.sqlite.org/cvstrac/wiki?p=SqliteWrappers> | |
- NoSQLite is versioned. | |
- NoSQLite tracks changes you make to your objects in a | |
way that you might find in a version control system like | |
git. | |
- NoSQLite tracks immutable snapshots (versions) of your | |
objects and the commits you make to the database. | |
- The snapshots are tamper proof via SHA-1 hashes and | |
are linked together to form a complete history of | |
your database. | |
- You can rollback to previous versions of your database or | |
to an old version of an object. | |
- Tracking is done in special NoSQLite tables that don't | |
affect your tables | |
- You can turn off this tracking if you don't want or need | |
it. | |
- NoSQLite is tamper proof. | |
- With this feature, uuids become SHA-1 hashes are kept of | |
your content. | |
- 2 databases can be compared to see if one has been | |
tampered with. | |
- This feature can be turned off and non SHA-1 hash uuids | |
can be used if you don't care. | |
- NoSQLite works offline and syncs. | |
- Works in disconnected, offline mode. | |
- NoSQLite can push and pull from other NoSQLite instances | |
over HTTP. | |
- Uses simple syncing algorithms adapted from version | |
control systems like git and Fossil. | |
- Object versions with a common parent can be automatically | |
resolved via default policies or put in a special | |
conflict table for your application to resolve. | |
- NoSQLite is mobile. | |
- No dependencies that prevent it from working on mobile | |
devices. | |
- It will work on the iPhone or iPad (where are the really | |
usable apps are). | |
- NoSQLite is HTML5 compatible. | |
- You can use it an HTML5 compatible browser with offline | |
storage. | |
- NoSQLite is Core Data compatible. | |
- It is flexible enough to work with the CoreData schema so | |
you can use it in the Mac, iPhone or iPad. | |
- Sync an iPhone, with an iPad, with a desktop, with a | |
hosted server automatically! | |
- NoSQLite is fun, fuss-free and forgiving. | |
- These are qualities of SQLite and NoSQLite believes | |
strongly in them as well. | |
- NoSQLite strives to anticipate your needs and stay out of | |
your way. | |
- Hopefully using NoSQLite gives you a little bit of | |
unexpected delight. | |
- Basic Usage | |
- NoSQLite works virtually anywhere you can use SQLite. | |
- At it's most basic here is how you use NoSQLite: | |
- You use the API to store objects in one db. | |
- You use the API to store objects in another identical db. | |
- You can clone the the first db if you want or just | |
store the exact same types of objects. | |
- If you want to sync the 2 NoSQLite dbs you start one as a | |
server. | |
- NoSQLite has a built-in webserver. | |
- If you are in a browser or on a mobile device you | |
might not be able to host a webserver. That's ok, | |
only one needs to be a server in order to sync. | |
- You use the API to push and pull your objects from one db | |
to another. | |
- Or sync, which is a push and pull at the same time. | |
- Give me a flavor of the API | |
- There is the HTTP API | |
- NoSQLite must be running in server mode. | |
- Can use from any language that speaks HTTP and JSON | |
- Each call is specified with a URL and an optional | |
JSON body | |
- save | |
- find | |
- delete | |
- And there are 2 basic NoSQLite API wrappers: | |
- C based wrapper | |
- All the core functionality of NoSQLite written in | |
C on top of the NoSQLite db schema | |
- Built in C based webserver and JSON parsing on | |
top of SQLite. | |
- A clean C interface that provides a foundation | |
for other language bindings | |
- Javascript based wrapper | |
- This wrapper implements much of the core | |
functionality of NoSQLite in pure Javascript so | |
it can be used in a browser. | |
- More wrappers will hopefully be written in other | |
languages soon. | |
- How To Get Started | |
- Download NoSQLite | |
- NoSQLite is a single executable that includes SQLite, a | |
built in webserver, and a web interface. | |
- Use the HTTP interface | |
- Just run the command 'nosqlite server' and goto / | |
<http://localhost:5000/> and you can access the NoSQLite | |
web interface. | |
- A complete interface to SQLite! | |
- Add and remove objects | |
- Investigate the NoSQLite API with the API Explorer. | |
- You can easily add and remove objects with this | |
console. | |
- View the history of your database. | |
- Serves as a full featured SQLite admin interface as | |
well! | |
- Or download a wrapper | |
- C wrapper | |
- Javascript wrapper | |
- Documentation | |
- Logical Description | |
- Logical Data Model | |
- Objects | |
- Anything that can be described by a JSON object | |
- Object contents are immutable. They are | |
identified by a SHA-1 hash of their contents | |
(defined below in the schema). | |
- Each object when updated logically creates a new | |
object with a reference to it's parent | |
- Currently each time you update a new row is | |
inserted in the database | |
- Commits - a special object maintained by NoSQLite | |
that represents database commits | |
- A representation of a regular database commit, | |
i.e. one ore more insert or deletes of user | |
tracked tables within a transaction, excepting | |
updates or replaces. | |
- The following items are tracked for each commit | |
- The unique db_code identify the db on which | |
the commit was made | |
- A timestamp of the commit | |
- The objects which make up the commit | |
- Optionally, users can pass in an object | |
describing other attributes to track in the | |
commit. | |
- Possible things to track | |
- The user who made the commit | |
- A comment from a user | |
- The geo location where the commit was | |
made. | |
- Some other attribute that makes sense to | |
the application. | |
- Commit objects are tracked like any other | |
NoSQLite object | |
- Together all the commits form a history of all | |
changes to the database. | |
- The latest commit uuid for a db is referred | |
to as the head. This SHA-1 hash represents | |
all changes every made to the db ensuring the | |
db is tamper proof. | |
- Like Git this hash should be stored | |
somewhere safe if security a concern | |
- If 2 databases have the same db head they are | |
representing that they contain the same data. | |
- Syncing | |
- NoSQLite offers a git-like syncing interface built on | |
it's tamper-proof immutable foundation. | |
- During syncing, the remote DB does not maintain | |
state, the local db keeps track of what it needs | |
across HTTP request/response cycles. | |
- The remote db must be running inside a webserver, | |
either the built in NoSQLite one or another one. | |
- The local db acts as an HTTP client. | |
- 3 sync calls | |
- pull - the client pulls all objects that it | |
doesn't have yet. | |
- push - the client pushes to the server all | |
objects it doesn't have. | |
- sync - a push and pull combined at the same time | |
- Communication between dbs is over HTTP and all data | |
is exchanged in a canonical JSON format. | |
- During a syncing operation, remote and local dbs | |
logically exchange unordered collections of objects | |
until both dbs have the same set and conflicts are | |
resolved. | |
- At the heart of the syncing is an efficient little | |
mechanism for exchanging objects called clustering. | |
- This mechanism was adapted from the Fossil | |
version control system, by D. Richard Hipp, the | |
founder of SQLite. | |
- www.fossil-scm.org—sync.wiki | |
<http://www.fossil-scm.org/index.html/doc/tip/ | |
www/sync.wiki> | |
- Understanding this mechanism makes it easy to | |
understand all the sync operations. | |
- The basics of clustering | |
- Each db creates objects during it's | |
lifetime as normal storing them in their | |
db. | |
- Each time an object is created, a | |
reference to the object is also stored in | |
a table called unclustered | |
- When the number of objects in this table | |
reaches a certain limit, the db takes all | |
the objects currently in it, aggregates | |
references to them in a new object called | |
a cluster, deletes all the objects in | |
unclustered, and then stores the new | |
cluster object. | |
- When the the new cluster object is | |
stored, a reference to it is inserted | |
into the unclustered table. This is | |
because the unclustered table holds | |
references to objects that are not | |
referenced in a cluster which is the case | |
with the new cluster object itself. | |
- So now the unclustered table is empty | |
except for the one new cluster object | |
refence. The db keeps adding new objects | |
until the limit is hit again and then | |
another cluster is created. This one | |
holds all the new objects plus the one | |
cluster. | |
- Now the unclustered table once again | |
holds 1 cluster object reference and | |
there are 2 cluster objects not | |
referenced by this table. | |
- This process repeats itself as more | |
objects are created over the lifetime of | |
a db. | |
- To review: | |
- Each cluster holds at most one cluster | |
plus (cluster_limit - 1) other objects. | |
- The sync tree is made up of a straight | |
line branch of clusters like this | |
terminating in leaf nodes of unclustered | |
objects | |
- The sync tree could look something | |
like this (where a big O is a cluster | |
object and a little o is a regular | |
object). | |
- O | |
- O | |
- O | |
- O | |
- O | |
- ooooooooooO | |
- As will be seen below, this data structure | |
enables efficient syncing between a local and | |
remote db in a 1 or more HTTP | |
request/response cycles that continue until | |
all objects and their contents have been | |
received. | |
- A few other things to note before stepping through | |
the sync steps. | |
- As D.Richard Hipp points out with Fossil, the | |
actual clustering can take place only by the | |
remote db when a pull is requested instead of all | |
dbs creating clusters as they go. This way if | |
there is practically a master-slave like | |
configuration then a minimum amount of cluster | |
objects will be created. But this only an | |
optimization. | |
- There are couple other tables that are useful | |
during this process to keep track of what has | |
been sent and received. | |
- phantom - holds references to objects that a | |
db knows exist in another db but hasn't | |
received the contents for yet. | |
- Clusters for example are objects that | |
tell of the existence of other objects. | |
When a db processes a cluster, it will | |
creates phantoms for each object | |
referenced that does not exist. During | |
the next loop through the | |
request/response cycle it will request | |
those objects. | |
- already_sent - objects already sent to the | |
other db. This is a temp table and only | |
exists for the lifetime of the sync. It | |
serves as a backup measure to ensure that | |
objects are never sent twice. | |
- Syncing works as a series of logical set | |
operations | |
- SQL queries and tables are great for | |
implementing these set operations safely and | |
efficiently without a lot of code. | |
- There are no timestamps, last synced point, or | |
other state kept between dbs. | |
- What is below does not take into account | |
security, content verification and the exact | |
messages that are exchanged. It's only a high | |
level logical description, although accurate. | |
- pull | |
- A local database makes a pull request to a | |
remote db | |
- The local db requests all objects in it's | |
phantom table, a table with references to | |
objects that the db knows about but hasn't | |
received the contents of yet. | |
- The remote db processes the request | |
- It sends all the objects requested by the | |
local db. | |
- If the number of objects in the unclustered | |
table is >= the cluster limit, the remote db | |
creates a cluster, using the optimization | |
mentioned above | |
- It sends all objects in the unclustered table | |
that are not in the phantom table. | |
- The number of objects sent is a practical | |
limit which might 80% of the time be the | |
number of objects created since the last | |
sync allowing for single round trip sync. | |
- The local db processes the response | |
- Add any never before seen objects to it's db | |
- Creates phantoms for any never before seen | |
objects referenced in a cluster. | |
- Ignores all other objects. | |
- Restarts the request/response cycle from the | |
beginning until there are no more phantom | |
objects left to request. | |
- push | |
- The local db sends to the remote db: | |
- a push request | |
- all objects in the unsent table that have | |
never before been sent. | |
- all objects in the unclustered table that are | |
not in the phantom table | |
- The remote db processes the request. | |
- Adds any never seen before objects sent by | |
the local to it's db | |
- Creates phantoms for any never before seen | |
objects referenced in a cluster. | |
- Responds back with a request for any objects | |
in it's phantom table. | |
- The local db processes the response | |
- If the remote db says it needs objects the | |
request/response cycle is started from the | |
top by sending the remote db those objects | |
- sync | |
- A sync is just a pull and push that happen at the | |
same time. | |
- Conflict identification and resolution. | |
- Schema | |
- User tables | |
- No special columns - just user data | |
- Dates are stored in the ISO-8609 format | |
- nsl_object- a NoSQLite tracked object | |
- tbl_name - the name of the table | |
- oid - the rowid/primary key of the object in it's | |
table | |
- uuid - a SHA-1 hash of the object's attributes | |
- The SQLite piped export format is used as | |
canonical representation of rows for computing | |
hashes of objects across all NoSQLite dbs | |
- Specifically. the SHA-1 hash of | |
"col1|col2|col3..." as seen by running | |
sqlite3 foo.db "select * from table_name" | |
- The columns should appear in the same | |
order they are in the create table sql. | |
- Other alternatives were considered like JSON. | |
The disadvantage of all of them was that | |
they require intermediate software to compute | |
the uuid. | |
- With this format any language, even bash, can | |
easily compute the hash in a few lines | |
against the SQLite file. | |
- The data in the underlying rows itself doesn't | |
matter, or the number of columns as long as: | |
- All previous rows inserted are kept immutable. | |
- You can't remove columns e.g. since | |
SQLite doesn't allow it. | |
- The data hashes out the same across all the | |
SQLite dbs. | |
- It is up to the user to come up with a | |
standard for their project for dates and | |
other fields that could be stored | |
differently across SQLite dbs. | |
- nsl_plink - maps objects to their parent or parents | |
- pid - the rowid of the parent object | |
- cid - the rowid of the child object | |
- is_primary - is parent the primary parent | |
- nsl_commit - represents a history of changes to objects | |
in the db. | |
- cid - the rowid, primary key, local to this db. | |
- uuid - a SHA-1 hash of the commit columns in this | |
table, excluding the cid | |
- commit columns | |
- objects_uuid - a SHA-1 hash of all of the objects | |
referenced in the commit. | |
- Specifically a SHA-1 hash of: | |
- A unique letter describing whether the | |
object was inserted, deleted, or updated | |
in the commit. | |
- I - Inserted | |
- D - Deleted | |
- U - Updated | |
- A space | |
- The uuid of the object | |
- This ties the integrity of the commit to the | |
integrity of the objects which comprise it | |
- created_at - the time of the commit. | |
- db_code - a uuid that uniquely identifies each db | |
- Helper tables - these tables are lower-level helper | |
tables used by NoSQLite | |
- nsl_unclustered - any object not referenced yet in a | |
cluster | |
- nsl-cluster - a special type of object that | |
identifies one or more other objects for more | |
efficient syncing | |
- nsl-phantom - objects whose contents have not yet | |
been received from another db | |
- nsl_remote - aliases for remote NoSQLite dbs | |
- name - the name of the remote NoSQLite db. | |
Unique. | |
- url - the url to the remote NoSQLite db. | |
- API | |
- save | |
- remove | |
- beginTransaction | |
- commit | |
- push | |
- pull | |
- sync | |
- merge | |
- clone | |
- migrate_table | |
- FAQ | |
- Is NoSQLite a relational database or an object database? | |
- NoSQLite maintains a simple object database on top of | |
the unique relational storage that SQLite offers. | |
Developers can treat NoSQLite as an object database, | |
simply inserting objects, and getting them back out by | |
key. NoSQLite doesn't take the relational model too | |
far, it stores objects in a single table so it doesn't | |
have to do joins. But because the data is stored | |
relationally and specifically because it is stored in | |
SQLite they can also query it with NoSQLite predicates or | |
directly via SQL. |
If anybody wants the OmniOutliner file from which this was exported, let me know.
this isn't too crazy. see http://github.com/mikeal/IDBCouch for a project which is building a lightweight in-browser CouchDB using HTML5 APIs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Will, this really missing tons of things and is illogical, hate to show you my brain in progress but wanted to give you a flavor of where I am at.