Skip to content

Instantly share code, notes, and snippets.

@mrjjwright
Created April 29, 2010 20:35
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mrjjwright/384199 to your computer and use it in GitHub Desktop.
Save mrjjwright/384199 to your computer and use it in GitHub Desktop.
The blueprint for a radically new version of my project NoSQLite
Topic
- NoSQLite
- Consider 3 useful things:
- SQLite - SQLite is itself a great datastore because it is a
fast and powerful SQL database in one file and is widely
deployed and supported.
- JSON - JSON is a simple and cruft free way to describe and
transport objects. That is why programmers love it. There is
good support for it in virtually every language.
- HTTP - HTTP allows things to connect and talk to each other.
Every language supports it.
- So 3 things that are really useful, are widely deployed,
supported in most every language, and have minimal dependencies.
Hmm? Sounds like they should be combined.
- Many data stores use HTTP and JSON but they combine with them
something not so lightweight and portable.
- This makes useful only on bigger machines that can run
things like Erlang (I am looking at you CouchDB).
- The beauty of SQLite is that it runs on virtually any device.
It's really small. (And the same can be said for JSON and
HTTP).
- Since SQLite is a great datastore it serves as an great
foundation for other data stores on top of it.
- Let's combine SQLite, HTTP and JSON to make a really cool
datastore.
- So what kind of datastore do we want?
- One that just stores my objects with minimal fuss.
- Not just a key-value store, I need to search for things a bit
more ad-hoc and I don't want to give up SQL.
- One built on SQLite and that doesn't take away most of the
power of SQLite.
- One that talks HTTP and JSON and come with a nice REST
interface.
- One that treats other identical dbs as peers and is able to
sync with each automatically over HTTP.
- Why can I sync my version control repositories like git
but not my databases?
- How does CouchDB automatically sync?
- Will somebody please build something that syncs
automatically on top of SQLite?
- One that tracks changes to my content and gives me a history?
- My version control system gives a history of every
commit, why doesn't my db?
- My version control system lets me rollback to previous
versions of files, why doesn't my db let me rollback
objects?
- Are you willing to give up a little something on top of pure
SQLite to make all this happen?
- The ability to update your objects directly in SQL.
- A little bit bigger SQLite file size.
- Introducing NoSQLite.
- Features.
- NoSQLite talks JSON.
- Exposes a REST oriented HTTP based JSON API.
- Translate your programming objects to JSON and pass them
off for automatic storage in NoSQLite.
- Allows the API to work consistently across many languages
and HTTP.
- Each JSON object is stored in one row in a single table.
- JSON arrays are broken down into multiple rows.
- Each attribute of the JSON object is stored in one column.
- NoSQLite has a powerful JSON predicate search interface.
- This isn't just a key value store.
- Common SQL queries are easily realized with
predicates.
- You can execute arbitrary SQL too from JSON, why not?
- Results are returned in JSON format.
- NoSQLite is SQLite.
- It sounds contradictory but SQL and the relational model
are our friends, we use them for you.
- NoSQLite is built on top of SQL and the relational
model so you don't have to. But you still get most
of the benefits such as ad-hoc queries.
- Since NoSQLite is a plain old simple SQLite file
underneath you get the full power of SQL and can access
SQLite via the many programming wrappers available as
well.
- www.sqlite.org—wiki
<http://www.sqlite.org/cvstrac/wiki?p=SqliteWrappers>
- NoSQLite is versioned.
- NoSQLite tracks changes you make to your objects in a
way that you might find in a version control system like
git.
- NoSQLite tracks immutable snapshots (versions) of your
objects and the commits you make to the database.
- The snapshots are tamper proof via SHA-1 hashes and
are linked together to form a complete history of
your database.
- You can rollback to previous versions of your database or
to an old version of an object.
- Tracking is done in special NoSQLite tables that don't
affect your tables
- You can turn off this tracking if you don't want or need
it.
- NoSQLite is tamper proof.
- With this feature, uuids become SHA-1 hashes are kept of
your content.
- 2 databases can be compared to see if one has been
tampered with.
- This feature can be turned off and non SHA-1 hash uuids
can be used if you don't care.
- NoSQLite works offline and syncs.
- Works in disconnected, offline mode.
- NoSQLite can push and pull from other NoSQLite instances
over HTTP.
- Uses simple syncing algorithms adapted from version
control systems like git and Fossil.
- Object versions with a common parent can be automatically
resolved via default policies or put in a special
conflict table for your application to resolve.
- NoSQLite is mobile.
- No dependencies that prevent it from working on mobile
devices.
- It will work on the iPhone or iPad (where are the really
usable apps are).
- NoSQLite is HTML5 compatible.
- You can use it an HTML5 compatible browser with offline
storage.
- NoSQLite is Core Data compatible.
- It is flexible enough to work with the CoreData schema so
you can use it in the Mac, iPhone or iPad.
- Sync an iPhone, with an iPad, with a desktop, with a
hosted server automatically!
- NoSQLite is fun, fuss-free and forgiving.
- These are qualities of SQLite and NoSQLite believes
strongly in them as well.
- NoSQLite strives to anticipate your needs and stay out of
your way.
- Hopefully using NoSQLite gives you a little bit of
unexpected delight.
- Basic Usage
- NoSQLite works virtually anywhere you can use SQLite.
- At it's most basic here is how you use NoSQLite:
- You use the API to store objects in one db.
- You use the API to store objects in another identical db.
- You can clone the the first db if you want or just
store the exact same types of objects.
- If you want to sync the 2 NoSQLite dbs you start one as a
server.
- NoSQLite has a built-in webserver.
- If you are in a browser or on a mobile device you
might not be able to host a webserver. That's ok,
only one needs to be a server in order to sync.
- You use the API to push and pull your objects from one db
to another.
- Or sync, which is a push and pull at the same time.
- Give me a flavor of the API
- There is the HTTP API
- NoSQLite must be running in server mode.
- Can use from any language that speaks HTTP and JSON
- Each call is specified with a URL and an optional
JSON body
- save
- find
- delete
- And there are 2 basic NoSQLite API wrappers:
- C based wrapper
- All the core functionality of NoSQLite written in
C on top of the NoSQLite db schema
- Built in C based webserver and JSON parsing on
top of SQLite.
- A clean C interface that provides a foundation
for other language bindings
- Javascript based wrapper
- This wrapper implements much of the core
functionality of NoSQLite in pure Javascript so
it can be used in a browser.
- More wrappers will hopefully be written in other
languages soon.
- How To Get Started
- Download NoSQLite
- NoSQLite is a single executable that includes SQLite, a
built in webserver, and a web interface.
- Use the HTTP interface
- Just run the command 'nosqlite server' and goto /
<http://localhost:5000/> and you can access the NoSQLite
web interface.
- A complete interface to SQLite!
- Add and remove objects
- Investigate the NoSQLite API with the API Explorer.
- You can easily add and remove objects with this
console.
- View the history of your database.
- Serves as a full featured SQLite admin interface as
well!
- Or download a wrapper
- C wrapper
- Javascript wrapper
- Documentation
- Logical Description
- Logical Data Model
- Objects
- Anything that can be described by a JSON object
- Object contents are immutable. They are
identified by a SHA-1 hash of their contents
(defined below in the schema).
- Each object when updated logically creates a new
object with a reference to it's parent
- Currently each time you update a new row is
inserted in the database
- Commits - a special object maintained by NoSQLite
that represents database commits
- A representation of a regular database commit,
i.e. one ore more insert or deletes of user
tracked tables within a transaction, excepting
updates or replaces.
- The following items are tracked for each commit
- The unique db_code identify the db on which
the commit was made
- A timestamp of the commit
- The objects which make up the commit
- Optionally, users can pass in an object
describing other attributes to track in the
commit.
- Possible things to track
- The user who made the commit
- A comment from a user
- The geo location where the commit was
made.
- Some other attribute that makes sense to
the application.
- Commit objects are tracked like any other
NoSQLite object
- Together all the commits form a history of all
changes to the database.
- The latest commit uuid for a db is referred
to as the head. This SHA-1 hash represents
all changes every made to the db ensuring the
db is tamper proof.
- Like Git this hash should be stored
somewhere safe if security a concern
- If 2 databases have the same db head they are
representing that they contain the same data.
- Syncing
- NoSQLite offers a git-like syncing interface built on
it's tamper-proof immutable foundation.
- During syncing, the remote DB does not maintain
state, the local db keeps track of what it needs
across HTTP request/response cycles.
- The remote db must be running inside a webserver,
either the built in NoSQLite one or another one.
- The local db acts as an HTTP client.
- 3 sync calls
- pull - the client pulls all objects that it
doesn't have yet.
- push - the client pushes to the server all
objects it doesn't have.
- sync - a push and pull combined at the same time
- Communication between dbs is over HTTP and all data
is exchanged in a canonical JSON format.
- During a syncing operation, remote and local dbs
logically exchange unordered collections of objects
until both dbs have the same set and conflicts are
resolved.
- At the heart of the syncing is an efficient little
mechanism for exchanging objects called clustering.
- This mechanism was adapted from the Fossil
version control system, by D. Richard Hipp, the
founder of SQLite.
- www.fossil-scm.org—sync.wiki
<http://www.fossil-scm.org/index.html/doc/tip/
www/sync.wiki>
- Understanding this mechanism makes it easy to
understand all the sync operations.
- The basics of clustering
- Each db creates objects during it's
lifetime as normal storing them in their
db.
- Each time an object is created, a
reference to the object is also stored in
a table called unclustered
- When the number of objects in this table
reaches a certain limit, the db takes all
the objects currently in it, aggregates
references to them in a new object called
a cluster, deletes all the objects in
unclustered, and then stores the new
cluster object.
- When the the new cluster object is
stored, a reference to it is inserted
into the unclustered table. This is
because the unclustered table holds
references to objects that are not
referenced in a cluster which is the case
with the new cluster object itself.
- So now the unclustered table is empty
except for the one new cluster object
refence. The db keeps adding new objects
until the limit is hit again and then
another cluster is created. This one
holds all the new objects plus the one
cluster.
- Now the unclustered table once again
holds 1 cluster object reference and
there are 2 cluster objects not
referenced by this table.
- This process repeats itself as more
objects are created over the lifetime of
a db.
- To review:
- Each cluster holds at most one cluster
plus (cluster_limit - 1) other objects.
- The sync tree is made up of a straight
line branch of clusters like this
terminating in leaf nodes of unclustered
objects
- The sync tree could look something
like this (where a big O is a cluster
object and a little o is a regular
object).
- O
- O
- O
- O
- O
- ooooooooooO
- As will be seen below, this data structure
enables efficient syncing between a local and
remote db in a 1 or more HTTP
request/response cycles that continue until
all objects and their contents have been
received.
- A few other things to note before stepping through
the sync steps.
- As D.Richard Hipp points out with Fossil, the
actual clustering can take place only by the
remote db when a pull is requested instead of all
dbs creating clusters as they go. This way if
there is practically a master-slave like
configuration then a minimum amount of cluster
objects will be created. But this only an
optimization.
- There are couple other tables that are useful
during this process to keep track of what has
been sent and received.
- phantom - holds references to objects that a
db knows exist in another db but hasn't
received the contents for yet.
- Clusters for example are objects that
tell of the existence of other objects.
When a db processes a cluster, it will
creates phantoms for each object
referenced that does not exist. During
the next loop through the
request/response cycle it will request
those objects.
- already_sent - objects already sent to the
other db. This is a temp table and only
exists for the lifetime of the sync. It
serves as a backup measure to ensure that
objects are never sent twice.
- Syncing works as a series of logical set
operations
- SQL queries and tables are great for
implementing these set operations safely and
efficiently without a lot of code.
- There are no timestamps, last synced point, or
other state kept between dbs.
- What is below does not take into account
security, content verification and the exact
messages that are exchanged. It's only a high
level logical description, although accurate.
- pull
- A local database makes a pull request to a
remote db
- The local db requests all objects in it's
phantom table, a table with references to
objects that the db knows about but hasn't
received the contents of yet.
- The remote db processes the request
- It sends all the objects requested by the
local db.
- If the number of objects in the unclustered
table is >= the cluster limit, the remote db
creates a cluster, using the optimization
mentioned above
- It sends all objects in the unclustered table
that are not in the phantom table.
- The number of objects sent is a practical
limit which might 80% of the time be the
number of objects created since the last
sync allowing for single round trip sync.
- The local db processes the response
- Add any never before seen objects to it's db
- Creates phantoms for any never before seen
objects referenced in a cluster.
- Ignores all other objects.
- Restarts the request/response cycle from the
beginning until there are no more phantom
objects left to request.
- push
- The local db sends to the remote db:
- a push request
- all objects in the unsent table that have
never before been sent.
- all objects in the unclustered table that are
not in the phantom table
- The remote db processes the request.
- Adds any never seen before objects sent by
the local to it's db
- Creates phantoms for any never before seen
objects referenced in a cluster.
- Responds back with a request for any objects
in it's phantom table.
- The local db processes the response
- If the remote db says it needs objects the
request/response cycle is started from the
top by sending the remote db those objects
- sync
- A sync is just a pull and push that happen at the
same time.
- Conflict identification and resolution.
- Schema
- User tables
- No special columns - just user data
- Dates are stored in the ISO-8609 format
- nsl_object- a NoSQLite tracked object
- tbl_name - the name of the table
- oid - the rowid/primary key of the object in it's
table
- uuid - a SHA-1 hash of the object's attributes
- The SQLite piped export format is used as
canonical representation of rows for computing
hashes of objects across all NoSQLite dbs
- Specifically. the SHA-1 hash of
"col1|col2|col3..." as seen by running
sqlite3 foo.db "select * from table_name"
- The columns should appear in the same
order they are in the create table sql.
- Other alternatives were considered like JSON.
The disadvantage of all of them was that
they require intermediate software to compute
the uuid.
- With this format any language, even bash, can
easily compute the hash in a few lines
against the SQLite file.
- The data in the underlying rows itself doesn't
matter, or the number of columns as long as:
- All previous rows inserted are kept immutable.
- You can't remove columns e.g. since
SQLite doesn't allow it.
- The data hashes out the same across all the
SQLite dbs.
- It is up to the user to come up with a
standard for their project for dates and
other fields that could be stored
differently across SQLite dbs.
- nsl_plink - maps objects to their parent or parents
- pid - the rowid of the parent object
- cid - the rowid of the child object
- is_primary - is parent the primary parent
- nsl_commit - represents a history of changes to objects
in the db.
- cid - the rowid, primary key, local to this db.
- uuid - a SHA-1 hash of the commit columns in this
table, excluding the cid
- commit columns
- objects_uuid - a SHA-1 hash of all of the objects
referenced in the commit.
- Specifically a SHA-1 hash of:
- A unique letter describing whether the
object was inserted, deleted, or updated
in the commit.
- I - Inserted
- D - Deleted
- U - Updated
- A space
- The uuid of the object
- This ties the integrity of the commit to the
integrity of the objects which comprise it
- created_at - the time of the commit.
- db_code - a uuid that uniquely identifies each db
- Helper tables - these tables are lower-level helper
tables used by NoSQLite
- nsl_unclustered - any object not referenced yet in a
cluster
- nsl-cluster - a special type of object that
identifies one or more other objects for more
efficient syncing
- nsl-phantom - objects whose contents have not yet
been received from another db
- nsl_remote - aliases for remote NoSQLite dbs
- name - the name of the remote NoSQLite db.
Unique.
- url - the url to the remote NoSQLite db.
- API
- save
- remove
- beginTransaction
- commit
- push
- pull
- sync
- merge
- clone
- migrate_table
- FAQ
- Is NoSQLite a relational database or an object database?
- NoSQLite maintains a simple object database on top of
the unique relational storage that SQLite offers.
Developers can treat NoSQLite as an object database,
simply inserting objects, and getting them back out by
key. NoSQLite doesn't take the relational model too
far, it stores objects in a single table so it doesn't
have to do joins. But because the data is stored
relationally and specifically because it is stored in
SQLite they can also query it with NoSQLite predicates or
directly via SQL.
@mrjjwright
Copy link
Author

Will, this really missing tons of things and is illogical, hate to show you my brain in progress but wanted to give you a flavor of where I am at.

@mrjjwright
Copy link
Author

If anybody wants the OmniOutliner file from which this was exported, let me know.

@jchris
Copy link

jchris commented Jul 29, 2010

this isn't too crazy. see http://github.com/mikeal/IDBCouch for a project which is building a lightweight in-browser CouchDB using HTML5 APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment