Context: static.wiki and Show HN post
We downloaded static.wiki's 40.3 GiB SQLite database of English Wikipedia and created a compressed version of it with sqlite_zstd_vfs, our read/write Zstandard compression layer for SQLite3. The compressed version is 10.4 GiB (26%), and the VFS supports HTTP random access in the spirit of the original (although we don't yet have a WebAssembly build; it's a library for CLI & desktop apps for now). You can try it out on Linux or macOS x86-64:
pip3 install genomicsqlite
genomicsqlite https://f000.backblazeb2.com/file/mlin-public/static.wiki/en.zstd.db \
"select text from wiki_articles where title = 'SQLite'"
(replace the query with any other, or .schema
to see the schema, or omit it to enter the sqlite3
interactive REPL)
sqlite_zstd_vfs is a building block of our Genomics Extension for SQLite, which bundles it along with other domain-specific features. This demonstrates a general-purpose use case of the compression layer originally intended for storing genomics big data.
If you want to download the whole 10 GiB compressed database then please kindly get it from zenodo instead of b2 so that I'm not charged. You can use the genomicsqlite
CLI with the local file instead of the URL as above, or write a program using any of the GenomicSQLite language bindings.
We used the following commands to generate en.zstd.db
from static.wiki's original en.db
:
genomicsqlite en.db --compact --inner-page-KiB 64 --outer-page-KiB 2 --level 19 -o en.zstd.db
genomicsqlite --dbi en.zstd.db
The second command generates a .dbi helper file to be served alongside the main database file, which the extension can use (optionally) to streamline web access.