Skip to content

Instantly share code, notes, and snippets.

@Krazybug
Last active March 22, 2024 19:28
Show Gist options
  • Star 34 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save Krazybug/5f015c2ee7e39b3faff08d1d1d91f802 to your computer and use it in GitHub Desktop.
Save Krazybug/5f015c2ee7e39b3faff08d1d1d91f802 to your computer and use it in GitHub Desktop.
Calishot Howto

What is it ?

CALISHOT is a specialised search engine to unearth books on open calibre servers.

It allows you to search ebooks in full text across them or to browse the database by facets: authors, language, year, series, tags ... You can even run your own queries in SQL.

Where is this ?

These servers are often up and down so, for now, the data are regularly updated and new snasphots are posted on ... you know, the first rule of the club.

So, stay tuned!

Here is a list of regular mirrors (Keep in mind that they are not always online)

English books:

  1. Mirror 1
  2. Mirror 2
  3. Mirror 3
  4. Mirror 4

Non English books:

  1. Mirror 1
  2. Mirror 2
  3. Mirror 3
  4. Mirror 4

What's under the hood ?

Well... It's not so glorious (on our side). This is just a simple Sqlite db indexed in full text with this poweful extension.

The Web UI and the server are powered by an awesome project which is able to serve a db as it: Datasette

OK, I just need the dataset. How can I play with it ?

Again, join the club, and download the db !

Then, just use your favorite Sqlite client. We strongly advice you to use another gem of this first class fighter: sqlite-utils (pipx is you friend).

Let say we want Asimov's works:

sqlite-utils index-eng.db --json-cols 'select title, authors, year from summary where instr(authors, "Asimov") >0  order by title' 

Not only you can run regular SQL queries but you can run you queries in full text and you get your json dataset for free:

sqlite-utils index-eng.db --json-cols 'select * from summary_fts  where title  match("robots")'
sqlite-utils index-eng.db --json-cols 'select * from summary_fts  where summary_fts  match("robots")'
sqlite-utils index-eng.db --json-cols 'select * from summary_fts  where summary_fts  match("title:robots and formats:epub")'

or simpler with the new version:

sqlite-utils search --json-cols index-eng.db summary "robots"
sqlite-utils search --json-cols index-eng.db summary "title:robots and formats:epub"

You do prefer a CSV, no worry (jq is also your friend):

sqlite-utils index-eng.db 'select title, authors, year from summary where instr(authors, "asimov") >0  order by authors limit 100' --json-cols | jq -r '.[] | [.title.label, .authors[0], .year] | @csv

Any way to get the search engine self hosted internally ?

  1. Install datasette and it's plugins thanks to virtualenv/pip:
python -m venv calishot
. ./calishot/bin/activate
pip install datasette
pip install datasette-json-html
pip install datasette-pretty-json
  1. Prepare the calishot settings:

Move the sqlite db file to the same directory and then:

cat <<EOF > metadata.json 
{
    "databases": {
      "index": {
        "tables": {
            "summary": {
                "sort": "title",
                "searchmode": "raw"
            }
        }
      }
    }
  }
EOF

You can now run a local test:

datasette serve index.db --config sql_time_limit_ms:10000 --config allow_download:off --config max_returned_rows:2000  --config num_sql_threads:10 --metadata metadata.json

Open your browser to http://localhost:8001/ and check the result.

Any way to setup a mirror ?

Install heroku-cli then :

heroku login -i

datasette publish heroku index.db -n calishot-3 --install=datasette-json-html --install=datasette-pretty-json --extra-options="--config sql_time_limit_ms:10000 --config allow_download:off --config num_sql_threads:10 --config max_returned_rows:500" --metadata metadata.json
@lbe
Copy link

lbe commented Nov 13, 2021

First, Thank You for your labor of love!

I love your front end in Datasette. I would really like to get a copy of you sqlite db. Per above, I think you have it accessible somewhere, but I can't seem to find it. Please advise!

@Krazybug
Copy link
Author

@lbe Regarding your pseudo, I guess I just answered to your question on Reddit.
Let me know in case I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment