This is a (hacky) implementation of full-text search in Python for Mastodon. Built and tested with Python 3.10
Errors and disconnections are not handled, so you'll need to implement that for something more robust and production ready. If you just want to monitor certain keywords via SQL queries from a real-time stream, however, the simple implementation may suit your needs.
You can change the instance URL on Line 28 (default: mastodon.social) to fetch the federated timeline from somehwere else, if you'd like. Big, well-federated instances will obviously expose the most rich streams.
Install dependencies:
pip3 install Mastodon.py jsonpickle bs4
Begin capturing stream:
python3 stream.py
You can then open the "streaming.db" SQLite database and begin querying it. Note that only statuses created after you begin capturing the stream will be search-able. Here's an example of how to search:
sqlite3
.open streaming.db
SELECT * FROM statuses WHERE statuses MATCH "cats" ORDER BY created_at DESC;
The above example will return all records matching the term "cats", sorted in reverse chronological order. For more advanced querying options, read the SQLite FTS5 documentation.
Update: Just realized that the Mastodon implementation of Snowflake IDs means that IDs are not guaranteed to be unique across instances... so the
on_delete
function, which relies on the ID field may not work correctly sometimes. Hard to tell. Better solution would be to update to delete based on URL of the post or some concatenation of fields to make it a universally unique identifier.