You have a database table with the daily number of hits to your web site. Before you run some calculations, you'd like to check the time span in the database. Here's an example code:
https://gist.github.com/8a60e0aef08e4ae8fd6f9e31daa0b594
When you run this code, you get the following error:
https://gist.github.com/c6502c905cb49415484284b45ced879b
What!? SQL does have a TIMESTAMP
type, and it should be translated to Python's datetime
(or in pandas, pd.Timestamp
).
The answer lies with how SQLite implements types. The documentation says:
Any column in an SQLite version 3 database, except an INTEGER PRIMARY KEY column, may be used to store a value of any storage class.
The built-in sqlite3
follows one of the rules in The Zen of Python: "
In the face of ambiguity, refuse the temptation to guess."
This means we need that when we read from an SQLite database, we need to explicitly convert TIMESTAMP
. We can use Pandas to_datetime
function, but there's a better way. sqlite3
's connect
function accepts a detect_types
parameter telling it how to detect types in the database. I use the PARSE_DECLTYPES
option to parse the types from the SQL schema.
Here's our fix:
https://gist.github.com/d9d9a108e15914a8b40f874737bff7e8
And now the code runs as expected:
https://gist.github.com/0438cc286f1ea31ff29cc66057c1c4fc
The SQLite developers are debating adding string tables that will be more strict with data types.
Don't take it as a jab at SQLite, I love it and used is successfully with several client implementing fast and efficient data transfer. If you want to learn more about Pandas and some quirks you should avoid, grab my Pandas Brain Teasers, it's full of such gems.