Skip to content

Instantly share code, notes, and snippets.

@karuppiah7890
Last active October 17, 2022 11:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save karuppiah7890/91850a2801251b67dd28fa3248c147d4 to your computer and use it in GitHub Desktop.
Save karuppiah7890/91850a2801251b67dd28fa3248c147d4 to your computer and use it in GitHub Desktop.
Teaching Timescale DB

Richard Feynman Technique

RIVERS principle (check Addy Osami's post)

Learn basics, basic principles, foundation principles, first principles. A good and solid foundation is key!

  • What is time series data?

Data recorded over a period of time is time series data

  • What are some examples of time series data?

Stock price over a period of time.

Weather over a period of time.

CPU usage over a period of time.

  • How does time series data look like?

For CPU usage over a period of time, an example might look like this:

75% at 10am

74% at 10:10am

73% at 10:20am

  • Based on the example, seems like time series data is recorded over a constant interval of time. Is that true?

Not sure.

  • What are the different ways in which we can represent time series data?

One way is to just write ✍️ down the values with corresponding time values. That would be a simple written representation like the above CPU usage example. Another way is to plot the data values against time values on a graph. The time is on the X axis and the data is on few Y axis

  • What is Timescale DB?

It's an open source software. It's a relational database. It runs on top of PostgreSQL as an extension

  • What does Timescale mean when it says it can be used for analytical purposes too? Is it an Online Analytical Processing (OLAP) DB? Or Online Transactional Processing (OLTP) DB or hybrid? HTAP

  • What's the compatibility matrix of Timescale and PostgreSQL? Are all versions of Timescale compatible with all versions of PostgreSQL?

  • How to install Timescale extension on PostgreSQL? Like any other extension? Or are there any exceptions?

  • How easy is it to upgrade PostgreSQL and Timescale independently after installing Timescale? And after inserting data into the database? Inserting data using Timescale SQL features

  • How can we use Timescale's features using SQL?

  • What are hyper functions? Why are they called that? "Hyperfunctions"

  • How do hyperfunctions make time series easier? Why should it be easy? Does easy come with tradeoffs?

  • How would one go about storing, updating and retrieving (and deleting) time series data with vanilla PostgreSQL without Timescale extension? Can it be better than Timescale? How is Timescale 10x to 100x faster?

  • Is Timescale faster than vanilla PostgreSQL, Influx DB, MongoDB? 10x faster? 100x faster? How? Also, data? Benchmark results?

  • How does one store time series data in Influx DB, in MongoDB?

  • "Write millions of data points per second per node. Horizontally scale to petabytes. Don’t worry about cardinality."

  • How is it possible to write ✍️ millions of data points per second per node? What would be the kind and size of data, and the kind and size and architecture of the node? How come we can do horizontal scaling with Timescale DB? Is it true? Or there are catches in horizontal scaling? Also, what is cardinality? And why would anyone worry about cardinality?

  • How is it possible to scale and store petabytes of data in Timescale DB?

  • Pros and cons of Timescale? Usage, cost, operatioal ease etc

  • What are hyper tables? In Timescale DB

  • What are the different problems that Timescale DB team faced while building Timescale DB and how did they solve them?

  • What's the difference between Timescale V1 and Timescale V2?

  • How has Timescale DB evolved over time?

  • In what language is Timescale DB written in and why? History? Reasoning?

  • Why did Timescale DB team build Timescale? How did it happen?

  • How good is Timescale DB? It's performance, stability, reliability? Benchmarks and other results?

  • Who are Timescale DB users? Which big and small companies (startups etc) use them? How do they use them? What are they saying about their experience and usage of Timescale DB? Pros, cons. Problems. Advantages, disadvantages. Happy things. Sad things. Annoying things. Blog posts about issues and resolutions at tech level or process level etc

  • Which users (big and small companies, startups, popular individuala) endorse and advocate for Timescale DB? Why?

  • What are the features required in a DB, especially a time series DB?

  • Different operations people perform while working with time series data?

  • Does Timescale DB do indexing? Searching? Sharding? Does it have High Availability? Cluster mode? Multi node? Multi master?

  • How does Timescale help with cost reduction / optimized cost?

  • How does Timescale compress data? Does it compress time series data? Or all kinds of data? What compression algorithm(s) does it use? Can we use custom algorithms?

  • How does Timescale decouple compute and storage? Sounds like serveless DB 🤔🤨

  • What kind of data retention policies can we enforce on the time series data in Timescale?

  • What is downsampling?

  • What's the difference between managed Timescale DB vs self hosted Timescale DB? In terms of cost (price, management costs like paying team managing the Timescale DB). Why would anyone go with self hosted Timescale DB? Self hosted Timescale DB can be a bummer if not done well, right? Hence increase costs due to issues / errors / problems. (Error budget etc)

  • What are users saying about the self hosted Timescale DB and the managed Timescale DB? Given the managed service is paid, Timescale DB could be built with hard to operate DB in mind too, to incentivise getting the managed service which has good management services etc

  • Is there an enterprise version of Timescale DB? That has features different from open source Timescale DB. Like Influx DB vs paid Influx DB

  • Behind the scenes in Timescale DB, how does it provides all the features that it provides? The algorithm, the mechanism, the logic, math, research etc

Below are the features of Timescale DB on the Cloud:

Time-series Analytics

  • Full SQL
  • Super-charged PostgreSQL
  • Full ecosystem of PostgreSQL plug-ins
  • Hypertable abstraction layer
  • Automatic chunking / partitioning
  • Optimized time-based constraint exclusion
  • Join time-series and relational tables
  • Built-in flexible time bucketing
  • Advanced analytical functions (gapfilling, LOCF, interpolation)
  • Hyperfunctions
  • Real-time aggregates

Data Lifecycle management

  • Automated continuous aggregations
  • Automated data reordering on disk
  • Automated data retention policies
  • Automated downsampling
  • Automated native data compression
  • User-defined actions

Operational management

  • Provider - AWS, Microsoft Azure, Google Cloud Platform
  • Regions - Choice of 76 deployment regions
  • Compute - 0.25 to 72 CPUs (why the min and max??)
  • Provisioned storage - Up to 16TB (why this limit??)
  • Capacity for uncompressed data - Up to 100+ TBs
  • Dynamic scaling
  • Instant pause/resume
  • High availability - Instantaneous recovery
  • High availability - Synchronous replicas
  • Read replicas and database forking
  • Continuous backup and recovery (what does this even mean? How is it done?? In general and specifically in Timescale DB)
  • Zero-downtime upgrades (meaning and mechanism?)
  • Scheduled maintenance windows (why the need? Especially when the upgrades are zero downtime. Or, what other maintenance work apart from upgrades?)
  • Role-based access control
  • Data security - Data encrypted at rest and in transit
  • Advanced network security - VPC peering, IP whitelisting
  • Compliance - SOC2, ISO 27001, and HIPAA compliance
  • Support Options - Highly-rated support team available via email, portal, and call-back; Community Slack

Goal: Teach Timescale DB to a newbie audience

About the audience: The audience have heard the term database, that's all. They don't know databases, database internals or time series database or Timescale DB

Questions that might come up from the audience:

  • Why learn any of these things? Timescale DB, time series data etc?
  • What is Timescale DB?
  • What is time series data?
  • Where is time series data used? What are it's applications?
  • Where is Timescale used? What are it's applications? What are it's features?
  • Which companies use Timescale DB?
  • Why do companies use Timescale DB?
  • Why use Timescale DB?
  • What other time series databases are out there?
  • Which companies use other time series databases instead of Timescale DB? Why do they use other time series databases?
  • What are the differences between Timescale DB and other time series database? Is it just yet another time series database? Simply created to be flashy and valuable to sell?
  • How does one build a time series database on their own? Why would anyone use Timescale DB instead of building their own time series database?
  • How to install Timescale DB in a system? Does it work on Linux? Mac? Windows? Does it work on all OS platforms and architectures?
  • What is PostgreSQL DB? Can we use PostgreSQL DB and not use Timescale DB and still have time series data stored in the DB and write SQL queries on top of PostgreSQL? What's the difference between vanilla PostgreSQL DB containing time series data vs Timescale DB containing time series data?
  • How does one go about choosing which database to use for their application and use case? Especially for time series data but in general too (based on data model? stats on CRUD operations? what are the things to consider?)
  • How does Timescale do compression of timeseries data? What's the algorithm?
  • What is Promscale? https://github.com/timescale/promscale , https://github.com/timescale/promscale_extension
  • What is Timescale Toolkit? https://github.com/timescale/timescaledb-toolkit

Content formats for explanations

  • Visual
    • Diagrams, with and without animations
  • Writing
    • Blog posts, Articles
    • Slides, Presentations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment