karuppiah7890/compression-algorithms.md

## compression-algorithms.md

      
    Raw
  

              compression-algorithms.md
            
          
    Gorilla Compression algorithm
https://faun.pub/victoriametrics-achieving-better-compression-for-time-series-data-than-gorilla-317bc1f95932
https://blog.acolyer.org/2016/05/03/gorilla-a-fast-scalable-in-memory-time-series-database/
https://www.timescale.com/blog/time-series-compression-algorithms-explained/
https://news.ycombinator.com/item?id=23547786

  
## how-to-learn.md

      
    Raw
  

              how-to-learn.md
            
          
    Richard Feynman Technique
RIVERS principle (check Addy Osami's post)
Learn basics, basic principles, foundation principles, first principles. A good and solid foundation is key!

  
## notes-1.md

      
    Raw
  

              notes-1.md
            
          
What is time series data?

Data recorded over a period of time is time series data

What are some examples of time series data?

Stock price over a period of time.
Weather over a period of time.
CPU usage over a period of time.

How does time series data look like?

For CPU usage over a period of time, an example might look like this:
75% at 10am
74% at 10:10am
73% at 10:20am

Based on the example, seems like time series data is recorded over a constant interval of time. Is that true?

Not sure.

What are the different ways in which we can represent time series data?

One way is to just write ✍️ down the values with corresponding time values. That would be a simple written representation like the above CPU usage example. Another way is to plot the data values against time values on a graph. The time is on the X axis and the data is on few Y axis

  
## notes-2.md

      
    Raw
  

              notes-2.md
            
          
What is Timescale DB?

It's an open source software. It's a relational database. It runs on top of PostgreSQL as an extension


What does Timescale mean when it says it can be used for analytical purposes too? Is it an Online Analytical Processing (OLAP) DB? Or Online Transactional Processing (OLTP) DB or hybrid? HTAP


What's the compatibility matrix of Timescale and PostgreSQL? Are all versions of Timescale compatible with all versions of PostgreSQL?


How to install Timescale extension on PostgreSQL? Like any other extension? Or are there any exceptions?


How easy is it to upgrade PostgreSQL and Timescale independently after installing Timescale? And after inserting data into the database? Inserting data using Timescale SQL features


How can we use Timescale's features using SQL?


What are hyper functions? Why are they called that? "Hyperfunctions"


How do hyperfunctions make time series easier? Why should it be easy? Does easy come with tradeoffs?


How would one go about storing, updating and retrieving (and deleting) time series data with vanilla PostgreSQL without Timescale extension? Can it be better than Timescale? How is Timescale 10x to 100x faster?


Is Timescale faster than vanilla PostgreSQL, Influx DB, MongoDB? 10x faster? 100x faster? How? Also, data? Benchmark results?


How does one store time series data in Influx DB, in MongoDB?


"Write millions of data points per second per node. Horizontally scale to petabytes. Don’t worry about cardinality."


How is it possible to write ✍️ millions of data points per second per node? What would be the kind and size of data, and the kind and size and architecture of the node? How come we can do horizontal scaling with Timescale DB? Is it true? Or there are catches in horizontal scaling? Also, what is cardinality? And why would anyone worry about cardinality?


How is it possible to scale and store petabytes of data in Timescale DB?


Pros and cons of Timescale? Usage, cost, operatioal ease etc


What are hyper tables? In Timescale DB


What are the different problems that Timescale DB team faced while building Timescale DB and how did they solve them?


What's the difference between Timescale V1 and Timescale V2?


How has Timescale DB evolved over time?


In what language is Timescale DB written in and why? History? Reasoning?


Why did Timescale DB team build Timescale? How did it happen?


How good is Timescale DB? It's performance, stability, reliability? Benchmarks and other results?


Who are Timescale DB users? Which big and small companies (startups etc) use them? How do they use them? What are they saying about their experience and usage of Timescale DB? Pros, cons. Problems. Advantages, disadvantages. Happy things. Sad things. Annoying things. Blog posts about issues and resolutions at tech level or process level etc


Which users (big and small companies, startups, popular individuala) endorse and advocate for Timescale DB? Why?


What are the features required in a DB, especially a time series DB?


Different operations people perform while working with time series data?


Does Timescale DB do indexing? Searching? Sharding? Does it have High Availability? Cluster mode? Multi node? Multi master?


How does Timescale help with cost reduction / optimized cost?


How does Timescale compress data? Does it compress time series data? Or all kinds of data? What compression algorithm(s) does it use? Can we use custom algorithms?


How does Timescale decouple compute and storage? Sounds like serveless DB 🤔🤨


What kind of data retention policies can we enforce on the time series data in Timescale?


What is downsampling?


What's the difference between managed Timescale DB vs self hosted Timescale DB? In terms of cost (price, management costs like paying team managing the Timescale DB). Why would anyone go with self hosted Timescale DB? Self hosted Timescale DB can be a bummer if not done well, right? Hence increase costs due to issues / errors / problems. (Error budget etc)


What are users saying about the self hosted Timescale DB and the managed Timescale DB? Given the managed service is paid, Timescale DB could be built with hard to operate DB in mind too, to incentivise getting the managed service which has good management services etc


Is there an enterprise version of Timescale DB? That has features different from open source Timescale DB. Like Influx DB vs paid Influx DB


Behind the scenes in Timescale DB, how does it provides all the features that it provides? The algorithm, the mechanism, the logic, math, research etc


Below are the features of Timescale DB on the Cloud:
Time-series Analytics

Full SQL
Super-charged PostgreSQL
Full ecosystem of PostgreSQL plug-ins
Hypertable abstraction layer
Automatic chunking / partitioning
Optimized time-based constraint exclusion
Join time-series and relational tables
Built-in flexible time bucketing
Advanced analytical functions
(gapfilling, LOCF, interpolation)
Hyperfunctions
Real-time aggregates

Data Lifecycle management

Automated continuous aggregations
Automated data reordering on disk
Automated data retention policies
Automated downsampling
Automated native data compression
User-defined actions

Operational management

Provider - AWS, Microsoft Azure, Google Cloud Platform
Regions - Choice of 76 deployment regions
Compute - 0.25 to 72 CPUs (why the min and max??)
Provisioned storage - Up to 16TB (why this limit??)
Capacity for uncompressed data - Up to 100+ TBs
Dynamic scaling
Instant pause/resume
High availability - Instantaneous recovery
High availability - Synchronous replicas
Read replicas and database forking
Continuous backup and recovery (what does this even mean? How is it done?? In general and specifically in Timescale DB)
Zero-downtime upgrades (meaning and mechanism?)
Scheduled maintenance windows (why the need? Especially when the upgrades are zero downtime. Or, what other maintenance work apart from upgrades?)
Role-based access control
Data security - Data encrypted at rest and in transit
Advanced network security - VPC peering, IP whitelisting
Compliance - SOC2, ISO 27001, and HIPAA compliance
Support Options - Highly-rated support team available via email, portal, and call-back; Community Slack


## teaching-timescale-db.md

      
    Raw
  

              teaching-timescale-db.md
            
          
    Goal: Teach Timescale DB to a newbie audience
About the audience: The audience have heard the term database, that's all. They don't know databases, database internals or time series database or Timescale DB
Questions that might come up from the audience:

Why learn any of these things? Timescale DB, time series data etc?
What is Timescale DB?
What is time series data?
Where is time series data used? What are it's applications?
Where is Timescale used? What are it's applications? What are it's features?
Which companies use Timescale DB?
Why do companies use Timescale DB?
Why use Timescale DB?
What other time series databases are out there?
Which companies use other time series databases instead of Timescale DB? Why do they use other time series databases?
What are the differences between Timescale DB and other time series database? Is it just yet another time series database? Simply created to be flashy and valuable to sell?
How does one build a time series database on their own? Why would anyone use Timescale DB instead of building their own time series database?
How to install Timescale DB in a system? Does it work on Linux? Mac? Windows? Does it work on all OS platforms and architectures?
What is PostgreSQL DB? Can we use PostgreSQL DB and not use Timescale DB and still have time series data stored in the DB and write SQL queries on top of PostgreSQL? What's the difference between vanilla PostgreSQL DB containing time series data vs Timescale DB containing time series data?
How does one go about choosing which database to use for their application and use case? Especially for time series data but in general too (based on data model? stats on CRUD operations? what are the things to consider?)
How does Timescale do compression of timeseries data? What's the algorithm?
What is Promscale? https://github.com/timescale/promscale , https://github.com/timescale/promscale_extension
What is Timescale Toolkit? https://github.com/timescale/timescaledb-toolkit

Content formats for explanations

Visual

Diagrams, with and without animations


Writing

Blog posts, Articles
Slides, Presentations


## time-series-db-examples.md

      
    Raw
  

              time-series-db-examples.md
            
          
    What are some examples of Time Series Databases?
OpenTS DB, Influx DB, Graphite DB, Timescale DB, Goku DB (by Pinterest)
Goku - https://medium.com/pinterest-engineering/goku-building-a-scalable-and-high-performant-time-series-database-system-a8ff5758a181
https://jessicagreben.medium.com/four-minute-paper-facebooks-time-series-database-gorilla-800697717d72