phpfour/summary.md

## summary.md

      
    Raw
  

              summary.md
            
          
    Current Implementation


Currently, all log information is in the logging table and as of now it has around ~2,13,92,493 records
Total size of data is ~36 GB (retrieved from information_schema table)
A record can have a size between 1KB to 10KB based on the amount of data stored.
With the current indexes in the table, logs for a single user can be fetched pretty quickly and uses const ref

mysql> EXPLAIN SELECT * FROM `logging` WHERE `member_id` = 45373 ORDER BY `id` DESC LIMIT 100 OFFSET 0;
+----+-------------+---------+------------+------+---------------------------+---------------------------+---------+-------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys             | key                       | key_len | ref   | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------------------+---------------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | logging | NULL       | ref  | logging_member_id_foreign | logging_member_id_foreign | 5       | const |    1 |   100.00 | Using where |
+----+-------------+---------+------------+------+---------------------------+---------------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.05 sec)
Alternative Datastores

I have tried to see which of the available datastores might fit the need, here's an overview:


	Name
	 Description
	 PHP/Laravel Package
	 Recommend

	ClickHouse
	 ClickHouse® is an open-source, high performance columnar OLAP database management system for real-time analytics using SQL.
	 https://github.com/smi2/phpClickHouse https://github.com/bavix/laravel-clickhouse
	 Yes

	Elasticsearch
	 Elasticsearch, Kibana, Beats, and Logstash (also known as the ELK Stack). Reliably and securely take data from any source, in any format, then search, analyze, and visualize it in real time.
	 https://github.com/elastic/elasticsearch-php https://github.com/sleimanx2/plastic
	 Yes

	PrestoDB
	 Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
	 No well-maintained PHP client
	 No, lacks good integration

	InfluxDB
	 At its heart, InfluxDB is a database purpose-built to handle the epic volumes and countless sources of time-stamped data produced by sensors, applications and infrastructure.
	 https://github.com/influxdata/influxdb-client-php
	 No, unrelated to our need

	Apache HBase
	 Apache HBase™ is good when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.
	 No well-maintained PHP client, requires Throttle protocol
	 No, lacks good integration

	Tarantool
	 Tarantool keeps all the data in random-access memory (RAM). Solutions of this class are fast, but they often demonstrate a lack of data integrity. Tarantool solves all these problems.
	 https://github.com/tarantool-php/client
	 No, unrelated to our need

Suggestions


Keep using MySQL in a separate, dedicated database and if needed in the future, introduce sharding data based on some logic (i.e. year).


Introduce Elasticsearch, which has a good official PHP client and popular Laravel packages. It also has other bundled tools for data visualization which might come in handy.


Introduce ClickHouse, which has a community-developed PHP client and less-used Laravel package. It features SQL like syntax so it can be used for financial data analytics and dashboard.
Name	Description	PHP/Laravel Package	Recommend
ClickHouse	ClickHouse® is an open-source, high performance columnar OLAP database management system for real-time analytics using SQL.	https://github.com/smi2/phpClickHouse https://github.com/bavix/laravel-clickhouse	Yes
Elasticsearch	Elasticsearch, Kibana, Beats, and Logstash (also known as the ELK Stack). Reliably and securely take data from any source, in any format, then search, analyze, and visualize it in real time.	https://github.com/elastic/elasticsearch-php https://github.com/sleimanx2/plastic	Yes
PrestoDB	Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.	No well-maintained PHP client	No, lacks good integration
InfluxDB	At its heart, InfluxDB is a database purpose-built to handle the epic volumes and countless sources of time-stamped data produced by sensors, applications and infrastructure.	https://github.com/influxdata/influxdb-client-php	No, unrelated to our need
Apache HBase	Apache HBase™ is good when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.	No well-maintained PHP client, requires Throttle protocol	No, lacks good integration
Tarantool	Tarantool keeps all the data in random-access memory (RAM). Solutions of this class are fast, but they often demonstrate a lack of data integrity. Tarantool solves all these problems.	https://github.com/tarantool-php/client	No, unrelated to our need