Skip to content

Instantly share code, notes, and snippets.

View greenlion's full-sized avatar

Justin Swanhart greenlion

View GitHub Profile
package com.leapdb.cdc;
import io.debezium.spi.converter.CustomConverter;
import io.debezium.spi.converter.RelationalColumn;
import org.apache.kafka.connect.data.SchemaBuilder;
import org.apache.kafka.connect.data.Struct;
import java.util.Properties;
public class DatetimeValueConverter implements CustomConverter<SchemaBuilder, RelationalColumn> {
private SchemaBuilder datetimeSchema;
@greenlion
greenlion / gist:d54d32426117e06ebfe2a768bfb70bde
Last active August 29, 2022 18:11
BTS OnTime data WARP vs InnoDB
util.import_table("/data/ontime/ontime.csv", {"schema": "ontime", "table": "ontime", "dialect": "csv-unix", "skipRows": 0, "showProgress": "true", "threads": 12})
Load WARP table (11 mins)
File '/data/ontime/ontime.csv' (79.68 GB) was imported in 9 min 58.1429 sec at 133.22 MB/s
Total rows affected in ontime.ontime: Records: 157928010 Deleted: 0 Skipped: 0 Warnings: 0
Load InnoDB table (37.5 mins)
File '/data/ontime/ontime.csv' (79.68 GB) was imported in 37 min 28.6070 sec at 35.44 MB/s
Total rows affected in ontime.ontime_innodb: Records: 157928010 Deleted: 0 Skipped: 0 Warnings: 0
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDarSyqvN5E51BEtzfNHy/9GuP+qNSo5Pm+ky4+zHYmhtHawGuJFyCdsudA/flY/WewcF8ubep5mVYS1/F24qZ9lpan/2AoWlZ37D4ETrmoHO4VaWKWYDW7ukIgievnDfaclkjRVx1H2HE8U9PC8EwzAPsmFH++OCXZfJfbkzROp3Xq45QYyXccWKrajzTS5OGayMkxY5MDpIVx8Ag3D1S/Dtd01O7He0bVKGvllirPoMpnNc/PsPTAWt8R33XiHb+lyWFJuQpzCz8qv2D+zmiOKXYWKMgDf45Vu176sme/EJWvZQkb8GtchmSGZ3lASsU5f1BkCtVji71MlWD1E3DD9zkJvk1sCWesSlQ1jZAnvFLvCiWVmmWXHL+HgemL9ATbSpP41oo2d4JLDp2rsZRSVBvgF06l5R2e/BUNVcPuQVKK4zJa+7tus7A7b/h/kUjrsuD7e5s9SNZJVZ/DHgPhxp4jgL+S48DXowTFZRdhbIOyCyPzSxWNTes03DQAK50= justin@ubuntu
@greenlion
greenlion / optimizer_trace.json
Last active October 23, 2021 02:09
For MySQL bug 105308
{
"steps": [
{
"join_preparation": {
"select#": 1,
"steps": [
{
"expanded_query": "/* select#1 */ select count(0) AS `count(*)` from ((((`lineorder` join `dim_date` on((`lineorder`.`LO_OrderDateKey` = `dim_date`.`D_DateKey`))) join `customer` on((`lineorder`.`LO_CustKey` = `customer`.`C_CustomerKey`))) join `supplier` on((`lineorder`.`LO_SuppKey` = `supplier`.`S_SuppKey`))) join `part` on((`lineorder`.`LO_PartKey` = `part`.`P_PartKey`))) where ((`customer`.`C_Region` = 'AMERICA') and (`supplier`.`S_Region` = 'AMERICA') and ((`part`.`P_MFGR` = 'MFGR#1') or (`part`.`P_MFGR` = 'MFGR#2')))"
},
{
@greenlion
greenlion / star_schema_optimization.txt
Last active October 21, 2021 17:30
star schema optimization!
mysql>
-- This will force the storage engine to examine only a single partition.
set warp_partition_filter='lineorder: p0';
mysql> explain
select straight_join count(*)
from lineorder
join dim_date on lo_orderdatekey = d_datekey
join customer on lo_custkey = c_customerkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
@greenlion
greenlion / gist:70e16e03736d51dee4ec9e0f76d3a1fb
Last active February 5, 2023 02:27
mysql shell py load command for SSB
util.import_table("/data/lineorder.tbl", {"schema": "ssb_sf100_warp", "table": "lineorder", "fieldsTerminatedBy": "|", "skipRows": 0, "showProgress": "true", "threads": 12});
100% (7.95 GB / 7.95 GB), 30.94 MB/s
File '/data/lineorder.tbl' (7.95 GB) was imported in 2 min 31.2739 sec at 52.53 MB/s
Total rows affected in ssb.lineorder: Records: 71985077 Deleted: 0 Skipped: 0 Warnings: 0
MySQL 127.0.0.1:3306 ssl Py > util.import_table("/data/date.tbl", {"schema": "ssb", "table": "dim_date", "dialect": "csv-unix", "skipRows": 0, "showProgress": "true", "threads": 12});
Importing from file '/data/date.tbl' to table `ssb`.`dim_date` in MySQL Server at 127.0.0.1:3306 using 1 thread
[Worker000] date.tbl: Records: 2556 Deleted: 0 Skipped: 0 Warnings: 0
100% (273.42 KB / 273.42 KB), 0.00 B/s
File '/data/date.tbl' (273.42 KB) was imported in 0.0586 sec at 273.42 KB/s
Total rows affected in ssb.dim_date: Records: 2556 Deleted: 0 Skipped: 0 Warnings: 0
I had to make a few minor changes to FV and work around some CONNECT issues, but it does work.
The following queries are on my "slave" which does not actually replicate any data from the master.
The 'test.t1' table is a CONNECT table so all the data is actually read from the master.
The "mvlog" (changelog) tables ARE written to the "slave".
-- Note the 'test.t1' table is empty so the COUNT (cnt) is 0 (t1_v is equivalent to select count(*) as cnt from t1):
MariaDB [test]> select * from t1_v;
mysql> create table Z(id bigint auto_increment primary key, c1 varchar(30), c2 int, key((reverse(c1)),c2)) engine=innodb;
Query OK, 0 rows affected (0.03 sec)
mysql> insert into z values (1, 'abcdef', 1);
ERROR 1146 (42S02): Table 'test.z' doesn't exist
mysql> insert into Z values (1, 'abcdef', 1);
Query OK, 1 row affected (0.01 sec)
mysql> insert into z values (2,'xyzabc', 2);
ERROR 1146 (42S02): Table 'test.z' doesn't exist
@greenlion
greenlion / raster_magic.sql
Last active August 29, 2020 10:24
Complicated SQL to do raster magic
/* get the 32x32 data for a specific chunk */
CREATE OR REPLACE FUNCTION get_chunk_data(in v_x numeric , in v_y numeric, in chunk_size integer default 32)
RETURNS REFCURSOR
LANGUAGE PLPGSQL
AS
$$
DECLARE srtm_1m_rid INTEGER := 0;
DECLARE rows REFCURSOR;
BEGIN
v_x := CEIL(v_x);
@greenlion
greenlion / gist:cc0874dd2bf06c87117608061c31582a
Last active June 10, 2019 17:59
Digital Physics equations
please see differentphysics.com