Skip to content

Instantly share code, notes, and snippets.

CREATE TABLE example_serial(id serial, name text, created_at timestamp);
SELECT create_distributed_table('example_serial', 'id');
INSERT INTO example_serial (name, created_at) VALUES ('My test', now());
-- ERROR: values given for the partition column must be constants or constant expressions
-- Work around:
-- 1) Create a sequence 'seq' on the coordinator node:
CREATE SEQUENCE seq;
CREATE TABLE key_value (key int, value jsonb);
CREATE TABLE big_table(key int, first_name text, last_name text);
select create_distributed_table('key_value','key');
select create_distributed_table('big_table','key');
select run_command_on_shards($cmd$big_table$cmd$,$cmd$CREATE OR REPLACE FUNCTION insert_fun_%1$I() RETURNS TRIGGER AS $$
BEGIN
--
export PGHOST=...
export PGUSER=citus
export PGPASSWORD=...
export PGSSLMODE=require
export PGDATABASE=citus
export TABLE_NAME=events
export NEW_TABLE_NAME=events_backup
export PARALLEL_FACTOR=4
psql -tA -F" " -c "SELECT s.shardid, \
# Analyzes a specific distributed table
export PGHOST=...
export PGUSER=citus
export PGPASSWORD=...
#export PGSSLMODE=require
export PGDATABASE=citus
export TABLE_NAME=<table_name>
export PARALLEL_FACTOR=4
**Issue**:
1) Start a long running query say ~5 minutes. Ex:
SELECT count(*) from github_events g1,github_events g2 WHERE g1.repo_id=g2.repo_id;
2) Terminate a worker whilst the query is running.
3) Now the system hangs indefinitely.
**Reasoning**
I think what's happening is that removing the network interface doesn't kill the connection. As a result, the coordinator node waits for a response from the worker node, which never comes.
Citus uses PostgreSQL's client connection library (libpq) to communicate between machines; and Postgres uses system defaults to timeout in this scenario. On Linux, I think the default TCP keepalive time (tcp_keepalive_time) is set to 7200 seconds.
My tests:
CREATE TABLE flows(id integer, app_protocol_details jsonb);
SELECT create_distributed_table('flows','id');
Query which din't run:
select app_protocol_details from flows fu, jsonb_array_elements(fu.app_protocol_details->'protocols') protocols where protocols->>'{dnsProtocol,rc}' like '%ERROR%';
--Show Meta data tables
--Distributed tables.
SELECT nodes,partmethod,colocationid FROM pg_dist_partition;
--Shards for the stores table
SELECT logicalrelid,shardid,shardminvalue,shardmaxvalue from pg_dist_shard where logicalrelid='stores'::regclass;
--Shard placements
SELECT shardid,nodename FROM pg_dist_shard_placement;

session_analytics

This PostgreSQL extension provides some functions for querying hstore arrays.

Functions

  • count_elements(elements hstore[], pattern text). returns the number of elements that match the given pattern.
  • multi_count_elements(elements hstore[], patterns text[]). returns an array
#Table and Schema Creation.
CREATE TABLE data_rows(company_id integer,id integer, data text);
ALTER TABLE dATA_ROWS ADD COLUMN tsv tsvector;
select master_create_distributed_table('data_rows','company_id','hash');
select master_create_worker_shards('data_rows',16,1);