Skip to content

Instantly share code, notes, and snippets.

https://dtb5pzswcit1e.cloudfront.net/product_files/Pivotal-CF/pcf-vsphere-1.4.2.0.ova?Expires=1432243477&Signature=QEAUQrad9KWq-~eNVCDhhVW7bIHzv48zdPNPg55oTda7~Qv82mFGCpUutRBo3Wdj3kBfFcAUhXJ7ffGTUwJHSQj92O3PdOYpD98ZKeFJCrpYYclNo8UOqFaN-8OOdlk~Sh2s1Ouaujddc8CBcoU2cQMI04BFaSEYS2-wxFU3Hps_&Key-Pair-Id=APKAJLAM6FL65BYZP7UQ
@d
d / regression.diffs
Created June 4, 2016 08:01
external_table failure
*** ./expected/external_table.out Sat Jun 4 01:57:38 2016
--- ./results/external_table.out Sat Jun 4 01:57:39 2016
***************
*** 1452,1465 ****
(SELECT i, j FROM exttab_subtxs_1 WHERE i < 5 ) e1,
(SELECT i, j FROM exttab_subtxs_1 WHERE i < 10) e2
WHERE e1.i = e2.i;
! NOTICE: Found 4 data formatting errors (4 or more input rows). Rejected related input data.
! i | j
! ---+----------
@d
d / jesse_parallel.xml
Created June 30, 2016 00:33
jesse_parallel.mdp
<?xml version="1.0" encoding="UTF-8"?>
<!--
CREATE TEMP TABLE foo (i int);
CREATE TEMP TABLE bar (j int);
SELECT * FROM foo UNION ALL SELECT * FROM bar;
-->
<dxl:DXLMessage xmlns:dxl="http://greenplum.com/dxl/2010/12/">
<dxl:Thread Id="0">
<dxl:OptimizerConfig>
<dxl:EnumeratorConfig Id="0" PlanSamples="0" CostThreshold="0"/>
@d
d / inline.py
Last active July 11, 2016 18:14
Eliminate `.inl` files
#!/usr/bin/env python3
# Usage:
# git grep -l '#include ".*\.inl"' | pypy3 inline.py -I libgpos/libgpos/include
# OR
# git grep -l '#include ".*\.inl"' | pypy3 inline.py -I libgpopt/include -I libnaucrates/include -I libgpdbcost/include -I server/include
import sys
import re
import os.path
@d
d / make_cluster.bash
Last active December 10, 2016 05:50
Nikos this is what we did
env BLDWRAP_POSTGRES_CONF_ADDONS='fsync=off optimizer_disable_missing_stats_collection=on' make -C /build/gpdb4/gpAux/gpdemo
@d
d / create_unique_path.md
Created April 19, 2017 00:52
A reading of GPDB before a big bang merge of SEMI JOIN from 84 (e006a24ad152b3faec748afe8c1ff0829699b2e6)

We need to keep our diff here, maybe?

  1. costsize.c
  • join_in_selectivity
  • set_joinrel_size_estimates

Things changed here: joinpath.c

  • hash_inner_and_outer is split into two:
  1. hashclauses_for_join which takes the bulk of the upstream code; AND
@d
d / YOLO.md
Last active June 21, 2017 21:34
Problem Statement

What challenge are we facing?

  1. As an ORCA pair adding a backwards-compatible change, we'd like to be able to rebuild Greenplum without code change so as to see our change in effect in the database.
  2. As a non-ORCA developer working on Greenplum Database, I'd like ./configure to fail if my local ORCA version will cause a compilation failure.

A modest proposal

Use semver to convey breaking changes:

  1. Let's start with Greenplum stating an "required version" of 2.9.1
  2. If Venky improved some implementation of stats (interface preserving), he bumps ORCA version from 2.9.1 to 2.10.0. Venky should be able to rebuild GPDB without changing it to effect his optimizer change. (Implied: 2.10.0 satsifies a "required version" of 2.9.1.
@d
d / CMakeLists.txt
Last active October 26, 2017 22:48
Postgres/Greenplum CMakeLists.txt
cmake_minimum_required(VERSION 3.0)
project(gpdb5 LANGUAGES C CXX)
include_directories(src/include)
include_directories(/usr/local/opt/openssl/include)
file(GLOB_RECURSE HEADERS
src/include/*.h
)
file(GLOB_RECURSE SRC_FILES
@d
d / CXX11_and_ORCA.md
Last active February 3, 2018 01:21
C++11/14 and ORCA

Mordern C++

New language features that may boost productivity

  1. rvalue references, move assignment operator, and move constructors
  2. (using type aliases)[http://en.cppreference.com/w/cpp/language/type_alias]
  3. range-based for-loops
  4. auto
  5. Variadic templates

Standardized features that were GCC extensions

@d
d / split_row.sql
Created December 7, 2017 19:03
split rows bug
CREATE TABLE rank_exc(
id int,
year int,
gender char(1)
)
DISTRIBUTED BY (id)
PARTITION BY LIST (gender)
SUBPARTITION BY RANGE (year)
SUBPARTITION TEMPLATE (
SUBPARTITION year1 START (2001),