Skip to content

Instantly share code, notes, and snippets.

@oeegee
Last active November 6, 2017 14:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save oeegee/151468597356052c3b290b20c8d3aedb to your computer and use it in GitHub Desktop.
Save oeegee/151468597356052c3b290b20c8d3aedb to your computer and use it in GitHub Desktop.
윈도우즈기반 도커에 postgres-xl 설치하기

윈도우즈기반 도커에 postgres-xl 설치하기

* 빅데이터용 RDBMS가 필요해!

추천시스템을 제공하기위해서 대규모 데이터를 핸들링하기 위한 RDBMS가 필요했습니다. (이 글을 보시는 분들께,,, 가급적 Parquet 또는 다른 하둡 에코를 이용하시는게 정신건강에 이롭습니다.)

찾아본 솔루션은 다음과 같습니다.

  • Cockroach DB : Raft 합의알고리즘채택으로 가장 많이 기대하고 사용해 봤으나, 1천만 건 테이블 Join 시 killed. 적은 건수는 잘 됨. (go로 개발되었는데, 메모리 할당할 때 문제가 생기는 듯...)
  • Pivotal Greenplum : Postgres 버전이 낮음. v8.3 (버전관리 안하고있는듯...)
  • Citusdata : 꼭 상용 솔루션을 써야만 될것 같은 느낌적인 느낌! Postgresql V10 지원. (개인적으로는 가장 나이스해 보입니다.)
  • Postgres-XL : 2014년부터 출시되어 많은 유저가 있다. (선택 : 팀에서 원해서)

* Postgres-XL은?

  • postgresql을 여러 노드에 분산하여 병렬처리가 가능하도록 변형시킨 오픈소스 솔루션입니다. Alt text

이 솔루션은 TransLattice에서(Geographically Distributed Database 솔루션 회사) 2014 5월 중순에 출시했습니다. Postgres-XL은 PostgresSQL을 확장가능한 구조로 변경한 대규모 병렬처리 데이터베이스 입니다. Postgres-XL의 핵심 기능에는 OLTP 쓰기 확장성, 대규모 병렬 처리(MPP), 클러스터 와이드 ACID(원자성, 일관성, 격리, 내구성)속성 및 multi-tenant 보안이 있습니다.


* 주요 구성요소

Alt text


1. Global Transaction Monitor (GTM)
글로벌 트랜잭션 모니터는 클러스터 전반의 트랜잭션 일관성을 보장합니다. GTM은 다중 버전 동시성 제어 (Multi-version Concurrency Control)의 일환으로 트랜잭션 ID와 Snap-shot을 발행합니다.

클러스터는 가용성을 향상시키기 위해 선택적으로 GTM Standby로 구성 될 수도 있습니다.

또한 GTM과의 통신량을 줄이고 확장성을 향상시키기 위하여 Coordinators에서 GTM 프록시를 구성 할 수도 있습니다.

2. Coordinator
Coordinator는 사용자 세션을 관리하고 GTM 및 Data Node와 상호작용합니다. Coordinator는 쿼리들을 파싱하고 플래닝하며 일련의 글로벌 플랜을 명령문에 포함 된 구성요소로 보냅니다.

3. Data Node
Data Node는 실제 데이터가 저장되는 곳입니다. 데이터의 분배는 DBA가 구성 할 수 있습니다. 향상된 가용성을 위해 데이터 노드의 warm standby를 failover-ready로 설정할 수 있습니다.


***prepare windows's docker

docker run -d --privileged --name pgxl centos7 /sbin/init

* download postgres-xl

mkdir /svc/build/
cd /svc/build

git clone git://git.postgresql.org/git/postgres-xl.git
cd postgres-xl

./configure -prefix=/svc/pgxl
make -j 4
make install

cd contrib/pgxc_ctl
make install

cd /svc/pgxl
cp bin/pgxc_ctl ./

* vi /etc/bashrc

export dataDirRoot=/data/pgxl/nodes
export PGXC_CTL_HOME=/svc/pgxl

* pgxl configuration

export USER=postgres
# 사용자 추가
useradd $USER

# 패스워드 변경
passwd $USER

export TARGET_DIR=/svc/pgxl
export DATA_DIR=/data/pgxl/nodes
export LOG_DIR=/opt/log/pgxl
export USER_DB_PORT=5432
export MAX_USER_CONNECTIONS=300
export DATA_NODE_SHARED_BUFFERS="2000MB"
export DATA_NODE_WORK_MEM="128MB"
export DATA_NODE_MAINTENANCE_MEM="128MB"
export DATA_NODE_WAL_BUFFERS="16MB"
export DATA_NODE_CHECKPOINT_SEGMENTS="256"

mkdir -p $TARGET_DIR
mkdir -p $DATA_DIR
mkdir -p $LOG_DIR

chown -R postgres:postgres $TARGET_DIR
chown -R postgres:postgres $DATA_DIR
chown -R postgres:postgres $LOG_DIR

cat <<EOT > $TARGET_DIR/pgxc_ctl.conf
pgxcOwner=$USER
pgxcUser=\$pgxcOwner
tmpDir=/tmp/pgxl
localTmpDir=\$tmpDir
configBackup=n
pgxcInstallDir=$TARGET_DIR

gtmName=gtm
gtmMasterServer=localhost
gtmMasterPort=20001
gtmMasterDir=$DATA_DIR/gtm
gtmSlave=n
gtmProxy=n

coordMasterDirs=($DATA_DIR/coord)
coordNames=(coord)
coordMasterServers=(localhost)
coordPorts=($USER_DB_PORT)
poolerPorts=(20002)
coordMaxWALSenders=(5)
coordSlave=n

datanodeMasterDirs=($DATA_DIR/data1 $DATA_DIR/data2)
datanodeNames=(data1 data2)
datanodeMasterServers=(localhost localhost)
datanodePorts=(3001 3002)
datanodePoolerPorts=(4001 4002)
datanodeMaxWALSenders=(5 5)
datanodeSpecificExtraConfig=(none none)
datanodeSpecificExtraPgHba=(none none)
datanodeSlave=n

coordExtraConfig=coordExtraConfig
cat > \$coordExtraConfig <<EOF
#================================================
# Added to all the coordinator postgresql.conf
log_destination = 'stderr'
logging_collector = on
log_directory = $LOG_DIR
listen_addresses = '*'
log_filename = 'coordinator.log'
max_connections = $MAX_USER_CONNECTIONS
max_pool_size = $MAX_USER_CONNECTIONS
shared_buffers = $DATA_NODE_SHARED_BUFFERS
#checkpoint_segments = $DATA_NODE_CHECKPOINT_SEGMENTS
work_mem = $DATA_NODE_WORK_MEM
maintenance_work_mem = $DATA_NODE_MAINTENANCE_MEM
wal_buffers = $DATA_NODE_WAL_BUFFERS
EOF

datanodeExtraConfig=datanodeExtraConfig
cat > \$datanodeExtraConfig <<EOF
#================================================
# Added to all the datanode postgresql.conf
log_destination = 'stderr'
logging_collector = on
log_directory = $LOG_DIR
log_filename = 'datanode.log'
max_connections = $MAX_USER_CONNECTIONS
max_pool_size = $MAX_USER_CONNECTIONS
shared_buffers = $DATA_NODE_SHARED_BUFFERS
#checkpoint_segments = $DATA_NODE_CHECKPOINT_SEGMENTS
work_mem = $DATA_NODE_WORK_MEM
maintenance_work_mem = $DATA_NODE_MAINTENANCE_MEM
wal_buffers = $DATA_NODE_WAL_BUFFERS
EOF
EOT

tee  /usr/lib/sysctl.d/50-pgxl.conf <<EOF
kernel.sem = 1000  32000  32  1000
# up to 400GB shared memory
kernel.shmmax = 429496729600
EOF

* switch user - postgres

원격 접속이 원할 하도록 모든 노드에서 ssh-copy-id 실행해준다.

mkdir -p ~/.ssh
chmod 700 ~/.ssh
cd ~/.ssh
if [ ! -f id_rsa.pub ]; then
    ssh-keygen -t rsa -N "" -f id_rsa
fi
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys

* 압축하여 모든 노드에 복사

# postgres 유저로
cd /svc/
tar cvfz pgxl.tgz pgxl

# 여러 노드가 있다면...
# scp pgxl.tgz postgres@host2:/svc/
# scp pgxl.tgz postgres@host3:/svc/
# scp pgxl.tgz postgres@host4:/svc/

* pgxc init

# postgres 유저로 수행
postgres@fcd727ce0bcf:/svc/pgxl:> pgxc_ctl init all
/bin/bash
Installing pgxc_ctl_bash script as /svc/pgxl/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /svc/pgxl/pgxc_ctl_bash.
Reading configuration using /svc/pgxl/pgxc_ctl_bash --home /svc/pgxl --configuration /svc/pgxl/pgxc_ctl.conf
Finished reading configuration.
   ******** PGXC_CTL START ***************

Current directory: /svc/pgxl
Initialize GTM master
The files belonging to this GTM system will be owned by user "postgres".
This user must also own the server process.


fixing permissions on existing directory /data/pgxl/nodes/gtm ... ok
creating configuration files ... ok
creating control file ... ok

Success.
waiting for server to shut down.... done
server stopped
Done.
Start GTM master
server starting
Initialize all the coordinator masters.
Initialize coordinator master coord.
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /data/pgxl/nodes/coord ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... creating cluster information ... ok
syncing data to disk ... ok
freezing database template0 ... ok
freezing database template1 ... ok
freezing database postgres ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success.
Done.
Starting coordinator master.
Starting coordinator master coord
2017-11-05 09:00:51.353 UTC [8765] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2017-11-05 09:00:51.353 UTC [8765] LOG:  listening on IPv6 address "::", port 5432
2017-11-05 09:00:51.397 UTC [8765] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2017-11-05 09:00:51.468 UTC [8765] LOG:  redirecting log output to logging collector process
2017-11-05 09:00:51.468 UTC [8765] HINT:  Future log output will appear in directory "logs".
Done.
Initialize all the datanode masters.
Initialize the datanode master data1.
Initialize the datanode master data2.
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /data/pgxl/nodes/data1 ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... creating cluster information ... ok
syncing data to disk ... ok
freezing database template0 ... ok
freezing database template1 ... ok
freezing database postgres ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success.
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /data/pgxl/nodes/data2 ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... creating cluster information ... ok
syncing data to disk ... ok
freezing database template0 ... ok
freezing database template1 ... ok
freezing database postgres ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success.
Done.
Starting all the datanode masters.
Starting datanode master data1.
Starting datanode master data2.
2017-11-05 09:00:57.122 UTC [9429] LOG:  listening on IPv4 address "0.0.0.0", port 3001
2017-11-05 09:00:57.122 UTC [9429] LOG:  listening on IPv6 address "::", port 3001
2017-11-05 09:00:57.210 UTC [9429] LOG:  listening on Unix socket "/tmp/.s.PGSQL.3001"
2017-11-05 09:00:57.325 UTC [9429] LOG:  redirecting log output to logging collector process
2017-11-05 09:00:57.325 UTC [9429] HINT:  Future log output will appear in directory "logs".
2017-11-05 09:00:57.144 UTC [9441] LOG:  listening on IPv4 address "0.0.0.0", port 3002
2017-11-05 09:00:57.144 UTC [9441] LOG:  listening on IPv6 address "::", port 3002
2017-11-05 09:00:57.232 UTC [9441] LOG:  listening on Unix socket "/tmp/.s.PGSQL.3002"
2017-11-05 09:00:57.355 UTC [9441] LOG:  redirecting log output to logging collector process
2017-11-05 09:00:57.355 UTC [9441] HINT:  Future log output will appear in directory "logs".
Done.
ALTER NODE coord WITH (HOST='localhost', PORT=5432);
ALTER NODE
CREATE NODE data1 WITH (TYPE='datanode', HOST='localhost', PORT=3001, PREFERRED);
CREATE NODE
CREATE NODE data2 WITH (TYPE='datanode', HOST='localhost', PORT=3002, PREFERRED);
CREATE NODE
SELECT pgxc_pool_reload();
 pgxc_pool_reload
------------------
 t
(1 row)

Done.
EXECUTE DIRECT ON (data1) 'CREATE NODE coord WITH (TYPE=''coordinator'', HOST=''localhost'', PORT=5432)';
EXECUTE DIRECT
EXECUTE DIRECT ON (data1) 'ALTER NODE data1 WITH (TYPE=''datanode'', HOST=''localhost'', PORT=3001, PREFERRED)';
EXECUTE DIRECT
EXECUTE DIRECT ON (data1) 'CREATE NODE data2 WITH (TYPE=''datanode'', HOST=''localhost'', PORT=3002, PREFERRED)';
EXECUTE DIRECT
EXECUTE DIRECT ON (data1) 'SELECT pgxc_pool_reload()';
 pgxc_pool_reload
------------------
 t
(1 row)

EXECUTE DIRECT ON (data2) 'CREATE NODE coord WITH (TYPE=''coordinator'', HOST=''localhost'', PORT=5432)';
EXECUTE DIRECT
EXECUTE DIRECT ON (data2) 'CREATE NODE data1 WITH (TYPE=''datanode'', HOST=''localhost'', PORT=3001, PREFERRED)';
EXECUTE DIRECT
EXECUTE DIRECT ON (data2) 'ALTER NODE data2 WITH (TYPE=''datanode'', HOST=''localhost'', PORT=3002, PREFERRED)';
EXECUTE DIRECT
EXECUTE DIRECT ON (data2) 'SELECT pgxc_pool_reload()';
 pgxc_pool_reload
------------------
 t
(1 row)

Done.

* 추가 명령어

pgxc_ctl monitor all
pgxc_ctl stop all
pgxc_ctl remove all

* pgxl process status monitoring

postgres@fcd727ce0bcf:/svc/pgxl:> pgxc_ctl monitor all
/bin/bash
Installing pgxc_ctl_bash script as /svc/pgxl/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /svc/pgxl/pgxc_ctl_bash.
Reading configuration using /svc/pgxl/pgxc_ctl_bash --home /svc/pgxl --configuration /svc/pgxl/pgxc_ctl.conf
Finished reading configuration.
   ******** PGXC_CTL START ***************

Current directory: /svc/pgxl
Running: gtm master
Running: coordinator master coord
Running: datanode master data1
Running: datanode master data2

* 이제 쿼리를 수행

쿼리는 cordinator port 5432 접속하여 수행합니다.

* 테스트

pgxc_ctl
> Createdb test

# coord1 호스트에서 수행 (insert)
$ psql test -p 5432
test=# create table contact( id int, name text, phone varchar(30)) DISTRIBUTE BY REPLICATION;
test=# insert into contact values ( 1,’tom’,’1212121′);
test=# select * from contact;

# datanode1 호스트에서 수행 (select)
$ psql test -p 3001
select * from contact;

# datanode2 호스트에서 수행 (select)
$ psql test -p 3002
select * from contact;

* 명령어

PAUSE CLUSTER / UNPAUSE CLUSTER
EXECUTE DIRECT ON (nodename) 'commannd'
CLEAN CONNECTION
CREATE NODE / ALTER NODE

SELECT pgxc_pool_reload()

* Reference Architecture

                     +-------------+
                     | GTM Master  |
                     | Coordinator |
                     +-------------+
                    /       |       \
                   /        |        \
                  /         |         \
                 /          |          \
  +-------------+           |           +-------------+
  |  GTM Proxy1 |                       |  GTM Proxy8 |
  |  Datanode1  |--------- ... ---------|  Datanode8  |
  +-------------+                       +-------------+

* 9.5 --> 9.6 변경사항

쿼리를 병렬로 수행

Alt text

-->

Alt text


* Reference

http://deepdive.stanford.edu/#what-is-deepdive https://ruihaijiang.wordpress.com/2015/09/17/postgres-xl-installation-example-on-linux/ https://blog.2ndquadrant.com/testing-postgres-xl-with-dbt-3/ https://www.postgres-xl.org/ https://www.slideshare.net/mason_s/postgres-xl-scaling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment