Skip to content

Instantly share code, notes, and snippets.

QQGoblin /
Last active December 30, 2021 06:18
【Patroni源码阅读】Recover 流程
class Ha(object):
def recover(self):
# Postgres is not running and we will restart in standby mode. Watchdog is not needed until we promote.
if self.has_lock() and self.update_lock():
# master_start_timeout,在触发故障转移之前允许主服务器从故障中恢复的时间(以秒为单位),默认值为 300 秒。
# 当设置为 0 时,如果可能,在检测到崩溃后立即进行故障转移。使用异步复制时,故障转移可能会导致事务丢失。
# master 故障的最坏情况故障转移时间是:loop_wait + master_start_timeout + loop_wait,除非 master_start_timeout 为零,
# PS:需要注意,master_start_timeout发生这段时间内,patroni会尝试恢复pg服务
QQGoblin /
Created December 30, 2021 05:47
class Ha(object):
def process_sync_replication(self):
"""Process synchronous standby beahvior.
Synchronous standbys are registered in two places postgresql.conf and DCS. The order of updating them must
be right. The invariant that should be kept is that if a node is master and sync_standby is set in DCS,
then that node must have synchronous_standby set to that value. Or more simple, first set in postgresql.conf
and then in DCS. When removing, first remove in DCS, then in postgresql.conf. This is so we only consider
promoting standbys that were guaranteed to be replicating synchronously.
QQGoblin /
Created December 28, 2021 08:13
class Bootstrap(object):
def create_replica(self, clone_member):
create the replica according to the replica_method
defined by the user. this is a list, so we need to
loop through all methods the user supplies
self._postgresql.set_state('creating replica')
QQGoblin /
Last active December 28, 2021 08:12
class Bootstrap(object):
def basebackup(self, conn_url, env, options):
# creates a replica data dir using pg_basebackup.
# this is the default, built-in create_replica_methods
# tries twice, then returns failure (as 1)
# uses "stream" as the xlog-method to avoid sync issues
# supports additional user-supplied options, those are not validated
maxfailures = 2
ret = 1
# 禁止用户传递的参数
QQGoblin /
Created December 24, 2021 08:47
class Ha(object):
def post_bootstrap(self):
# 当PG启动后,master节点执行用户名初始化,以及自定义脚本等工作
with self._async_response:
result = self._async_response.result
# bootstrap has failed if postgres is not running
if not self.state_handler.is_running() or result is False:
# cancel_initialization() 函数会进行以下操作:
# - 删除dcs上的initialize 信息
# - 强制停止pg
QQGoblin /
Created December 24, 2021 08:02
class Ha(object):
def handle_starting_instance(self):
"""Starting up PostgreSQL may take a long time. In case we are the leader we may want to
fail over to."""
# Check if we are in startup, when paused defer to main loop for manual failovers.
if not self.state_handler.check_for_startup() or self.is_paused():
# pg 不是strartup状态或者patroni处于paused状态
if self.is_paused():
QQGoblin /
Created December 24, 2021 07:42
【Patroni源码阅读】Leader 更新Endpoint
class Kubernetes(AbstractDCS):
def update_leader(self, last_lsn, slots=None):
# PS:这个是Leader更新Endpoint的接口,没有获取Leader之前,无法更新
# 获取当前leader对应的ep
kind = self._kinds.get(self.leader_path)
kind_annotations = kind and kind.metadata.annotations or {}
# ep上annotations.leader和当前实例名称不一致,退出
if kind and kind_annotations.get(self._LEADER) != self._name:
return False
QQGoblin /
Created December 23, 2021 12:45
class Ha(object):
def demote(self, mode):
"""Demote PostgreSQL running as master.
:param mode: One of offline, graceful or immediate.
offline is used when connection to DCS is not available.
graceful is used when failing over to another node due to user request. May only be called running async.
immediate is used when we determine that we are not suitable for master and want to failover quickly
without regard for data durability. May only be called synchronously.
immediate-nolock is used when find out that we have lost the lock to be master. Need to bring down
QQGoblin /
Created December 23, 2021 12:36
【Patroni源码阅读】Postgresql 启动PG
class Postgresql(object):
def start(self, timeout=None, task=None, block_callbacks=False, role=None):
"""Start PostgreSQL
Waits for postmaster to open ports or terminate so pg_isready can be used to check startup completion
or failure.
:returns: True if start was initiated and postmaster ports are open, False if start failed"""
# make sure we close all connections established against
# the former node, otherwise, we might get a stalled one
QQGoblin /
Last active December 23, 2021 09:34
【Patroni源码阅读】Postgresql 停止PG
class Postgresql(object):
def stop(self, mode='fast', block_callbacks=False, checkpoint=None,
on_safepoint=None, on_shutdown=None, stop_timeout=None):
"""Stop PostgreSQL
Supports a callback when a safepoint is reached. A safepoint is when no user backend can return a successful
commit to users. Currently this means we wait for user backends to close. But in the future alternate mechanisms
could be added.
:param on_safepoint: This callback is called when no user backends are running.