kogoto/pacemaker_document.md

## pacemaker_document.md

      
    Raw
  

              pacemaker_document.md
            
          
    Pacemakerを利用したHAクラスタの構築

このドキュメントでは2ノードでのクラスタリング設定方法を説明します。
シェルの表記について

# 1台目のみに対して設定する場合
[root@sv1 ~] $ some command

# 両方のサーバに対して設定する場合
$ some command
サーバ構成

OS

CentOS7 x 2
ホスト名:IPアドレス

sv1:192.168.33.11

sv2:192.168.33.12
初期設定

ノード間の疎通確認

ホストファイルの設定

名称解決できるようにhostsファイルを設定します。
$ vi /etc/hosts
192.168.33.11 sv1 sv1.localdomain
192.168.33.12 sv2 sv2.localdomain
疎通確認

お互いに通信できるか確認します。
[root@sv1 ~]$ ping sv2
PING sv2 (192.168.33.12) 56(84) bytes of data.
64 bytes from sv2 (192.168.33.12): icmp_seq=1 ttl=64 time=0.701 ms
64 bytes from sv2 (192.168.33.12): icmp_seq=2 ttl=64 time=0.505 ms
64 bytes from sv2 (192.168.33.12): icmp_seq=3 ttl=64 time=0.557 ms

[root@sv2 ~]$ ping sv1
PING sv1 (192.168.33.11) 56(84) bytes of data.
64 bytes from sv1 (192.168.33.11): icmp_seq=1 ttl=64 time=0.253 ms
64 bytes from sv1 (192.168.33.11): icmp_seq=2 ttl=64 time=0.551 ms
64 bytes from sv1 (192.168.33.11): icmp_seq=3 ttl=64 time=0.379 ms
クラスタソフトのインストール

以下のソフトをインストールします。

pacemaker
pcs

$ yum install -y pacemaker pcs
クラスタソフトの設定

ファイアウォールの設定

pacemakerのインストール後、以下のファイアウォール設定ファイルが作成されサービスが追加されます。
$ cat /usr/lib/firewalld/services/high-availability.xml
<?xml version="1.0" encoding="utf-8"?>
<service>
  <short>Red Hat High Availability</short>
  <description>This allows you to use the Red Hat High Availability (previously named Red Hat Cluster Suite). Ports are opened for corosync, pcsd, pacemaker_remote and dlm.</description>
  <port protocol="tcp" port="2224"/>
  <port protocol="tcp" port="3121"/>
  <port protocol="udp" port="5404"/>
  <port protocol="udp" port="5405"/>
  <port protocol="tcp" port="21064"/>
</service>
このサービスを有効にするため以下のコマンドを実行します。
$ firewall-cmd --permanent --add-service=high-availability
success

$ firewall-cmd --reload
success
pcsデーモンの有効化

クラスタの設定前に、pcsデーモンが起動時から有効になっている必要があります。
このデーモンはクラスタ内のノード間におけるcorosyncの設定値の同期に利用されます。
以下のコマンドでデーモンを有効にします。
$ systemctl start pcsd.service
$ systemctl enable pcsd
ln -s '/usr/lib/systemd/system/pcsd.service' '/etc/systemd/system/multi-user.target.wants/pcsd.service'
パッケージのインストール時にhaclusterユーザが作成されますが
パスワードが無効となっているので各ノードに同じパスワードを設定します。
このユーザはcorosyncの設定の同期や、クラスタの開始・停止の権限があります。
$ passwd hacluster
ユーザー hacluster のパスワードを変更。
新しいパスワード:
新しいパスワードを再入力してください:
passwd: すべての認証トークンが正しく更新できました。
Corosyncの設定

いずれかのノードでクラスタの認証を行います。
[root@sv1 ~]$ pcs cluster auth sv1 sv2 -u hacluster -p [password]
sv1: Authorized
sv2: Authorized
同じノードでcorosyncの設定ファイルを作成し同期させます。
[root@sv1 ~]$ pcs cluster setup --name mycluster sv1 sv2
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
sv1: Succeeded
sv2: Succeeded
別のノードで正しく同期できているかを確認します。
# corosync.confが自動生成されていることを確認
[root@sv2 ~]$ cat /etc/corosync/corosync.conf
totem {
  version: 2
  secauth: off
  cluster_name: mycluster
  transport: udpu
}

nodelist {
  node {
    ring0_addr: sv1
    nodeid: 1
  }
  node {
    ring0_addr: sv2
    nodeid: 2
  }
}

quorum {
  provider: corosync_votequorum
  two_node: 1
}

logging {
  to_syslog: yes
}
Pacemakerのツール

シェルを利用して簡単に管理する

主なコマンドラインツールとして以下があります。

pcs
crmsh

ここではpcsを利用した設定方法を説明します。
pcsができること

$ pcs

Usage: pcs [-f file] [-h] [commands]...
Control and configure pacemaker and corosync.

Options:
    -h, --help  Display usage and exit
    -f file     Perform actions on file instead of active CIB
    --debug     Print all network traffic and external commands run
    --version   Print pcs version information

Commands:
    cluster     Configure cluster options and nodes
    resource    Manage cluster resources
    stonith     Configure fence devices
    constraint  Set resource constraints
    property    Set pacemaker properties
    acl         Set pacemaker access control lists
    status      View cluster status
    config      View and manage cluster configuration
pcsのコマンドはいくつかのカテゴリに分かれています。
各カテゴリに対するコマンドの利用方法はpcs category helpで表示されます。
$ pcs status help

Usage: pcs status [commands]...
View current cluster and resource status
Commands:
    [status] [--full]
        View all information about the cluster and resources (--full provides
        more details)

    resources
        View current status of cluster resources

    groups
        View currently configured groups and their resources

    cluster
        View current cluster status

    corosync
        View current membership information as seen by corosync

    nodes [corosync|both|config]
        View current status of nodes from pacemaker. If 'corosync' is
        specified, print nodes currently configured in corosync, if 'both'
        is specified, print nodes from both corosync & pacemaker.  If 'config'
        is specified, print nodes from corosync & pacemaker configuration.

    pcsd <node> ...
        Show the current status of pcsd on the specified nodes

    xml
        View xml version of status (output from crm_mon -r -1 -X)
クラスタの開始と検証

クラスタの開始

corosyncの設定が完了したのでクラスタを開始します。
以下のコマンドは各ノードのcorosyncとpacemakerを開始します。
[root@sv1 ~]$ pcs cluster start --all
sv2: Starting Cluster...
sv1: Starting Cluster...
Corosyncの検証

クラスタが動作しているか確認するためcorosync-cfgtoolを実行します。
[root@sv1 ~]$ corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
	id	    = 192.168.33.11
	status	= ring 0 active with no faults
固定IP(127.0.0.xのようなループバックアドレスではない)がidとして設定されていて、no faultsとなっていればOKです。
次にメンバシップとクォーラムを確認します。
[root@sv1 ~]$ corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.33.11)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.33.12)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

[root@sv1 ~]$ pcs status corosync
Membership information
----------------------
    Nodeid      Votes Name
         1          1 sv1 (local)
         2          1 sv2
各ノードがクラスタに参加していることが分かります。
Pacemakerの検証

必要なプロセスが動作しているか確認します。
[root@sv1 ~]$ ps axf
  PID TTY      STAT   TIME COMMAND
    2 ?        S      0:00 [kthreadd]
    3 ?        S      0:00  \_ [ksoftirqd/0]
    5 ?        S<     0:00  \_ [kworker/0:0H]
    6 ?        S      0:00  \_ [kworker/u2:0]
...
32578 ?        Ssl    0:03 corosync
32593 ?        Ss     0:00 /usr/sbin/pacemakerd -f
32599 ?        Ss     0:00  \_ /usr/libexec/pacemaker/cib
32600 ?        Ss     0:00  \_ /usr/libexec/pacemaker/stonithd
32601 ?        Ss     0:00  \_ /usr/libexec/pacemaker/lrmd
32602 ?        Ss     0:00  \_ /usr/libexec/pacemaker/attrd
32603 ?        Ss     0:00  \_ /usr/libexec/pacemaker/pengine
32604 ?        Ss     0:00  \_ /usr/libexec/pacemaker/crmd
次にpcsでステータスを確認します。
[root@sv1 ~]$ pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Thu Aug 20 17:17:16 2015
Last change: Thu Aug 20 17:10:37 2015
Stack: corosync
Current DC: sv2 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
0 Resources configured


Online: [ sv1 sv2 ]

Full list of resources:


PCSD Status:
  sv1: Online
  sv2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
最後にスタートアップでエラーが出ていないか確認します。
(STONITHの設定エラーが出ますがこの時点ではOKとします。)
[root@sv1 ~]$ journalctl | grep -i error
 8月 20 17:09:55 sv1.localdomain pengine[31981]: error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
 8月 20 17:09:55 sv1.localdomain pengine[31981]: notice: process_pe_message: Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.
アクティブ/スタンバイ クラスタの作成

既存の設定確認

pcs statusで表示される内容はXMLに記録されています。
以下のコマンドでファイルを直に表示できます。
[root@sv1 ~]$ pcs cluster cib
<cib crm_feature_set="3.0.9" validate-with="pacemaker-2.3" epoch="5" num_updates="10" admin_epoch="0" cib-last-written="Thu Aug 20 17:10:37 2015" have-quorum="1" dc-uuid="2">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.12-a14efad"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="mycluster"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="sv1"/>
      <node id="2" uname="sv2"/>
    </nodes>
    <resources/>
    <constraints/>
  </configuration>
  <status>
    <node_state id="2" uname="sv2" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
      <lrm id="2">
        <lrm_resources/>
      </lrm>
      <transient_attributes id="2">
        <instance_attributes id="status-2">
          <nvpair id="status-2-shutdown" name="shutdown" value="0"/>
          <nvpair id="status-2-probe_complete" name="probe_complete" value="true"/>
        </instance_attributes>
      </transient_attributes>
    </node_state>
    <node_state id="1" uname="sv1" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
      <lrm id="1">
        <lrm_resources/>
      </lrm>
      <transient_attributes id="1">
        <instance_attributes id="status-1">
          <nvpair id="status-1-shutdown" name="shutdown" value="0"/>
          <nvpair id="status-1-probe_complete" name="probe_complete" value="true"/>
        </instance_attributes>
      </transient_attributes>
    </node_state>
  </status>
</cib>
設定を変更する前に、設定ファイルのバリデーションを行いましょう。
[root@sv1 ~]$ crm_verify -L -V
   error: unpack_resources: 	Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources: 	Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources: 	NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
STONITHの設定エラーが出ているので無効にします。
[root@sv1 ~]$ pcs property set stonith-enabled=false
[root@sv1 ~]$ crm_verify -L -V
リソースの追加

各ノードに別のIPが割り振られているので仮想IPをリソースとして追加します。
IPアドレスは192.168.33.10とします。
[root@sv1 ~]$ pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.33.10 cidr_netmask=24 op monitor interval=30s
追加したリソースが正しく動作していることを確認します。
[root@sv1 ~]$ pcs status
Cluster name: mycluster
Last updated: Thu Aug 20 17:46:43 2015
Last change: Thu Aug 20 17:38:04 2015
Stack: corosync
Current DC: sv2 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
1 Resources configured


Online: [ sv1 sv2 ]

Full list of resources:

 ClusterIP	(ocf::heartbeat:IPaddr2):	Started sv1

PCSD Status:
  sv1: Online
  sv2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
利用可能なリソースの確認

標準リソース(ocf:heartbeat:IPaddr2のocfの部分)の一覧を表示するには以下のコマンドを実行します。
[root@sv1 ~]$ pcs resource standards
OCFリソースプロバイダ(ocf:heartbeat:IPaddr2のheartbeatの部分)の一覧を表示するには以下のコマンドを実行します。
[root@sv1 ~]$ pcs resource providers
OCFプロバイダで定義されているリソースエージェント(ocf:heartbeat:IPaddr2のIPaddr2の部分)の一覧を表示するには以下のコマンドを実行します。
[root@sv1 ~]$ pcs resource agents ocf:heartbeat
フェイルオーバーの実行

1台目のノードを停止してフェイルオーバーされるか確認します。
[root@sv1 ~]$ pcs cluster stop sv1
[root@sv1 ~]$ pcs status
Error: cluster is not currently running on this node

[root@sv2 ~]$ pcs status
Cluster name: mycluster
Last updated: Thu Aug 20 17:49:07 2015
Last change: Thu Aug 20 17:38:04 2015
Stack: corosync
Current DC: sv2 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
1 Resources configured


Online: [ sv2 ]
OFFLINE: [ sv1 ]

Full list of resources:

 ClusterIP	(ocf::heartbeat:IPaddr2):	Started sv2
sv1がOFFLINEとなりCluterIPがsv2に引き継がれました。
フェイルオーバーは自動で行われエラーの出力はしません。
再度sv1のノードを起動し状態を確認します。
[root@sv1 ~]$ pcs cluster start sv1
sv1: Starting Cluster...

[root@sv1 ~]$ pcs status
Cluster name: mycluster
Last updated: Thu Aug 20 17:53:17 2015
Last change: Thu Aug 20 17:38:04 2015
Stack: corosync
Current DC: sv2 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
1 Resources configured


Online: [ sv1 sv2 ]

Full list of resources:

 ClusterIP	(ocf::heartbeat:IPaddr2):	Started sv2

PCSD Status:
  sv1: Online
  sv2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
sv1がOnlineになりました。
ClusterIPはsv2のままとなります。