DRY

Web関連の技術の事、食事/お酒の事、旅行の事など

MySQL-MHAを試してみる

たまにはAndroid以外のエントリーを。

最近巷で噂になっているMySQL-MHAを試してみる事にする

コマンド/手順等はこちらのエントリをだいぶ参考にさせて頂きました。
http://myhome.munetika.mydns.jp/ossdbwiki/index.php/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8


環境
OS:CentOS release 5.3 (Final)

インストールは結構適当にしてあって
yum -y update
perlが動く環境です。

サーバはとりあえず2台用意して、サーバA(10.13.13.151)は単独一台でマスターとして、サーバB(10.13.13.152)にはmysqlインスタンスを3つ用意して(ポートを別けてMySQLを起動)スレーブとする感じにします。

MySQLは5.1の最新をインストールします

cd /usr/local/src
wget http://dev.mysql.com/get/Downloads/MySQL-5.1/mysql-5.1.58.tar.gz/from/http://ftp.iij.ad.jp/pub/db/mysql/
tar zxfv mysql-5.1.58.tar.gz
cd mysql-5.1.58
./configure --prefix=/usr/local/mysql --with-charset=sjis --with-extra-charsets=all --with-plugins=max-no-ndb
make
make install

MySQLの初期設定

/usr/local/mysql/bin/mysql_install_db

mysqld_safe編集
vi /usr/local/mysql/bin/mysqld_safe
559行目あたりに追加
for i in "$ledir/$MYSQLD" "--skip-character-set-client-handshake" "--old-passwords" "--skip-name-resolve" "$defaults" "--basedir=$MY_BASEDIR_VERSION" \ "--datadir=$DATADIR" "$USER_OPTION"
do
cmd="$cmd "`shell_quote_string "$i"`
done

パーミッション調整
cd /usr/local/mysql
mkdir /usr/local/mysql/etc
mv /etc/my.cnf /usr/local/mysql/etc
chown -R root .
chown -R mysql var
chgrp -R mysql .

/usr/local/mysql/share/mysql/mysql.server start

■ノード側のMySQL-MHA環境の準備

http://code.google.com/p/mysql-master-ha/downloads/list から MySQL-MasterHA-Node-0.50.tar.gz を取得します。

cd /usr/local/src
wget http://mysql-master-ha.googlecode.com/files/MySQL-MasterHA-Node-0.50.tar.gz
tar zxvf MySQL-MasterHA-Node-0.50.tar.gz
cd MySQL-MasterHA-Node-0.50
perl Makefile.PL

をすると、DBIとDBD::mysqlが入ってないからエラーみたいな事を言われたので

Module::AutoInstall version 1.03
Checking for Perl dependencies...

[Core Features]

  • DBI ...missing.
  • DBD::mysql ...missing.

==> Auto-install the 1 mandatory module(s) from CPAN? [y]

こちらをCPANからinstall

perl -MCPAN -e shell
cpan > install DBI
cpan > install DBD::mysql

したいのだが、お決まりのcpanのエラーが出るので向き先を変更

Error in server response, closing control connection.

cpan> o conf urllist pop ftp://ftp.dti.ad.jp/pub/lang/CPAN/
cpan> o conf urllist push ftp://ftp.jaist.ac.jp/pub/CPAN/ ftp://ftp.ring.gr.jp/pub/lang/perl/CPAN/
cpan> o conf commit

で改めてMySQL-MasterHA-Node-0.50のコンパイル

cd /usr/local/src/MySQL-MasterHA-Node-0.50
perl Makefile.PL
make
make install


■マスタ側のMySQL-MHA環境の準備
マスター側は「MySQL-MasterHA-Node-0.50」と「MySQL-MasterHA-Manager-0.51」の両方を入れて、なおかつNodeから先にインストールする必要が
あるようです。

Nodeはスレーブサーバと同様に入れる。CPAN等の設定も同様。

http://code.google.com/p/mysql-master-ha/downloads/list から MySQL-MasterHA-Manager-0.51.tar.gz を取得します。

cd /usr/local/src
wget http://mysql-master-ha.googlecode.com/files/MySQL-MasterHA-Manager-0.51.tar.gz
tar zxvf MySQL-MasterHA-Manager-0.51.tar.gz
cd MySQL-MasterHA-Manager-0.51
perl Makefile.PL

Managerを入れようとしたら以下3つが入ってないとエラーがでたのでCPANにてインストール

Module::AutoInstall version 1.03
Checking for Perl dependencies...

[Core Features]

  • DBI ...loaded. (1.616)
  • DBD::mysql ...loaded. (4.020)
  • Time::HiRes ...loaded. (1.9715)
  • Config::Tiny ...missing.
  • Log::Dispatch ...missing.
  • Parallel::ForkManager ...missing.
  • MHA::NodeConst ...loaded. (0.50)

==> Auto-install the 4 mandatory module(s) from CPAN? [y] n
==> The module(s) are mandatory! Really skip? [n] n

Dependencies will be installed the next time you type 'make'.
Module::AutoInstall configuration finished.

Checking if your kit is complete...
Looks good
Warning: prerequisite Config::Tiny 0 not found.
Warning: prerequisite Log::Dispatch 0 not found.
Warning: prerequisite Parallel::ForkManager 0 not found.
Writing Makefile for MySQL::MasterHA::Manager
Writing MYMETA.yml and MYMETA.json

perl -MCPAN -e shell
cpan> install Config::Tiny
cpan> install Log::Dispatch
cpan> install Parallel::ForkManager

で改めて

cd /usr/local/src/MySQL-MasterHA-Manager-0.51
perl Makefile.PL
make
make install

■マスター側のスナップショットの取得(サーバAにて)

cd /usr/local/mysql/var

# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 17
Server version: 5.1.58-log Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> FLUSH TABLES WITH READ LOCK;

お約束通りにこの間に

tar cpf /var/tmp/mysql.tar .

マスターの取得

mysql> SHOW MASTER STATUS\G;

                                                                    • -

File: mysql-bin.000003
Position: 106
Binlog_Do_DB:
Binlog_Ignore_DB:

mysql> UNLOCK TABLES;

mysql> SHOW VARIABLES LIKE 'server\_id';

                                                  • +
Variable_name Value
                                                  • +
server_id 200001
                                                  • +

レプリケーションのGRANTもしておく

mysql> GRANT REPLICATION SLAVE ON *.* TO repl@'%' IDENTIFIED BY 'repl';
mysql> flush PRIVILEGES;

・スナップショットを転送(Bサーバへ)

scp /var/tmp/mysql.tar 10.13.13.152:`pwd`

■スレーブ側にスナップショットのコピー(サーバBにて)

rm -fr /usr/local/mysqls/mysql01/var/*
rm -fr /usr/local/mysqls/mysql02/var/*
rm -fr /usr/local/mysqls/mysql03/var/*
※冒頭説明した通り、複数インスタンスをインストール

・スナップショットの展開

cd /usr/local/mysqls/mysql01/var
tar xpf /var/tmp/mysql.tar
rm -fr [マスターのホスト名].*

他2インスタンスも同様に。
my.cnfをそれぞれのmysqls/mysql0?/etc/my.cnfで設定

MySQLの起動

# /usr/local/mysqls/mysql01/share/mysql/mysql.server start
Starting MySQL. SUCCESS!
# /usr/local/mysqls/mysql02/share/mysql/mysql.server start
Starting MySQL. SUCCESS!
# /usr/local/mysqls/mysql03/share/mysql/mysql.server start
Starting MySQL. SUCCESS!

・スレーブの設定

/usr/local/mysqls/mysql01/bin/mysql -P30001
mysql でパス通ってないので(というか/usr/local/mysqlが無いから)

一応サーバIDを見てターゲットサーバである事を確認
mysql> SHOW VARIABLES LIKE 'server\_id';

                                                      • +
Variable_name Value
                                                      • +
server_id 20000201
                                                      • +

mysql> CHANGE MASTER TO
MASTER_HOST = 'ホスト名',
MASTER_USER = 'repl',
MASTER_PASSWORD = 'repl',
MASTER_LOG_FILE = 'mysql-bin.000003',
MASTER_LOG_POS = 106;

mysql> START SLAVE;
mysql> SHOW SLAVE STATUS\G;

・稼動確認
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

/usr/local/mysqls/mysql01/bin/mysql -P30002
/usr/local/mysqls/mysql01/bin/mysql -P30003
も同様に設定

一応実稼動を確認
サーバAにて

# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 63
Server version: 5.1.58-log Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database tako;
Query OK, 1 row affected (0.00 sec)

サーバBにて

/usr/local/mysqls/mysql01/bin/mysql -P30001 tako
/usr/local/mysqls/mysql01/bin/mysql -P30002 tako
/usr/local/mysqls/mysql01/bin/mysql -P30003 tako

それぞれ入れる事を確認

MySQL-MHAの稼動
/etc/app1.cnfを作るのが定石のようなので、作成。
今回はポートを分けていて、最初設定で「hostname=10.13.13.152:30001」のような形で書いたが

Failed to get IP address on host 10.13.13.152:30001!
at /usr/lib/perl5/site_perl/5.8.8/MHA/Config.pm line 62

こんな感じで怒られてので、/usr/lib/perl5/site_perl/5.8.8/MHA/Config.pmの62行目を確認してみると、そのちょっと下に

$value{port} = $param_arg->{port};
if ( !defined( $value{port} ) ) {
$value{port} = $default->{port};
$value{port} = 3306 unless ( $value{port} );
}

こんな記載を発見したので、「port=」のセクションを作ってみたら起動しました。

[server default]
# mysql user and password
user=root
password=
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1

[server1]
hostname=10.13.13.151

[server2]
hostname=10.13.13.152
port=30001

[server3]
hostname=10.13.13.152
port=30002

[server4]
hostname=10.13.13.152
port=30003

■まずはSSHの確認コマンドのようなものを起動

#masterha_check_ssh --conf=/etc/app1.cnf

Mon Sep 12 16:53:17 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 12 16:53:17 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Mon Sep 12 16:53:17 2011 - [info] Reading server configurations from /etc/app1.cnf..
Mon Sep 12 16:53:17 2011 - [info] Starting SSH connection tests..
Mon Sep 12 16:53:18 2011 - [debug]
Mon Sep 12 16:53:17 2011 - [debug] Connecting via SSH from root@10.13.13.151(10.13.13.151) to root@10.13.13.152(10.13.13.152)..
Mon Sep 12 16:53:17 2011 - [debug] ok.
Mon Sep 12 16:53:17 2011 - [debug] Connecting via SSH from root@10.13.13.151(10.13.13.151) to root@10.13.13.152(10.13.13.152)..
Mon Sep 12 16:53:17 2011 - [debug] ok.
Mon Sep 12 16:53:17 2011 - [debug] Connecting via SSH from root@10.13.13.151(10.13.13.151) to root@10.13.13.152(10.13.13.152)..
Mon Sep 12 16:53:18 2011 - [debug] ok.
Mon Sep 12 16:53:18 2011 - [debug]
Mon Sep 12 16:53:17 2011 - [debug] Connecting via SSH from root@10.13.13.152(10.13.13.152) to root@10.13.13.151(10.13.13.151)..
Mon Sep 12 16:53:17 2011 - [debug] ok.
Mon Sep 12 16:53:17 2011 - [debug] Connecting via SSH from root@10.13.13.152(10.13.13.152) to root@10.13.13.152(10.13.13.152)..
Mon Sep 12 16:53:18 2011 - [debug] ok.
Mon Sep 12 16:53:18 2011 - [debug] Connecting via SSH from root@10.13.13.152(10.13.13.152) to root@10.13.13.152(10.13.13.152)..
Mon Sep 12 16:53:18 2011 - [debug] ok.
Mon Sep 12 16:53:19 2011 - [debug]
Mon Sep 12 16:53:18 2011 - [debug] Connecting via SSH from root@10.13.13.152(10.13.13.152) to root@10.13.13.151(10.13.13.151)..
Mon Sep 12 16:53:18 2011 - [debug] ok.
Mon Sep 12 16:53:18 2011 - [debug] Connecting via SSH from root@10.13.13.152(10.13.13.152) to root@10.13.13.152(10.13.13.152)..
Mon Sep 12 16:53:18 2011 - [debug] ok.
Mon Sep 12 16:53:18 2011 - [debug] Connecting via SSH from root@10.13.13.152(10.13.13.152) to root@10.13.13.152(10.13.13.152)..
Mon Sep 12 16:53:18 2011 - [debug] ok.
Mon Sep 12 16:53:19 2011 - [debug]
Mon Sep 12 16:53:18 2011 - [debug] Connecting via SSH from root@10.13.13.152(10.13.13.152) to root@10.13.13.151(10.13.13.151)..
Mon Sep 12 16:53:18 2011 - [debug] ok.
Mon Sep 12 16:53:18 2011 - [debug] Connecting via SSH from root@10.13.13.152(10.13.13.152) to root@10.13.13.152(10.13.13.152)..
Mon Sep 12 16:53:19 2011 - [debug] ok.
Mon Sep 12 16:53:19 2011 - [debug] Connecting via SSH from root@10.13.13.152(10.13.13.152) to root@10.13.13.152(10.13.13.152)..
Mon Sep 12 16:53:19 2011 - [debug] ok.
Mon Sep 12 16:53:19 2011 - [info] All SSH connection tests passed successfully. ←成功のメッセージ

■次にレプリケーションの接続チェック

# masterha_check_repl --conf=/etc/app1.cnf

Mon Sep 12 16:57:10 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 12 16:57:10 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Mon Sep 12 16:57:10 2011 - [info] Reading server configurations from /etc/app1.cnf..
Mon Sep 12 16:57:10 2011 - [info] MHA::MasterMonitor version 0.51.
Mon Sep 12 16:57:10 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/Server.pm, ln148] Got MySQL error when connecting 10.13.13.151(10.13.13.151:3306) :1045:Access denied for user 'root'@'10.13.13.151' (using password: NO), but this is not mysql crash. Check MySQL server settings.Mon Sep 12 16:57:10 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln316] Error happend on checking configurations. at /usr/lib/perl5/site_perl/5.8.8/MHA/ServerManager.pm line 197
Mon Sep 12 16:57:10 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln397] Error happened on monitoring servers.
Mon Sep 12 16:57:10 2011 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

と赤のエラーだと言っているので、メンドクサイからとりあえずrootにGRANT ALL PRIVILEGESしておく

# mysql
mysql> GRANT ALL PRIVILEGES ON *.* TO root WITH GRANT OPTION;
mysql> FLUSH PRIVILEGES;

再び実行してみる

# masterha_check_repl --conf=/etc/app1.cnf
Mon Sep 12 17:15:16 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 12 17:15:16 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Mon Sep 12 17:15:16 2011 - [info] Reading server configurations from /etc/app1.cnf..
Mon Sep 12 17:15:16 2011 - [info] MHA::MasterMonitor version 0.51.
Mon Sep 12 17:15:16 2011 - [info] Dead Servers:
Mon Sep 12 17:15:16 2011 - [info] Alive Servers:
Mon Sep 12 17:15:16 2011 - [info] 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 17:15:16 2011 - [info] 10.13.13.152(10.13.13.152:30001)
Mon Sep 12 17:15:16 2011 - [info] 10.13.13.152(10.13.13.152:30002)
Mon Sep 12 17:15:16 2011 - [info] 10.13.13.152(10.13.13.152:30003)
Mon Sep 12 17:15:16 2011 - [info] Alive Slaves:
Mon Sep 12 17:15:16 2011 - [info] 10.13.13.152(10.13.13.152:30001) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 17:15:16 2011 - [info] Replicating from ホスト名(127.0.0.1:3306)
Mon Sep 12 17:15:16 2011 - [info] 10.13.13.152(10.13.13.152:30002) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 17:15:16 2011 - [info] Replicating from ホスト名(127.0.0.1:3306)
Mon Sep 12 17:15:16 2011 - [info] 10.13.13.152(10.13.13.152:30003) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 17:15:16 2011 - [info] Replicating from ホスト名(127.0.0.1:3306)
Mon Sep 12 17:15:16 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/ServerManager.pm, ln557] FATAL: Replication configuration error. master ip/port(10.13.13.151(10.13.13.151:3306)) is different from slaves' master ip/port(ホスト名(127.0.0.1:3306)).Mon Sep 12 17:15:16 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/ServerManager.pm, ln1069] MySQL master is not correctly configured. Check master/slave settings
Mon Sep 12 17:15:16 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln316] Error happend on checking configurations. at /usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm line 243
Mon Sep 12 17:15:16 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln397] Error happened on monitoring servers.
Mon Sep 12 17:15:16 2011 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

それで
FATAL: Replication configuration error. master ip/port(10.13.13.151(10.13.13.151:3306)) is different from slaves' master ip/port(ホスト名(127.0.0.1:3306)).
的な事を言っているので、何となくMASTER_HOSTをIPにした方がいいのかなと。

なので、3つのスレーブのMASTER_HOSTをIPに変更

STOP SLAVE;
CHANGE MASTER TO
MASTER_HOST = '10.13.13.151';
START SLAVE;
SHOW SLAVE STATUS\G;

また実行

# masterha_check_repl --conf=/etc/app1.cnf
Mon Sep 12 17:59:16 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 12 17:59:16 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Mon Sep 12 17:59:16 2011 - [info] Reading server configurations from /etc/app1.cnf..
Mon Sep 12 17:59:16 2011 - [info] MHA::MasterMonitor version 0.51.
Mon Sep 12 17:59:16 2011 - [info] Dead Servers:
Mon Sep 12 17:59:16 2011 - [info] Alive Servers:
Mon Sep 12 17:59:16 2011 - [info] 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 17:59:16 2011 - [info] 10.13.13.152(10.13.13.152:30001)
Mon Sep 12 17:59:16 2011 - [info] 10.13.13.152(10.13.13.152:30002)
Mon Sep 12 17:59:16 2011 - [info] 10.13.13.152(10.13.13.152:30003)
Mon Sep 12 17:59:16 2011 - [info] Alive Slaves:
Mon Sep 12 17:59:16 2011 - [info] 10.13.13.152(10.13.13.152:30001) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 17:59:16 2011 - [info] Replicating from 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 17:59:16 2011 - [info] 10.13.13.152(10.13.13.152:30002) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 17:59:16 2011 - [info] Replicating from 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 17:59:16 2011 - [info] 10.13.13.152(10.13.13.152:30003) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 17:59:16 2011 - [info] Replicating from 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 17:59:16 2011 - [info] Current Master: 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 17:59:16 2011 - [info] Checking slave configurations..
Mon Sep 12 17:59:16 2011 - [warn] read_only=1 is not set on slave 10.13.13.152(10.13.13.152:30001).
Mon Sep 12 17:59:16 2011 - [warn] relay_log_purge=0 is not set on slave 10.13.13.152(10.13.13.152:30001).
Mon Sep 12 17:59:16 2011 - [warn] read_only=1 is not set on slave 10.13.13.152(10.13.13.152:30002).
Mon Sep 12 17:59:16 2011 - [warn] relay_log_purge=0 is not set on slave 10.13.13.152(10.13.13.152:30002).
Mon Sep 12 17:59:16 2011 - [warn] read_only=1 is not set on slave 10.13.13.152(10.13.13.152:30003).
Mon Sep 12 17:59:16 2011 - [warn] relay_log_purge=0 is not set on slave 10.13.13.152(10.13.13.152:30003).
Mon Sep 12 17:59:16 2011 - [info] Checking replication filtering settings..
Mon Sep 12 17:59:16 2011 - [info] binlog_do_db= , binlog_ignore_db=
Mon Sep 12 17:59:16 2011 - [info] Replication filtering check ok.
Mon Sep 12 17:59:16 2011 - [info] Starting SSH connection tests..
Mon Sep 12 17:59:19 2011 - [info] All SSH connection tests passed successfully.
Mon Sep 12 17:59:19 2011 - [info] Checking MHA Node version..
Mon Sep 12 17:59:19 2011 - [info] Version check ok.
Mon Sep 12 17:59:19 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on the current master..
Mon Sep 12 17:59:20 2011 - [info] Executing command: save_binary_logs --command=test --start_file=xxx-bin.000001 --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.51
Mon Sep 12 17:59:20 2011 - [info] Connecting to root@10.13.13.151(10.13.13.151)..
Failed to save binary log: Binlog not found from /var/lib/mysql,/var/log/mysql!
at /usr/bin/save_binary_logs line 95
eval {...} called at /usr/bin/save_binary_logs line 59
main::main() called at /usr/bin/save_binary_logs line 55
Mon Sep 12 17:59:20 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln94] Master setting check failed!
Mon Sep 12 17:59:20 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln296] Master configuration failed.
Mon Sep 12 17:59:20 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln316] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48
Mon Sep 12 17:59:20 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln397] Error happened on monitoring servers.
Mon Sep 12 17:59:20 2011 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

yumで入れるのが前提?になってるのかな?
なので今回の環境はbin-logは/usr/local/mysql/varの下にあるので

cd /var/lib
ln -s /usr/local/mysql/var mysql


※9/13追記
作者からコメントを頂き、設定ファイルに下記記載で対応
> [server default]
> master_binlog_dir=/data/mysql

再び実行します

# masterha_check_repl --conf=/etc/app1.cnf
Mon Sep 12 18:17:44 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 12 18:17:44 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Mon Sep 12 18:17:44 2011 - [info] Reading server configurations from /etc/app1.cnf..
Mon Sep 12 18:17:44 2011 - [info] MHA::MasterMonitor version 0.51.
Mon Sep 12 18:17:44 2011 - [info] Dead Servers:
Mon Sep 12 18:17:44 2011 - [info] Alive Servers:
Mon Sep 12 18:17:44 2011 - [info] 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 18:17:44 2011 - [info] 10.13.13.152(10.13.13.152:30001)
Mon Sep 12 18:17:44 2011 - [info] 10.13.13.152(10.13.13.152:30002)
Mon Sep 12 18:17:44 2011 - [info] 10.13.13.152(10.13.13.152:30003)
Mon Sep 12 18:17:44 2011 - [info] Alive Slaves:
Mon Sep 12 18:17:44 2011 - [info] 10.13.13.152(10.13.13.152:30001) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 18:17:44 2011 - [info] Replicating from 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 18:17:44 2011 - [info] 10.13.13.152(10.13.13.152:30002) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 18:17:44 2011 - [info] Replicating from 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 18:17:44 2011 - [info] 10.13.13.152(10.13.13.152:30003) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 18:17:44 2011 - [info] Replicating from 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 18:17:44 2011 - [info] Current Master: 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 18:17:44 2011 - [info] Checking slave configurations..
Mon Sep 12 18:17:44 2011 - [warn] read_only=1 is not set on slave 10.13.13.152(10.13.13.152:30001).
Mon Sep 12 18:17:44 2011 - [warn] relay_log_purge=0 is not set on slave 10.13.13.152(10.13.13.152:30001).
Mon Sep 12 18:17:44 2011 - [warn] read_only=1 is not set on slave 10.13.13.152(10.13.13.152:30002).
Mon Sep 12 18:17:44 2011 - [warn] relay_log_purge=0 is not set on slave 10.13.13.152(10.13.13.152:30002).
Mon Sep 12 18:17:44 2011 - [warn] read_only=1 is not set on slave 10.13.13.152(10.13.13.152:30003).
Mon Sep 12 18:17:44 2011 - [warn] relay_log_purge=0 is not set on slave 10.13.13.152(10.13.13.152:30003).
Mon Sep 12 18:17:44 2011 - [info] Checking replication filtering settings..
Mon Sep 12 18:17:44 2011 - [info] binlog_do_db= , binlog_ignore_db=
Mon Sep 12 18:17:44 2011 - [info] Replication filtering check ok.
Mon Sep 12 18:17:44 2011 - [info] Starting SSH connection tests..
Mon Sep 12 18:17:46 2011 - [info] All SSH connection tests passed successfully.
Mon Sep 12 18:17:46 2011 - [info] Checking MHA Node version..
Mon Sep 12 18:17:46 2011 - [info] Version check ok.
Mon Sep 12 18:17:46 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on the current master..
Mon Sep 12 18:17:47 2011 - [info] Executing command: save_binary_logs --command=test --start_file=xxx-bin.000001 --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.51
Mon Sep 12 18:17:47 2011 - [info] Connecting to root@10.13.13.151(10.13.13.151)..
Creating /var/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to xxx-bin.000001
Mon Sep 12 18:17:47 2011 - [info] Master setting check done.
Mon Sep 12 18:17:47 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Sep 12 18:17:47 2011 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=10.13.13.152 --slave_ip=10.13.13.152 --slave_port=30001 --workdir=/var/log/masterha/app1 --target_version=5.1.58-log --manager_version=0.51 --relay_log_info=/usr/local/mysqls/mysql01/var/relay-log.info --slave_pass=xxx
Mon Sep 12 18:17:47 2011 - [info] Connecting to root@10.13.13.152(10.13.13.152)..
Can't exec "mysqlbinlog": No such file or directory at /usr/lib/perl5/site_perl/5.8.8/MHA/BinlogManager.pm line 99.
mysqlbinlog version not found!
at /usr/bin/apply_diff_relay_logs line 425
Mon Sep 12 18:17:47 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln129] Slaves settings check failed!
Mon Sep 12 18:17:47 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln305] Slave configuration failed.
Mon Sep 12 18:17:47 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln316] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48
Mon Sep 12 18:17:47 2011 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln397] Error happened on monitoring servers.
Mon Sep 12 18:17:47 2011 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

うむむ。。。
前述の通りサーバB(スレーブ)は3306のインスタンスは作っていないので、PATHをちょっと細工(/usr/local/mysqls/mysql01/bin/に通す)

      • 省略 ---

Mon Sep 12 18:58:14 2011 - [info] Checking replication health on 10.13.13.152..
Mon Sep 12 18:58:14 2011 - [info] ok.
Mon Sep 12 18:58:14 2011 - [info] Checking replication health on 10.13.13.152..
Mon Sep 12 18:58:14 2011 - [info] ok.
Mon Sep 12 18:58:14 2011 - [info] Checking replication health on 10.13.13.152..
Mon Sep 12 18:58:14 2011 - [info] ok.
Mon Sep 12 18:58:14 2011 - [warn] master_ip_failover_script is not defined.
Mon Sep 12 18:58:14 2011 - [warn] shutdown_script is not defined.
Mon Sep 12 18:58:14 2011 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

とりあえずOK頂きました。


■マネージャの起動

# masterha_manager --conf=/etc/app1.cnf
Mon Sep 12 19:00:19 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 12 19:00:19 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Mon Sep 12 19:00:19 2011 - [info] Reading server configurations from /etc/app1.cnf..
Mon Sep 12 19:00:19 2011 - [info] MHA::MasterMonitor version 0.51.
Mon Sep 12 19:00:19 2011 - [info] Dead Servers:
Mon Sep 12 19:00:19 2011 - [info] Alive Servers:
Mon Sep 12 19:00:19 2011 - [info] 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 19:00:19 2011 - [info] 10.13.13.152(10.13.13.152:30001)
Mon Sep 12 19:00:19 2011 - [info] 10.13.13.152(10.13.13.152:30002)
Mon Sep 12 19:00:19 2011 - [info] 10.13.13.152(10.13.13.152:30003)
Mon Sep 12 19:00:19 2011 - [info] Alive Slaves:
Mon Sep 12 19:00:19 2011 - [info] 10.13.13.152(10.13.13.152:30001) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 19:00:19 2011 - [info] Replicating from 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 19:00:19 2011 - [info] 10.13.13.152(10.13.13.152:30002) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 19:00:19 2011 - [info] Replicating from 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 19:00:19 2011 - [info] 10.13.13.152(10.13.13.152:30003) Version=5.1.58-log (oldest major version between slaves) log-bin:enabled
Mon Sep 12 19:00:19 2011 - [info] Replicating from 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 19:00:19 2011 - [info] Current Master: 10.13.13.151(10.13.13.151:3306)
Mon Sep 12 19:00:19 2011 - [info] Checking slave configurations..
Mon Sep 12 19:00:19 2011 - [warn] read_only=1 is not set on slave 10.13.13.152(10.13.13.152:30001).
Mon Sep 12 19:00:19 2011 - [warn] relay_log_purge=0 is not set on slave 10.13.13.152(10.13.13.152:30001).
Mon Sep 12 19:00:19 2011 - [warn] read_only=1 is not set on slave 10.13.13.152(10.13.13.152:30002).
Mon Sep 12 19:00:19 2011 - [warn] relay_log_purge=0 is not set on slave 10.13.13.152(10.13.13.152:30002).
Mon Sep 12 19:00:19 2011 - [warn] read_only=1 is not set on slave 10.13.13.152(10.13.13.152:30003).
Mon Sep 12 19:00:19 2011 - [warn] relay_log_purge=0 is not set on slave 10.13.13.152(10.13.13.152:30003).
Mon Sep 12 19:00:19 2011 - [info] Checking replication filtering settings..
Mon Sep 12 19:00:19 2011 - [info] binlog_do_db= , binlog_ignore_db=
Mon Sep 12 19:00:19 2011 - [info] Replication filtering check ok.
Mon Sep 12 19:00:19 2011 - [info] Starting SSH connection tests..
Mon Sep 12 19:00:21 2011 - [info] All SSH connection tests passed successfully.
Mon Sep 12 19:00:21 2011 - [info] Checking MHA Node version..
Mon Sep 12 19:00:22 2011 - [info] Version check ok.
Mon Sep 12 19:00:22 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on the current master..
Mon Sep 12 19:00:22 2011 - [info] Executing command: save_binary_logs --command=test --start_file=xxx-bin.000001 --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.51
Mon Sep 12 19:00:22 2011 - [info] Connecting to root@10.13.13.151(10.13.13.151)..
Creating /var/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to xxx-bin.000001
Mon Sep 12 19:00:22 2011 - [info] Master setting check done.
Mon Sep 12 19:00:22 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Sep 12 19:00:22 2011 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=10.13.13.152 --slave_ip=10.13.13.152 --slave_port=30001 --workdir=/var/log/masterha/app1 --target_version=5.1.58-log --manager_version=0.51 --relay_log_info=/usr/local/mysqls/mysql01/var/relay-log.info --slave_pass=xxx
Mon Sep 12 19:00:22 2011 - [info] Connecting to root@10.13.13.152(10.13.13.152)..
aaa at /usr/lib/perl5/site_perl/5.8.8/MHA/BinlogManager.pm line 100.
Checking slave recovery environment settings..
Opening /usr/local/mysqls/mysql01/var/relay-log.info ... ok.
Relay log found at /usr/local/mysqls/mysql01/var, up to xxx-relay-bin.000003
Temporary relay log file is /usr/local/mysqls/mysql01/var/xxx-relay-bin.000003
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Sep 12 19:00:22 2011 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=10.13.13.152 --slave_ip=10.13.13.152 --slave_port=30002 --workdir=/var/log/masterha/app1 --target_version=5.1.58-log --manager_version=0.51 --relay_log_info=/usr/local/mysqls/mysql02/var/relay-log.info --slave_pass=xxx
Mon Sep 12 19:00:22 2011 - [info] Connecting to root@10.13.13.152(10.13.13.152)..
aaa at /usr/lib/perl5/site_perl/5.8.8/MHA/BinlogManager.pm line 100.
Checking slave recovery environment settings..
Opening /usr/local/mysqls/mysql02/var/relay-log.info ... ok.
Relay log found at /usr/local/mysqls/mysql02/var, up to xxx-relay-bin.000003
Temporary relay log file is /usr/local/mysqls/mysql02/var/xxx-relay-bin.000003
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Sep 12 19:00:23 2011 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=10.13.13.152 --slave_ip=10.13.13.152 --slave_port=30003 --workdir=/var/log/masterha/app1 --target_version=5.1.58-log --manager_version=0.51 --relay_log_info=/usr/local/mysqls/mysql03/var/relay-log.info --slave_pass=xxx
Mon Sep 12 19:00:23 2011 - [info] Connecting to root@10.13.13.152(10.13.13.152)..
aaa at /usr/lib/perl5/site_perl/5.8.8/MHA/BinlogManager.pm line 100.
Checking slave recovery environment settings..
Opening /usr/local/mysqls/mysql03/var/relay-log.info ... ok.
Relay log found at /usr/local/mysqls/mysql03/var, up to xxx-relay-bin.000003
Temporary relay log file is /usr/local/mysqls/mysql03/var/xxx-relay-bin.000003
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Sep 12 19:00:23 2011 - [info] Slaves settings check done.
Mon Sep 12 19:00:23 2011 - [info]
10.13.13.151 (current master)
+--10.13.13.152
+--10.13.13.152
+--10.13.13.152

Mon Sep 12 19:00:23 2011 - [warn] master_ip_failover_script is not defined.
Mon Sep 12 19:00:23 2011 - [warn] shutdown_script is not defined.
Mon Sep 12 19:00:23 2011 - [info] Set master ping interval 3 seconds.
Mon Sep 12 19:00:23 2011 - [warn] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon Sep 12 19:00:23 2011 - [info] Starting ping health check on 10.13.13.151(10.13.13.151:3306)..
Mon Sep 12 19:00:23 2011 - [info] Ping succeeded, sleeping until it doesn't respond..

■別のscreenやterminalでマネージャの起動状況を確認

# masterha_check_status --conf=/etc/app1.cnf
app1 (pid:10195) is running(0:PING_OK), master:10.13.13.151

さて、これで準備は出来たはずなので改めてスレーブの状況を確認(スレーブサーバにて)

# /usr/local/mysqls/mysql01/bin/mysql -P30001 -e "SHOW VARIABLES LIKE 'server\_id'"

                                                      • +
Variable_name Value
                                                      • +
server_id 20000201
                                                      • +

# /usr/local/mysqls/mysql01/bin/mysql -P30001 -e 'show slave status\G'

                                                                    • -

Slave_IO_State: Waiting for master to send event
Master_Host: 10.13.13.151
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: xxx-bin.000001
Read_Master_Log_Pos: 511
Relay_Log_File: xxx-relay-bin.000003
Relay_Log_Pos: 251
Relay_Master_Log_File: xxx-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
以下省略

# /usr/local/mysqls/mysql02/bin/mysql -P30002 -e "SHOW VARIABLES LIKE 'server\_id'"

                                                      • +
Variable_name Value
                                                      • +
server_id 20000202
                                                      • +

# /usr/local/mysqls/mysql02/bin/mysql -P30002 -e 'show slave status\G'

                                                                    • -

Slave_IO_State: Waiting for master to send event
Master_Host: 10.13.13.151
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: xxx-bin.000001
Read_Master_Log_Pos: 511
Relay_Log_File: xxx-relay-bin.000003
Relay_Log_Pos: 251
Relay_Master_Log_File: xxx-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

# /usr/local/mysqls/mysql03/bin/mysql -P30003 -e "SHOW VARIABLES LIKE 'server\_id'"

                                                      • +
Variable_name Value
                                                      • +
server_id 20000203
                                                      • +

# /usr/local/mysqls/mysql03/bin/mysql -P30003 -e 'show slave status\G'

                                                                    • -

Slave_IO_State: Waiting for master to send event
Master_Host: 10.13.13.151
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: xxx-bin.000001
Read_Master_Log_Pos: 511
Relay_Log_File: xxx-relay-bin.000003
Relay_Log_Pos: 251
Relay_Master_Log_File: xxx-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

リブートは一旦メンドイのでマスターのMySQLを止めてみる(マスタサーバにて)

# /usr/local/mysql/share/mysql/mysql.server stop
Shutting down MySQL..... SUCCESS!

そうすると「masterha_manager --conf=/etc/app1.cnf」を動かしていたターミナルでメッセージが大量に出力されて最終的に以下のようなメッセージが出た

          • Failover Report -----

app1: MySQL Master failover 10.13.13.151 to 10.13.13.152 succeeded

Master 10.13.13.151 is down!

Check MHA Manager logs at xxx.paq.to for details.

Started automated(non-interactive) failover.
The latest slave 10.13.13.152(10.13.13.152:30001) has all relay logs for recovery.
Selected 10.13.13.152 as a new master.
10.13.13.152: OK: Applying all logs succeeded.
10.13.13.152: This host has the latest relay log events.
10.13.13.152: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.13.13.152: OK: Applying all logs succeeded. Slave started, replicating from 10.13.13.152.
10.13.13.152: OK: Applying all logs succeeded. Slave started, replicating from 10.13.13.152.
10.13.13.152: Resetting slave info succeeded.
Master failover to 10.13.13.152(10.13.13.152:30001) completed successfully.

見る限りでは
Master failover to 10.13.13.152(10.13.13.152:30001) completed successfully.
と言っているので、サーバBのポート30001のインスタンスがマスターになったようだ

実際コマンドで確認してみると

# /usr/local/mysqls/mysql02/bin/mysql -P30002 -e 'show slave status\G'

Slave_IO_State: Waiting for master to send event
Master_Host: 10.13.13.152
Master_User: repl
Master_Port: 30001
Connect_Retry: 60
Master_Log_File: mysql-bin.000004
Read_Master_Log_Pos: 187
Relay_Log_File: xxx-relay-bin.000002
Relay_Log_Pos: 251
Relay_Master_Log_File: mysql-bin.000004
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

# /usr/local/mysqls/mysql03/bin/mysql -P30003 -e 'show slave status\G'

Slave_IO_State: Waiting for master to send event
Master_Host: 10.13.13.152
Master_User: repl
Master_Port: 30001
Connect_Retry: 60
Master_Log_File: mysql-bin.000004
Read_Master_Log_Pos: 187
Relay_Log_File: xxx-relay-bin.000002
Relay_Log_Pos: 251
Relay_Master_Log_File: mysql-bin.000004
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

確かに変わっている!成功だ!!!


マスターに昇格したインスタンスはどうなっているのかというと

# /usr/local/mysqls/mysql01/bin/mysql -P30001 -e 'show slave status\G'

Slave_IO_State:
Master_Host: 10.13.13.151
Master_User: test
Master_Port: 3306
Connect_Retry: 60
Master_Log_File:
Read_Master_Log_Pos: 4
Relay_Log_File: xxx-relay-bin.000001
Relay_Log_Pos: 4
Relay_Master_Log_File:
Slave_IO_Running: No
Slave_SQL_Running: No

Master_HostやMaster_Portは元のまんまだ。
ただ、Slave_IO_Running: No/Slave_SQL_Running: Noに設定されている。
という事みたいだね。
でも何でUSERがtestになってるんだろう?そういう設定にしてるのかな?


※9/13追記
こちらも作者からコメント頂き
>MySQLの仕様ですね。MySQLの内部でハードコーディングされています。
との事。

あと、元のマスターが落ちた時点で「masterha_manager --conf=/etc/app1.cnf」のコマンドは帰ってきてしまっているので(実行されていない)
障害対応した後にもう一度正しいマスターなどで「masterha_manager --conf=/etc/app1.cnf」し直すという事なんですかね。
松信さんのエントリーを見ればいいのか。まだあんまり読んでないので確認します。


とりあえず後は負荷を掛けてハングアップとかした状態で見てみたいな。また試します。


最後にエラーが出た時、こちらのサイトも参考にさせて貰いました。
http://code.google.com/p/mysql-master-ha/wiki/Requirements