网友通过本文主要向大家介绍了华为 mha al00,华为mha al00多少钱,nike x mha,华为手机mha al00,德国mha官网等相关知识,希望对您有所帮助,也希望大家支持linkedu.com www.linkedu.com
MHA故障切换和在线切换的代码解析
前段时间我的同事沈龙星整理了一下MHA故障切换和在线切换的代码流程,在征得其同意后,在此转发。以下是正文
本文是以MySQL5.5为基础的,因此没有涉及到gtid相关内容。MHA的主从切换过程分为failover和rotate两种,前者适用于原Master down的情况,后者是在在线切换的情况下使用。下面分别讲解failover的处理流程
- MHA::MasterFailover::main()
- ->do_master_failover
- Phase 1: Configuration Check Phase
- -> check_settings:
- check_node_version:查看MHA的版本信息
- connect_all_and_read_server_status:确认各个node的MySQL实例是否可以连接
- get_dead_servers/get_alive_servers/get_alive_slaves:double check各个node的死活状态
- start_sql_threads_if:查看Slave_SQL_Running是否为Yes,若不是则启动SQL thread
- Phase 2: Dead Master Shutdown Phase:对于我们来说,唯一的作用就是stop IO thread
- -> force_shutdown($dead_master):
- stop_io_thread:所有slave的IO thread stop掉(将stop掉master)
- force_shutdown_internal(实际上就是执行配置文件中的master_ip_failover_script/shutdown_script,若无则不执行):
- master_ip_failover_script:如果设置了VIP,则首先切换VIP
- shutdown_script:如果设置了shutdown脚本,则执行
- Phase 3: Master Recovery Phase
- -> Phase 3.1: Getting Latest Slaves Phase(取得latest slave)
- read_slave_status:取得各个slave的binlog file/position
- check_slave_status:调用"SHOW SLAVE STATUS"来取得slave的如下信息:
- Slave_IO_State, Master_Host,
- Master_Port, Master_User,
- Slave_IO_Running, Slave_SQL_Running,
- Master_Log_File, Read_Master_Log_Pos,
- Relay_Master_Log_File, Last_Errno,
- Last_Error, Exec_Master_Log_Pos,
- Relay_Log_File, Relay_Log_Pos,
- Seconds_Behind_Master, Retrieved_Gtid_Set,
- Executed_Gtid_Set, Auto_Position
- Replicate_Do_DB, Replicate_Ignore_DB, Replicate_Do_Table,
- Replicate_Ignore_Table, Replicate_Wild_Do_Table,
- Replicate_Wild_Ignore_Table
- identify_latest_slaves:
- 通过比较各个slave中的Master_Log_File/Read_Master_Log_Pos,来找到latest的slave
- identify_oldest_slaves:
- 通过比较各个slave中的Master_Log_File/Read_Master_Log_Pos,来找到oldest的slave
- -> Phase 3.2: Saving Dead Master's Binlog Phase:
- save_master_binlog:
- 如果dead master可以ssh连接,则走如下分支:
- save_master_binlog_internal:(使用node节点的save_binary_logs脚本在dead master上做拷贝)
- save_binary_logs --command=save --start_file=mysql-bin.000281 --start_pos=107 --binlog_dir=/opt/mysql/data/binlog --output_file=/opt/mha/log/saved_master_binlog_from_10.27.177.245_3306_20160108211857.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.55
- generate_diff_binary_log:
- concat_all_binlogs_from:
- dump_binlog:就是将binlog文件dump到target文件中,用的就是binmode read
- dump_binlog_header_fde:从0读到position-1
- dump_binlog_from_pos:从position开始,dump binlog file到target file
- file_copy:
- 文件拷贝,是将上述生成的binlog文件拷贝到manage节点的manager_workdir目录下
- 如果dead master无法ssh登录,则master上未同步到slave的txn丢失
- -> Phase 3.3: Determining New Master Phase
- find_latest_base_slave:
- find_latest_base_slave_internal:
- pos_cmp( $oldest_mlf, $oldest_mlp, $latest_mlf, $latest_mlp )
- 判断latest/oldest slave的binlog位置是不是相同,若相同则不需要同步relay log
- apply_diff_relay_logs --command=find --latest
- 查看latest slave中是否有oldest缺少的relay log,若无则继续,否则failover失败
- 查找的方法很简单,就是逆序的读latest slave的relay log文件,一直找到file/position为止
- select_new_master:选出新的master节点
- If preferred node is specified, one of active preferred nodes will be new master.
- If the latest server behinds too much (i.e. stopping sql thread for online backups),
- we should not use it as a new master, we should fetch relay log there. Even though preferred
- master is configured, it does not become a master if it's far behind.
- get_candidate_masters:
- 就是配置文件中配置了candidate_master>0的节点
- get_bad_candidate_masters:
- # The following servers can not be master:
- # - dead servers
- # - Set no_master in conf files (i.e. DR servers)
- # - log_bin is disabled
- # - Major version is not the oldest
- # - too much replication delay(slave与master的binlog position差距大于100000000)
- Searching from candidate_master slaves which have received the latest relay log events
- if NOT FOUND:
- Searching from all candidate_master slaves
- if NOT FOUND:
- Searching from all slaves which have received the latest relay log events
- if NOT FOUND:
- Searching from all slaves
- -> Phase 3.4: New Master Diff Log Generation Phase
- recover_relay_logs:
- 判断new master是不是latest slave,若不是则使用apply_diff_relay_logs --命令生成差分log,
- 并发送到新new master
- recover_master_internal:
- 将3.2中生成的daed master上的binlog发送到new master
- -> Phase 3.5: Master Log Apply Phase
- recover_slave:
- apply_diff:
- 0. wait_until_relay_log_applied,等待new master将relaylog执行完
- 1. 判断Exec_Master_Log_Pos == Read_Master_Log_Pos,
- 如果不相等则使用save_binary_logs --command=save生成差分log
- 2. 调用apply_diff_relay_logs命令,让new master进行recover.其中:
- 2.1 recover的log分为三部分:
- exec_diff:Exec_Master_Log_Pos和Read_Master_Log_Pos的差分
- read_diff:new master与lastest slave的relay log的差分
- binlog_diff:lastest slave与daed master之间的binlog差分
- 实际上apply_diff_relay_logs就是调用mysqlbinlog command进行recover
- //如果设置了vip,则需要调用master_ip_failover_script进行vip的failover
- Phase 4: Slaves Recovery Phase
- -> Phase 4.1: Starting Parallel Slave Diff Log Generation Phase
- 生成Slave与New Slave之间的差异日志,并将该日志拷贝到各Slave的工作目录下。
- -> Phase 4.2: Starting Parallel Slave Log Apply Phase
- recover_slave:
- 对各个slave进行恢复,同Phase3.5
- change_master_and_start_slave:
- 通过CHANGE MASTER TO命令将这些Slave指向新的New Master,最后开始复制(start slave)
- Phase 5: New master cleanup phase
- reset_slave_on_new_master
- 清理New Master其实就是重置slave info,即取消原来的Slave信息。至此整个Master故障切换过程完成
rotate的处理过程
- MHA::MasterRotate::main()
-> do_master_online_switch:
Phase 1: Confi