线上问题
为了提升高并发下MySQL读性能,去年11月底我们在线上部署了threadpool版本的mysql(Percona 5.5.18),threadpool patch来自mariadb 5.5.28,作为只读备库运行了4个多月,一切都很正常,上周有DBA在“show processlist;”时发现存在大量处于killed的连接存在,而且killed状态持续的时间很长。关于threadpool线程调度:mariadb 5.5 threadpool, mysql thread sheduler
mysql> show processlist; +----+------+-----------+------+---------+------+-------+------------------+-----------+---------------+-----------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read | +----+------+-----------+------+---------+------+-------+------------------+-----------+---------------+-----------+ | 3 | root | localhost | NULL | Query | 0 | NULL | show processlist | 0 | 0 | 1 | | 7 | root | localhost | NULL | Killed | 2304 | | NULL | 0 | 0 | 1 | | 15 | root | localhost | NULL | Killed | 2285 | | NULL | 0 | 0 | 1 | | 25 | root | localhost | NULL | Killed | 2258 | | NULL | 0 | 0 | 1 | | 32 | root | localhost | NULL | Killed | 2240 | | NULL | 0 | 0 | 1 | | 34 | root | localhost | NULL | Killed | 2232 | | NULL | 0 | 0 | 1 | | 37 | root | localhost | NULL | Killed | 2221 | | NULL | 0 | 0 | 1 | | 46 | root | localhost | NULL | Killed | 2191 | | NULL | 0 | 0 | 1 | | 54 | root | localhost | NULL | Killed | 2178 | | NULL | 0 | 0 | 1 | | 55 | root | localhost | NULL | Killed | 2172 | | NULL | 0 | 0 | 1 | | 58 | root | localhost | NULL | Killed | 2165 | | NULL | 0 | 0 | 1 | | 59 | root | localhost | NULL | Killed | 2159 | | NULL | 0 | 0 | 1 | | 67 | root | localhost | NULL | Killed | 2145 | | NULL | 0 | 0 | 1 | | 70 | root | localhost | NULL | Killed | 2136 | | NULL | 0 | 0 | 1 | | 73 | root | localhost | NULL | Killed | 2126 | | NULL | 0 | 0 | 1 | | 78 | root | localhost | NULL | Killed | 2114 | | NULL | 0 | 0 | 1 | | 81 | root | localhost | NULL | Sleep | 2104 | | NULL | 0 | 0 | 1 | +----+------+-----------+------+---------+------+-------+------------------+-----------+---------------+-----------+ 17 rows in set (0.00 sec) |
分析过程
这是一个看起来有些怪异的问题,通过代码查看“KILL connection_id”的执行流程:
1 2 3 4 5 6 7 8 | sql_kill ->THD.awake ->thd.killed=KILL_CONNECTION ->close_active_vio ->vio_close ->mysql_socket_shutdown(vio->sd, SHUT_RDWR) ->closesocket(vio->sd) ->post_kill_notification |
模拟大量并发长连接的场景,登陆mysql执行KILL,起初很难重现问题,因为代码中可以看到SQL执行过程中有很多的if(thd.killed ==KILL_CONNECTION)判断,一旦连接对应的THD状态被置为KILL_CONNECTION,线程在执行过程中总有一处会检测到,接着会rollback/abort相应的事务,连接也会顺利退出
重现现场困难是因为漏掉了一种情况!那就是当连接处于Sleep状态,此时kill后,“show processlist;”会发现其状态为killed,但是会一直存在
对于one thread per connection线程调度策略,当没有数据可读时执行线程会堵塞在do_command中的my_net_read这里,当此连接被KILL时,其状态被修改成了KILL_CONNECTION,mysql_socket_shutdown之后之前堵塞的my_net_read会返回异常,接着do_command外层的循环会对thd.killed状态做判断,之后连接就可以正常退出了;然而,对于threadpool调度策略,当连接中没有数据可读时,其对应的sockfd会被添加到epoll_wait中等待事件发生,而相应的worker线程会去处理其它连接或者进入condition wait状态,这种调度方式对于大量执行时间较短的读事务能大大提高性能,弊端就是:在大量存在冲突的读写事务并发执行的场景下,如果worker线程数目受到限制时可能会出现调度上的死锁
继续分析问题原因,“show processlist;”的结果表明目标连接的状态已经被置为KILL_CONNECTION了,但是连接却一直无法退出,怀疑连接其实一直都处于Sleep状态,可能的解释是epoll_wait监听不到socket断开的事件,但是事实上epoll_wait是可以监听到shutdown的
找到重现问题的方法后,想看看最新的mariadb是否会有同样的问题,如果也存在就可以提交bug求助社区,然而mariadb 5.5.30上这个问题并没有重新,只能用最土的办法了,对比两份代码,diff文件数万多行,看来mariadb在mysql上确实做了大量工作,聚焦threadpool和sql_kill相关的文件上,发现了一个疑点,THD.awake函数有些不同,mariadb 5.5.30里的逻辑是:
1 2 3 4 5 6 | sql_kill ->THD.awake ->thd.killed=KILL_CONNECTION ->vio_shutdown(active_vio, SHUT_RDWR) ->vio_socket_shutdown(vio->sd, SHUT_RDWR) ->post_kill_notification |
发现只是shutdown,并没有close,当时打threadpool的patch是基于mariadb 5.5.28版本,对比发现和5.5.30一致,怀疑问题就处在这,用gdb在vio_close的shutdown和close之间下断点,当执行到shutdown之后(close还未执行)epoll_wait成功监测到了事件,连接顺利退出!通过google得知:close函数会使epoll_wait中的相关sockfd被注销
原因总结
由于close紧接着shutdown执行,异步的epoll_wait在检测到shutdown事件返回前,sockfd已经从监听文件列表被移除了!
因而patch也非常简单:延迟close sockfd,在THD.awake中只执行shutdown,只要能从epoll_wait返回,就能被worker线程处理,到了判断thd.killed时,close_connection函数最终会被调用,其中会close sockfd
1 2 3 4 5 6 7 | worker_main ->handle_event ->connection_abort ->threadpool_remove_connection ->close_connection ->thd->disconnect ->thd->close_active_vio |
赞,阿普也整上个人博客啦!
Better late than never