问题背景
产品DBA在做日常SQL REVIEW时,遇到一个问题:一个包含子查询的query,使用order by(order by query)竟然比不使用order by(normal query)执行时间更短,而且是在order by无法使用index排序的情况下(使用filesort),这个问题不好解释,只能借助一些工具以及跟踪代码寻找原因
MySQL版本
现象在Percona 5.5.18和Oracle MySQL 5.1.48上都存在,本文数据和分析基于Percona 5.5.18
表ugcbutterfly结构
mysql> show create table ugcbutterfly\G *************************** 1. row *************************** Table: ugcbutterfly Create Table: CREATE TABLE `ugcbutterfly` ( `id` bigint(20) NOT NULL AUTO_INCREMENT , `status` tinyint(4) NOT NULL, `gmt_create` datetime NOT NULL, `gmt_modified` datetime NOT NULL, `memo` varchar(512) DEFAULT NULL, `userid` bigint(20) NOT NULL COMMENT, `u_contact1` varchar(512) DEFAULT NULL, `u_contact2` varchar(512) DEFAULT NULL, `b_name` varchar(512) DEFAULT NULL, `b_desc` varchar(1024) DEFAULT NULL, `b_type` tinyint(4) NOT NULL COMMENT, `b_total` bigint(20) NOT NULL COMMENT, `b_quantity` bigint(20) NOT NULL COMMENT, `startime` datetime NOT NULL COMMENT, `endtime` datetime NOT NULL COMMENT, `b_url1` varchar(1024) DEFAULT NULL, `b_url2` varchar(1024) DEFAULT NULL, `b_url3` varchar(1024) DEFAULT NULL, `b_url4` varchar(1024) DEFAULT NULL, `b_longitude` double(10,6) NOT NULL, `b_latitude` double(10,6) NOT NULL, `b_radius` bigint(20) NOT NULL, `b_max_longitude` double(10,6) NOT NULL, `b_max_latitude` double(10,6) NOT NULL, `b_min_longitude` double(10,6) NOT NULL, `b_min_latitude` double(10,6) NOT NULL, `user_name` varchar(512) DEFAULT NULL, `device_id` varchar(60) DEFAULT NULL, `store_name` varchar(512) DEFAULT NULL, `store_addr` varchar(1024) DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_wmc_cm_ugcbutterfly` (`userid`), KEY `idx_starttime_endtime_quantity` (`startime`,`endtime`,`b_quantity`), KEY `idx_start_end_tude` (`startime`,`endtime`,`b_min_longitude`), KEY `idx_6cols` (`startime`,`endtime`,`b_max_longitude`,`b_min_longitude`,`b_max_latitude`,`b_min_latitude`) ) ENGINE=InnoDB 1 row in set (0.00 sec) |
表catchlog结构
mysql> show create table catchlog\G *************************** 1. row *************************** Table: catchlog Create Table: CREATE TABLE `catchlog` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `status` tinyint(4) NOT NULL, `gmt_create` datetime NOT NULL , `gmt_modified` datetime NOT NULL, `userid` bigint(20) NOT NULL COMMENT, `butterfly_id` bigint(20) NOT NULL, `activity_id` bigint(20) NOT NULL, `memo` varchar(512) DEFAULT NULL, `username` varchar(128) NOT NULL, `distance` int(11) DEFAULT NULL, `type` tinyint(4) NOT NULL, PRIMARY KEY (`id`), KEY `idx_cm_catchlog_userid` (`userid`,`butterfly_id`,`activity_id`), KEY `idx_bid_status_gmt_modified` (`butterfly_id`,`status`,`gmt_modified`) ) ENGINE=InnoDB 1 row in set (0.00 sec) |
表数据规模
mysql> select count(*) from catchlog; +----------+ | count(*) | +----------+ | 1930879 | +----------+ 1 row in set (11.14 sec) mysql> select count(*) from ugcbutterfly; +----------+ | count(*) | +----------+ | 120952 | +----------+ 1 row in set (0.02 sec) |
query不带order by执行计划:
mysql> explain select id,status,gmt_create,gmt_modified,memo, userid,u_contact1,u_contact2,b_name,b_desc, b_type,b_total,b_quantity,startime,endtime, b_url1,b_url2,b_url3,b_url4,b_longitude, b_latitude,b_radius,b_max_longitude,b_max_latitude,b_min_longitude, b_min_latitude, store_name, store_addr, u_contact1 as store_phone, ROUND(6378.138*2*ASIN(SQRT(POW(SIN((22.299439*PI()/180-b_latitude*PI()/180)/2),2)+COS(22.299439*PI()/180)*COS(b_latitude*PI()/180)*POW(SIN((114.173881*PI()/180-b_longitude*PI()/180)/2),2)))*1000) AS juli from ugcbutterfly b where b.status = 0 and b_quantity > 0 and (b.startime < now() and b.endtime > now()) and b.b_type = 7 and b.b_max_longitude >= 120.118867 and b.b_min_longitude <= 120.118867 and b.b_max_latitude >= 30.277737 and b.b_min_latitude <= 30.277737 and b.b_radius > 0 and b.id not in (select butterfly_id from catchlog where userid = 279104765); *************************** 1. row *************************** id: 1 select_type: PRIMARY table: b type: ALL possible_keys: idx_starttime_endtime_quantity,idx_start_end_tude,idx_6cols key: NULL key_len: NULL ref: NULL rows: 127059 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: catchlog type: ref possible_keys: idx_cm_catchlog_userid,idx_bid_status_gmt_modified key: idx_cm_catchlog_userid key_len: 16 ref: const,func rows: 1 Extra: Using index 2 rows in set (0.00 sec) |
查询结果及执行时间:
4 rows in set (0.34 sec)
query带order by执行计划,及执行时间
mysql> explain select id,status,gmt_create,gmt_modified,memo, userid,u_contact1,u_contact2,b_name,b_desc, b_type,b_total,b_quantity,startime,endtime, b_url1,b_url2,b_url3,b_url4,b_longitude, b_latitude,b_radius,b_max_longitude,b_max_latitude,b_min_longitude, b_min_latitude, store_name, store_addr, u_contact1 as store_phone, ROUND(6378.138*2*ASIN(SQRT(POW(SIN((22.299439*PI()/180-b_latitude*PI()/180)/2),2)+COS(22.299439*PI()/180)*COS(b_latitude*PI()/180)*POW(SIN((114.173881*PI()/180-b_longitude*PI()/180)/2),2)))*1000) AS juli from ugcbutterfly b where b.status = 0 and b_quantity > 0 and (b.startime < now() and b.endtime > now()) and b.b_type = 7 and b.b_max_longitude >= 120.118867 and b.b_min_longitude <= 120.118867 and b.b_max_latitude >= 30.277737 and b.b_min_latitude <= 30.277737 and b.b_radius > 0 and b.id not in (select butterfly_id from catchlog where userid = 279104765) order by juli; *************************** 1. row *************************** id: 1 select_type: PRIMARY table: b type: ALL possible_keys: idx_starttime_endtime_quantity,idx_start_end_tude,idx_6cols key: NULL key_len: NULL ref: NULL rows: 127059 Extra: Using where; Using filesort *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: catchlog type: ref possible_keys: idx_cm_catchlog_userid,idx_bid_status_gmt_modified key: idx_cm_catchlog_userid key_len: 16 ref: const,func rows: 1 Extra: Using index 2 rows in set (0.00 sec) |
查询结果及执行时间:
4 rows in set (0.13 sec)
profile
mysql> show profile for query 1; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000168 | | checking permissions | 0.000003 | | checking permissions | 0.000005 | | Opening tables | 0.000028 | | System lock | 0.000011 | | init | 0.000081 | | optimizing | 0.000020 | | statistics | 0.000137 | | preparing | 0.000021 | | executing | 0.000002 | | Sending data | 0.335989 | | optimizing | 0.000022 | | statistics | 0.000075 | | preparing | 0.000023 | | executing | 0.000002 | | Sending data | 0.001634 | | executing | 0.000002 | | Sending data | 0.000073 | | executing | 0.000002 | | Sending data | 0.000163 | | executing | 0.000002 | | Sending data | 0.000567 | | end | 0.000006 | | query end | 0.000004 | | closing tables | 0.000017 | | freeing items | 0.000044 | | logging slow query | 0.000002 | | logging slow query | 0.000157 | | cleaning up | 0.000004 | +----------------------+----------+ 29 rows in set (0.00 sec) mysql> show profile for query 2; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000158 | | checking permissions | 0.000003 | | checking permissions | 0.000004 | | Opening tables | 0.000023 | | System lock | 0.000010 | | init | 0.000076 | | optimizing | 0.000014 | | statistics | 0.000113 | | preparing | 0.000022 | | executing | 0.000002 | | Sorting result | 0.128376 | | optimizing | 0.000036 | | statistics | 0.000082 | | preparing | 0.000023 | | executing | 0.000002 | | Sending data | 0.000632 | | executing | 0.000001 | | Sending data | 0.000032 | | executing | 0.000002 | | Sending data | 0.000066 | | executing | 0.000002 | | Sending data | 0.000269 | | Sending data | 0.000162 | | end | 0.000006 | | query end | 0.000004 | | closing tables | 0.000012 | | freeing items | 0.000046 | | logging slow query | 0.000002 | | logging slow query | 0.000149 | | cleaning up | 0.000009 | +----------------------+----------+ 30 rows in set (0.00 sec) |
问题分析
首先总结一下问题:
1、相对于normal query,通过explain可以发现order by query执行计划多了一个Using filesort,这个说明order by字段`juli`无法利用索引进行排序
2、查询命中的结果集较小(4 rows),即order by代价较小
3、在执行时间上,order by query时间短于normal query,通过profile来看,差别主要在(Sorting result: 0.128376s) vs. (Sending data:0.335989s)上,而且可以发现,order by query的profile结果多了一行:Sorting result
systemtap
通过神器systemtap追踪重要函数调用次数:(脚本很简单,写几个probe就OK)
normal query # of evaluate_join_record: 120957, sub_select: 10, do_select: 5, JOIN::optimize 2, JOIN::exec 5 order by query # of evaluate_join_record: 9, sub_select: 10, do_select: 5, JOIN::optimize 2, JOIN::exec 5 |
发现evaluate_join_record函数在normal query中调用的次数基本等于表ugcbutterfly的行数+结果集行数,通过代码分析这个函数的功能:
1、这个函数被sub_select调用,但是它在自查或者多表join时也会调用sub_select,normal query执行过程中的一个函数堆栈
#0 evaluate_join_record (join=0x7fa1cc078d90, join_tab=0x7fa1cc07b770, error=-1) at sql/sql_select.cc:11805 #1 0x00000000005a70b6 in sub_select (join=0x7fa1cc078d90, join_tab=0x7fa1cc07b770, end_of_records=false) at sql/sql_select.cc:11745 #2 0x00000000005abf8d in do_select (join=0x7fa1cc078d90, fields=0x7fa1cc064318, table=0x0, procedure=<value optimized out>) at sql/sql_select.cc:11535 #3 0x00000000005c208e in JOIN::exec (this=0x7fa1cc078d90) at sql/sql_select.cc:2391 #4 0x00000000006fbef9 in subselect_single_select_engine::exec (this=0x7fa1cc073c00) at sql/item_subselect.cc:2001 #5 0x00000000006fad1d in Item_subselect::exec (this=0x7fa1cc073b10) at sql/item_subselect.cc:277 #6 0x00000000006fb3c0 in Item_in_subselect::val_bool (this=0x7fa1cc073b10) at sql/item_subselect.cc:878 #7 0x00000000006b1322 in Item_in_optimizer::val_int (this=0x7fa1cc074388) at sql/item_cmpfunc.cc:1928 #8 0x000000000069722c in Item::val_bool (this=0x7fa1cc074388) at sql/item.cc:200 #9 0x00000000006b074c in Item_func_not::val_int (this=0x7fa1cc073c40) at sql/item_cmpfunc.cc:289 #10 0x000000000069722c in Item::val_bool (this=0x7fa1cc073c40) at sql/item.cc:200 #11 0x00000000006b24fa in Item_cond_and::val_int (this=0x7fa1cc07a8f0) at sql/item_cmpfunc.cc:4586 #12 0x00000000005a224c in evaluate_join_record (join=0x7fa1cc074dc0, join_tab=0x7fa1cc078990, error=<value optimized out>) at sql/sql_select.cc:11818 #13 0x00000000005a70b6 in sub_select (join=0x7fa1cc074dc0, join_tab=0x7fa1cc078990, end_of_records=<value optimized out>) at sql/sql_select.cc:11770 #14 0x00000000005abf8d in do_select (join=0x7fa1cc074dc0, fields=0x6450f50, table=0x0, procedure=<value optimized out>) at sql/sql_select.cc:11535 #15 0x00000000005c208e in JOIN::exec (this=0x7fa1cc074dc0) at sql/sql_select.cc:2391 #16 0x00000000005bdc69 in mysql_select (thd=0x644eb90, rref_pointer_array=<value optimized out>, tables=<value optimized out>, wild_num=0, fields=<value optimized out>, conds=<value optimized out>, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fa1cc073d70, unit=0x6450828, select_lex=0x6450e48) at sql/sql_select.cc:2600 |
通过代码可以发现:mysql的多表join是通过最外层的do_select进去,调用sub_select完成nested-loop join,而sub_select是一个递归函数,根据索引或者全表扫描循环读取记录,然后调用evaluate_join_record,evaluate_join_record根据where条件过滤,对于满足要求的记录递归调用sub_select进入下一层nested-loop join,evaluate_join_record部分代码分析:
... if (found) { enum enum_nested_loop_state rc; /* A match from join_tab is found for the current partial join. */ rc= (*join_tab->next_select)(join, join_tab+1, 0); join->thd->warning_info->inc_current_row_for_warning(); if (rc != NESTED_LOOP_OK && rc != NESTED_LOOP_NO_MORE_ROWS) return rc; ... } |
其中join_tab->next_select是一个函数指针,对于join_tables中的最后一个join_tab,next_select在最外层的do_select中被赋值为end_send(发送结果到客户端),对于其它join_tab,join_tab->next_select在make_join_readinfo(该函数被JOIN::optimize调用)中赋初值sub_select.
查看explain结果,ugcbutterfly表采用全表扫描方式读取记录,因此evaluate_join_record的执行次数应该等于:ugcbutterfly + (满足ugcbutterfly表上出subquery外where条件的记录行数),这个数字与120957基本吻合
那为什么order by query执行evaluate_join_record却不是120957? 查看执行堆栈:
#0 evaluate_join_record (join=0x7fa1cc078d90, join_tab=0x7fa1cc07ba08, error=-1) at sql/sql_select.cc:11805 #1 0x00000000005a70b6 in sub_select (join=0x7fa1cc078d90, join_tab=0x7fa1cc07ba08, end_of_records=<value optimized out>) at sql/sql_select.cc:11770 #2 0x00000000005abf8d in do_select (join=0x7fa1cc078d90, fields=0x7fa1cc064318, table=0x0, procedure=<value optimized out>) at sql/sql_select.cc:11535 #3 0x00000000005c208e in JOIN::exec (this=0x7fa1cc078d90) at sql/sql_select.cc:2391 #4 0x00000000006fbef9 in subselect_single_select_engine::exec (this=0x7fa1cc073c00) at sql/item_subselect.cc:2001 #5 0x00000000006fad1d in Item_subselect::exec (this=0x7fa1cc073b10) at sql/item_subselect.cc:277 #6 0x00000000006fb3c0 in Item_in_subselect::val_bool (this=0x7fa1cc073b10) at sql/item_subselect.cc:878 #7 0x00000000006b1322 in Item_in_optimizer::val_int (this=0x7fa1cc0744b0) at sql/item_cmpfunc.cc:1928 #8 0x000000000069722c in Item::val_bool (this=0x7fa1cc0744b0) at sql/item.cc:200 #9 0x00000000006b074c in Item_func_not::val_int (this=0x7fa1cc073c40) at sql/item_cmpfunc.cc:289 #10 0x000000000069722c in Item::val_bool (this=0x7fa1cc073c40) at sql/item.cc:200 #11 0x00000000006b24fa in Item_cond_and::val_int (this=0x7fa1cc07ab48) at sql/item_cmpfunc.cc:4586 #12 0x00000000006895a9 in skip_record (thd=<value optimized out>, table=<value optimized out>, sortorder=0x7fa1d9b6fe30, s_length=3422616976, select=0x7fa1cc07acd8, max_rows=140333119110416, sort_positions=false, examined_rows=0x7fa1d9b700c0) at sql/opt_range.h:888 #13 find_all_keys (thd=<value optimized out>, table=<value optimized out>, sortorder=0x7fa1d9b6fe30, s_length=3422616976, select=0x7fa1cc07acd8, max_rows=140333119110416, sort_positions=false, examined_rows=0x7fa1d9b700c0) at sql/filesort.cc:628 #14 filesort (thd=<value optimized out>, table=<value optimized out>, sortorder=0x7fa1d9b6fe30, s_length=3422616976, select=0x7fa1cc07acd8, max_rows=140333119110416, sort_positions=false, examined_rows=0x7fa1d9b700c0) at sql/filesort.cc:249 #15 0x00000000005b756e in create_sort_index (thd=0x644eb90, join=0x7fa1cc074dc0, order=<value optimized out>, filesort_limit=18446744073709551615, select_limit=<value optimized out>, is_order_by=<value optimized out>) at sql/sql_select.cc:14178 #16 0x00000000005c1faf in JOIN::exec (this=0x7fa1cc074dc0) at sql/sql_select.cc:2355 #17 0x00000000005bdc69 in mysql_select (thd=0x644eb90, rref_pointer_array=<value optimized out>, tables=<value optimized out>, wild_num=0, fields=<value optimized out>, conds=<value optimized out>, og_num=1, order=0x7fa1cc073dd8, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fa1cc073e98, unit=0x6450828, select_lex=0x6450e48) at sql/sql_select.cc:2600 |
通过上面两个堆栈输出,进一步分析代码,可以得到这样的结论:
1、filesort通过find_all_keys–>skip_record–>Item_cond_and::val_int调用子查询筛选数据
2、normal query中,通过evaluate_join_record–>test(select_cond->val_int())–>Item_cond_and::val_int调用子查询筛选数据
可以看出filesort函数其实已经完成了查询和排序,但是在profile中为什么order by query多了一行? 查看JOIN::exec代码:
... if (curr_join->group_list || curr_join->order) { DBUG_PRINT("info",("Sorting for send_result_set_metadata")); thd_proc_info(thd, "Sorting result"); ... if (create_sort_index(thd, curr_join, curr_join->group_list ? curr_join->group_list : curr_join->order, curr_join->select_limit, (select_options & OPTION_FOUND_ROWS ? HA_POS_ERROR : unit->select_limit_cnt), curr_join->group_list ? TRUE : FALSE)) ... } ... thd_proc_info(thd, "Sending data"); DBUG_PRINT("info", ("%s", thd->proc_info)); result->send_result_set_metadata((procedure ? curr_join->procedure_fields_list : *curr_fields_list), Protocol::SEND_NUM_ROWS | Protocol::SEND_EOF); error= do_select(curr_join, curr_fields_list, NULL, procedure); ... |
只要是order by query,都会比normal query多一句:thd_proc_info(thd, “Sorting result”),这个会反映在profile中,create_sort_index把do_select要做的事情都做的差不多了(还没把数据发送到客户端),检索出结果并已经完成排序,那么do_select不是做重复工作了? 带着疑问追踪代码,发现,在create_sort_index执行完filesort之后,select_cond被置为空了,create_sort_index中部分代码:
... if (select) { ... select->cleanup(); // filesort did select tab->select= 0; table->quick_keys.clear_all(); // as far as we cleanup select->quick table->sort.io_cache= tablesort_result_cache; } ... |
由于select_cond为NULL, order by query在filesort之后,do_select–>sub_select->evaluate_join_record时,evaluate_join_record并不会执行test(select_cond->val_int()),子查询不会被执行,从而直接调用end_send发送数据到客户端
而且,filesort之后,do_select调用sub_select读取数据时使用的是rr_from_pointers,这个与normal query中使用index/scan不同(catchlog: join_read_always_key–>ha_index_init,ugcbutterfly: join_init_read_record–>init_read_record)
因此问题就变得简单了:order by query的两个阶段中,filesort把数据检索出来并排好序,do_select负责将数据发送到客户端;normal query只有do_select阶段,它循环从外层表(ugcbutterfly,全表扫描)读出记录,满足条件的行与内层表(catchlog,使用索引)join,发送数据到客户端
从本质上讲,这个问题应该是由order by引起的,与subquery应该无关,将query中的subquery去掉后测试一下:(query 3和4是在query1和2的基础上去掉子查询,查询结果集不变,还是4条记录)
mysql> show profile for query 3; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000133 | | checking permissions | 0.000006 | | Opening tables | 0.000022 | | System lock | 0.000010 | | init | 0.000069 | | optimizing | 0.000018 | | statistics | 0.000139 | | preparing | 0.000021 | | executing | 0.000002 | | Sending data | 0.339549 | | end | 0.000009 | | query end | 0.000005 | | closing tables | 0.000010 | | freeing items | 0.000043 | | logging slow query | 0.000002 | | logging slow query | 0.000123 | | cleaning up | 0.000004 | +----------------------+----------+ 17 rows in set (0.00 sec) mysql> show profile for query 4; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000144 | | checking permissions | 0.000007 | | Opening tables | 0.000021 | | System lock | 0.000010 | | init | 0.000069 | | optimizing | 0.000018 | | statistics | 0.000132 | | preparing | 0.000027 | | executing | 0.000002 | | Sorting result | 0.129252 | | Sending data | 0.000190 | | end | 0.000006 | | query end | 0.000005 | | closing tables | 0.000010 | | freeing items | 0.000039 | | logging slow query | 0.000002 | | logging slow query | 0.000140 | | cleaning up | 0.000010 | +----------------------+----------+ 18 rows in set (0.00 sec) |
猜测成立,下一步将找出为什么采用filesort会省却那么多时间,继续使用神器systemtap检测下述函数调用次数和总共执行时间:
normal query: JOIN::exec:1(1586518us), evaluate_join_record:120953(1058604us), sub_select:2(1586451us) do_select:1(1586488us), Item_cond_and::val_int:120952(450491us), create_sort_index:0(0us) order by query: JOIN::exec:1(766256us), evaluate_join_record:5(84us), sub_select:2(193us) do_select:1(217us), Item_cond_and::val_int:120952(449748us), create_sort_index:1(765997us) |
注:由于systemtap注入了一些钩子,导致query执行时间大大延长,但是不影响结果判断
可以发现:
1、normal query主要时间在JOIN::exec–>do_select–>sub_select,order by query主要时间在JOIN::exec–>create_sort_index
2、normal query中的sub_select调用evaluate_join_record达120953次,占用了主要时间
3、normal query与order by query在过滤where条件上占用的时间基本相同(Item_cond_and::val_int),normal query中的sub_select除了evaluate_join_record占用了绝大部分调用时间,还有部分较大调用开销(1586451us-1586451us,通过代码分析应该为全表扫描代价)