一个MySQL子查询与Order By问题分析

问题背景

产品DBA在做日常SQL REVIEW时，遇到一个问题：一个包含子查询的query，使用order by(order by query)竟然比不使用order by(normal query)执行时间更短，而且是在order by无法使用index排序的情况下（使用filesort），这个问题不好解释，只能借助一些工具以及跟踪代码寻找原因

MySQL版本

现象在Percona 5.5.18和Oracle MySQL 5.1.48上都存在，本文数据和分析基于Percona 5.5.18

表ugcbutterfly结构

mysql> show create table ugcbutterfly\G
*************************** 1. row ***************************
       Table: ugcbutterfly
Create Table: CREATE TABLE `ugcbutterfly` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT ,
  `status` tinyint(4) NOT NULL,
  `gmt_create` datetime NOT NULL,
  `gmt_modified` datetime NOT NULL,
  `memo` varchar(512) DEFAULT NULL,
  `userid` bigint(20) NOT NULL COMMENT,
  `u_contact1` varchar(512) DEFAULT NULL,
  `u_contact2` varchar(512) DEFAULT NULL,
  `b_name` varchar(512) DEFAULT NULL,
  `b_desc` varchar(1024) DEFAULT NULL,
  `b_type` tinyint(4) NOT NULL COMMENT,
  `b_total` bigint(20) NOT NULL COMMENT,
  `b_quantity` bigint(20) NOT NULL COMMENT,
  `startime` datetime NOT NULL COMMENT,
  `endtime` datetime NOT NULL COMMENT,
  `b_url1` varchar(1024) DEFAULT NULL,
  `b_url2` varchar(1024) DEFAULT NULL,
  `b_url3` varchar(1024) DEFAULT NULL,
  `b_url4` varchar(1024) DEFAULT NULL,
  `b_longitude` double(10,6) NOT NULL,
  `b_latitude` double(10,6) NOT NULL,
  `b_radius` bigint(20) NOT NULL,
  `b_max_longitude` double(10,6) NOT NULL,
  `b_max_latitude` double(10,6) NOT NULL,
  `b_min_longitude` double(10,6) NOT NULL,
  `b_min_latitude` double(10,6) NOT NULL,
  `user_name` varchar(512) DEFAULT NULL,
  `device_id` varchar(60) DEFAULT NULL,
  `store_name` varchar(512) DEFAULT NULL,
  `store_addr` varchar(1024) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_wmc_cm_ugcbutterfly` (`userid`),
  KEY `idx_starttime_endtime_quantity` (`startime`,`endtime`,`b_quantity`),
  KEY `idx_start_end_tude` (`startime`,`endtime`,`b_min_longitude`),
  KEY `idx_6cols` (`startime`,`endtime`,`b_max_longitude`,`b_min_longitude`,`b_max_latitude`,`b_min_latitude`)
) ENGINE=InnoDB
1 row in set (0.00 sec)

表catchlog结构

mysql> show create table catchlog\G
*************************** 1. row ***************************
       Table: catchlog
Create Table: CREATE TABLE `catchlog` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `status` tinyint(4) NOT NULL,
  `gmt_create` datetime NOT NULL ,
  `gmt_modified` datetime NOT NULL,
  `userid` bigint(20) NOT NULL COMMENT,
  `butterfly_id` bigint(20) NOT NULL,
  `activity_id` bigint(20) NOT NULL,
  `memo` varchar(512) DEFAULT NULL,
  `username` varchar(128) NOT NULL,
  `distance` int(11) DEFAULT NULL,
  `type` tinyint(4) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_cm_catchlog_userid` (`userid`,`butterfly_id`,`activity_id`),
  KEY `idx_bid_status_gmt_modified` (`butterfly_id`,`status`,`gmt_modified`)
) ENGINE=InnoDB
1 row in set (0.00 sec)

表数据规模

mysql> select count(*) from catchlog;
+----------+
| count(*) |
+----------+
|  1930879 |
+----------+
1 row in set (11.14 sec)
 
mysql> select count(*) from ugcbutterfly;
+----------+
| count(*) |
+----------+
|   120952 |
+----------+
1 row in set (0.02 sec)

query不带order by执行计划：

mysql> explain select id,status,gmt_create,gmt_modified,memo,   
       userid,u_contact1,u_contact2,b_name,b_desc, 
       b_type,b_total,b_quantity,startime,endtime,
       b_url1,b_url2,b_url3,b_url4,b_longitude,        
       b_latitude,b_radius,b_max_longitude,b_max_latitude,b_min_longitude,
       b_min_latitude, store_name, store_addr, u_contact1 as store_phone,
       ROUND(6378.138*2*ASIN(SQRT(POW(SIN((22.299439*PI()/180-b_latitude*PI()/180)/2),2)+COS(22.299439*PI()/180)*COS(b_latitude*PI()/180)*POW(SIN((114.173881*PI()/180-b_longitude*PI()/180)/2),2)))*1000) AS juli 
     from ugcbutterfly b      
     where b.status = 0 and b_quantity > 0
       and (b.startime < now() and b.endtime > now())      
       and b.b_type = 7   
       and b.b_max_longitude >= 120.118867 and b.b_min_longitude <= 120.118867
       and b.b_max_latitude >= 30.277737    and b.b_min_latitude <= 30.277737
       and b.b_radius > 0
       and b.id not in (select butterfly_id from catchlog where userid = 279104765);
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: b
         type: ALL
possible_keys: idx_starttime_endtime_quantity,idx_start_end_tude,idx_6cols
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 127059
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: catchlog
         type: ref
possible_keys: idx_cm_catchlog_userid,idx_bid_status_gmt_modified
          key: idx_cm_catchlog_userid
      key_len: 16
          ref: const,func
         rows: 1
        Extra: Using index
2 rows in set (0.00 sec)

查询结果及执行时间：
4 rows in set (0.34 sec)

query带order by执行计划，及执行时间

 
mysql> explain select id,status,gmt_create,gmt_modified,memo,   
       userid,u_contact1,u_contact2,b_name,b_desc, 
       b_type,b_total,b_quantity,startime,endtime,
       b_url1,b_url2,b_url3,b_url4,b_longitude,        
       b_latitude,b_radius,b_max_longitude,b_max_latitude,b_min_longitude,
       b_min_latitude, store_name, store_addr, u_contact1 as store_phone,
       ROUND(6378.138*2*ASIN(SQRT(POW(SIN((22.299439*PI()/180-b_latitude*PI()/180)/2),2)+COS(22.299439*PI()/180)*COS(b_latitude*PI()/180)*POW(SIN((114.173881*PI()/180-b_longitude*PI()/180)/2),2)))*1000) AS juli 
     from ugcbutterfly b      
     where b.status = 0 and b_quantity > 0
       and (b.startime < now() and b.endtime > now())      
       and b.b_type = 7   
       and b.b_max_longitude >= 120.118867 and b.b_min_longitude <= 120.118867
       and b.b_max_latitude >= 30.277737    and b.b_min_latitude <= 30.277737
       and b.b_radius > 0
       and b.id not in (select butterfly_id from catchlog where userid = 279104765) order by juli;
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: b
         type: ALL
possible_keys: idx_starttime_endtime_quantity,idx_start_end_tude,idx_6cols
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 127059
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: catchlog
         type: ref
possible_keys: idx_cm_catchlog_userid,idx_bid_status_gmt_modified
          key: idx_cm_catchlog_userid
      key_len: 16
          ref: const,func
         rows: 1
        Extra: Using index
2 rows in set (0.00 sec)

查询结果及执行时间：
4 rows in set (0.13 sec)

profile

mysql> show profile for query 1;                  
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000168 |
| checking permissions | 0.000003 |
| checking permissions | 0.000005 |
| Opening tables       | 0.000028 |
| System lock          | 0.000011 |
| init                 | 0.000081 |
| optimizing           | 0.000020 |
| statistics           | 0.000137 |
| preparing            | 0.000021 |
| executing            | 0.000002 |
| Sending data         | 0.335989 |
| optimizing           | 0.000022 |
| statistics           | 0.000075 |
| preparing            | 0.000023 |
| executing            | 0.000002 |
| Sending data         | 0.001634 |
| executing            | 0.000002 |
| Sending data         | 0.000073 |
| executing            | 0.000002 |
| Sending data         | 0.000163 |
| executing            | 0.000002 |
| Sending data         | 0.000567 |
| end                  | 0.000006 |
| query end            | 0.000004 |
| closing tables       | 0.000017 |
| freeing items        | 0.000044 |
| logging slow query   | 0.000002 |
| logging slow query   | 0.000157 |
| cleaning up          | 0.000004 |
+----------------------+----------+
29 rows in set (0.00 sec)
 
mysql> show profile for query 2;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000158 |
| checking permissions | 0.000003 |
| checking permissions | 0.000004 |
| Opening tables       | 0.000023 |
| System lock          | 0.000010 |
| init                 | 0.000076 |
| optimizing           | 0.000014 |
| statistics           | 0.000113 |
| preparing            | 0.000022 |
| executing            | 0.000002 |
| Sorting result       | 0.128376 |
| optimizing           | 0.000036 |
| statistics           | 0.000082 |
| preparing            | 0.000023 |
| executing            | 0.000002 |
| Sending data         | 0.000632 |
| executing            | 0.000001 |
| Sending data         | 0.000032 |
| executing            | 0.000002 |
| Sending data         | 0.000066 |
| executing            | 0.000002 |
| Sending data         | 0.000269 |
| Sending data         | 0.000162 |
| end                  | 0.000006 |
| query end            | 0.000004 |
| closing tables       | 0.000012 |
| freeing items        | 0.000046 |
| logging slow query   | 0.000002 |
| logging slow query   | 0.000149 |
| cleaning up          | 0.000009 |
+----------------------+----------+
30 rows in set (0.00 sec)

mysql> show profile for query 1; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000168 | | checking permissions | 0.000003 | | checking permissions | 0.000005 | | Opening tables | 0.000028 | | System lock | 0.000011 | | init | 0.000081 | | optimizing | 0.000020 | | statistics | 0.000137 | | preparing | 0.000021 | | executing | 0.000002 | | Sending data | 0.335989 | | optimizing | 0.000022 | | statistics | 0.000075 | | preparing | 0.000023 | | executing | 0.000002 | | Sending data | 0.001634 | | executing | 0.000002 | | Sending data | 0.000073 | | executing | 0.000002 | | Sending data | 0.000163 | | executing | 0.000002 | | Sending data | 0.000567 | | end | 0.000006 | | query end | 0.000004 | | closing tables | 0.000017 | | freeing items | 0.000044 | | logging slow query | 0.000002 | | logging slow query | 0.000157 | | cleaning up | 0.000004 | +----------------------+----------+ 29 rows in set (0.00 sec) mysql> show profile for query 2; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000158 | | checking permissions | 0.000003 | | checking permissions | 0.000004 | | Opening tables | 0.000023 | | System lock | 0.000010 | | init | 0.000076 | | optimizing | 0.000014 | | statistics | 0.000113 | | preparing | 0.000022 | | executing | 0.000002 | | Sorting result | 0.128376 | | optimizing | 0.000036 | | statistics | 0.000082 | | preparing | 0.000023 | | executing | 0.000002 | | Sending data | 0.000632 | | executing | 0.000001 | | Sending data | 0.000032 | | executing | 0.000002 | | Sending data | 0.000066 | | executing | 0.000002 | | Sending data | 0.000269 | | Sending data | 0.000162 | | end | 0.000006 | | query end | 0.000004 | | closing tables | 0.000012 | | freeing items | 0.000046 | | logging slow query | 0.000002 | | logging slow query | 0.000149 | | cleaning up | 0.000009 | +----------------------+----------+ 30 rows in set (0.00 sec)

问题分析

首先总结一下问题：
1、相对于normal query，通过explain可以发现order by query执行计划多了一个Using filesort，这个说明order by字段`juli`无法利用索引进行排序
2、查询命中的结果集较小（4 rows），即order by代价较小
3、在执行时间上，order by query时间短于normal query，通过profile来看，差别主要在(Sorting result: 0.128376s) vs. (Sending data：0.335989s)上，而且可以发现，order by query的profile结果多了一行：Sorting result

systemtap

通过神器systemtap追踪重要函数调用次数：(脚本很简单，写几个probe就OK)

normal query
# of evaluate_join_record: 120957, sub_select: 10, do_select: 5, JOIN::optimize 2, JOIN::exec 5
 
order by query
# of evaluate_join_record: 9, sub_select: 10, do_select: 5, JOIN::optimize 2, JOIN::exec 5

发现evaluate_join_record函数在normal query中调用的次数基本等于表ugcbutterfly的行数+结果集行数，通过代码分析这个函数的功能：
1、这个函数被sub_select调用，但是它在自查或者多表join时也会调用sub_select，normal query执行过程中的一个函数堆栈

#0  evaluate_join_record (join=0x7fa1cc078d90, join_tab=0x7fa1cc07b770, error=-1) at sql/sql_select.cc:11805
#1  0x00000000005a70b6 in sub_select (join=0x7fa1cc078d90, join_tab=0x7fa1cc07b770, end_of_records=false) at sql/sql_select.cc:11745
#2  0x00000000005abf8d in do_select (join=0x7fa1cc078d90, fields=0x7fa1cc064318, table=0x0, procedure=<value optimized out>) at sql/sql_select.cc:11535
#3  0x00000000005c208e in JOIN::exec (this=0x7fa1cc078d90) at sql/sql_select.cc:2391
#4  0x00000000006fbef9 in subselect_single_select_engine::exec (this=0x7fa1cc073c00) at sql/item_subselect.cc:2001
#5  0x00000000006fad1d in Item_subselect::exec (this=0x7fa1cc073b10) at sql/item_subselect.cc:277
#6  0x00000000006fb3c0 in Item_in_subselect::val_bool (this=0x7fa1cc073b10) at sql/item_subselect.cc:878
#7  0x00000000006b1322 in Item_in_optimizer::val_int (this=0x7fa1cc074388) at sql/item_cmpfunc.cc:1928
#8  0x000000000069722c in Item::val_bool (this=0x7fa1cc074388) at sql/item.cc:200
#9  0x00000000006b074c in Item_func_not::val_int (this=0x7fa1cc073c40) at sql/item_cmpfunc.cc:289
#10 0x000000000069722c in Item::val_bool (this=0x7fa1cc073c40) at sql/item.cc:200
#11 0x00000000006b24fa in Item_cond_and::val_int (this=0x7fa1cc07a8f0) at sql/item_cmpfunc.cc:4586
#12 0x00000000005a224c in evaluate_join_record (join=0x7fa1cc074dc0, join_tab=0x7fa1cc078990, error=<value optimized out>) at sql/sql_select.cc:11818
#13 0x00000000005a70b6 in sub_select (join=0x7fa1cc074dc0, join_tab=0x7fa1cc078990, end_of_records=<value optimized out>) at sql/sql_select.cc:11770
#14 0x00000000005abf8d in do_select (join=0x7fa1cc074dc0, fields=0x6450f50, table=0x0, procedure=<value optimized out>) at sql/sql_select.cc:11535
#15 0x00000000005c208e in JOIN::exec (this=0x7fa1cc074dc0) at sql/sql_select.cc:2391
#16 0x00000000005bdc69 in mysql_select (thd=0x644eb90, rref_pointer_array=<value optimized out>, 
    tables=<value optimized out>, wild_num=0, fields=<value optimized out>, conds=<value optimized out>, og_num=0, 
    order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fa1cc073d70, unit=0x6450828, select_lex=0x6450e48) 
    at sql/sql_select.cc:2600

#0 evaluate_join_record (join=0x7fa1cc078d90, join_tab=0x7fa1cc07b770, error=-1) at sql/sql_select.cc:11805 #1 0x00000000005a70b6 in sub_select (join=0x7fa1cc078d90, join_tab=0x7fa1cc07b770, end_of_records=false) at sql/sql_select.cc:11745 #2 0x00000000005abf8d in do_select (join=0x7fa1cc078d90, fields=0x7fa1cc064318, table=0x0, procedure=<value optimized out>) at sql/sql_select.cc:11535 #3 0x00000000005c208e in JOIN::exec (this=0x7fa1cc078d90) at sql/sql_select.cc:2391 #4 0x00000000006fbef9 in subselect_single_select_engine::exec (this=0x7fa1cc073c00) at sql/item_subselect.cc:2001 #5 0x00000000006fad1d in Item_subselect::exec (this=0x7fa1cc073b10) at sql/item_subselect.cc:277 #6 0x00000000006fb3c0 in Item_in_subselect::val_bool (this=0x7fa1cc073b10) at sql/item_subselect.cc:878 #7 0x00000000006b1322 in Item_in_optimizer::val_int (this=0x7fa1cc074388) at sql/item_cmpfunc.cc:1928 #8 0x000000000069722c in Item::val_bool (this=0x7fa1cc074388) at sql/item.cc:200 #9 0x00000000006b074c in Item_func_not::val_int (this=0x7fa1cc073c40) at sql/item_cmpfunc.cc:289 #10 0x000000000069722c in Item::val_bool (this=0x7fa1cc073c40) at sql/item.cc:200 #11 0x00000000006b24fa in Item_cond_and::val_int (this=0x7fa1cc07a8f0) at sql/item_cmpfunc.cc:4586 #12 0x00000000005a224c in evaluate_join_record (join=0x7fa1cc074dc0, join_tab=0x7fa1cc078990, error=<value optimized out>) at sql/sql_select.cc:11818 #13 0x00000000005a70b6 in sub_select (join=0x7fa1cc074dc0, join_tab=0x7fa1cc078990, end_of_records=<value optimized out>) at sql/sql_select.cc:11770 #14 0x00000000005abf8d in do_select (join=0x7fa1cc074dc0, fields=0x6450f50, table=0x0, procedure=<value optimized out>) at sql/sql_select.cc:11535 #15 0x00000000005c208e in JOIN::exec (this=0x7fa1cc074dc0) at sql/sql_select.cc:2391 #16 0x00000000005bdc69 in mysql_select (thd=0x644eb90, rref_pointer_array=<value optimized out>, tables=<value optimized out>, wild_num=0, fields=<value optimized out>, conds=<value optimized out>, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fa1cc073d70, unit=0x6450828, select_lex=0x6450e48) at sql/sql_select.cc:2600

通过代码可以发现：mysql的多表join是通过最外层的do_select进去，调用sub_select完成nested-loop join，而sub_select是一个递归函数，根据索引或者全表扫描循环读取记录，然后调用evaluate_join_record，evaluate_join_record根据where条件过滤，对于满足要求的记录递归调用sub_select进入下一层nested-loop join，evaluate_join_record部分代码分析：

    ...
    if (found)
    {
      enum enum_nested_loop_state rc;
      /* A match from join_tab is found for the current partial join. */
      rc= (*join_tab->next_select)(join, join_tab+1, 0);
      join->thd->warning_info->inc_current_row_for_warning();
      if (rc != NESTED_LOOP_OK && rc != NESTED_LOOP_NO_MORE_ROWS)
        return rc;
      ...
    }

其中join_tab->next_select是一个函数指针，对于join_tables中的最后一个join_tab，next_select在最外层的do_select中被赋值为end_send(发送结果到客户端)，对于其它join_tab，join_tab->next_select在make_join_readinfo(该函数被JOIN::optimize调用)中赋初值sub_select.

查看explain结果，ugcbutterfly表采用全表扫描方式读取记录，因此evaluate_join_record的执行次数应该等于：ugcbutterfly + (满足ugcbutterfly表上出subquery外where条件的记录行数)，这个数字与120957基本吻合

那为什么order by query执行evaluate_join_record却不是120957? 查看执行堆栈：

#0  evaluate_join_record (join=0x7fa1cc078d90, join_tab=0x7fa1cc07ba08, error=-1) at sql/sql_select.cc:11805
#1  0x00000000005a70b6 in sub_select (join=0x7fa1cc078d90, join_tab=0x7fa1cc07ba08, end_of_records=<value optimized out>)
    at sql/sql_select.cc:11770
#2  0x00000000005abf8d in do_select (join=0x7fa1cc078d90, fields=0x7fa1cc064318, table=0x0, procedure=<value optimized out>)
    at sql/sql_select.cc:11535
#3  0x00000000005c208e in JOIN::exec (this=0x7fa1cc078d90) at sql/sql_select.cc:2391
#4  0x00000000006fbef9 in subselect_single_select_engine::exec (this=0x7fa1cc073c00) at sql/item_subselect.cc:2001
#5  0x00000000006fad1d in Item_subselect::exec (this=0x7fa1cc073b10) at sql/item_subselect.cc:277
#6  0x00000000006fb3c0 in Item_in_subselect::val_bool (this=0x7fa1cc073b10) at sql/item_subselect.cc:878
#7  0x00000000006b1322 in Item_in_optimizer::val_int (this=0x7fa1cc0744b0) at sql/item_cmpfunc.cc:1928
#8  0x000000000069722c in Item::val_bool (this=0x7fa1cc0744b0) at sql/item.cc:200
#9  0x00000000006b074c in Item_func_not::val_int (this=0x7fa1cc073c40) at sql/item_cmpfunc.cc:289
#10 0x000000000069722c in Item::val_bool (this=0x7fa1cc073c40) at sql/item.cc:200
#11 0x00000000006b24fa in Item_cond_and::val_int (this=0x7fa1cc07ab48) at sql/item_cmpfunc.cc:4586
#12 0x00000000006895a9 in skip_record (thd=<value optimized out>, table=<value optimized out>, sortorder=0x7fa1d9b6fe30, 
    s_length=3422616976, select=0x7fa1cc07acd8, max_rows=140333119110416, sort_positions=false, 
    examined_rows=0x7fa1d9b700c0) at sql/opt_range.h:888
#13 find_all_keys (thd=<value optimized out>, table=<value optimized out>, sortorder=0x7fa1d9b6fe30, s_length=3422616976, 
    select=0x7fa1cc07acd8, max_rows=140333119110416, sort_positions=false, examined_rows=0x7fa1d9b700c0) at sql/filesort.cc:628
#14 filesort (thd=<value optimized out>, table=<value optimized out>, sortorder=0x7fa1d9b6fe30, s_length=3422616976, 
    select=0x7fa1cc07acd8, max_rows=140333119110416, sort_positions=false, examined_rows=0x7fa1d9b700c0) at sql/filesort.cc:249
#15 0x00000000005b756e in create_sort_index (thd=0x644eb90, join=0x7fa1cc074dc0, order=<value optimized out>, 
    filesort_limit=18446744073709551615, select_limit=<value optimized out>, is_order_by=<value optimized out>) at sql/sql_select.cc:14178
#16 0x00000000005c1faf in JOIN::exec (this=0x7fa1cc074dc0) at sql/sql_select.cc:2355
#17 0x00000000005bdc69 in mysql_select (thd=0x644eb90, rref_pointer_array=<value optimized out>, 
    tables=<value optimized out>, wild_num=0, fields=<value optimized out>, conds=<value optimized out>, og_num=1, 
    order=0x7fa1cc073dd8, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fa1cc073e98, 
    unit=0x6450828, select_lex=0x6450e48) at sql/sql_select.cc:2600

#0 evaluate_join_record (join=0x7fa1cc078d90, join_tab=0x7fa1cc07ba08, error=-1) at sql/sql_select.cc:11805 #1 0x00000000005a70b6 in sub_select (join=0x7fa1cc078d90, join_tab=0x7fa1cc07ba08, end_of_records=<value optimized out>) at sql/sql_select.cc:11770 #2 0x00000000005abf8d in do_select (join=0x7fa1cc078d90, fields=0x7fa1cc064318, table=0x0, procedure=<value optimized out>) at sql/sql_select.cc:11535 #3 0x00000000005c208e in JOIN::exec (this=0x7fa1cc078d90) at sql/sql_select.cc:2391 #4 0x00000000006fbef9 in subselect_single_select_engine::exec (this=0x7fa1cc073c00) at sql/item_subselect.cc:2001 #5 0x00000000006fad1d in Item_subselect::exec (this=0x7fa1cc073b10) at sql/item_subselect.cc:277 #6 0x00000000006fb3c0 in Item_in_subselect::val_bool (this=0x7fa1cc073b10) at sql/item_subselect.cc:878 #7 0x00000000006b1322 in Item_in_optimizer::val_int (this=0x7fa1cc0744b0) at sql/item_cmpfunc.cc:1928 #8 0x000000000069722c in Item::val_bool (this=0x7fa1cc0744b0) at sql/item.cc:200 #9 0x00000000006b074c in Item_func_not::val_int (this=0x7fa1cc073c40) at sql/item_cmpfunc.cc:289 #10 0x000000000069722c in Item::val_bool (this=0x7fa1cc073c40) at sql/item.cc:200 #11 0x00000000006b24fa in Item_cond_and::val_int (this=0x7fa1cc07ab48) at sql/item_cmpfunc.cc:4586 #12 0x00000000006895a9 in skip_record (thd=<value optimized out>, table=<value optimized out>, sortorder=0x7fa1d9b6fe30, s_length=3422616976, select=0x7fa1cc07acd8, max_rows=140333119110416, sort_positions=false, examined_rows=0x7fa1d9b700c0) at sql/opt_range.h:888 #13 find_all_keys (thd=<value optimized out>, table=<value optimized out>, sortorder=0x7fa1d9b6fe30, s_length=3422616976, select=0x7fa1cc07acd8, max_rows=140333119110416, sort_positions=false, examined_rows=0x7fa1d9b700c0) at sql/filesort.cc:628 #14 filesort (thd=<value optimized out>, table=<value optimized out>, sortorder=0x7fa1d9b6fe30, s_length=3422616976, select=0x7fa1cc07acd8, max_rows=140333119110416, sort_positions=false, examined_rows=0x7fa1d9b700c0) at sql/filesort.cc:249 #15 0x00000000005b756e in create_sort_index (thd=0x644eb90, join=0x7fa1cc074dc0, order=<value optimized out>, filesort_limit=18446744073709551615, select_limit=<value optimized out>, is_order_by=<value optimized out>) at sql/sql_select.cc:14178 #16 0x00000000005c1faf in JOIN::exec (this=0x7fa1cc074dc0) at sql/sql_select.cc:2355 #17 0x00000000005bdc69 in mysql_select (thd=0x644eb90, rref_pointer_array=<value optimized out>, tables=<value optimized out>, wild_num=0, fields=<value optimized out>, conds=<value optimized out>, og_num=1, order=0x7fa1cc073dd8, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fa1cc073e98, unit=0x6450828, select_lex=0x6450e48) at sql/sql_select.cc:2600

通过上面两个堆栈输出，进一步分析代码，可以得到这样的结论：
1、filesort通过find_all_keys–>skip_record–>Item_cond_and::val_int调用子查询筛选数据
2、normal query中，通过evaluate_join_record–>test(select_cond->val_int())–>Item_cond_and::val_int调用子查询筛选数据

可以看出filesort函数其实已经完成了查询和排序，但是在profile中为什么order by query多了一行? 查看JOIN::exec代码：

  ...
  if (curr_join->group_list || curr_join->order)
  {
    DBUG_PRINT("info",("Sorting for send_result_set_metadata"));
    thd_proc_info(thd, "Sorting result");
    ...
    if (create_sort_index(thd, curr_join,
			  curr_join->group_list ? 
			  curr_join->group_list : curr_join->order,
			  curr_join->select_limit,
			  (select_options & OPTION_FOUND_ROWS ?
			   HA_POS_ERROR : unit->select_limit_cnt),
                          curr_join->group_list ? TRUE : FALSE))
    ...
  }
  ...
  thd_proc_info(thd, "Sending data");
  DBUG_PRINT("info", ("%s", thd->proc_info));
  result->send_result_set_metadata((procedure ? curr_join->procedure_fields_list :
                                    *curr_fields_list),
                                   Protocol::SEND_NUM_ROWS | Protocol::SEND_EOF);
  error= do_select(curr_join, curr_fields_list, NULL, procedure);
  ...

只要是order by query，都会比normal query多一句：thd_proc_info(thd, “Sorting result”)，这个会反映在profile中，create_sort_index把do_select要做的事情都做的差不多了（还没把数据发送到客户端），检索出结果并已经完成排序，那么do_select不是做重复工作了? 带着疑问追踪代码，发现，在create_sort_index执行完filesort之后，select_cond被置为空了，create_sort_index中部分代码：

  ...
  if (select)
  {
    ...
    select->cleanup();				// filesort did select
    tab->select= 0;
    table->quick_keys.clear_all();  // as far as we cleanup select->quick
    table->sort.io_cache= tablesort_result_cache;
  }
  ...

由于select_cond为NULL, order by query在filesort之后，do_select–>sub_select->evaluate_join_record时，evaluate_join_record并不会执行test(select_cond->val_int())，子查询不会被执行，从而直接调用end_send发送数据到客户端

而且，filesort之后，do_select调用sub_select读取数据时使用的是rr_from_pointers，这个与normal query中使用index/scan不同（catchlog: join_read_always_key–>ha_index_init，ugcbutterfly: join_init_read_record–>init_read_record）

因此问题就变得简单了：order by query的两个阶段中，filesort把数据检索出来并排好序，do_select负责将数据发送到客户端；normal query只有do_select阶段，它循环从外层表（ugcbutterfly，全表扫描）读出记录，满足条件的行与内层表（catchlog，使用索引）join，发送数据到客户端

从本质上讲，这个问题应该是由order by引起的，与subquery应该无关，将query中的subquery去掉后测试一下：(query 3和4是在query1和2的基础上去掉子查询，查询结果集不变，还是4条记录)

mysql> show profile for query 3; 
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000133 |
| checking permissions | 0.000006 |
| Opening tables       | 0.000022 |
| System lock          | 0.000010 |
| init                 | 0.000069 |
| optimizing           | 0.000018 |
| statistics           | 0.000139 |
| preparing            | 0.000021 |
| executing            | 0.000002 |
| Sending data         | 0.339549 |
| end                  | 0.000009 |
| query end            | 0.000005 |
| closing tables       | 0.000010 |
| freeing items        | 0.000043 |
| logging slow query   | 0.000002 |
| logging slow query   | 0.000123 |
| cleaning up          | 0.000004 |
+----------------------+----------+
17 rows in set (0.00 sec)
 
mysql> show profile for query 4;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000144 |
| checking permissions | 0.000007 |
| Opening tables       | 0.000021 |
| System lock          | 0.000010 |
| init                 | 0.000069 |
| optimizing           | 0.000018 |
| statistics           | 0.000132 |
| preparing            | 0.000027 |
| executing            | 0.000002 |
| Sorting result       | 0.129252 |
| Sending data         | 0.000190 |
| end                  | 0.000006 |
| query end            | 0.000005 |
| closing tables       | 0.000010 |
| freeing items        | 0.000039 |
| logging slow query   | 0.000002 |
| logging slow query   | 0.000140 |
| cleaning up          | 0.000010 |
+----------------------+----------+
18 rows in set (0.00 sec)

猜测成立，下一步将找出为什么采用filesort会省却那么多时间，继续使用神器systemtap检测下述函数调用次数和总共执行时间：

normal query：
JOIN::exec:1(1586518us), evaluate_join_record:120953(1058604us), sub_select:2(1586451us)
do_select:1(1586488us), Item_cond_and::val_int:120952(450491us), create_sort_index:0(0us)
 
order by query：
JOIN::exec:1(766256us), evaluate_join_record:5(84us), sub_select:2(193us)
do_select:1(217us), Item_cond_and::val_int:120952(449748us), create_sort_index:1(765997us)

注：由于systemtap注入了一些钩子，导致query执行时间大大延长，但是不影响结果判断

可以发现：
1、normal query主要时间在JOIN::exec–>do_select–>sub_select，order by query主要时间在JOIN::exec–>create_sort_index
2、normal query中的sub_select调用evaluate_join_record达120953次，占用了主要时间
3、normal query与order by query在过滤where条件上占用的时间基本相同（Item_cond_and::val_int），normal query中的sub_select除了evaluate_join_record占用了绝大部分调用时间，还有部分较大调用开销（1586451us-1586451us，通过代码分析应该为全表扫描代价）

一个MySQL子查询与Order By问题分析

问题背景

MySQL版本

表ugcbutterfly结构

表catchlog结构

表数据规模

query不带order by执行计划：

query带order by执行计划，及执行时间

profile

问题分析

systemtap

相关

发表回复取消回复

问题背景

MySQL版本

表ugcbutterfly结构

表catchlog结构

表数据规模

query不带order by执行计划：

query带order by执行计划，及执行时间

profile

问题分析

systemtap

相关

发表回复 取消回复

发表回复取消回复