Contents
  1. 1. NATURAL JOIN
  2. 2. STRAIGHT_JOIN
  3. 3. 其他联接分类
  4. 4. SELF JOIN
  5. 5. NONEQUI JOIN
  6. 6. SEMI JOIN
  7. 7. ANTI SEMI JOIN

NATURAL JOIN

ANSI SQL还支持NATURAL JOIN,即自然联接。NATURAL JION等同于INNER JOIN与USING的组俣,它隐含的作用是将两个表中具有相同名称的列进行匹配。同样的,NATURAL LEFT(RIGHT) JOIN等同于LEFT(RIGHT) OUTER JOIN与USING的组合。对于下面这句INNER JOIN:

mysql> select a.emp_no,first_name,last_name
from employees a inner join dept_manager b
on a.emp_no=b.emp_no;

+--------+-------------+--------------+
| emp_no | first_name  | last_name    |
+--------+-------------+--------------+
| 110022 | Margareta   | Markovitch   |
| 110039 | Vishwani    | Minakawa     |
| 110085 | Ebru        | Alpin        |
| 110114 | Isamu       | Legleitner   |
| 110183 | Shirish     | Ossenbruggen |
| 110228 | Karsten     | Sigstam      |
| 110303 | Krassimir   | Wegerle      |
| 110344 | Rosine      | Cools        |
| 110386 | Shem        | Kieras       |
| 110420 | Oscar       | Ghazalie     |
| 110511 | DeForest    | Hagimont     |
| 110567 | Leon        | DasSarma     |
| 110725 | Peternela   | Onuegbe      |
| 110765 | Rutger      | Hofmeyr      |
| 110800 | Sanjoy      | Quadeer      |
| 110854 | Dung        | Pesch        |
| 111035 | Przemyslawa | Kaelbling    |
| 111133 | Hauke       | Zhang        |
| 111400 | Arie        | Staelin      |
| 111534 | Hilary      | Kambil       |
| 111692 | Tonny       | Butterworth  |
| 111784 | Marjo       | Giarratana   |
| 111877 | Xiaobin     | Spinelli     |
| 111939 | Yuchang     | Weedman      |
+--------+-------------+--------------+
24 rows in set (0.00 sec)

上述语句和如下的NATURAL JOIN是等价的:

mysql> select a.emp_no,first_name,last_name from employees a natural join dept_manager b;
+--------+-------------+--------------+
| emp_no | first_name  | last_name    |
+--------+-------------+--------------+
| 110022 | Margareta   | Markovitch   |
| 110039 | Vishwani    | Minakawa     |
| 110085 | Ebru        | Alpin        |
| 110114 | Isamu       | Legleitner   |
| 110183 | Shirish     | Ossenbruggen |
| 110228 | Karsten     | Sigstam      |
| 110303 | Krassimir   | Wegerle      |
| 110344 | Rosine      | Cools        |
| 110386 | Shem        | Kieras       |
| 110420 | Oscar       | Ghazalie     |
| 110511 | DeForest    | Hagimont     |
| 110567 | Leon        | DasSarma     |
| 110725 | Peternela   | Onuegbe      |
| 110765 | Rutger      | Hofmeyr      |
| 110800 | Sanjoy      | Quadeer      |
| 110854 | Dung        | Pesch        |
| 111035 | Przemyslawa | Kaelbling    |
| 111133 | Hauke       | Zhang        |
| 111400 | Arie        | Staelin      |
| 111534 | Hilary      | Kambil       |
| 111692 | Tonny       | Butterworth  |
| 111784 | Marjo       | Giarratana   |
| 111877 | Xiaobin     | Spinelli     |
| 111939 | Yuchang     | Weedman      |
+--------+-------------+--------------+
24 rows in set (0.00 sec)

STRAIGHT_JOIN

STRAIGHT_JOIN其实不是新的联接类型,而是用户对SQL优化器的控制,其等同于JOIN。通过STRAIGHT_JOIN,MySQL数据库会强制先读取左边的表。先看一个未使用STRAIGHT_JOIN的SQL语句的执行计划,其SQL语句如下,

mysql> desc select a.emp_no,first_name,last_name from employees a inner join dept_manager b on a.emp_no=b.emp_no;
+----+-------------+-------+--------+----------------+---------+---------+--------------------+------+-------------+
| id | select_type | table | type   | possible_keys  | key     | key_len | ref                | rows | Extra       |
+----+-------------+-------+--------+----------------+---------+---------+--------------------+------+-------------+
|  1 | SIMPLE      | b     | index  | PRIMARY,emp_no | emp_no  | 4       | NULL               |   24 | Using index |
|  1 | SIMPLE      | a     | eq_ref | PRIMARY        | PRIMARY | 4       | employees.b.emp_no |    1 | NULL        |
+----+-------------+-------+--------+----------------+---------+---------+--------------------+------+-------------+
2 rows in set (0.00 sec)

可以看到,MySQL数据库先选择b表,也就是dept_manager表,然后进行匹配。这样做的好处实际只进行了24次匹配。如果使用STRAIGHT_JOIN,则会强制使用左表,也就是employees表,例如:

mysql> desc select a.emp_no,first_name,last_name from employees a straight_join dept_manager b on a.emp_no=b.emp_no;   
+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------------+
| id | select_type | table | type | possible_keys  | key     | key_len | ref                | rows   | Extra       |
+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------------+
|  1 | SIMPLE      | a     | ALL  | PRIMARY        | NULL    | NULL    | NULL               | 299290 | NULL        |
|  1 | SIMPLE      | b     | ref  | PRIMARY,emp_no | PRIMARY | 4       | employees.a.emp_no |      1 | Using index |
+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------------+
2 rows in set (0.00 sec)

可以看到,MySQL优化器会强制使用左边的表进行匹配,因为左表employees表大约30多万行的数据,因此一共要匹配30多万次,显然选择的这个联接并不是最有效率的。
对两张表进行INNER JOIN,通常MySQL数据库的优化器都能工作得很好。但是对于多张表参与联接的语句,MySQL数据库的优化器选择可能并不总是正确的。这时,对于有经验的DBA,要确定最优的路径,可以使用STRAIGHT_JOIN,强制优化器按照自己的联接顺序来进行联接操作。不过随着MySQL数据库的不断完善,这种情况正变得越来越少。

其他联接分类

到目前为止介绍的都是基本的联接类型。除此之外还有其他几种联接分类方式。这一节将介绍SELF JOIN、NONEQUI JOIN和SEMI JOIN。

SELF JOIN

SELF JOIN是同一个表的两个实例之间的JOIN操作。前面的最小缺失值问题使用的就是SELF JON,只是没有显式地进行归类。

mysql> select min(x.a+1) from t x left outer join t y 
     ->on x.a+1=y.a where y.a is null; 
+------------+
| min(x.a+1) |
+------------+
|          3 |
+------------+
1 row in set (0.00 sec)

显然是表t自己对自己进行的JOIN操作。再次提醒,对同一个表进行联接操作必须指定表的别名。下面介绍另外一个见的使用SELF JOIN的问题—-员工—-经理问题。先根据下列语句来创建表emp。

mysql> create table emp(
    -> emp_no int primary key,
    -> mgr_no int,
    -> emp_name varchar(30)
    -> );
Query OK, 0 rows affected (0.03 sec)

mysql> 

insert into emp select 1,null,"Michael";
insert into emp select 4,1,"Jim";
insert into emp select 3,1,"Tommy";
insert into emp select 2,3,"Mariah";
insert into emp select 5,3,"Beyond";
insert into emp select 6,4,"John";
insert into emp select 8,3,"Monty";

在表emp中,emp_no代表员工编号,mgr_no代表员工的上级经理的员工编号。员工—-经理问题就是得到每位员工的经理信息。这个问题的解决方案如下:

mysql> select a.emp_name as employee,b.emp_name as manager 
    -> from emp a 
    -> left join emp b
    -> on a.mgr_no=b.emp_no;
+----------+---------+
| employee | manager |
+----------+---------+
| Michael  | NULL    |
| Mariah   | Tommy   |
| Tommy    | Michael |
| Jim      | Michael |
| Beyond   | Tommy   |
| John     | Jim     |
| Monty    | Tommy   |
+----------+---------+
7 rows in set (0.01 sec)

mysql> 

由于Michael是最高级别的员工,所以他没有经理信息,其manager为NULL。如果再深入地分析员工—-经理问题,会发现它其实是一个层次结构问题,表emp表示的员工组织图如下

员工组织结构图

对于层次结构的问题,通常可以通过SELF JOIN来解决。

NONEQUI JOIN

前面介绍的都是EQUAL JOIN(等值联接),即联接条件是基于“等于”运算的联接操作,NONEQUI JOIN的联手条件包含“等于”运算符之外的运算符。
要生成dept_manager表中所有两不同经理的组合,这里先假设当前表中仅包含员工号A、B、C,执行CROSS JOIN后生成九对:(A,A),(A,B),(A,C),(B,A),(B,B),(B,C),(C,A),(C,B),(C,C)。
显然,(A,A),(B,B),(C,C)包含相同的员工号,不是有效的员工组合。而(A,B),(B,A)又表示同样的组合。要解决这个问题,可以指定一个左边值小于右边值的联接条件,这样可以移除上述两种情况。该问题的解决方案为:

mysql> select a.emp_no,b.emp_no from emp a inner join emp b on a.emp_no<b.emp_no;            
+--------+--------+
| emp_no | emp_no |
+--------+--------+
|      1 |      2 |
|      1 |      3 |
|      1 |      4 |
|      1 |      5 |
|      1 |      6 |
|      1 |      8 |
|      2 |      3 |
|      2 |      4 |
|      2 |      5 |
|      2 |      6 |
|      2 |      8 |
|      3 |      4 |
|      3 |      5 |
|      3 |      6 |
|      3 |      8 |
|      4 |      5 |
|      4 |      6 |
|      4 |      8 |
|      5 |      6 |
|      5 |      8 |
|      6 |      8 |
+--------+--------+
21 rows in set (0.01 sec)

通过1*N的CROSS JOIN产生行号。利用如下SQL语句来产生行号:

mysql> select a.emp_no,b.emp_no,count(1) as rownum from emp a inner join emp b on a.emp_no>=b.emp_no group by a.emp_no;
+--------+--------+--------+
| emp_no | emp_no | rownum |
+--------+--------+--------+
|      1 |      1 |      1 |
|      2 |      1 |      2 |
|      3 |      1 |      3 |
|      4 |      1 |      4 |
|      5 |      1 |      5 |
|      6 |      1 |      6 |
|      8 |      1 |      7 |
+--------+--------+--------+
7 rows in set (0.00 sec)

SEMI JOIN

SEMI JOIN是根据一个表中存在的想着记录找到另一个表中想要数据的联接。如果从左表返回记录,该联接被称为左半联接;如果从右表返回记录,该联接被称为右半联接。

实现SEMI JOIN的方法有多种,如内部联接,子查询、集合操作等。在使用内部联接方式时,只从一个表中选择记录,然后应用DISTINCT。下面的SQL查询返回的是来自杭州且发生过订单的客户信息。

mysql> select distinct c.customer_id,city from customers as c
    -> join orders as o on c.customer_id=o.customer_id
    -> where c.city='HangZhou';
+-------------+----------+
| customer_id | city     |
+-------------+----------+
| 139         | Hangzhou |
| tx          | Hangzhou |
+-------------+----------+
2 rows in set (0.00 sec)

mysql> desc select distinct c.customer_id,city from customers as c
    -> join orders as o on c.customer_id=o.customer_id where c.city='HangZhou';
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                              |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
|  1 | SIMPLE      | o     | ALL  | NULL          | NULL | NULL    | NULL |    7 | Using temporary                                    |
|  1 | SIMPLE      | c     | ALL  | PRIMARY       | NULL | NULL    | NULL |    4 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
2 rows in set (0.01 sec)

ANTI SEMI JOIN

与SEMI JOIN相反的是ANTI SEMI JOIN,它根据一个表中不存在的记录而从另一个表中返回记录。使用OUTER JOIN并过滤外部行,可以实现ANTI SEMI JOIN。例如,下面的SQL查询返回的是来自杭州但没有订单的客户信息。

mysql> desc select c.customer_id,c.city from customers as c left outer 
    ->join orders as o  on c.customer_id=o.customer_id where 
    ->c.city='Hangzhou' AND o.customer_id is null;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                              |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
|  1 | SIMPLE      | c     | ALL  | NULL          | NULL | NULL    | NULL |    5 | Using where                                        |
|  1 | SIMPLE      | o     | ALL  | NULL          | NULL | NULL    | NULL |    7 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
2 rows in set (0.00 sec)

可以看到所有来自杭州的订单都被访问到。如果来自杭州的客户数量为a,每个消费者的平均订单数为b,则霜访问的订单数是a*b。然后过滤外部行。最后得到的结果如下所示。

mysql>  select c.customer_id,c.city from customers as c left outer 
     ->join orders as o  on c.customer_id=o.customer_id where 
     ->c.city='Hangzhou' AND o.customer_id is null;
    +-------------+----------+
    | customer_id | city     |
    +-------------+----------+
    | 186         | Hangzhou |
    +-------------+----------+
    1 row in set (0.00 sec)
Contents
  1. 1. NATURAL JOIN
  2. 2. STRAIGHT_JOIN
  3. 3. 其他联接分类
  4. 4. SELF JOIN
  5. 5. NONEQUI JOIN
  6. 6. SEMI JOIN
  7. 7. ANTI SEMI JOIN