联接与集合操作二--其他联接
NATURAL JOIN
ANSI SQL还支持NATURAL JOIN,即自然联接。NATURAL JION等同于INNER JOIN与USING的组俣,它隐含的作用是将两个表中具有相同名称的列进行匹配。同样的,NATURAL LEFT(RIGHT) JOIN等同于LEFT(RIGHT) OUTER JOIN与USING的组合。对于下面这句INNER JOIN:
mysql> select a.emp_no,first_name,last_name
from employees a inner join dept_manager b
on a.emp_no=b.emp_no;
+--------+-------------+--------------+
| emp_no | first_name | last_name |
+--------+-------------+--------------+
| 110022 | Margareta | Markovitch |
| 110039 | Vishwani | Minakawa |
| 110085 | Ebru | Alpin |
| 110114 | Isamu | Legleitner |
| 110183 | Shirish | Ossenbruggen |
| 110228 | Karsten | Sigstam |
| 110303 | Krassimir | Wegerle |
| 110344 | Rosine | Cools |
| 110386 | Shem | Kieras |
| 110420 | Oscar | Ghazalie |
| 110511 | DeForest | Hagimont |
| 110567 | Leon | DasSarma |
| 110725 | Peternela | Onuegbe |
| 110765 | Rutger | Hofmeyr |
| 110800 | Sanjoy | Quadeer |
| 110854 | Dung | Pesch |
| 111035 | Przemyslawa | Kaelbling |
| 111133 | Hauke | Zhang |
| 111400 | Arie | Staelin |
| 111534 | Hilary | Kambil |
| 111692 | Tonny | Butterworth |
| 111784 | Marjo | Giarratana |
| 111877 | Xiaobin | Spinelli |
| 111939 | Yuchang | Weedman |
+--------+-------------+--------------+
24 rows in set (0.00 sec)
上述语句和如下的NATURAL JOIN是等价的:
mysql> select a.emp_no,first_name,last_name from employees a natural join dept_manager b;
+--------+-------------+--------------+
| emp_no | first_name | last_name |
+--------+-------------+--------------+
| 110022 | Margareta | Markovitch |
| 110039 | Vishwani | Minakawa |
| 110085 | Ebru | Alpin |
| 110114 | Isamu | Legleitner |
| 110183 | Shirish | Ossenbruggen |
| 110228 | Karsten | Sigstam |
| 110303 | Krassimir | Wegerle |
| 110344 | Rosine | Cools |
| 110386 | Shem | Kieras |
| 110420 | Oscar | Ghazalie |
| 110511 | DeForest | Hagimont |
| 110567 | Leon | DasSarma |
| 110725 | Peternela | Onuegbe |
| 110765 | Rutger | Hofmeyr |
| 110800 | Sanjoy | Quadeer |
| 110854 | Dung | Pesch |
| 111035 | Przemyslawa | Kaelbling |
| 111133 | Hauke | Zhang |
| 111400 | Arie | Staelin |
| 111534 | Hilary | Kambil |
| 111692 | Tonny | Butterworth |
| 111784 | Marjo | Giarratana |
| 111877 | Xiaobin | Spinelli |
| 111939 | Yuchang | Weedman |
+--------+-------------+--------------+
24 rows in set (0.00 sec)
STRAIGHT_JOIN
STRAIGHT_JOIN其实不是新的联接类型,而是用户对SQL优化器的控制,其等同于JOIN。通过STRAIGHT_JOIN,MySQL数据库会强制先读取左边的表。先看一个未使用STRAIGHT_JOIN的SQL语句的执行计划,其SQL语句如下,
mysql> desc select a.emp_no,first_name,last_name from employees a inner join dept_manager b on a.emp_no=b.emp_no;
+----+-------------+-------+--------+----------------+---------+---------+--------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------+---------+---------+--------------------+------+-------------+
| 1 | SIMPLE | b | index | PRIMARY,emp_no | emp_no | 4 | NULL | 24 | Using index |
| 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | employees.b.emp_no | 1 | NULL |
+----+-------------+-------+--------+----------------+---------+---------+--------------------+------+-------------+
2 rows in set (0.00 sec)
可以看到,MySQL数据库先选择b表,也就是dept_manager表,然后进行匹配。这样做的好处实际只进行了24次匹配。如果使用STRAIGHT_JOIN,则会强制使用左表,也就是employees表,例如:
mysql> desc select a.emp_no,first_name,last_name from employees a straight_join dept_manager b on a.emp_no=b.emp_no;
+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------------+
| 1 | SIMPLE | a | ALL | PRIMARY | NULL | NULL | NULL | 299290 | NULL |
| 1 | SIMPLE | b | ref | PRIMARY,emp_no | PRIMARY | 4 | employees.a.emp_no | 1 | Using index |
+----+-------------+-------+------+----------------+---------+---------+--------------------+--------+-------------+
2 rows in set (0.00 sec)
可以看到,MySQL优化器会强制使用左边的表进行匹配,因为左表employees表大约30多万行的数据,因此一共要匹配30多万次,显然选择的这个联接并不是最有效率的。
对两张表进行INNER JOIN,通常MySQL数据库的优化器都能工作得很好。但是对于多张表参与联接的语句,MySQL数据库的优化器选择可能并不总是正确的。这时,对于有经验的DBA,要确定最优的路径,可以使用STRAIGHT_JOIN,强制优化器按照自己的联接顺序来进行联接操作。不过随着MySQL数据库的不断完善,这种情况正变得越来越少。
其他联接分类
到目前为止介绍的都是基本的联接类型。除此之外还有其他几种联接分类方式。这一节将介绍SELF JOIN、NONEQUI JOIN和SEMI JOIN。
SELF JOIN
SELF JOIN是同一个表的两个实例之间的JOIN操作。前面的最小缺失值问题使用的就是SELF JON,只是没有显式地进行归类。
mysql> select min(x.a+1) from t x left outer join t y
->on x.a+1=y.a where y.a is null;
+------------+
| min(x.a+1) |
+------------+
| 3 |
+------------+
1 row in set (0.00 sec)
显然是表t自己对自己进行的JOIN操作。再次提醒,对同一个表进行联接操作必须指定表的别名。下面介绍另外一个见的使用SELF JOIN的问题—-员工—-经理问题。先根据下列语句来创建表emp。
mysql> create table emp(
-> emp_no int primary key,
-> mgr_no int,
-> emp_name varchar(30)
-> );
Query OK, 0 rows affected (0.03 sec)
mysql>
insert into emp select 1,null,"Michael";
insert into emp select 4,1,"Jim";
insert into emp select 3,1,"Tommy";
insert into emp select 2,3,"Mariah";
insert into emp select 5,3,"Beyond";
insert into emp select 6,4,"John";
insert into emp select 8,3,"Monty";
在表emp中,emp_no代表员工编号,mgr_no代表员工的上级经理的员工编号。员工—-经理问题就是得到每位员工的经理信息。这个问题的解决方案如下:
mysql> select a.emp_name as employee,b.emp_name as manager
-> from emp a
-> left join emp b
-> on a.mgr_no=b.emp_no;
+----------+---------+
| employee | manager |
+----------+---------+
| Michael | NULL |
| Mariah | Tommy |
| Tommy | Michael |
| Jim | Michael |
| Beyond | Tommy |
| John | Jim |
| Monty | Tommy |
+----------+---------+
7 rows in set (0.01 sec)
mysql>
由于Michael是最高级别的员工,所以他没有经理信息,其manager为NULL。如果再深入地分析员工—-经理问题,会发现它其实是一个层次结构问题,表emp表示的员工组织图如下
对于层次结构的问题,通常可以通过SELF JOIN来解决。
NONEQUI JOIN
前面介绍的都是EQUAL JOIN(等值联接),即联接条件是基于“等于”运算的联接操作,NONEQUI JOIN的联手条件包含“等于”运算符之外的运算符。
要生成dept_manager表中所有两不同经理的组合,这里先假设当前表中仅包含员工号A、B、C,执行CROSS JOIN后生成九对:(A,A),(A,B),(A,C),(B,A),(B,B),(B,C),(C,A),(C,B),(C,C)。
显然,(A,A),(B,B),(C,C)包含相同的员工号,不是有效的员工组合。而(A,B),(B,A)又表示同样的组合。要解决这个问题,可以指定一个左边值小于右边值的联接条件,这样可以移除上述两种情况。该问题的解决方案为:
mysql> select a.emp_no,b.emp_no from emp a inner join emp b on a.emp_no<b.emp_no;
+--------+--------+
| emp_no | emp_no |
+--------+--------+
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 1 | 6 |
| 1 | 8 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 2 | 6 |
| 2 | 8 |
| 3 | 4 |
| 3 | 5 |
| 3 | 6 |
| 3 | 8 |
| 4 | 5 |
| 4 | 6 |
| 4 | 8 |
| 5 | 6 |
| 5 | 8 |
| 6 | 8 |
+--------+--------+
21 rows in set (0.01 sec)
通过1*N的CROSS JOIN产生行号。利用如下SQL语句来产生行号:
mysql> select a.emp_no,b.emp_no,count(1) as rownum from emp a inner join emp b on a.emp_no>=b.emp_no group by a.emp_no;
+--------+--------+--------+
| emp_no | emp_no | rownum |
+--------+--------+--------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 1 | 4 |
| 5 | 1 | 5 |
| 6 | 1 | 6 |
| 8 | 1 | 7 |
+--------+--------+--------+
7 rows in set (0.00 sec)
SEMI JOIN
SEMI JOIN是根据一个表中存在的想着记录找到另一个表中想要数据的联接。如果从左表返回记录,该联接被称为左半联接;如果从右表返回记录,该联接被称为右半联接。
实现SEMI JOIN的方法有多种,如内部联接,子查询、集合操作等。在使用内部联接方式时,只从一个表中选择记录,然后应用DISTINCT。下面的SQL查询返回的是来自杭州且发生过订单的客户信息。
mysql> select distinct c.customer_id,city from customers as c
-> join orders as o on c.customer_id=o.customer_id
-> where c.city='HangZhou';
+-------------+----------+
| customer_id | city |
+-------------+----------+
| 139 | Hangzhou |
| tx | Hangzhou |
+-------------+----------+
2 rows in set (0.00 sec)
mysql> desc select distinct c.customer_id,city from customers as c
-> join orders as o on c.customer_id=o.customer_id where c.city='HangZhou';
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 7 | Using temporary |
| 1 | SIMPLE | c | ALL | PRIMARY | NULL | NULL | NULL | 4 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
2 rows in set (0.01 sec)
ANTI SEMI JOIN
与SEMI JOIN相反的是ANTI SEMI JOIN,它根据一个表中不存在的记录而从另一个表中返回记录。使用OUTER JOIN并过滤外部行,可以实现ANTI SEMI JOIN。例如,下面的SQL查询返回的是来自杭州但没有订单的客户信息。
mysql> desc select c.customer_id,c.city from customers as c left outer
->join orders as o on c.customer_id=o.customer_id where
->c.city='Hangzhou' AND o.customer_id is null;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 5 | Using where |
| 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 7 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
2 rows in set (0.00 sec)
可以看到所有来自杭州的订单都被访问到。如果来自杭州的客户数量为a,每个消费者的平均订单数为b,则霜访问的订单数是a*b。然后过滤外部行。最后得到的结果如下所示。
mysql> select c.customer_id,c.city from customers as c left outer
->join orders as o on c.customer_id=o.customer_id where
->c.city='Hangzhou' AND o.customer_id is null;
+-------------+----------+
| customer_id | city |
+-------------+----------+
| 186 | Hangzhou |
+-------------+----------+
1 row in set (0.00 sec)