联接与集合操作一--常见联接
联接查询
联接查询是一种常见的数据库操作,即在两张表(或更多表)中进行行匹配的操作。一般称之为水平操作,这是因为对几张表进行联接操作所产生的结果集可以包含这几张表中所有的列。对应于联接的水平操作,一般集合操作视为垂直操作。
MySQL数据库支持如下的联接查询:
- CROSS JOIN (交叉联接)
- INNER JOIN (内联接)
- OUTER JOIN (外联接)
- 其他
新旧查询语法
MySQL数据库支持两种不同的联接操作语法。在实际使用中,很少有DBA或开发人员可以观察到或区分出两者的不同。
ANSI SQL 89 | ANSI SQL 92 |
---|---|
SELECT … FROM A,B WHERE A.X=B.X | SELECT … FROM A INNER JION B ON A.X=B.X |
左边为ANSI SQL 89标准,右边为ANSI SQL 92标准,ANSI SQL 89 与新语法的区别是FROM 子句中的表名之间用逗号分隔,没有JOIN关键字,也没有ON子句。其语法格式如下:
FROM T1,T2 WHERE where_condition
ANSI SQL 89 只支持CROSS JOIN和INNER JOIN ,不支持OUTER JOIN。新语法是由ANSI SQL 92 引入的,与旧语法的区别是引入了JOIN关键字和ON过滤子句,并去掉了表之间的逗号,引入了外部联接的支持,因此要严格区分ON过滤器和WHERE过滤器的作用,其语法格式如下:
FROM T1
<join_type> JOIN T2
ON on_condition
对于这两种语法其逻辑查询和物理查询也是相同的。
CRONSS JOIN
CROSS JOIN对两个表执行笛卡儿积,返回两个表中所有列的组合。若左表有m行数据,右表有n行数据,则CROSS JOIN将返回m*n行的表。
mysql> desc select a.customer_id,b.customer_id from customer as a cross join customer as b;
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | a | index | NULL | idx_fk_store_id | 1 | NULL | 599 | Using index |
| 1 | SIMPLE | b | index | NULL | idx_fk_store_id | 1 | NULL | 599 | Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+----------------------------------------------------+
2 rows in set (0.00 sec)
因为customer表共有599行记录,所以CROSS JOIN后共有599*599=358801条记录。
也可以使用ANSI SQL 89的语法来实现相同任务的查询。
mysql> desc select * from customer as a, customer as b;
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------------+
| 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 599 | NULL |
| 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 599 | Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------------+
2 rows in set (0.00 sec
对于交叉联接,ANSI SQL 89语法代码更短,语法更加易读。不必担心两者的性能,因为正如前面据说,优化器将为两者生成相同的执行计划。
row_num行号的示例
mysql> select customer_id,store_id,@a:=@a+1 as row_num from customer,(select @a:=0) t;
+-------------+----------+---------+
| customer_id | store_id | row_num |
+-------------+----------+---------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 5 | 1 | 4 |
| 7 | 1 | 5 |
| 10 | 1 | 6 |
| 12 | 1 | 7 |
| 15 | 1 | 8 |
| 17 | 1 | 9 |
| 19 | 1 | 10 |
INNER JOIN
通过INNER JOIN用户可以根据一些过滤条件来匹配表之间的数据。在逻辑查询的前三个处理阶段中,INNER JOIN应用前两个阶段,即首先产生笛卡儿积的虚拟表,再按照ON过滤条件来进行数据的匹配操作。INNER JOIN没有第三步操作,即不添加外部行,这是和OUTER JOIN 最大的区别之一。也正是因为不会添加外部行,指定过滤条件在ON子句和WHERE子句是没有任何区别的。
在表之间匹配记录的过滤器放在ON子句中,而只从一个表中过滤数据的条件放在WHERE子句中,下面语句实现的是找出地址ID为1的地址信息
select a.address,a.address_id,a.city_id
from address a
[inner] join city b
on a.city_id=b.city_id
where a.city_id=1;
mysql> select a.address,a.address_id,a.city_id from address a inner join city b on a.city_id=b.city_id where a.city_id=1;
+----------------------+------------+---------+
| address | address_id | city_id |
+----------------------+------------+---------+
| 939 Probolinggo Loop | 56 | 1 |
+----------------------+------------+---------+
1 row in set (0.00 sec)
INNER 关键字可以省略。前面说过INNER JOIN 中WHERE 的过滤条件可以写在ON子句中,因此下面的SQL查询语句得到的结果和上面是一样的。
select a.address,a.address_id,a.city_id
from address a
inner join city b
on a.city_id=b.city_id
and a.city_id=1;
mysql> select a.address,a.address_id,a.city_id from address a inner join city b on a.city_id=b.city_id and a.city_id=1;
+----------------------+------------+---------+
| address | address_id | city_id |
+----------------------+------------+---------+
| 939 Probolinggo Loop | 56 | 1 |
+----------------------+------------+---------+
1 row in set (0.00 sec)
mysql>
如果使用ANSI SQL 89语法,用户只能在WHERE条件中指定所有的过滤条件,其查询语句如下:
select a.address,a.address_id,a.city_id
from address a,city b
where a.city_id=b.city_id
and a.city_id=1;
mysql> select a.address,a.address_id,a.city_id from address a,city b where a.city_id=b.city_id and a.city_id=1;
+----------------------+------------+---------+
| address | address_id | city_id |
+----------------------+------------+---------+
| 939 Probolinggo Loop | 56 | 1 |
+----------------------+------------+---------+
1 row in set (0.00 sec)
mysql>
对于CROSS JOIN笔者更喜欢使用ANSI SQL 89语法,而对于INNER JOIN正好相反,更倾向于使用ANSI SQL 92语法。如果忘记指定联手条件,则使用ANSI SQL 89语法可能有些危险,因为可能会得到很大的笛卡积返回集。
mysql> desc select a.address,a.address_id,a.city_id from address a,city b where a.city_id=1;
+----+-------------+-------+-------+----------------+-------------------+---------+-------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------+-------------------+---------+-------+------+----------------------------------------------------+
| 1 | SIMPLE | a | ref | idx_fk_city_id | idx_fk_city_id | 2 | const | 1 | NULL |
| 1 | SIMPLE | b | index | NULL | idx_fk_country_id | 2 | NULL | 600 | Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+----------------+-------------------+---------+-------+------+----------------------------------------------------+
2 rows in set (0.00 sec)
特别注意的是,在MySQL数据库中,如果INNER JOIN后不跟ON子句,也是可以通过语法解析的,这时INNER JOIN 等于CROSS JOIN,即产生笛卡儿积,示例如下
mysql> desc select a.customer_id,b.customer_id from customer as a inner join customer as b;
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | a | index | NULL | idx_fk_store_id | 1 | NULL | 599 | Using index |
| 1 | SIMPLE | b | index | NULL | idx_fk_store_id | 1 | NULL | 599 | Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+----------------------------------------------------+
2 rows in set (0.04 sec)
mysql>
这一点和很多其他关系数据库非常不同,例如,在Microsoft SQL Server中必须指定ON子句,否则语法解析器会抛出异常。而在MySQL数据库中,CROSS JOIN其实和INNER JOIN是同义词关系,因此当没有ON子句时,SQL解析器会将INNER JOIN理解为CROSS JON。
此外,如果ON子句中的列具有相同的名称,可以使用USING子句来进行简化,得到的结果和上述两语法的语句结果是一样的:
mysql> select a.address,a.address_id,a.city_id from address a inner join city b on a.city_id=b.city_id where a.city_id=1;
+----------------------+------------+---------+
| address | address_id | city_id |
+----------------------+------------+---------+
| 939 Probolinggo Loop | 56 | 1 |
+----------------------+------------+---------+
1 row in set (0.00 sec)
mysql> select a.address,address_id,city_id from address a inner join city b using(city_id) where city_id=1;
+----------------------+------------+---------+
| address | address_id | city_id |
+----------------------+------------+---------+
| 939 Probolinggo Loop | 56 | 1 |
+----------------------+------------+---------+
1 row in set (0.00 sec)
mysql>
OUTER JOIN
通过OUTER JOIN 用户可以按照上些过滤条件来匹配表之间的数据。与INNER JOIN不同的是,在通过OUTER JOIN添加的保留表中存在未找到的匹配数据。MySQL数据库支持LEFT OUTER JOIN和RIGHT OUTER JOIN。与INNER关键字一样,可以省略OUTER关键字。
注意:目前MySQL不支持FULL OUTER JOINOUTER JOIN应用逻辑查询的前三个步骤,即产生笛卡儿积、应用ON过滤器和添加外部行。对于保留表中的行数据,如果是未找到匹配数据而被添加的记录,其值NULL进行填充。查询处理 中我们已经接触过了LEFT JOIN,即返回客户信息、订单信息,同时返回没有订单的客户,因为指定了LEFT关键字,因此表customers是保留表。查询过程如下:
mysql> select c.customer_id,o.order_id from customers as c left outer join orders as o on c.customer_id=o.customer_id;
+-------------+----------+
| customer_id | order_id |
+-------------+----------+
| 139 | 1 |
| 139 | 2 |
| guo | 3 |
| guo | 4 |
| guo | 5 |
| tx | 6 |
| baidu | NULL |
+-------------+----------+
7 rows in set (0.01 sec)
OUTER JOIN只在ANSI SQL 92 中得到支持,在其他一些数据库中可以使用(+)=、=来表示LEFT JOIN,用=(+)、=来扩展ANSI SQL 89语法使其支持OUTER JOIN。但是对MySQL数据库来说,只有一种OUTER JOIN的联接语法。
通过OUTER JOIN和IS NULL,可以返回没有用户订单的客户,例如:
mysql> select c.customer_id,o.order_id from customers as c left outer join orders as o on c.customer_id=o.customer_id where o.order_id is null;
+-------------+----------+
| customer_id | order_id |
+-------------+----------+
| baidu | NULL |
+-------------+----------+
1 row in set (0.00 sec)
需要注意的是,INNER JOIN中的过滤条件都可以写在ON子句中,而OUTER JOIN的过滤条件不可以这样处理,因为可能会得到不正确的结果,例如:
mysql> select c.customer_id,o.order_id from customers as c left outer join orders as o on c.customer_id=o.customer_id and o.order_id is null;
+-------------+----------+
| customer_id | order_id |
+-------------+----------+
| 139 | NULL |
| baidu | NULL |
| guo | NULL |
| tx | NULL |
+-------------+----------+
4 rows in set (0.00 sec
这次得到了4条记录,显然不是我们想要的结果。
对于OUTER JOIN,同样可以使用USING来简化ON子句,因此上述的语句可以改写为:
mysql> select c.customer_id,o.order_id from customers as c left outer join orders as o using(customer_id);
+-------------+----------+
| customer_id | order_id |
+-------------+----------+
| 139 | 1 |
| 139 | 2 |
| guo | 3 |
| guo | 4 |
| guo | 5 |
| tx | 6 |
| baidu | NULL |
+-------------+----------+
7 rows in set (0.00 sec)
与INNER JOIN不同的是,对于OUTER JION,必须制定ON子句,否则MySQL数据库会抛出异常,例如:
mysql> select c.customer_id,o.order_id from customers as c left outer join orders as o ;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1