User's Guide
PART 1. Working with Databases
CHAPTER 7. Using Subqueries
Understanding which queries are valid and which ones aren't can be complicated when a query contains a subquery. Similarly, figuring out what a multi-level query does can also be very involved, and it helps to understand how SQL processes subqueries. For general information about how queries are processed, see .
In a simple query, SQL evaluates and processes the query's WHERE clause once for each row of the query. Sometimes, though, the subquery returns only one result, making it unnecessary for SQL to evaluate it more than once for the entire result set.
Consider this query:
SELECT name, description FROM product WHERE quantity < 2 * ( SELECT avg (quantity) FROM sales_order_items)
In this example, the subquery calculates exactly one value: the average quantity from the sales_order_items table. In evaluating the query, SQL computes this value once, and compares each value in the quantity field of the product table to it to determine whether the corresponding row is selected.
When a subquery contains an outer reference, this shortcut can no longer be used. For instance, the subquery in the query
SELECT name, description FROM product WHERE quantity < 2 * ( SELECT avg (quantity) FROM sales_order_items WHERE product.id=sales_order_items.prod_id)
returns a value that is dependent upon the active row in the product table. Such subqueries are called correlated subqueries. In these cases, there the subquery might return a different value for each row of the outer query, making it necessary for SQL to perform more than one evaluation.
In general, a query that uses joins is executed faster than is a multi-level query. For this reason, whenever possible, Adaptive Server Anywhere's query optimizer will convert a multi-level query to a query that uses joins. The conversion is carried out without any user action. This section describes which subqueries can be converted to joins so that you can understand the performace of queries in your database.
The question "When did Mrs. Clarke and Suresh place their orders, and by which sales representatives?" can be written as a two-level query:
SELECT order_date, sales_rep FROM sales_order WHERE cust_id IN ( SELECT id FROM customer WHERE lname = 'Clarke' OR fname = 'Suresh')
An alternate, and equally correct way to write the query uses joins:
SELECT fname, lname, order_date, sales_rep FROM sales_order, customer WHERE cust_id=customer.id AND (lname = 'Clarke' OR fname = 'Suresh')
The critieria that must be satisfied in order for a multi-level query to be able to be rewritten with joins differ for the various types of operators. Recall that when a subquery appears in the query's WHERE clause, it is of the form
SELECT select-list
FROM table
WHERE [NOT] expression comparison-operator (subquery) |
[NOT] expression comparison-operator ANY / SOME (subquery) |
[NOT] expression comparison-operator ALL (subquery) |
[NOT] expression IN (subquery) |
[NOT] EXISTS (subquery)
GROUP BY group-by-expression
HAVING search-condition
Whether a subquery can be converted to a join depends on a number of factors, such as the type of operator and the structure of the query.
A subquery that follows a comparison operator (=, <>, <, <=, >, >=) must satisfy certain conditions if it is to be converted into a join. Subqueries that follow comparison operators in general are valid only if they return exactly one value for each row of the main query. In addition to this criterion, a subquery is converted to a join only if
It does not contain a GROUP BY clause
It does not contain the keyword DISTINCT
It is not a UNION query
It is not an aggregate query
Suppose the request "When were Suresh's products ordered, and by which sales representative?" were phrased as the subquery
SELECT order_date, sales_rep FROM sales_order WHERE cust_id = ( SELECT id FROM customer WHERE fname = 'Suresh')
This query satisfies the criteria, and therefore, it would be converted to a query that uses a join:
SELECT order_date, sales_rep FROM sales_order WHERE cust_id=customer.id AND (lname = 'Clarke' OR fname = 'Suresh')
However, the request, "Find the products whose in-stock quantities are less than double the average ordered quantity" cannot be converted to a join, as the subquery contains the aggregate function AVG:
SELECT name, description FROM product WHERE quantity < 2 * ( SELECT avg (quantity) FROM sales_order_items)
A subquery that follows one of the keywords ALL, ANY and SOME is converted into a join only if it satisfies certain criteria.
The subquery does not contain a GROUP BY clause
The subquery does not contain the keyword DISTINCT
The subquery is not a UNION query
The subquery is not an aggregate query
If the subquery follows the keywords ANY or SOME, it must not be negated; if it follows the keyword ALL, it must be negated.
The first four of these conditions are relatively straightforward.
The request "When did Mrs. Clarke and Suresh place their orders, and by which sales representatives?" can be handled in subquery form:
SELECT order_date, sales_rep FROM sales_order WHERE cust_id = ANY ( SELECT id FROM customer WHERE lname = 'Clarke' OR fname = 'Suresh')
Alternately, it can be phrased in join form
SELECT fname, lname, order_date, sales_rep FROM sales_order, customer WHERE cust_id=customer.id AND (lname = 'Clarke' OR fname = 'Suresh')
However, request, "When did Mrs. Clarke, Suresh, and any employee who is also a customer, place their orders?" would be phrased as a union query, and thus cannot be converted to a join:
SELECT order_date, sales_rep FROM sales_order WHERE cust_id = ANY ( SELECT id FROM customer WHERE lname = 'Clarke' OR fname = 'Suresh' UNION SELECT id FROM employee)
Similarly, the request "Find the customer and product ID's of those products that were not ordered after all orders of product #2001 were shipped," is naturally expressed with a subquery
SELECT id, cust_id FROM sales_order WHERE NOT order_date > ALL ( SELECT ship_date FROM sales_order_items WHERE id=2001)
It would be converted to the join:
SELECT sales_order.id, cust_id FROM sales_order, sales_order_items WHERE (sales_order_items.id=2001) and (order_date <= ship_date)
However, the request "Find the customer and product ID's of those products that were not shipped after the first shipping dates of all the products" would be phrased as the aggregate query
SELECT id, cust_id FROM sales_order WHERE NOT order_date > ALL ( SELECT first (ship_date) FROM sales_order_items )
Therefore, it would not be converted to a join.
The fifth criterion is a little more puzzling: queries of the form
SELECT select-list
FROM table
WHERE [NOT] expression comparison-operator ALL (subquery)
are converted to joins, as are queries of the form
SELECT select-list
FROM table
WHERE expression comparison-operator ANY (subquery)
but the queries
SELECT select-list
FROM table
WHERE expression comparison-operator ALL (subquery)
and
SELECT select-list
FROM table
WHERE NOT expression comparison-operator ANY (subquery)
are not.
This is because the first two queries are in fact equivalent, as are the last two. Recall that the any operator analogous to the or operator, but with a variable number of arguments; and that the ALL operator is similarly analogous to the AND operator. Just as the expression
NOT ((> A) AND (> B))
is equivalent to the expression
(<= A) OR (<= B)
the expression
NOT order_date > ALL ( SELECT first (ship_date) FROM sales_order_items )
is equivalent to the expression
order_date <= ANY ( SELECT first (ship_date) FROM sales_order_items )
In general, the expression
NOT column-name operator ANY (subquery)
is equivalent to the expression
column-name inverse-operator ALL (subquery)
and the expression
NOT column-name operator ALL (subquery)
is equivalent to the expression
column-name inverse-operator ANY (subquery)
where inverse-operator is obtained by negating operator, as shown in the table:
Operator |
inverse-operator |
---|---|
= |
<> |
< |
=> |
> |
=< |
=< |
> |
=> |
< |
<> |
= |
A query containing a subquery that follows the keyword IN is converted into a join only if:
The subquery does not contain a GROUP BY clause
The subquery does not contain the keyword DISTINCT
The subquery is not a UNION query
The subquery is not an aggregate query
The subquery must not be negated
So, the request "Find the names of the employees who are also department heads", which is expressed by the query
SELECT emp_fname, emp_lname FROM employee WHERE emp_id IN ( SELECT dept_head_id FROM department WHERE (dept_name='Finance' or dept_name = 'Shipping'))
would be converted to a joined query, as it satisfies the conditions; however, the request, "Find the names of the employees who are also either department heads or customers" would not be converted to a join if it were expressed by the UNION query
SELECT emp_fname, emp_lname FROM employee WHERE emp_id IN ( SELECT dept_head_id FROM department WHERE (dept_name='Finance' or dept_name = 'Shipping') UNION SELECT cust_id FROM sales_order)
Similarly, the request "Find the names of employees who are not department heads" is formualted as the negated subquery
SELECT emp_fname, emp_lname FROM employee WHERE NOT emp_id IN ( SELECT dept_head_id FROM department WHERE (dept_name='Finance' OR dept_name = 'Shipping'))
and would not be converted.
The conditions that must be fulfilled in order for a subquery that follows the must be fulfilled in order for a subquery that follows the must be fulfilled in order for a subquery that follows the IN keyword to be converted to a join are identical to those that must be fulfilled in order for a subquery that follows the ANY keyword to be converted. This is not a coincidence, and the reason for this is that the expression
WHERE column-name IN (subquery)
is logically equivalent to the expression
WHERE column-name = ANY (subquery)
So the query
SELECT emp_fname, emp_lname FROM employee WHERE emp_id IN ( SELECT dept_head_id FROM department WHERE (dept_name='Finance' or dept_name = 'Shipping'))
is equivalent to the query
SELECT emp_fname, emp_lname FROM employee WHERE emp_id = ANY ( SELECT dept_head_id FROM department WHERE (dept_name='Finance' or dept_name = 'Shipping'))
Conceptually, SQL converts a query with the IN operator to one with an ANY operator, and decides accordingly whether to convert the subquery to a join.
A subquery that follows the keyword EXISTS is converted to a join only if it satisfies the following two conditions:
The subquery is not negated
The subquery is correlated; that is, it contains an outer reference.
Therefore, the request, "Which customers placed orders after July 13, 1994?", which can be formulated by this query whose non-negated subquery subquery contains the outer reference customer.id = sales_order.cust_id, could be converted to a join.
SELECT fname, lname FROM customer WHERE EXISTS ( SELECT * FROM sales_order WHERE (order_date > '1994-07-13') AND (customer.id = sales_order.cust_id))
The EXISTS keyword essentially tells SQL to check for NULLs. When inner joins are used, SQL automatically only displays the rows in which there is data from all of the tables in the FROM clause; that is, SQL only returns the rows that do not contain NULLs. So this query returns the same rows as does the one with the subquery:
SELECT fname, lname FROM customer, sales_order WHERE (sales_order.order_date > '1994-07-13') AND (customer.id = sales_order.cust_id).