New Features and Upgrading Guide
CHAPTER 3. New Features in Adaptive Server Anywhere 6.0
This section describes the major enhancements that have been made to the query optimizer since the release of SQL Anywhere Version 5.
When an application sends a SQL statement to Adaptive Server Anywhere, the DBMS carries out a number of operations:
Parsing The statement is converted into a form suitable for internal use.
Rewrite optimization The optimizer may re-write the query in an equivalent form that provides more opportunities for cost-based optimization.
Cost-based optimization For many queries, there are many alternative ways of executing, which generate the same results but with huge differences in the work required. In which order should the tables be accessed? Which indexes can be used, and which should not be used?
In cost-based optimization, alternative access plans are generated and the cost (roughly, the execution time) of each one estimated. The optimizer selects the access plan that has the lowest cost.
Data access Once an access plan has been generated, it is executed. The efficiency of the access depends on how the execution is carried out.
For example, if intermediate results were stored during the operation rather than being recalculated, some computation is saved, but extra memory is required to hold the intermediate results.
The optimizer is the part of Adaptive Server Anywhere that handles steps 2 and 3 in the above list.
The optimizer is commonly called a query optimizer, it optimizes many statements that are not queries. DELETE and UPDATE statements can also include references to many tables.
More accurately, the optimizer attempts to improve the efficiency of queries and search conditions (also called predicates in database literature).
Adaptive Server Anywhere includes query rewrite optimization. This feature has been introduced since the release of SQL Anywhere Version 5.
The goal of rewrite optimization is to transform a query into an equivalent query that has a greater potential for cost-based optimization to find the most efficient access plan.
The rewrite optimization carries out the following kinds of transformation:
Subquery rewriting In many cases, subqueries are rewritten as joins.
Elimination of unnecessary DISTINCT clauses There are cases when a result set is guaranteed distinct, without needing to specify a DISTINCT in the query itself. The rewrite optimizer eliminates the extra work in such cases.
The goals of these optimizations are:
Performance By uncovering more efficient access plans, the performance of many queries and other operations is improved.
Ease of use In many cases, you do not need to test different ways of phrasing queries to see if they perform differently, as the optimizer rewrites inefficient phrasings for you.
Some subqueries are equivalent to joins. The following is a simple example using the sample database.
Type the following in Interactive SQL:
SELECT * FROM sales_order o WHERE o.cust_id IN ( SELECT id FROM customer WHERE customer.state = 'MB' )
There is a straightforward but inefficient way to evaluate this query.
For each row in the sales_order table, check the customer id.
For each customer id, evaluate the subquery. This involves scanning the customer table looking for all customers from Manitoba, and listing their id values.
Compare the id values, and list those rows of sales_order for which they match.
Evaluating the subquery each time is unnecessarily expensive. The query can be rewritten in the following form:
SELECT sales_order.* FROM sales_order KEY JOIN customer WHERE customer.state = 'MB'
This query has exactly the same result set. However, it can be evaluated using a single scan of the index for the foreign key. Rewriting the subquery as a join allows the cost-based optimizer to find a more efficient way of executing the query.
The following query has an unnecessary DISTINCT qualifier:
SELECT DISTINCT * FROM product WHERE unit_price > 10
The result set contains all columns of the table. This includes the primary key, and so the result set is guaranteed to contain no duplicates even without the DISTINCT keyword.
The rewrite optimizer eliminates the unnecessary DISTINCT, eliminating the need for the server to carry out the fruitless task of searching for duplicates.
There are subqueries that cannot be rewritten as joins—for example, subqueries involving aggregate functions. In these cases, there is a trade off between evaluating the subquery each time it is referenced (CPU time) or caching (storing) the subquery result for repeated use (memory use, and possible disk access).
Adaptive Server Anywhere Version 6 carries out some caching of subqueries. (In some circles, this caching is called memoization.) The caching is limited to four database pages to avoid using excess memory.
For example, the following query computes the average quantity that is ordered at once, for each product:
SELECT prod_id, AVG(quantity) FROM sales_order_items S2 GROUP BY prod_id
Consider a related query that uses this kind of information as a subquery. The following query retrieves all order items that have a shipping quantity that is less than the average for that product:
SELECT * FROM sales_order_items S WHERE S.quantity < ( SELECT AVG(quantity) FROM sales_order_items S2 WHERE S2.prod_id = S.prod_id )
Instead of computing the subquery once for each order item, it is computed only once for each product, and the results are held in a cache for access from the outer query.
In those cases where the cost-based optimizer does not generate the best access plan, and where query performance is important, you can supply estimates of various parts of the query in order to prompt the optimizer.
For a discussion of user-supplied estimates, see How the optimizer works.
For example, in the following query, the optimizer is told that the percentage of ship_date values later than 30 June 1994 is 1%.
SELECT ship_date FROM sales_order_items WHERE ( ship_date > '1994/06/30', 1 ) ORDER BY ship_date DESC
In Adaptive Server Anywhere, fractional estimates (such as 0.25) are permitted, allowing the real distribution of data in your database to be accurately made available to the optimizer.