Query optimization enhancements

New Features and Upgrading Guide
CHAPTER 3. New Features in Adaptive Server Anywhere 6.0

Query optimization enhancements

This section describes the major enhancements that have been made to the query optimizer since the release of SQL Anywhere Version 5.

What is a query optimizer?

When an application sends a SQL statement to Adaptive Server Anywhere, the DBMS carries out a number of operations:

Parsing The statement is converted into a form suitable for internal use.
Rewrite optimization The optimizer may re-write the query in an equivalent form that provides more opportunities for cost-based optimization.
Cost-based optimization For many queries, there are many alternative ways of executing, which generate the same results but with huge differences in the work required. In which order should the tables be accessed? Which indexes can be used, and which should not be used?

In cost-based optimization, alternative access plans are generated and the cost (roughly, the execution time) of each one estimated. The optimizer selects the access plan that has the lowest cost.
Data access Once an access plan has been generated, it is executed. The efficiency of the access depends on how the execution is carried out.

For example, if intermediate results were stored during the operation rather than being recalculated, some computation is saved, but extra memory is required to hold the intermediate results.

The optimizer is the part of Adaptive Server Anywhere that handles steps 2 and 3 in the above list.

What statements are optimized?

The optimizer is commonly called a query optimizer, it optimizes many statements that are not queries. DELETE and UPDATE statements can also include references to many tables.

More accurately, the optimizer attempts to improve the efficiency of queries and search conditions (also called predicates in database literature).

Rewrite optimization

Adaptive Server Anywhere includes query rewrite optimization. This feature has been introduced since the release of SQL Anywhere Version 5.

The goal of rewrite optimization is to transform a query into an equivalent query that has a greater potential for cost-based optimization to find the most efficient access plan.

The rewrite optimization carries out the following kinds of transformation:

Subquery rewriting In many cases, subqueries are rewritten as joins.
Elimination of unnecessary DISTINCT clauses There are cases when a result set is guaranteed distinct, without needing to specify a DISTINCT in the query itself. The rewrite optimizer eliminates the extra work in such cases.

The goals of these optimizations are:

Performance By uncovering more efficient access plans, the performance of many queries and other operations is improved.
Ease of use In many cases, you do not need to test different ways of phrasing queries to see if they perform differently, as the optimizer rewrites inefficient phrasings for you.

Subquery rewriting

Some subqueries are equivalent to joins. The following is a simple example using the sample database.

To list all sales orders from customers in Manitoba:

Type the following in Interactive SQL:

SELECT *
FROM sales_order o
WHERE o.cust_id IN 
   (   SELECT id 
      FROM customer
      WHERE customer.state = 'MB' )

There is a straightforward but inefficient way to evaluate this query.

For each row in the sales_order table, check the customer id.
For each customer id, evaluate the subquery. This involves scanning the customer table looking for all customers from Manitoba, and listing their id values.
Compare the id values, and list those rows of sales_order for which they match.

Evaluating the subquery each time is unnecessarily expensive. The query can be rewritten in the following form:

SELECT sales_order.*
FROM sales_order KEY JOIN customer
WHERE customer.state = 'MB'

This query has exactly the same result set. However, it can be evaluated using a single scan of the index for the foreign key. Rewriting the subquery as a join allows the cost-based optimizer to find a more efficient way of executing the query.

Elimination of unnecessary DISTINCT qualifiers

The following query has an unnecessary DISTINCT qualifier:

SELECT DISTINCT *
FROM product
WHERE unit_price > 10

The result set contains all columns of the table. This includes the primary key, and so the result set is guaranteed to contain no duplicates even without the DISTINCT keyword.

The rewrite optimizer eliminates the unnecessary DISTINCT, eliminating the need for the server to carry out the fruitless task of searching for duplicates.

Subquery caching

There are subqueries that cannot be rewritten as joins—for example, subqueries involving aggregate functions. In these cases, there is a trade off between evaluating the subquery each time it is referenced (CPU time) or caching (storing) the subquery result for repeated use (memory use, and possible disk access).

Adaptive Server Anywhere Version 6 carries out some caching of subqueries. (In some circles, this caching is called memoization.) The caching is limited to four database pages to avoid using excess memory.

For example, the following query computes the average quantity that is ordered at once, for each product:

SELECT prod_id, AVG(quantity)
FROM sales_order_items S2
GROUP BY prod_id

Consider a related query that uses this kind of information as a subquery. The following query retrieves all order items that have a shipping quantity that is less than the average for that product:

SELECT *
FROM sales_order_items S
WHERE S.quantity < ( SELECT AVG(quantity)
      FROM sales_order_items S2
      WHERE S2.prod_id = S.prod_id )

Instead of computing the subquery once for each order item, it is computed only once for each product, and the results are held in a cache for access from the outer query.

Fractional user-supplied estimates

In those cases where the cost-based optimizer does not generate the best access plan, and where query performance is important, you can supply estimates of various parts of the query in order to prompt the optimizer.

For Info For a discussion of user-supplied estimates, see How the optimizer works.

For Info For example, in the following query, the optimizer is told that the percentage of ship_date values later than 30 June 1994 is 1%.

SELECT  ship_date
FROM  sales_order_items
WHERE ( ship_date > '1994/06/30', 1 )
ORDER BY ship_date DESC

For Info In Adaptive Server Anywhere, fractional estimates (such as 0.25) are permitted, allowing the real distribution of data in your database to be accurately made available to the optimizer.