Underlying assumptions

User's Guide
PART 4. Database Administration and Advanced Use
CHAPTER 25. Query Optimization

Underlying assumptions

A number of assumptions underlie the design direction and philosophy of the Adaptive Server Anywhere query optimizer. You can improve the quality or performance of your own applications through an understanding of the optimizer's decisions. These assumptions provide a context in which you may understand the information contained in the remaining sections.

Assumptions

The list below summarizes the assumptions upon which the Adaptive Server Enterprise optimizer is based.

Assumption

Implications

Minimal administration work

Self-tuning design that requires fewer performance controls.

No separate statistics-gathering utility

Applications tend to retrieve only the first few rows of a cursor

Indices are used whenever possible

Use of temporary tables is discouraged

Selectivity statistics necessary for optimization are available in the Column Statistics Registry

Optimization decisions are based on prior query execution.

Dropping optimizer statistics makes the optimizer ineffective.

An index can be found to satisfy a join predicate in virtually all cases

Performance is poor if a suitable index cannot be found.

Virtual memory is a scarce resource

Intermediate results are not materialized unless absolutely necessary.

Assumption	Implications
Minimal administration work	Self-tuning design that requires fewer performance controls. No separate statistics-gathering utility
Applications tend to retrieve only the first few rows of a cursor	Indices are used whenever possible Use of temporary tables is discouraged
Selectivity statistics necessary for optimization are available in the Column Statistics Registry	Optimization decisions are based on prior query execution. Dropping optimizer statistics makes the optimizer ineffective.
An index can be found to satisfy a join predicate in virtually all cases	Performance is poor if a suitable index cannot be found.
Virtual memory is a scarce resource	Intermediate results are not materialized unless absolutely necessary.

Minimal administration work

Traditionally, high-performance database engines have relied heavily on the presence of a knowledgeable, dedicated, database administrator. This person spent a great deal of time adjusting data storage and performance controls of all kinds to achieve good database performance. These controls often required continuing adjustment as the data in the database changed.

Anywhere learns and adjusts as the database grows and changes. Each query betters its knowledge of the data distribution in the database. Anywhere automatically stores and uses this information to optimize future queries.

Every query both contributes to this internal knowledge and benefits from it. Every user can benefit from knowledge that Anywhere has gained through executing another user's queries.

Statistics gathering mechanisms are thus an integral part of the database server. Because of this design, no external mechanism is required. Should you find an occasion where it would help, you can provide the database server with estimates of data distributions to use during optimization. If you encode these into a trigger or procedure, for example, you then assume responsibility for maintaining these estimates and updating them whenever appropriate.

Only first few rows of a cursor used frequently

Many application programs examine only the first few rows of a cursor, particularly when the cursor is ordered. Select the ordering carefully for best results.

To accommodate this observation, the optimizer avoids materializing cursors whenever possible. Since few rows of the cursor are likely to be fetched, this strategy allows Adaptive Server Anywhere to reduce the time required to pass the first row of the result to the application.

Statistics are present and correct

The optimizer is self-tuning. All the needed information is stored internally. The column statistics registry is a persistent repository of data distributions and predicate selectivity estimates. At the completion of each query, Adaptive Server Anywhere uses statistics gathered during query execution to update this registry. In consequence, all subsequent queries gain access to more accurate estimates.

The optimizer relies heavily on these statistics and, because it does so, the quality of the access plans it generates depends heavily on them. If you have recently reloaded your database or inserted a lot of new rows, these statistics may no longer accurately describe the data. You may find that your first subsequent queries execute unusually slowly.

You can assist Anywhere in its efforts to correct its statistical information by executing sample queries. As Anywhere executes these statements, it will learn from its experience. Correct statistical information can dramatically improve the efficiency of subsequent queries.

An index can usually be found to satisfy a predicate

Often, Anywhere can evaluate predicates with the aid of an index. By using an index, the optimizer can speed access to data and reduce the amount of information read. Whenever possible, Anywhere uses indices to satisfy ORDER BY, GROUP BY, and DISTINCT clauses.

When the optimizer cannot find a suitable index, it must resort to a table scan, which can be expensive. An index can improve performance dramatically when joining tables. You should add indices to tables or rewrite queries wherever doing so will facilitate the efficient processing of common requests.

Virtual Memory is a scarce resource

The operating system and a number of applications frequently vie for the memory of a typical computer. Adaptive Server Anywhere treats memory as a scarce resource. Because it uses memory economically, Anywhere can run on relatively small computers. This economy is important if you wish your database to operate on portable computers or on older machines.

Reserving extra memory, for example to hold the contents of a cursor, may be expensive. If the buffer cache is full, one or more pages may have to be written to disk to make room for new pages. Some pages may need to be re-read to complete a subsequent operation.

In recognition of this situation, Adaptive Server Anywhere associates a higher cost with execution plans that require additional buffer cache overhead. This cost discourages the optimizer from choosing plans that use temporary tables.

On the other hand, it is careful to use memory where it will improve performance. For example, caches the results of subqueries when they will be needed repeatedly during the processing of the query.

Rewriting sub-queries as exists predicates

The assumptions which underlie the design of Anywhere require that it conserve memory and that it return the first few results of a cursor quickly as possible. In keeping with these objectives, Adaptive Server Anywhere rewrites all set-operation sub-queries, such as IN, ANY, or SOME predicates, as EXISTS predicates. By doing so, Anywhere avoids creating unnecessary temporary tables and may more easily identify a suitable index through which to access a table.

Non-correlated sub-queries are sub-queries that contain no explicit reference to the table or tables contained in the rest higher-level portions of the tables.

The following is an ordinary query that contains a non-correlated subquery. It selects information about all the customers who did not place an order on January 1, 1998.

Non-correlated subquery

SELECT *
FROM customer c
WHERE c.id NOT IN
   (   SELECT o.cust_id
      FROM sales_order o
      WHERE o.order_date = '1998-01-01' )

PLAN> c (seq): o (ky_so_customer)

One possible access plan is to first read the sales_order table and create a temporary table of all the customers who placed orders on January 1, 1998, then, read the customer table and extract one row for each customer listed in the temporary table.

However, Adaptive Server Anywhere avoids materializing results. It also gives preference to plans that return the first few rows of a result most quickly. Thus, the optimizer rewrites such queries using EXISTS predicates. In this form, the subquery becomes correlated: the subquery now contains an explicit reference to the id column of the customer table.

Correlated subquery

SELECT *
FROM customer c
WHERE NOT EXISTS
   (   SELECT *
      FROM sales_order o
      WHERE o.order_date = '1993-01-01'
         AND (    o.cust_id = c.id
            OR   o.cust_id IS NULL
            OR   c.id IS NULL ) )

PLAN> c (seq): o (seq)

This query is semantically equivalent to the one above, but when expressed in this new syntax two advantages become apparent.

The optimizer can choose to use either the index on the cust_id attribute or the order_date attribute of the sales_order table. (However, in the sample database, only the id and cust_id columns are indexed.)
The optimizer has the option of choosing to evaluate the subquery without materializing intermediate results.

Anywhere can cache the results of this subquery during processing. This strategy lets Anywhere reuse previously computed results. In the case of query above, caching will not help because customer identification numbers are unique in the customer table.

For Info Further information on subquery caching is located in Subquery caching.