Chapter 3 A High Performance DBMS Minimizes Contention

Overview

With Sybase, database performance is a combination of high scalability, large throughput, and fast response time. Sybase defines these factors as follows:

Scalability means that throughput increases linearly with increased resources.
Throughput refers to the volume of transactions the RDBMS can process in a fixed time period; the higher the volume, the more work your system can perform.
Response time is the time it takes for the RDBMS to complete a task to process a query and return results to the user.

Scalability is the ability to maintain consistent performance while supporting an increasing number of users and applications. Scalability is more than the ability to run on a variety of hardware, from low-end to high-end systems. It also means taking full advantage of that hardware to deliver very high levels of throughput.

The key to building a fully scalable RDBMS is finding and removing "hot spots," or contention points. Contention points are areas in the system that require concurrent access by multiple users or multiple database servers. Most contention points are found in the areas of cache management, logging, and locking, which can cause transactions to slow each other down. To find and eliminate these hot spots from a Sybase environment, the SQL Server 11 performance team built a very sophisticated performance monitor to simulate a wide variety of user workloads and thoroughly evaluate every major component of SQL Server. The result is an optimized RDBMS that fully exploits CPU, memory, and I/O resources to provide real-world application scalability across a very wide range of hardware configurations.

This chapter describes how SQL Server 11 minimizes contention in the areas of caching, logging, and locking to achieve scalable high performance. It discusses the following topics:

DBMS Components

The DBMS contains three components that handle all the database-specific tasks: the DBMS engine, the resource managers, and the access manager.

The DBMS engine takes care of parsing, optimizing compiling, and executing Transact-SQL statements and returning results.

Resource managers are a set of functions that manage global system resources such as pages, buffers, and data structures. The DBMS uses these functions to manipulate different system resources. Resource managers and other specialized tasks rely on the kernel component (see Chapter 4, "A Multiprocessing Kernel") to perform low-level functions. The resource managers handle:

Query processing
Caching
Indexing
Transaction processing
Locking

Access methods manage data on disk and manipulate database objects such as tables and rows.

Cache Management

SQL Server 11 data cache management supports an evolving database environment by letting you customize data caches to maintain high performance. Data cache performance also benefits from System 11's "housekeeper" task, which takes advantage of lulls in activity to perform disk writes.

SQL Server has a unique procedure cache technology as well. After a stored procedure is called the first time, the precompiled procedure is held in the procedure cache so it is ready to run the next time it is called. This greatly reduces compile time for frequently run stored procedures.

This section describes SQL Server's data and procedure cache technology.

The Data Cache and Named Caches

The data cache contains pages from recently accessed objects such as tables, log pages, and indexes. The Logical Memory Manager (LMM) (also referred to as the Buffer Cache Manager in some documents) lets you "carve up" the global data cache into independent caches that you name. You can then bind database objects or entire databases to your named caches.

These user-defined caches tailor SQL Server's use of the global data cache to meet the system's current work-load requirements. Figure 3-1 illustrates how you might divide the global cache in a hypothetical production environment. If you have a small, frequently accessed Customer_List table, you might want to devote to it a buffer cache that contains enough buffers to keep the entire table in memory. Because no other table shares that cache, the table's pages remain in memory for quick access. If, on the other hand, you have a large Payment_History table that is only used for low priority batch processing, you can allocate a small cache and assign the Payment_History table to it. Because queries on this table only use buffers from the small cache, only old Payment_History pages are read. These reads do not cause valuable pages (such as those in the frequently accessed Customer_List table) to be removed from cache.

Figure 3-1: SQL Server 11 cache architecture

The system procedures that allow you to do this type of configuration are sp_cacheconfig, sp_poolconfig, sp_bindcache, and sp_unbindcache. These system procedures let you reconfigure named caches "on the fly" to match changing workload requirements even within the same day's operations.

For comparison, Figure 3-2 shows a standard DBMS cache architecture in which all objects occupy a single global cache. Queries must compete with each other for a single buffer manager semaphore.

Figure 3-2: Standard DBMS cache architecture

The graph in Figure 3-3 illustrates that using multiple caches reduces contention on the buffer manager semaphore, improving scalability. The logical memory manager allows the system to optimally exploit the available memory resources. This is particularly beneficial in tuning small memory systems; it also helps to get the most out of large memory configurations.

Figure 3-3: The Logical Memory Manager reduces contention

Named Caches and Performance

Adding named data caches can improve performance in the following ways:

When changes are made to a cache by any user process, a spinlock denies all other processes access to the cache. Although spinlocks are very brief, they can slow performance in multiprocessor systems with high transaction rates. When you configure multiple caches, each is controlled by a separate spinlock, increasing concurrency on multiprocessor systems.
You can configure caches so they are large enough to hold critical tables and indexes. This keeps other server activity from contending for cache space and speeds up queries that use these tables because the hit rate is 100 percent; that is, the needed pages are always found in cache.
You can bind a "hot" table¯a table in high demand by user applications¯to one cache and indexes on the table to other caches to increase concurrency.
You can create a cache large enough to hold the "hot pages" of a table where a high percentage of the queries reference only a portion of the table. For example, if a table contains data for a year, but 75 percent of the queries reference data from the most recent month (about 8 percent of the table), configuring a cache of about 10 percent of the table size provides room to keep the most frequently used pages in cache and leaves some space for less frequently used pages as well.
You can assign tables or databases used in decision support systems (DSS) to specific caches with large I/O configured. This keeps DSS applications from contending for cache space with online transaction processing (OLTP) applications. DSS applications typically access large numbers of sequential pages, and OLTP applications typically access relatively few random pages.
You can bind the tempdb database to its own cache. Because all processes that create worktables or temporary tables use tempdb, binding it to its own cache means that other processes don't have to compete for cache access because tempdb is heavily accessed. Proper sizing of tempdb's cache can keep most tempdb activity in memory for many applications. If this cache is large enough, you can even avoid I/O for tempdb.
You can bind a database's log to a cache, again reducing contention for cache space and access to the cache.

Most of these uses for named data caches have the greatest impact on multiprocessor systems with high transaction rates or frequent DSS queries and multiple users. Some of these uses will increase performance on single CPU systems if they lead to improved utilization of memory and reduced I/O.

Cache Optimization Strategies

System 11 provides commands that give you control over how SQL Server applies optimization strategies to objects, sessions, and queries. Here are some options that support named cache capabilities and provide additional control when tuning queries:

The set forceplan on option forces SQL Server to join the tables in a query in the order in which they are listed in the from clause.
The set table count option allows you to specify the number of tables that are considered at once as joins are optimized.
The system procedure sp_cachestrategy disables and re-enables cache strategies and large I/O for individual objects.
Clauses for select, delete, and insert statements allow you to specify the index, cache strategy, or I/O size for a query.
The set prefetch option allows you to enable or disable large I/O for a session.

The Off-peak Disk Writer

When SQL Server has no user tasks to process, a housekeeper task automatically begins writing dirty buffers from cache to disk. Because these writes are done during the server's idle cycles, they are known as free writes. The housekeeper provides huge benefits for "smoothing" system response in transaction-intensive environments. Other benefits are:

Improved CPU utilization
Decreased need for buffer washing during transaction processing
Faster checkpoints
Shorter recovery time

SQL Server has long been known to have a very efficient checkpoint capability with recovery measured in seconds rather than in minutes or even hours, as in some other systems. SQL Server 11 significantly extends Sybase's advantage in this area. According to users of Oracle and Informix, checkpointing I/O load and recovery times remain significant constraints for their OLTP systems.

Data Cache Management Summary

No other vendor offers the same level of workload adaptability or sophisticated memory management. Oracle has added "direct reads and writes" so that DSS queries bypass the buffer cache, but there is no way to explicitly manage or configure this capability. Further, direct writes can only be used in special situations where the transaction does not have to be recovered in case of system failure. Oracle also has no ability to perform sequential prefetch operations for large tables or datatypes (there's more on the prefetch capability in Chapter 4, "A Multiprocessing Kernel"). Informix and Microsoft still buffer all memory activity into a single global cache, incurring constant mixed workload conflict and reducing system throughput. SQL Server 11 lets customers configure the RDBMS to meet their mixed workload requirements for high performance at the lowest possible resource cost on all standard SMP systems.

Procedure Cache

The procedure cache is used for query plans, stored procedures, and triggers. SQL Server maintains a most recently used/least recently used (MRU/LRU) chain of stored procedure query plans. As users execute stored procedures, SQL Server looks in the procedure cache for a query plan to use. If a query plan is available, it is placed on the MRU end of the chain and execution begins. If no plan is in memory, or if all copies are in use, the query tree for the procedure is read from a system table. It is then optimized, using the parameters provided to the procedure, and put on the MRU end of the chain, and execution begins. Plans at the LRU end of the page chain that are not in use are aged out of the cache.

Figure 3-4: The procedure cache

The memory allocated for the procedure cache holds the optimized query plans for all batches, including any triggers. See ``Stored Procedures'' for more detailed information about how the procedure cache works.

Transaction Management and Logging

A DBMS must have a mechanism for keeping track of all transactions. The SQL Server RDBMS does this by keeping a transaction log for each Sybase database. The transaction log records changes that any user makes to that database.

Figure 3-5 shows the data cache during a transaction. The data cache holds the data pages that the transaction is using and all other log pages and data pages that currently reside in the data cache.

The transaction log is a write-ahead log. When a user issues a SQL statement that modifies any data in the database, SQL Server automatically writes the change to the log before it writes the change to the actual data page in cache. This strategy ensures that SQL Server can reverse all changes made by a transaction if any part of the transaction fails.

Figure 3-5: Writing data changes

SQL Server logs transactions on a per-database basis rather than on a per-user basis as some database managers do. Because all database users share the same transaction log, there are fewer writes to disk, which is an automatic performance gain. In fact, in some situations the amount of I/O per user actually decreases as the number of users increases.

Changing the log I/O size using sp_logiosize can improve performance by reducing the number of times that SQL Server writes transaction log pages to disk. By default, SQL Server defines the log I/O size of a database to be 4K. If the transaction log for a database is bound to the default cache or a user-defined cache that does not contain a 4K memory pool, SQL Server sets the log I/O size to 2K (a 2K memory pool is always present in any cache).

Minimizing Log Contention with the User Log Cache

Most relational databases write log records for each update, insert, or delete statement and processes each log entry one at a time. When a log record is written, the logging system must navigate through a synchronization point, the log semaphore, which controls concurrent access to the log by multiple database transactions.

Because every transaction involves the logging system, its efficiency is the most important factor for high transaction throughput. An analytical study of the logging subsystem revealed that the real bottleneck for transaction throughput was not the log file on disk, but the log semaphore in memory. As database engines are added and the transaction volume increases, the contention on the log semaphore dramatically increases, resulting in reduced scalability.

System 11's logging mechanism boosts overall performance by enabling user log caches (ULCs) (also referred to as the private log caches (PLCs) in some documents) and group commits. There is one user log cache for each configured user connection. SQL Server buffers the user transaction log records with user log caches, which reduces the contention at the end of the transaction log. When a user log cache becomes full or when another event occurs (such as when the transaction completes), SQL Server "flushes" all log records from the user log cache to the database transaction log. You can configure the size of all user log caches for your server.

A user log cache improves I/O and scalability on multiprocessor systems because it is only necessary to acquire the log semaphore once for all log records. Expanded group commits decrease the system overhead by permitting the logging mechanism to log more transactions in a single I/O.

Because more than one process can access the contents of a user log cache when determining a possible "flush," SQL Server uses spinlocks to protect the user log caches. A spinlock is an internal locking mechanism that prevents a second process from accessing a data structure when another process is using it. All processes trying to access the resource must wait (or spin) until the lock is released. SQL Server uses sp_configure's configurable user log cache spinlock ratio parameter to specify the ratio of user log caches per user log cache spinlock.

Because the log records for a complete transaction are immediately transferred through the log semaphore, contention on the log semaphore becomes almost nonexistent. Figure 3-6 shows that using the ULC eliminates contention on the append_log semaphore. Contention alleviated by the ULC dramatically increases SQL Server's transaction throughput.

Figure 3-6: The ULC eliminates log contention

Lock Management

SQL Server locks the tables or databases that are currently used by active transactions to protect them from being modified by other transactions at the same time. Rather than rely on an operating system's lock manager, which can be quite inefficient, SQL Server handles its own locking by means of a memory-resident table. Obtaining a lock takes only a few microseconds and requires no I/O or message traffic.

This gives you a great deal of control over how the lock manager behaves. System 11's rich set of configuration parameters lets you fine tune lock behavior to get the best performance possible from your system as it evolves.

Page Locks and Table Locks

Table locks generate more lock contention than page locks because no other tasks can access a table while there is an exclusive table lock on it, and if a task requires an exclusive table lock, it must wait until all shared locks are released. Some systems use row-level locking. However, if any change to a row is made that alters the row's space requirements or its location, the lock must be escalated to page level. Other systems escalate to a table-level lock during the actual update. When this happens, all transactions are effectively single-threaded, which is not acceptable in an on-line environment. Many DBMS vendors that use row-level locking "turn it off" during benchmark tests, during which they use page locking to achieve better throughput results.

SQL Server uses page-level locking, and only escalates to table locks when it detects that a large portion of the table will be affected, as in the case of a bulk load or an update operation that affects many rows throughout the table. In addition, Sybase supplies extensions such as intent and demand locks to extend concurrency.

Lock Promotion

You can use system procedures to configure the number of page locks that SQL Server acquires on a table before it attempts to escalate to a table lock on a server-wide, per-database, and per-table basis. This mechanism is particularly useful for very large databases. A lock promotion threshold lets you set either the number of page locks or a percentage allowed by a statement before the lock is promoted to a table lock.

Spinlock Ratios

SQL Server manages table, page, and address locks using an internal hash table. These tables use one or more spinlocks to serialize access between processes that are running on different engines. For SQL Servers running with multiple SQL Server engines, you can set the ratio of spinlocks protecting each hash table using the table lock spinlock ratio, page lock spinlock ratio, and address lock spinlock ratio configuration parameters.

Deadlocks

A deadlock occurs when two process each hold a lock on a page that the other process needs. Each waits for the other process to release the lock so it can continue processing. You can configure the minimum amount of time that a process must wait before SQL Server checks for deadlocks. If you expect your applications to deadlock infrequently, you can configure long deadlock checking delays and reduce the overhead cost.

Freelocks

A freelock list defines the engine's portion of freelocks from the global freelock pool. The portion is definable through the max engines freelocks configuration parameter as a percentage of the total number of locks. After an engine completes a process, all locks held by that process are released and returned to that engine's freelock list. This reduces the contention for each engine accessing the global freelock list. However, if the number of locks released to the engine exceeds the maximum number of locks allowed in the engine's freelock list, SQL Server moves locks to the global freelock list. This replenishes the number of locks that are available to other engines.

For most SQL Server configurations, you can increase the percentage of locks assigned to the engine freelock lists and improve application performance.

Rows Per Page, Contention, and Concurrency

You can reduce lock contention and improve concurrency for frequently accessed tables by restricting the maximum number of rows stored on a data page, clustered index leaf page, or a nonclustered index leaf page. You can configure the maximum number of rows per page by specifying a value for max_rows_per_page in a create table, create index, or alter table statement.

max_rows_per_page applies to the data pages of an unindexed table or the leaf pages of an index and is retained and used even after the table or index is created. SQL Server uses the max_rows_per_page value whenever it adds or deletes rows.

Page Allocation

The page utilization percent parameter is used during page allocations to control whether SQL Server scans a table's OAM (object allocation map) to find unused pages, or simply allocates a new extent to the table. A low page utilization percent value results in more unused pages. A high page utilization percent value slows page allocations in large tables, as SQL Server performs an OAM scan to locate each unused page before allocating a new extent.

For example, when inserting data into a 10GB table that has 120 OAM pages and only one unused data page:

A page utilization percent of 100 tells SQL Server to scan through all 120 OAM pages to locate the unused data page.
A page utilization percent of 95 allows SQL Server to allocate a new extent to the object, because 95 is lower than the ratio of used pages to used and unused pages.

Isolation Level 0

Queries executing at isolation level 0 do not acquire any read locks for their scans, so they do not block other transactions from writing to the same data (and vice versa). Applications that are not impacted by read-through locks may see better concurrency and reduced deadlocks when accessing the same data using isolation level 0. However, transactions that require data consistency should not use isolation level 0.

You can selectively choose isolation level 0 for a query in a transaction using the at isolation clause as follows:

select *

from titles

at isolation read uncommitted

You can also set isolation level 0 at the session level.

[Top] [Prev] [Next] [Bottom]