Unsound and Incomplete: June 2013

This isn't meant to be a blog devoted to SQL Server, but since I manage a relatively complex SQL Server application I come across a lot of interesting things relating to the product and building applications with it. This particular example came up in production but as usual I have created a simpler example based on AdventureWorks 2012.

To give some context, the real application from which this is example derived needs to load quite a lot of data as quickly as possible every night. The strategy for that is to partition switch in the data, and to speed up the load into the stage table we disable all the indexes prior to inserting the rows. (Then we re-enable the indexes before partition switching - when partition switching the source table must match the destination exactly). Not only does this ensure that the data is loaded very quickly, but it also leaves no fragmentation.

This approach was used in a number of place therefore a generic stored proc was written which queries sys.indexes to determine the set of indexes either to be rebuilt or disabled:

CREATE PROC DYNAMIC_REBUILD
(
@TABLE_NAME varchar(255),
@ACTION varchar(50)
)
AS
BEGIN
DECLARE @cmd varchar(8000)
SELECT @cmd = COALESCE(@cmd, '') +
' ALTER INDEX [' + i.name +
'] ON ' + @TABLE_NAME + ' ' +
@ACTION + ';' + CHAR(13) + Char(10)
FROM
sys.indexes i
INNER JOIN
sys.objects o
ON i.object_id = o.object_id
WHERE
OBJECT_SCHEMA_NAME(o.object_id) + '.' + OBJECT_NAME(o.object_id) =
@TABLE_NAME
AND o.type_desc = 'USER_TABLE'
AND i.index_id > 1
exec(@cmd)
END

The issue I am going to describe in detail is contained in that proc, but it wasn't obvious to me or the developer who had written the proc.

During the course of the overnight batch process, the proc was called by multiple times, often in parallel (but targeting different tables of course) because of parallel data loads. The first issue was that we experienced deadlocks nearly every day. The deadlock was on sysschobjs, which was initially surprising, although the proc was called within a transaction. That is easily resolved by adding a nolock hint, but in hindsight this should have given a big indication of the underlying problem with the proc.

The more interesting issue is that we found that executions blocked each other. To see this for yourself, create the proc above on AdventureWorks (making sure you add the nolock hint against both sys.indexes and sys.objects). Then in two separate sessions execute:

BEGIN TRAN
exec DYNAMIC_REBUILD @TABLE_NAME='Sales.SalesOrderDetail',
@ACTION='REBUILD'

and

BEGIN TRAN

exec DYNAMIC_REBUILD @TABLE_NAME='Sales.Customer',

@ACTION='REBUILD'

You should see the second session blocks. Let's look at what it is blocking on by executing this query:

SELECT blockee.wait_type, blockee.wait_resource, *

FROM sys.dm_exec_sessions blocker

INNER JOIN sys.dm_exec_requests blockee

ON blocker.session_id = blockee.blocking_session_id

This should return something like this:

wait_type wait_resource

LCK_M_SCH_S OBJECT: 7:1154103152:0

So, the second session is waiting for a schema stability lock (Sch-S) but on which table?

SELECT OBJECT_NAME(1154103152)

returns: SalesOrderDetail

How odd - the proc passed the argument to rebuild the Sales.Customer table is waiting for a Sch-S lock on the table whose indexes were rebuilt by the first proc. Just for completeness, let's check the locks held by the first session:

SELECT request_mode, request_status

FROM sys.dm_tran_locks

WHERE request_session_id=51

AND resource_associated_entity_id = 1154103152

returns:

request_mode request_status

Sch-M GRANT

No surprise really since you would expect a schema modification lock to be held by an index rebuild, and Sch-S is incompatible with Sch-M so that causes blocking. The question is why the query attempts to acquire a Sch-S lock? I did not know when I faced this, and reading the MSDN documentation on locks does not give any clues (at least not to me).

In order to figure out what was going I used extended events. (There are several good online resources to help if you aren't familiar with using them, but I would particularly recommend John Huang's site and Jonathan Keyahias's blog from which I learned a lot). My objective was to understand which locks and in what order were being acquired and released by the query in the proc. I created a session as follows:

CREATE EVENT SESSION understand_locking

ON SERVER

ADD EVENT sqlserver.lock_acquired

(

ACTION (sqlserver.session_id)

ADD EVENT sqlserver.lock_released

(

ACTION (sqlserver.session_id)

)

ADD TARGET package0.asynchronous_file_target

(

SET filename = 'c:\dev\xelog.xel', metadatafile = 'c:\dev\xelog.xem'

)

This just adds the SPID to the set of standard fields that are recorded. You can see the standard fields by executing this query:

SELECT *

FROM sys.dm_xe_object_columns

WHERE object_name = 'lock_acquired'

AND column_type = 'data'

After starting the session (you can do this interactively in SSMS), and executing the procedures you will have a file of event data that you need to process. To do that, you have to do some XML processing. You can use SSMS to examine the results in order to get a feel for the structure (I could not find an xsd) - the following query is a reasonable template for how to process the data:

DECLARE @Temp table (ID int identity(1,1), EventData xml);

INSERT INTO @Temp(EventData)

SELECT cast(event_data as xml) event_data

FROM

sys.fn_xe_file_target_read_file ('c:\dev\xelog*', 'c:\dev\xelog.xem', null, null) a;

SELECT OBJECT_NAME(ObjectID), * FROM

(

select ID,

cast(EventName as varchar(20)) as EventName,

cast(SessionID as varchar(20)) as SessionID,

cast(resource_type as varchar(10)) as ResourceType,

cast(object_id as varchar(16)) as ObjectID,

cast(mode as varchar(10)) Mode,

cast(owner_type as varchar(12)) OwnerType,

cast(database_id as int) DBID,

cast(resource_0 as varchar(16)) Res0,

cast(resource_1 as varchar(16)) Res1,

cast(resource_2 as varchar(16)) Res2,

cast(lockspace_workspace_id as varchar(20)) lockspace_workspace_id,

cast(lockspace_sub_id as varchar(10)) lockspace_sub_id,

cast(lockspace_nest_id as varchar(10)) lockspace_nest_id,

cast(transaction_id as int) TransID,

EventTime

FROM

(

SELECT

a.ID,

xml.value('../@name', 'varchar(128)') EventName,

xml.value('../@timestamp', 'datetime') EventTime,

xml.value('(../action/value)[1]', 'varchar(20)') SessionID,

xml.value('@name','varchar(128)') [Col],

CASE

WHEN isnull(rtrim(xml.value('(./text)[1]', 'varchar(128)')), '') = '' then xml.value('(.)[1]', 'varchar(128)')

ELSE xml.value('(./text)[1]', 'varchar(128)')

END [Value]

FROM

@Temp a

CROSS APPLY

a.EventData.nodes('//data') ed(xml)

) x

PIVOT

(

MAX([Value]) FOR Col IN

( resource_type, object_id, mode, owner_type, transaction_id, database_id,

lockspace_workspace_id, lockspace_sub_id, lockspace_nest_id, resource_0, resource_1,

resource_2)

) pvt

) y

WHERE

(Mode='SCH_M' OR Mode='SCH_S')

This might look a bit daunting but all it does it extract the values from the XML, then pivot it so that you get the data in a single row per event rather than multiple rows. The result looks like this (I have removed columns to make it readable on this blog):

ObjectName EventName SessionID

CountryRegionCurrency lock_acquired 55

CountryRegionCurrency lock_released 55

FK_Product_UnitMeasure_WeightUnitMeasureCode lock_acquired 55

Product lock_acquired 55

FK_Product_UnitMeasure_WeightUnitMeasureCode lock_released 55

Product lock_released 55

From the events, it is clear that it is acquiring and releasing Sch-S locks for a large number of tables in the database. At this point it is time to look at the query plan for the query in the proc that builds the SQL.

The highlighted section shows that there is an index scan on sysschobjs. I haven't read it in any official documentation, but I can only assume that when reading from the catalogue tables that, in addition to shared locks, schema stability locks are acquired too. Reviewing the query, the following bit provokes the scan:

WHERE OBJECT_SCHEMA_NAME(o.object_id) + '.' + OBJECT_NAME(o.object_id) = @TABLE_NAME

And it can easily be rewritten as:

WHERE o.object_id = OBJECT_ID(@TABLE_NAME)

Reviewing the modified query plan, we now see index seeks:

If we rerun the procs in two sessions they don't block each other, and analysis of the extended events shows that Sch-S locks are not acquired on all the tables.

In the end, this was a fairly sraightforward solution - optimising a poor query. However the thing I learned was that you need to pay attention to queries that access the catalogue tables because they can acquire Sch-S locks and lead to contention in some circumstances.

The MSDN documentation is very clear about the limitations of table variables in SQL Server, but their convenience does mean that they are quite tempting to use. I recently had to look at a performance issue which turned out to be related to their use. I thought I would share a similar example here, using the AdventureWorks2012 database.

The example is contrived but illustrates the key point. Consider a stored procedure that populates a table variable with rows identifying SalesOrderDetails. So we would have:

DECLARE @TAB AS TABLE
(
SalesOrderId int not null,
SalesOrderDetailId int not null
PRIMARY KEY CLUSTERED (SalesOrderId, SalesOrderDetailId)
)

In a real world example there would be non-trivial logic to populate that table, but in this example let's just take 3000 rows:

INSERT INTO @TAB
SELECT TOP 3000 SalesOrderId, SalesOrderDetailId
FROM Sales.SalesOrderDetail

Finally, the stored procedure returns the matching rows from the SalesOrderDetail table by joining to the table variable:

SELECT sd.* FROM @TAB t
INNER JOIN Sales.SalesOrderDetail sd
ON t.SalesOrderId = sd.SalesOrderId
AND t.SalesOrderDetailId = sd.SalesOrderDetailID

All pretty simple? Looking at the query plan we have:

At first glance this look ok. No table scans, no index scans just a clustered index seek. However if you look more closely, you can see it is doing a nested loop join. If you look at the actual and estimate I/O statistics there is a very different picture:

As you can see, highlighted in red, the estimated number of rows in the table variable is 1 however it actually contains 3000 rows. If you look at the I/O statistics, since this means 3000 index seeks, we have:

Table 'SalesOrderDetail'. Scan count 0, logical reads 9000, physical reads 0....

The problem is that, as documented, the optimiser always assumes a table variable contains a single row. In this case, it results in a suboptimal plan since a hash or merge join would be significantly more efficient. If we convert the above to use a temp table instead, the following plan is used:

This results in significantly reduced I/O:

Table 'SalesOrderDetail'. Scan count 1, logical reads 1226, physical reads 0....

Table variables can seem appealing since working with temp tables is more awkward. If you are trying to write modular code, the fact that you can create types and instantiate tables from them enables you to avoid having table definitions scattered throughout your procedures. It also makes it easier to pass data to other procedures by using arguments rather than relying that a suitably named temp table already exists. However you need to be very careful about the performance cost. The real world example from which this is derived took about 350 seconds to run with table variables, and that was reduced to about 40 just by changing to temp tables.

It would be useful if you could instantiate temp tables from table types, rather than just table variables. Or alternatively, improve table variables so that they have far fewer limitations and can be used almost interchangably with temp tables. (Even just allowing estimated cardinality to be specified would be a good first step).

Unsound and Incomplete

Pages

Sunday, 30 June 2013

Locking, catalogue tables and index rebuilds

Monday, 10 June 2013

Table variables and the optimiser