Pages

Monday, 10 June 2013

Table variables and the optimiser

The MSDN documentation is very clear about the limitations of table variables in SQL Server, but their convenience does mean that they are quite tempting to use. I recently had to look at a performance issue which turned out to be related to their use. I thought I would share a similar example here, using the AdventureWorks2012 database.

The example is contrived but illustrates the key point. Consider a stored procedure that populates a table variable with rows identifying SalesOrderDetails. So we would have:

DECLARE @TAB AS TABLE
(
    SalesOrderId int not null,
    SalesOrderDetailId int not null
    PRIMARY KEY CLUSTERED (SalesOrderId, SalesOrderDetailId)
)

In a real world example there would be non-trivial logic to populate that table, but in this example let's just take 3000 rows:

INSERT INTO @TAB
SELECT TOP 3000 SalesOrderId, SalesOrderDetailId
FROM Sales.SalesOrderDetail

Finally, the stored procedure returns the matching rows from the SalesOrderDetail table by joining to the table variable:

SELECT sd.* FROM @TAB t
INNER JOIN Sales.SalesOrderDetail sd
ON t.SalesOrderId = sd.SalesOrderId
AND t.SalesOrderDetailId = sd.SalesOrderDetailID

All pretty simple? Looking at the query plan we have:

At first glance this look ok. No table scans, no index scans just a clustered index seek. However if you look more closely, you can see it is doing a nested loop join. If you look at the actual and estimate I/O statistics there is a very different picture:

As you can see, highlighted in red, the estimated number of rows in the table variable is 1 however it actually contains 3000 rows. If you look at the I/O statistics, since this means 3000 index seeks, we have:

Table 'SalesOrderDetail'. Scan count 0, logical reads 9000, physical reads 0....

The problem is that, as documented, the optimiser always assumes a table variable contains a single row. In this case, it results in a suboptimal plan since a hash or merge join would be significantly more efficient. If we convert the above to use a temp table instead, the following plan is used:

This results in significantly reduced I/O:

Table 'SalesOrderDetail'. Scan count 1, logical reads 1226, physical reads 0....

Table variables can seem appealing since working with temp tables is more awkward. If you are trying to write modular code, the fact that you can create types and instantiate tables from them enables you to avoid having table definitions scattered throughout your procedures. It also makes it easier to pass data to other procedures by using arguments rather than relying that a suitably named temp table already exists. However you need to be very careful about the performance cost. The real world example from which this is derived took about 350 seconds to run with table variables, and that was reduced to about 40 just by changing to temp tables.

It would be useful if you could instantiate temp tables from table types, rather than just table variables. Or alternatively, improve table variables so that they have far fewer limitations and can be used almost interchangably with temp tables. (Even just allowing estimated cardinality to be specified would be a good first step).

No comments:

Post a Comment