Massive tempdb usage implies a missing index opportunity

Tripped over a great example yesterday, where an application was recently upgraded, and ran out of temp db space for the tables, and not the log. The TLDR was that an inefficient query was sticking 500MB of data into a tempdb table to satisfy a query, and the query was called hundreds of times, with various parameters.

How it all happened: I had participated in a third party application upgrade over the weekend, so I was a bit more alert for anything related to that newly upgraded application than I otherwise would be for a long owned app stack.

I got an automated space alert about tempdb log running out of space. By the time I connected, the mount point that contains tempdb files and log was already at zero space. Normally, you would look and say “tempdb log out of space, it has gotta be a long running transaction” I look and the mdf/ndf files are 2.5x their normal size, 250GB instead of 90GB, and the log file is a measly 3GB. yes, there was an open transaction, but that was because log space ran out, and it could not write anymore. so the mdf/ndf were bloating like crazy for some reason.

I used my envwho wrapper for Adam Machanic’s sp_whoisactive to review current queries. I found a stack /dozens of queries running effectively the same query from the application, but with different parameters, and all of them had huge numbers related to CPU, tempdb allocations, reads and writes, as seen below.

Grabbing the query itself, it looks very simple, and really implies no indexes, right? I see a table, the columns in the WHERE will map as the columns in the index i need, and two columns in the INCLUDE portion.

<?query --
SELECT DataID, Name FROM DTree WHERE Status = 1 AND SubType = 482

My next step was to review the indexes on the table, I doubted there was a matching index but maybe there was one that looked the same, and i could modify it. I ran my EXECUTE sp_GetDDLa ‘DTree’ via a keyboard shortcut to get the table definition, which I found was actually a VIEW instead.

IF OBJECT_ID('[dbo].[DTree]') IS NOT NULL 
DROP  VIEW      [dbo].[DTree] 

And now the core issue jumped right into my face. a view, on a column with no selectivity, [Deleted] = 0. I know that requires a table scan, and with the other columns from the WHERE clause, that’s going to spool into tempdb. yuck. how big is the table? 7.8 M rows.

So At this point, the app is now effectively shut down, no queries going through, and I loop in the lead developer to let him see what I see via Teams. My options are ask for more space, maybe fail the instance over to the other node on it’s cluster to clear all bloated connections that have jammed up tempdb, and then fix it via this indexing opportunity. I got our engineering team to add some space, and as soon as that was added, I got a salute of ran index from the Lead developer, and created an index(my initials _LI_ in the index name so I can differentiate it form pre-existing indexes)

CREATE INDEX IX_LI_DTreeCore_Status_SubType ON [dbo].[DTreeCore] (Status,SubType) INCLUDE (DataID, NAME) WHERE (Deleted = 0)

I immediately went back to my previous query, and see the index greatly satisfied the query and performance; CPU very low, often no tempdb, reads and writes near zero.

Problem solved, and I sent a typical email notification tot he business groups about my findings and solution.

Leave a Reply