Tuesday, October 21, 2014

What On Earth Is Consuming My Transaction Log - I've Got That Bloating Feeling

One more than one occasion, I've gotten the call that the log drive on one of my database servers has suddenly filled to capacity.  Some things may continue to work, but the situation gets worse and as time goes on, the possibility of a SQL Server crash increases.

Usually, a quick check of files in the log folder (sorted by size) reveals that the log growth is attributed to one database (in my experience, TempDB seems to be the most common database for this to happen in, but I have also seen other databases pop up with this issue). 

Massive log growth typically results from one rogue transaction that either has a really nasty execution plan, or is just incredibly inefficient.

Once you've identified which database is the problem child, check the free space on the log file.  I usually use this script:
SELECT DB_NAME() AS DBName, name AS LogicalFileName, filename AS PhysicalFileName,
        CONVERT(DECIMAL(12, 2), ROUND(size / 128.000, 2)) AS FileSizeMB,
        CONVERT(DECIMAL(12, 2), ROUND(FILEPROPERTY(name, 'SpaceUsed') / 128.000, 2)) AS DataSizeMB,
        CONVERT(DECIMAL(12, 2), ROUND((size - FILEPROPERTY(name,
'SpaceUsed')) / 128.000, 2)) AS FreeSpaceMB
FROM    dbo.sysfiles;


If there is plenty of free space in the log file, then then rogue transaction completed.  If there is no (or precious little) free space, you'll want to run the following command in the context of that database to find the open transactions in the database:
DBCC OPENTRAN

This will give you the SPID(s) for the open transactions in that database.  A quick sp_WhoIsActive SPID# (or a combination of EXEC sp_who2 SPID# and DBCC INPUTBUFFER (SPID#) if you have not yet implemented sp_WhoIsActive) will reveal what query is running and who is running it.

You can find Adam Machanic's awesome sp_WhoIsActive here (personally, I recommend v11.11): http://sqlblog.com/files/default.aspx

At this point, chances are you just want to kill the offending SPID(s) using a simple KILL SPID# command.  However, if it is a long running UPDATE, INSERT, or DELETE, you may be in for quite a wait while a rollback is performed and it may be better to try to free up or allocate additional space to allow the transaction to complete.

Once the transaction is killed (or completes), you should see the free space in the log file increase dramatically.  At this point you can shrink the log file back down to a reasonable, appropriate size (emphasis here on appropriate size, you do not want to shrink it to 0 MB as it will need to grow again to the normal working size).

SQL Server should recover at this point and you're on your way to fight another fire.  However, before you chalk it up as complete, you should follow up on the cause of the rogue query.  If it was a user query, check with the user to see what they were trying to accomplish.  If it was an automated process that has worked fine in the past, check to see if the statistics are up-to-date.  It's amazing how bad of an execution plan SQL Server can pick/generate when the statistics are no good.

By following up on the issue, you'll become more than "just a DBA"; you're well on your way to becoming a Rockstar DBA!  Many thanks to Thomas LaRock for publishing an awesome book (and making it a free download)!