Saturday, January 17, 2015

Interview With A DBA - Why Do You Do What You Do?

I recently had a student conducting a career study/project contact me with questions about my career and the choices I made to get where I am.  I'm certainly not perfect by any stretch of the imagination, but I give it my all.  I hope that my answers to his questions helped inspire him to go after his dream, whatever it may be (even if it's not to be a DBA).

Student: What is the title of your career?
Me: My career title is essentially Database Administrator.  My current job title is Tier III Database Administrator (Senior DBA), and I specialize in architecture/infrastructure engineering for Microsoft SQL Server.

Student: How many years have you had on the job?
Me: I've been working with databases in general for about 15 years, and specifically SQL Server for over 10 years.

Student: What attracted you to this career? Did you have an experience that influenced your decision to choose this field?
Me:  I started out my professional career in a small call center providing phone support.  I gradually worked my way into more of an IT/Help Desk role and then up to Network Engineer.  When I started working as a Network Engineer, I happened to be the only one on the team that wasn't scared of the database servers.  As I learned more about SQL Server, I realized that I really liked working with it and I wanted to learn more about it.  There is a bit of an infinite cycle there, the more you learn about a subject, the more you realize how much more there is to learn about it.  Once I realized that I was more passionate about working with data and SQL Server than anything else, I made that my primary focus.
One thing that has really solidified my decision in my career is the tremendous sense of community and even family when you work with SQL Server.  There are plenty of people who are more than happy to take some time out of their day to help you out if you have a problem, or just want to chat.  Check out the #SQLHelp and #SQLFamily hash tags on Twitter for just a couple examples of what I'm referring to here.

Student: What type of education or training is required for your job?
Me: A strong foundation on math, and logic/analytical thinking comes into play when working with SQL Server.  There are multiple different job roles/responsibilities/career paths in working with SQL Server (and even more if you consider other database platforms like Oracle, MySQL, Hadoop, etc...).  Programming classes (specifically T-SQL classes would be ideal) would also help you to prepare for working with SQL Server.  Almost every aspect of SQL Server requires some familiarity with T-SQL code and the logic behind structuring a query and getting it to perform.  While it's pretty obvious that a SQL developer needs to have strong T-SQL coding skills, as a DBA, I also need be very familiar with T-SQL coding so that I can tune queries that no longer perform well.
One way to get free training is to attend local events or online training.  I am the president of the Rochester PASS SQL Server User Group and we have monthly meetings during most of the year (I skip Summer and December).
More online training can be found in the form of virtual chapters (periodic webinars/meetings online). More virtual chapter information can be found here: And finally, there is a one day SQL Server mini-conference (appropriately named SQL Saturday) coming to Rochester on Saturday, May 16th.

Student: What specific skills are needed to do the job?
I think my previous comments speak to this well, but I will also add that there are times that you need to have tenacity and focus in order to solve problems (this applies to many other careers than just a career in SQL Server).  I am a highly distractable person, and I find it hard to focus on a problem at times, but I still manage quite well.  It is just more difficult for me to go heads-down and focus solely on a specific issue.

Student: What are your major daily tasks?
Major daily tasks for me include reviewing alerts (backups, critical errors, performance issues, failed scheduled jobs, etc...) from the database servers (and other infrastructure alerts like storage and network). Typically our Junior and Mid-level DBAs handle those tasks, but I will also keep an eye on these so that I have a good understanding of the health of our servers.  I am an escalation point for major issues and incidents. And my typical role (outside of problem/incident response) is project based.  We recently migrated to a new SAN and I was responsible for migrating all the database files from the drives associated with the old SAN to the drives/paths on the new SAN.  I'm also working on a project to upgrade our infrastructure to SQL Server 2014 (our last production servers will upgrade this weekend and next week).  I also provide specifications for ordering new server hardware and balancing server loads and a host of other tasks.  No two DBA jobs/roles will be exactly the same, but the career is extremely rewarding.

Student: What do you like most about your job and why?
I love working with other people and learning new things.  I learned a long time ago that even after you graduate and you are "done" with school, you never stop learning.  The best employees make sure they're staying up to date on the technologies that they work with so that they can provide the best solutions to the problems that they face.  I love working with data!  I was a math and science geek in school and college, and that really helped me prepare for a career working with data, servers and code.

Student: What are some dislikes or frustrations you have and why:
One of my biggest frustrations is that when you work with people, you will always find that one person that just does not understand how things really work, and won't take no for an answer.  Like if someone puts in a request that would require a week worth of work (and that's assuming that you drop everything else and focus on just this one request, which is likely never the case), and then asks if it can be done in a day or two.  Poor planning on their part doesn't always necessitate an emergency on mine, but sometimes it does.  This gives me an opportunity to discuss the situation with the person requesting the task, and we can set proper expectations (that's a critical point, make sure that you understand what is required of you and that the person requesting it understands what and when they can expect a response).  However, you're always going to need to work with other people (management, team members, customers, etc...), so interpersonal skills are a must. 

Student: How do your perceive the future of this field in terms of expanding opportunities and security?
Technology is always changing, and you need to continually learn about the products you're working with (that way you don't become obsolete).  SQL Server is not going away any time soon, but it is adapting to fit more modern business needs.  I go to the premier SQL Server conference every year, the PASS Summit.  

There have been a lot of security improvements in SQL Server over the years, including integration with Windows Active Directory, encryption, and multiple methods to protect your data while in-flight (in memory) or at rest (on disk).  I am a big advocate of least privilege.  Only give the minimum amount of people the minimum permissions that they need.  However, depending on the situation, that is not always easy, or possible (there are exceptions to every rule).  If you work with any sensitive data (credit cards, Social Security numbers, Dates of Birth, etc...), you will be audited on how you handle and protect the data.  The more important the data, the more steps you need to take to protect it (and the more paperwork you need to fill out to prove that it is protected).

Student: Has the job changed since you began? if so please elaborate:
There isn't much in the technology field stays the same for very long, and SQL Server is no exception.  Apart from the product changes already mentioned, one of the biggest changes is that the amount of data that I work with has done nothing but grow.  When I first started working with Databases, they would fit on a floppy disk (just a few hundred KB).  Currently, I'm working with databases that are multiple terabytes (our largest single database is almost 4TB, but I know others that work with much larger databases), and this results in a lot more data flying around when you're querying the database.  You have to do a lot more tuning to queries that are hitting large amounts of data in order to get them to perform well.

Student: Is there anything I currently can do to prepare for this career?
I'd recommend you check out this blog series for the accidental DBA:  Gaining an understanding of relational databases and their structures will definitely help.  Personally, I had no clue when I first started working with SQL Server, but I was able to learn as I went along.

Student: Is there other information that may be helpful?
The bottom line is to find something you love to do.  To do that, you have to try a lot of things.  I didn't discover my passion for SQL Server until over a decade into my professional career.  Don't be afraid to try new things.  Don't be afraid to learn new things.  And don't be afraid to make mistakes.  That's how we all learn.

One of my favorite quotes by Albert Einstein is, "Anyone who has never made a mistake has never tried anything new."  Don't be afraid to give it all you've got to follow your dream.  You may make mistakes, but you'll never wonder, "What if I had just..."

No regrets!  I know I can certainly say that about my career.

To my new friend Hunter, I wish you the very best of luck in whatever you do.  Find your passion and give it your all!

Wednesday, December 17, 2014

SQL Server Clustered Instance Error 422 - Ambiguous Error With A (Hopefully) Simple Fix

Recently I observed an issue with bringing up SQL Services in a clustered environment.  In this particular Windows Failover Cluster, there are two instances of SQL Server that typically run on the same physical node in the cluster.  All of the clustered services ran without issue on the primary node.  When the instances were failed over to the second node, the SQL Server services would start, but the SQL Agent services would not start.  Failover Cluster Manager was reporting Error 422 bringing resource online.  Our investigation efforts ended up being fruitless as there was no additional detail reported in Failover Cluster Manager or the Event Log.  There was no also SQLAGENT.OUT log file generated for that particular service start attempt.

Without any additional detail into the issue, we decided to see what would happen if we tried to start the services manually through the Windows Services console.  This is when we discovered the issue.  Someone had decided to be “helpful” and had reset the services to Disabled.  I can only imagine that the thought process included: “The services don't normally run on this node in the cluster, so they don't ever need to run on this node, right?”

Once the SQL Agent services were reset to Manual (they should not be set to Automatic in a Windows Failover Cluster), we were able to bring the clustered SQL Agent services online through Failover Cluster Manager without issue and the cluster reported that it was completely healthy.

There are a few other scenarios that can cause this issue, but hopefully this helps if you also have someone “helpful” reset your clustered services to Disabled.

Tuesday, November 25, 2014

Red Gate SQL In The City Seattle 2014 - Training The Red Gate Way

I've attended Red Gate's SQL In The City Seattle 2012 and SQL In The City Charlotte 2013.  This year, I not only had the privilege of attending SQL In The City in Seattle event, but I was thrilled to be selected to speak at the event!  It's a fantastic opportunity to share my passion of SQL Server with others.  

The venue this year was McCaw Hall in Seattle Center. It was a fabulous multilevel facility that was a perfect fit for an event of this type.  McCaw Hall is not far from the Seattle Monorail and the iconic EMP Museum:

John Theron (President of Red Gate) and Steve Jones (SQL Server MVP) presented the keynote which focused on Database Lifecycle Management, "Ship often, ship safe."  You can catch a recording of the keynote here (from the London event).

After the keynote, I caught Steve Jones session on Avoiding A DBA's Worst Days With Monitoring. It was a great session that highlighted the importance of monitoring SQL Server so that you know what's going on so you can quickly respond to sudden issues and anticipate performance slowdowns before they become critical.  SQL Monitor v4 was featured and is a fantastic option for monitoring your database servers.

I followed Steve's session with my own, 101 Stupid Mistakes Your Colleagues Make When Setting Up A SQL Server (because, of course, no one that is reading this would ever make any of these mistakes). You can snag the slide deck here if you're interested in checking out the presentation.  The presentation went very well, there were great questions and audience participation.

Lunch was a great time to hang out with members of the SQL community and chat with Red Gate product experts.

After lunch, I attended Kevin Boles session on new SQL Server 2014 features (I'm in the middle of a SQL Server 2014 upgrade project, so this was a key session to attend).  I also attended Bob Pusateri excellent session Passive Security For Hostile Environments.  Bob blogged about the SQL In The City event here.

After that I needed to go check into the apartment that I rented for the week, so I reluctantly left the event early to make sure I had a place to sleep at night.  I made sure to get a picture of the Space Needle on my way by:

Overall, it is a fantastic, free event to get your SQL Server learning on and get ready for the main event, the PASS Summit.  This is a brilliant option if you can't get your company to pay for a pre-con at the PASS Summit, and I very easily convinced several friends to attend the event (it was an easy sell for their employers, as they only had to cover an additional day of food and lodging in exchange for another full day of SQL Server training).

Many thanks to Red Gate for selecting me as a speaker.  I had a blast and I look forward to attending (and hopefully speaking at) the event again next year.

Tuesday, October 21, 2014

What On Earth Is Consuming My Transaction Log - I've Got That Bloating Feeling

One more than one occasion, I've gotten the call that the log drive on one of my database servers has suddenly filled to capacity.  Some things may continue to work, but the situation gets worse and as time goes on, the possibility of a SQL Server crash increases.

Usually, a quick check of files in the log folder (sorted by size) reveals that the log growth is attributed to one database (in my experience, TempDB seems to be the most common database for this to happen in, but I have also seen other databases pop up with this issue). 

Massive log growth typically results from one rogue transaction that either has a really nasty execution plan, or is just incredibly inefficient.

Once you've identified which database is the problem child, check the free space on the log file.  I usually use this script:
SELECT DB_NAME() AS DBName, name AS LogicalFileName, filename AS PhysicalFileName,
        CONVERT(DECIMAL(12, 2), ROUND(size / 128.000, 2)) AS FileSizeMB,
        CONVERT(DECIMAL(12, 2), ROUND(FILEPROPERTY(name, 'SpaceUsed') / 128.000, 2)) AS DataSizeMB,
        CONVERT(DECIMAL(12, 2), ROUND((size - FILEPROPERTY(name,
'SpaceUsed')) / 128.000, 2)) AS FreeSpaceMB
FROM    dbo.sysfiles;

If there is plenty of free space in the log file, then then rogue transaction completed.  If there is no (or precious little) free space, you'll want to run the following command in the context of that database to find the open transactions in the database:

This will give you the SPID(s) for the open transactions in that database.  A quick sp_WhoIsActive SPID# (or a combination of EXEC sp_who2 SPID# and DBCC INPUTBUFFER (SPID#) if you have not yet implemented sp_WhoIsActive) will reveal what query is running and who is running it.

You can find Adam Machanic's awesome sp_WhoIsActive here (personally, I recommend v11.11):

At this point, chances are you just want to kill the offending SPID(s) using a simple KILL SPID# command.  However, if it is a long running UPDATE, INSERT, or DELETE, you may be in for quite a wait while a rollback is performed and it may be better to try to free up or allocate additional space to allow the transaction to complete.

Once the transaction is killed (or completes), you should see the free space in the log file increase dramatically.  At this point you can shrink the log file back down to a reasonable, appropriate size (emphasis here on appropriate size, you do not want to shrink it to 0 MB as it will need to grow again to the normal working size).

SQL Server should recover at this point and you're on your way to fight another fire.  However, before you chalk it up as complete, you should follow up on the cause of the rogue query.  If it was a user query, check with the user to see what they were trying to accomplish.  If it was an automated process that has worked fine in the past, check to see if the statistics are up-to-date.  It's amazing how bad of an execution plan SQL Server can pick/generate when the statistics are no good.

By following up on the issue, you'll become more than "just a DBA"; you're well on your way to becoming a Rockstar DBA!  Many thanks to Thomas LaRock for publishing an awesome book (and making it a free download)!

Monday, March 10, 2014

SQL 2012 Installation Error - Error while enabling Windows feature NexFx3

While installing SQL Server 2012 on a server running Windows Server 2012, I occasionally run into an issue where the installer couldn't properly enable the .NET 3.5 role.  I was able to make it completely through all of the setup dialogs and configurations (including clicking Install to start the installation process), and it failed shortly into the install process with this error message: 
Error while enabling Windows feature : NexFx3, Error Code : -2146498298 , Please try enabling Windows feature : NetFx3 from Windows management tools and then run setup again. For more information on how to enable Windows features , see

In order to get around this issue, you need to manually enable the .NET Framework v3.5 role:
  • Launch Server Manager
  • Go to Manager and choose Add Roles and Features
  • Click Next at the Before you begin page
  • Verify Role-based or feature-based installation is selected and click Next
  • On the Select destination server page, verify the local server is selected and click Next
  • On the Select server roles page, just click Next
  • On the Features page, check the box for .NET Framework 3.5 Features and click Next
  • Click Install to perform the installation

If you have issues with the Add Roles and Features wizard, you can also run the following command from a Command Prompt (update Z:\ to the appropriate drive letter where the Windows Server DVD is inserted or mounted:
DSIM /online /enable-feature /featurename:netfx3 /all /source:Z:\Sources\sxs

Once the .NET Framework 3.5 feature is enabled, you will no longer get the aforementioned error message and SQL Server 2012 installation will proceed.

You should not experience this issue if you check the box to have the SQL Server 2012 installer Include SQL Server product updates.  Note that this check-box only updates the setup files, it will not automatically slipstream any service packs.

Thursday, February 20, 2014

Why Are My Database Restores So Slow? - How To Take Advantage Of Instant File Initialization

I like to keep a close eye on things and one of my favorite scripts queries the sys.dm_exec_requests Dynamic Management View.  This query reports the status of backups, restores and DBCC commands (like SHRINKFILE) including Wait Type and Estimated Time Remaining (I convert these values to seconds):

SELECT  session_id AS [SPID], 
wait_time/1000 AS [WaitTime(sec)], 
wait_type AS [WaitType], 
percent_complete AS [PercentComplete], 
estimated_completion_time/1000 AS [TimeRemaining(sec)]
FROM sys.dm_exec_requests
OR Command LIKE '%BACKUP%'
OR Command LIKE '%DBCC%';

What's happening?
When SQL Server needs to create or expand a data file (AUTOGROWTH, or CREATE, RESTORE, or ALTER DATABASE commands) it needs to write zeros to the entire contents of the file (or the portion of the file that has been expanded) before it can perform any IO to that file.  If you change the default AUTOGROWTH size (and you should change the default to prevent heavily fragmented data or log files), SQL Server will then need to initialize all of the freshly allocated space before performing additional IO to that data file.  

If you see a lot of ASYNC_IO_COMPLETION waits when you start to restore a database, you're waiting for SQL Server to write out the file to disk.  

Why does this happen?
SQL Server best practice is to run the SQL Server service as a user that does not have full rights to the server that it is running on.  However, if this restricted user does not have the appropriate rights, you can't take advantage of instant file initialization when creating, restoring, or expanding database data files.  

Instant file initialization allows SQL Server to immediately start writing data to a file without having to write out the entire file.  This dramatically speeds database creation, restore and expansion processes by nearly eliminating the ASYNC_IO_COMPLETION waits (there is still a negligible amount of wait incurred while writing to the file allocation table, which typically takes mere milliseconds).

How do I fix it?
The user that the SQL Server service is running as need only one parameter changed in Group Policy Editor.

  • Launch the Services management console (Services.msc) and record the account that the SQL Server service is running as
  • Launch Group Policy Editior (GPEdit.msc)
  • Expand Computer Configuration -> Windows Settings -> Security Settings -> Local Policies
  • Select User Rights Assignment 
  • Double-click Perform volume maintenance tasks and click Add User or Group...
  • Type in the name of the account (you may need to click the Locations... button to select the correct source domain/server).
  • Click the Check Names button to be sure it is properly recognized and then click OK
You will need to restart the SQL Server service to get it to recognize this change.  Once this has been enabled, your database creation, restoration, and expansion processes will no longer occur an IO penalty for just writing out the data file.

Note: This only affects the data files, log files still need to be full written to disk.

I've run across this many times with multiple clients as they are locking down user permissions.  It is very easy to overlook if you don't know about it.

Microsoft also has a blog post on this with some additional technical detail:

Thursday, February 13, 2014

HELP - My Distribution Database Is HUGE, But I Don't Have A Lot Of Commands In Queue

I recently ran across an issue where the Distribution database on the Distributor was almost 150 GB, but when I looked at the Publications in Replication Monitor, I found that there were less than 50,000 commands in queue across all Publications.

I immediately thought, "Why on earth is the Distribution database bigger than this truck?"
I started my investigation by running this query to see how many Transactions were in the Distribution database (there were several million):
SELECT PD.Publisher_DB, COUNT(RT.xact_id) AS #TransactionsInDistributionDB
FROM distribution.dbo.MSrepl_transactions RT
JOIN distribution.dbo.MSpublisher_databases PD ON = RT.publisher_database_id
GROUP BY PD.Publisher_DB;

However, Replication Monitor tells you how many Commands are in queue (not Transactions), so I ran this query and found that there were almost 500 million commands total in queue (NOTE: Use this query with caution, it will take a long time to run if there are a lot of commands in queue - in my case it took almost 45 minutes):
SELECT PD.Publisher_DBCOUNT(RC.command_idAS #CommandsInDistributionDB
FROM distribution.dbo.MSrepl_commands RC
JOIN distribution.dbo.MSpublisher_databases PD ON RC.publisher_database_id
GROUP BY PD.Publisher_DB;

We also observed a lot of Disk IO on the Distributor, and the Distribution clean up: distribution job was taking a lot longer than normal (1-4  hours instead of 2-5 minutes).

After some Googling I found a great article by Paul Ibison ( that reveals that the immediate_sync option causes the Distributor to queue all Transactions for a Publication until the retention period is reached, regardless of whether or not the Transaction has been delivered or not.

I ran this query at the Distributor to see what Publications had the immediate_sync option enabled:
SELECT SS.Name AS PublisherNamePubs.Publisher_DBPubs.PublicationPubs.Immediate_Sync
FROM distribution.dbo.MSpublications AS Pubs
JOIN master.sys.servers AS SS ON SS.server_id = Pubs.publisher_ID
WHERE Pubs.Immediate_Sync = 1
ORDER BY PublisherName, Publisher_DB, Publication;

This revealed 2 Snapshot Publications that had that option enabled.  Yes, you read that correctly, the immediate_sync option also causes the Distributor to queue all Transactions for Snapshot Publications.  Even though they're not necessary for the Publication.  

How Do I Fix It?
To correct the issue, you need to run a couple scripts in the Published database (update PublicationName as appropriate):
EXEC sp_changepublication
@publication = 'PublicationName',
@property = 'allow_anonymous'
@value = 'FALSE';

EXEC sp_changepublication 
@publication = 'PublicationName',
@property = 'immediate_sync'
@value = 'FALSE';

The next execution of the Distribution clean up: distribution job will take much longer, but it will then clean up all the unnecessary transactions.  In my case, there is now less than 10 GB of data in the Distribution database.

These scripts can be executed on the fly without impact as they do not interrupt or affect the Publication (aside from telling the Distributor that it no longer needs to queue commands that it doesn't really need).

How Did It Happen?
This option is a result of checking a very innocent looking box in the New Publication Wizard:
This setting is very BAD

If you have scripted out the Publication creation, edit your scripts and look for this snippet:
EXEC sp_addpublication ... @immediate_sync = N'true'

This will need to be changed to:
EXEC sp_addpublication ... @immediate_sync = N'false'

This option can greatly impact the performance and storage of your Distributor.  If you suspect this an issue on a Distributor you manage, I recommend you run the scripts above to check for and correct the issue.

Thanks again Paul Ibison for your invaluable assistance.