Tuesday, September 20, 2016

I've Got The SP2 Replication Blues - SP2 For SQL 2014 Changes A System SPROC

I ran into this replication error the other day and I was completely stumped.
Procedure or function sp_MSreplraiserror has too many arguments specified.
We started getting that error message shortly after we had applied SP2 for SQL Server 2014 to a server that is a replication Publisher (source of replicated data).

We dug into the commands that were being replicated and found that there were missing rows in the table on the replication Subscriber (destination for replicated data).  Once the rows were populated the errors stopped.  However, after digging in a bit more, we found that this error has an explainable source.

In SP2 for SQL Server 2014, the system SPROC sp_MSreplraiserror has been updated to support an additional parameter for additional error logging detail.  This is good!

When we updated the Publisher with SP2 for SQL Server 2014, the replication SPROCs that actually perform the work (sp_MSdel_*, sp_MSins_*, and sp_MSupd_*) were updated with the additional parameter value to specify when running the sp_MSreplraiserror SPROC.  Unfortunately, the Subscriber (which is where the SPROCs are executed) has not yet had SP2 applied and thus the error is thrown because the system SPROC sp_MSreplraiserror has not yet been updated to support an additional parameter.

To mitigate the issue, I edited the sp_MSdel_* and sp_MSupd_* SPROCs (the sp_MSins_* SPROCs do not execute sp_MSreplraiserror when an error is encountered) and commented out the very last parameter.  Once we apply SP2 on the Subscriber, I will go back in and uncomment the last parameter to gain the additional error logging detail.

Friday, May 6, 2016

Trace, Trace, Trace - Too Many And SQL Can't Keep Up The Pace!

I recently ran into an issue on a client development (thankfully not production) server where the server was performing very poorly.  We didn't see anything obvious in any of the monitoring detail we had, aside from the fact that our monitoring tools also had issues with how SQL was performing (lots of missing data).

Before the client took the last ditch effort of just restarting SQL Server, I checked traces.  There were 9 user traces collecting a ton of trace events.

I manually killed them all and suddenly performance returned to normal.  Phew!  Crisis averted.

As a follow up from that issue, I created a script that stops and then deletes all user traces.  We are discussing setting up a job to run this script periodically to keep traces at bay and I am educating the group on proper trace discipline (and Extended Events).

-- Turn off (## rows returned) messages

-- Set the stage by declaring working variables

-- Create a temp table to hold the results of the trace detail
Property SMALLINT,

-- Populate the temp table with only user trace data (TraceID 1 is the default system trace)
-- In our environment, TraceID #2 is our monitoring software, so we actually have WHERE TraceID > 2
INSERT INTO #TraceDetail (TraceID, Property, Value)
SELECT * FROM ::fn_trace_getinfo(NULL) WHERE TraceID > 1

-- Check to see if there is anything in the table, and if so, start stopping and killing the traces
WHILE (SELECT COUNT(*) FROM #TraceDetail) > 1
-- Just grab the first ID in the table
SELECT TOP 1 @TraceID = TraceID FROM #TraceDetail
-- Build the command to stop (status = 0) and then delete (status = 2) the trace
SELECT @SQLcmd = 'EXEC sp_trace_setstatus @traceid = ' + CONVERT(VARCHAR(3), @TraceID) + ', @status = 0;
EXEC sp_trace_setstatus @traceid = 'CONVERT(VARCHAR(3), @TraceID) + ', @status = 2;'
-- Execute the command
EXEC (@SQLcmd)
-- Report what we killed
PRINT 'Killed TraceID ' + CONVERT(VARCHAR(3), @TraceID)
-- Delete all entries for that TraceID (typically 5 rows each)
DELETE FROM #TraceDetail WHERE TraceID = @TraceID

-- Drop the temp table
DROP TABLE #TraceDetail

Monday, December 7, 2015

Clustered Instance Patch Failure - The Cluster Group Cannot Be Determined

I recently ran into an issue where I was unable to apply a cumulative update to one node of a Failover Clustered Instance (FCI) of SQL Server 2014.  The Summary log from the CU install had the following error:
The cluster group cannot be determined for the instance name 'MSSQLSERVER'.  This indicates there is a problem with the product registry setting for ClusterName, with product discovery, or the cluster resources.

This particular FCI was a DR cluster and we had renamed the network name resource for the instance to match the production server it was replacing.  I believe the instance name in the registry should automatically change when you change the network name, but in our case it did not do so.

To find the correct name, I launched Failover Cluster Manager and then expanded the cluster name and selected Roles in the left-hand pane.  From there I selected the Resources tab at the bottom of the center page.  
Note: I'm running Windows Server 2012 R2.  If you are on Windows Server 2008 (or 2008 R2), you'll need to expand Services and applications in the left-hand pane and then select the specific service you are looking for.  

In the Resources list, I found the Server Name resource and made note of the value specified there.

Next is just a simple matter of updating the registry value.  I launched Registry Editor (RegEdit.exe) and navigated to HKLM > Software > Microsoft > Microsoft SQL Server > MSSQL12.MSSQLSERVER > Cluster
Note: MSSQL12.MSSQLSERVER will vary based on SQL version (SQL 2012 is "MSSQL11") and instance name ("MSSQLSERVER" is the default instance and named instances are the actual instance name instead of "MSSQLSERVER").

Within the Cluster key I updated the value of ClusterName to match what was specified in Failover Cluster Manager:

I then ran the CU install again and it completed successfully!  So far this is a good Monday!

Saturday, January 17, 2015

Interview With A DBA - Why Do You Do What You Do?

I recently had a student conducting a career study/project contact me with questions about my career and the choices I made to get where I am.  I'm certainly not perfect by any stretch of the imagination, but I give it my all.  I hope that my answers to his questions helped inspire him to go after his dream, whatever it may be (even if it's not to be a DBA).

Student: What is the title of your career?
Me: My career title is essentially Database Administrator.  My current job title is Tier III Database Administrator (Senior DBA), and I specialize in architecture/infrastructure engineering for Microsoft SQL Server.

Student: How many years have you had on the job?
Me: I've been working with databases in general for about 15 years, and specifically SQL Server for over 10 years.

Student: What attracted you to this career? Did you have an experience that influenced your decision to choose this field?
Me:  I started out my professional career in a small call center providing phone support.  I gradually worked my way into more of an IT/Help Desk role and then up to Network Engineer.  When I started working as a Network Engineer, I happened to be the only one on the team that wasn't scared of the database servers.  As I learned more about SQL Server, I realized that I really liked working with it and I wanted to learn more about it.  There is a bit of an infinite cycle there, the more you learn about a subject, the more you realize how much more there is to learn about it.  Once I realized that I was more passionate about working with data and SQL Server than anything else, I made that my primary focus.
One thing that has really solidified my decision in my career is the tremendous sense of community and even family when you work with SQL Server.  There are plenty of people who are more than happy to take some time out of their day to help you out if you have a problem, or just want to chat.  Check out the #SQLHelp and #SQLFamily hash tags on Twitter for just a couple examples of what I'm referring to here.

Student: What type of education or training is required for your job?
Me: A strong foundation on math, and logic/analytical thinking comes into play when working with SQL Server.  There are multiple different job roles/responsibilities/career paths in working with SQL Server (and even more if you consider other database platforms like Oracle, MySQL, Hadoop, etc...).  Programming classes (specifically T-SQL classes would be ideal) would also help you to prepare for working with SQL Server.  Almost every aspect of SQL Server requires some familiarity with T-SQL code and the logic behind structuring a query and getting it to perform.  While it's pretty obvious that a SQL developer needs to have strong T-SQL coding skills, as a DBA, I also need be very familiar with T-SQL coding so that I can tune queries that no longer perform well.
One way to get free training is to attend local events or online training.  I am the president of the Rochester PASS SQL Server User Group and we have monthly meetings during most of the year (I skip Summer and December).
More online training can be found in the form of virtual chapters (periodic webinars/meetings online). More virtual chapter information can be found here: http://sqlpass.org/PASSChapters/VirtualChapters.aspx And finally, there is a one day SQL Server mini-conference (appropriately named SQL Saturday) coming to Rochester on Saturday, May 16th. https://www.sqlsaturday.com/383/eventhome.aspx

Student: What specific skills are needed to do the job?
I think my previous comments speak to this well, but I will also add that there are times that you need to have tenacity and focus in order to solve problems (this applies to many other careers than just a career in SQL Server).  I am a highly distractable person, and I find it hard to focus on a problem at times, but I still manage quite well.  It is just more difficult for me to go heads-down and focus solely on a specific issue.

Student: What are your major daily tasks?
Major daily tasks for me include reviewing alerts (backups, critical errors, performance issues, failed scheduled jobs, etc...) from the database servers (and other infrastructure alerts like storage and network). Typically our Junior and Mid-level DBAs handle those tasks, but I will also keep an eye on these so that I have a good understanding of the health of our servers.  I am an escalation point for major issues and incidents. And my typical role (outside of problem/incident response) is project based.  We recently migrated to a new SAN and I was responsible for migrating all the database files from the drives associated with the old SAN to the drives/paths on the new SAN.  I'm also working on a project to upgrade our infrastructure to SQL Server 2014 (our last production servers will upgrade this weekend and next week).  I also provide specifications for ordering new server hardware and balancing server loads and a host of other tasks.  No two DBA jobs/roles will be exactly the same, but the career is extremely rewarding.

Student: What do you like most about your job and why?
I love working with other people and learning new things.  I learned a long time ago that even after you graduate and you are "done" with school, you never stop learning.  The best employees make sure they're staying up to date on the technologies that they work with so that they can provide the best solutions to the problems that they face.  I love working with data!  I was a math and science geek in school and college, and that really helped me prepare for a career working with data, servers and code.

Student: What are some dislikes or frustrations you have and why:
One of my biggest frustrations is that when you work with people, you will always find that one person that just does not understand how things really work, and won't take no for an answer.  Like if someone puts in a request that would require a week worth of work (and that's assuming that you drop everything else and focus on just this one request, which is likely never the case), and then asks if it can be done in a day or two.  Poor planning on their part doesn't always necessitate an emergency on mine, but sometimes it does.  This gives me an opportunity to discuss the situation with the person requesting the task, and we can set proper expectations (that's a critical point, make sure that you understand what is required of you and that the person requesting it understands what and when they can expect a response).  However, you're always going to need to work with other people (management, team members, customers, etc...), so interpersonal skills are a must. 

Student: How do your perceive the future of this field in terms of expanding opportunities and security?
Technology is always changing, and you need to continually learn about the products you're working with (that way you don't become obsolete).  SQL Server is not going away any time soon, but it is adapting to fit more modern business needs.  I go to the premier SQL Server conference every year, the PASS Summit.  

There have been a lot of security improvements in SQL Server over the years, including integration with Windows Active Directory, encryption, and multiple methods to protect your data while in-flight (in memory) or at rest (on disk).  I am a big advocate of least privilege.  Only give the minimum amount of people the minimum permissions that they need.  However, depending on the situation, that is not always easy, or possible (there are exceptions to every rule).  If you work with any sensitive data (credit cards, Social Security numbers, Dates of Birth, etc...), you will be audited on how you handle and protect the data.  The more important the data, the more steps you need to take to protect it (and the more paperwork you need to fill out to prove that it is protected).

Student: Has the job changed since you began? if so please elaborate:
There isn't much in the technology field stays the same for very long, and SQL Server is no exception.  Apart from the product changes already mentioned, one of the biggest changes is that the amount of data that I work with has done nothing but grow.  When I first started working with Databases, they would fit on a floppy disk (just a few hundred KB).  Currently, I'm working with databases that are multiple terabytes (our largest single database is almost 4TB, but I know others that work with much larger databases), and this results in a lot more data flying around when you're querying the database.  You have to do a lot more tuning to queries that are hitting large amounts of data in order to get them to perform well.

Student: Is there anything I currently can do to prepare for this career?
I'd recommend you check out this blog series for the accidental DBA: https://www.sqlskills.com/help/accidental-dba/  Gaining an understanding of relational databases and their structures will definitely help.  Personally, I had no clue when I first started working with SQL Server, but I was able to learn as I went along.

Student: Is there other information that may be helpful?
The bottom line is to find something you love to do.  To do that, you have to try a lot of things.  I didn't discover my passion for SQL Server until over a decade into my professional career.  Don't be afraid to try new things.  Don't be afraid to learn new things.  And don't be afraid to make mistakes.  That's how we all learn.

One of my favorite quotes by Albert Einstein is, "Anyone who has never made a mistake has never tried anything new."  Don't be afraid to give it all you've got to follow your dream.  You may make mistakes, but you'll never wonder, "What if I had just..."

No regrets!  I know I can certainly say that about my career.

To my new friend Hunter, I wish you the very best of luck in whatever you do.  Find your passion and give it your all!

Wednesday, December 17, 2014

SQL Server Clustered Instance Error 422 - Ambiguous Error With A (Hopefully) Simple Fix

Recently I observed an issue with bringing up SQL Services in a clustered environment.  In this particular Windows Failover Cluster, there are two instances of SQL Server that typically run on the same physical node in the cluster.  All of the clustered services ran without issue on the primary node.  When the instances were failed over to the second node, the SQL Server services would start, but the SQL Agent services would not start.  Failover Cluster Manager was reporting Error 422 bringing resource online.  Our investigation efforts ended up being fruitless as there was no additional detail reported in Failover Cluster Manager or the Event Log.  There was no also SQLAGENT.OUT log file generated for that particular service start attempt.

Without any additional detail into the issue, we decided to see what would happen if we tried to start the services manually through the Windows Services console.  This is when we discovered the issue.  Someone had decided to be “helpful” and had reset the services to Disabled.  I can only imagine that the thought process included: “The services don't normally run on this node in the cluster, so they don't ever need to run on this node, right?”

Once the SQL Agent services were reset to Manual (they should not be set to Automatic in a Windows Failover Cluster), we were able to bring the clustered SQL Agent services online through Failover Cluster Manager without issue and the cluster reported that it was completely healthy.

There are a few other scenarios that can cause this issue, but hopefully this helps if you also have someone “helpful” reset your clustered services to Disabled.

Tuesday, November 25, 2014

Red Gate SQL In The City Seattle 2014 - Training The Red Gate Way

I've attended Red Gate's SQL In The City Seattle 2012 and SQL In The City Charlotte 2013.  This year, I not only had the privilege of attending SQL In The City in Seattle event, but I was thrilled to be selected to speak at the event!  It's a fantastic opportunity to share my passion of SQL Server with others.  

The venue this year was McCaw Hall in Seattle Center. It was a fabulous multilevel facility that was a perfect fit for an event of this type.  McCaw Hall is not far from the Seattle Monorail and the iconic EMP Museum:

John Theron (President of Red Gate) and Steve Jones (SQL Server MVP) presented the keynote which focused on Database Lifecycle Management, "Ship often, ship safe."  You can catch a recording of the keynote here (from the London event).

After the keynote, I caught Steve Jones session on Avoiding A DBA's Worst Days With Monitoring. It was a great session that highlighted the importance of monitoring SQL Server so that you know what's going on so you can quickly respond to sudden issues and anticipate performance slowdowns before they become critical.  SQL Monitor v4 was featured and is a fantastic option for monitoring your database servers.

I followed Steve's session with my own, 101 Stupid Mistakes Your Colleagues Make When Setting Up A SQL Server (because, of course, no one that is reading this would ever make any of these mistakes). You can snag the slide deck here if you're interested in checking out the presentation.  The presentation went very well, there were great questions and audience participation.

Lunch was a great time to hang out with members of the SQL community and chat with Red Gate product experts.

After lunch, I attended Kevin Boles session on new SQL Server 2014 features (I'm in the middle of a SQL Server 2014 upgrade project, so this was a key session to attend).  I also attended Bob Pusateri excellent session Passive Security For Hostile Environments.  Bob blogged about the SQL In The City event here.

After that I needed to go check into the apartment that I rented for the week, so I reluctantly left the event early to make sure I had a place to sleep at night.  I made sure to get a picture of the Space Needle on my way by:

Overall, it is a fantastic, free event to get your SQL Server learning on and get ready for the main event, the PASS Summit.  This is a brilliant option if you can't get your company to pay for a pre-con at the PASS Summit, and I very easily convinced several friends to attend the event (it was an easy sell for their employers, as they only had to cover an additional day of food and lodging in exchange for another full day of SQL Server training).

Many thanks to Red Gate for selecting me as a speaker.  I had a blast and I look forward to attending (and hopefully speaking at) the event again next year.

Tuesday, October 21, 2014

What On Earth Is Consuming My Transaction Log - I've Got That Bloating Feeling

One more than one occasion, I've gotten the call that the log drive on one of my database servers has suddenly filled to capacity.  Some things may continue to work, but the situation gets worse and as time goes on, the possibility of a SQL Server crash increases.

Usually, a quick check of files in the log folder (sorted by size) reveals that the log growth is attributed to one database (in my experience, TempDB seems to be the most common database for this to happen in, but I have also seen other databases pop up with this issue). 

Massive log growth typically results from one rogue transaction that either has a really nasty execution plan, or is just incredibly inefficient.

Once you've identified which database is the problem child, check the free space on the log file.  I usually use this script:
SELECT DB_NAME() AS DBName, name AS LogicalFileName, filename AS PhysicalFileName,
        CONVERT(DECIMAL(12, 2), ROUND(size / 128.000, 2)) AS FileSizeMB,
        CONVERT(DECIMAL(12, 2), ROUND(FILEPROPERTY(name, 'SpaceUsed') / 128.000, 2)) AS DataSizeMB,
        CONVERT(DECIMAL(12, 2), ROUND((size - FILEPROPERTY(name,
'SpaceUsed')) / 128.000, 2)) AS FreeSpaceMB
FROM    dbo.sysfiles;

If there is plenty of free space in the log file, then then rogue transaction completed.  If there is no (or precious little) free space, you'll want to run the following command in the context of that database to find the open transactions in the database:

This will give you the SPID(s) for the open transactions in that database.  A quick sp_WhoIsActive SPID# (or a combination of EXEC sp_who2 SPID# and DBCC INPUTBUFFER (SPID#) if you have not yet implemented sp_WhoIsActive) will reveal what query is running and who is running it.

You can find Adam Machanic's awesome sp_WhoIsActive here (personally, I recommend v11.11): http://sqlblog.com/files/default.aspx

At this point, chances are you just want to kill the offending SPID(s) using a simple KILL SPID# command.  However, if it is a long running UPDATE, INSERT, or DELETE, you may be in for quite a wait while a rollback is performed and it may be better to try to free up or allocate additional space to allow the transaction to complete.

Once the transaction is killed (or completes), you should see the free space in the log file increase dramatically.  At this point you can shrink the log file back down to a reasonable, appropriate size (emphasis here on appropriate size, you do not want to shrink it to 0 MB as it will need to grow again to the normal working size).

SQL Server should recover at this point and you're on your way to fight another fire.  However, before you chalk it up as complete, you should follow up on the cause of the rogue query.  If it was a user query, check with the user to see what they were trying to accomplish.  If it was an automated process that has worked fine in the past, check to see if the statistics are up-to-date.  It's amazing how bad of an execution plan SQL Server can pick/generate when the statistics are no good.

By following up on the issue, you'll become more than "just a DBA"; you're well on your way to becoming a Rockstar DBA!  Many thanks to Thomas LaRock for publishing an awesome book (and making it a free download)!