I was busy today with one SQL server 2000. Problem was constant 100% CPU usage for long period several hours.
In normal situation customers schedules data for processing with maximum of 500-700k records in queue table with maximum CPU usage of 25-30%. Today queue went above 1M records. SQL server is hosted on dedicated high end class piece of hardware, with previous hardware and software performance tweaks. Good resource http://www.sql-server-performance.com/tips/all_main.aspx
Recently we have changed some part of the code so I decided to check what went wrong.
One thing I have noticed is many active connection (above 500). It was normal behavior, clients were busy today ;-).
I turned on the performance monitor which already has performance counter profile for monitoring which includes following:
- % Disk Time
- Current Disk Queue Lenght
- Disk Reads/sec
- Disk Writes/sec
- % Processor time
- Batch Requests/sec
- SQL Compilations/sec
- SQL Re-Compilations/sec
- Context Switches/sec
All values were in normal ranges except % Processor time (98-100) and Context Switches/sec (>8000).
I have checked general hardware health, RAID, HDD fragmentation, memory usage etc.. but everything was just fine.
Then I got idea to turn on Fiber mode (also called Lightweight pooling) for test, how it will react but I could recall that this option was tested before and that we had all kind of problems with SQL server. One of the problems was that sometimes SQL server did not wanted to start at all! Last thing you could see in SQL Server Log is this line:
"Recovery is checkpointing database ...."
After turning off the fiber mode SQL server would start properly.
Here is very nice article about SQL Server fiber mode:
Ken Henderson explains the effects that SQL Server fiber mode coupled with the User Mode Scheduler can have on your system, and what to consider carefully before enabling fiber mode
He's conclusion is:
Fiber mode was intended for niche situations in which a scalability ceiling is
hit due to UMS workers spending significant amounts of time switching between
thread contexts, or switching the CPU into kernel mode and back again. Unless
you have encountered this yourself, and have already tuned the system as much as
possible using more obvious (and safer) techniques, I recommend that you stay
away from fiber mode and focus your tuning efforts on other things. If you run
into a situation where you feel strongly you need fiber mode, it's probably
worth a call into Microsoft Product Support Services to confirm your diagnosis.
Better to get a second opinion than to break your server in subtle and
pernicious ways with little hope of determining how or why, and with no likely
improvement in overall performance.
So fiber mode was not an option. Hardware was working properly, but all 4 CPU's were still on 100% usage.
Then I turned on SQL Profiler. After 1h of data collection I had enough material to work with.
After running "Index Tuning Wizard" several times I got the results. 2 new indexes on one table should improve performance up to 77%!
CPU usage dropped from 98-100% to 3-5%. After 1/2h queue was reduced for almost 200k records!
My golden tip would be:
When you want to boost performance of your SQL server first thing to check are indexes!