Working on a particular data intensive project has thought me a lot about SQL optimization both about structure optimization and stored procedure optimizations. We all know about introducing redundancy and indexes to improve read performance but I've picked up a couple of tricks from our resident SQL expert I thought I'd pass along. Please bear in mind that these are only my observations of how our systems behaves your mileage may vary.
Main Problem Area
For our web solutions the performance issues we generally encounter with an application are usually related to the data tier. I can recall maybe two instances where the application tier was the culprit otherwise always the data tier. When diagnosing performance issues you can save yourself a lot of time by eliminating the frontend and going directly to the database. Also determining whether you have a problem in the data tier is pretty straight forward using the Profiling tools of SQL Server. Granted my experience with Windows application is limited but I would imagine that the likelihood of more issues related to memory and CPU usage cropping up is higher but you'll probably still need to deal with some kind of database if you're doing anything in the business.
Temporary Tables / Table Variables
My number one tip for performance optimization when dealing with large amounts of data is to try to limit the number of rows you're dealing with. It's a nobrainer using filters to do so but my .NET brain needed some time to get used to the idea of doing it using temporary tables and tables variables. For some reason I had a hard time dealing with the fact that I should pull data out of a comfortable data structure and stick into a data variable somewhere; maybe because I'm used to avoiding transferring a lot of data across the database boundary into my application. Keeping in mind that we don't actually move the data outside the database and only store the temporary tables for the duration of the execution of our stored procedure it makes a lot of sense to pick out the rows we need and only work on the smaller set.
I had a case yesterday where a query would execute for almost a solid two minutes, actually 1 minute and 40 seconds. The case was a view joined onto itself to do some rolling calculations. By extracting my set from the view once and placing it into a table variable I was able to squeeze execution down to 6 seconds. That's an improvement in the order of 16x people! :)
Default Parameter Values vs. Execution Plans
An interesting case came up a while back where when we were trying to figure out why the users were experiencing long execution times when doing a particular operation. However much we tried we were unsuccessful at reproducing the problem. Whenever we'd run the procedure the performance was great and very much in line with what we were expecting to see. Doing the operation from the interface ourselves did yield the same lack of performance. Basically we were seeing great performance when executing the procedure directly in Management Studio and poor performance when execution the same procedure from the application UI.
Eventually we figured it out: It turned out that the stored procedure had some default parameter values set for a couple of optional search parameters. The application UI would not pass these parameters to the stored procedure and SQL Server would select an execution plan optimized for the default scenario. We were passing all the parameters to the procedure in Management Studio and were causing SQL Server to select a more optimized execution plan thus giving us the better performance and the impression that "something was happening between ASP.NET and SQL Server".
Removing the default parameter values and always passing in the same number of parameters fixed the issue. Probably more a gotcha that anything else :)
Ah WITH RECOMPILE, I remember it fondly from some years ago when I was studying for the SQL Server 2000 exam and I thought to myself, "I'm NEVER going to need that". Well it turns out that I did need it and the scenario was a text book example too. Basically we have an application in production which visualizes work load for an entire week. When the week starts out the users start scheduling that work for individual work days attaching a date and the actual work needing to be handled. Basically the result of this is that the dataset Monday morning is vastly different from the dataset Tuesday when most of the work is scheduled and good to go.
The main take away from this is this: When your datasets change a lot from execution to execution you should look into adding WITH RECOMPILE to stored procedures working on the dataset.
Yes IF statements! The rule of thumb is that they're evil. Ok maybe not but I've seen a couple of instances where the IF statement actually affected performance in a negative way because SQL Server was unable to select the best execution plan. Also in most cases you can embed the IF logic in a single SQL statement using the where clause like so
I know that redundancy is a pretty well known technique but I still want to mention it because you can gain massive performance increased by sprinkling a little bit of redundancy into your schemas. The basic idea is to avoid doing calculation by storing the result but course this introduces the possible of data which contradicts each other so tread carefully.
For this one we had data structures containing status history for inventory. For each item we knew when it had been set to a specific status so we just went ahead and used that existing knowledge to determine inventory status for a particular item. Not what I would call a speed demon once we started adding aprox. 150,000 items to the inventory. Simply adding the current status code to the item gave us many times the performance we were seeing and it scales well.
Views are a great way of simplifying queries but unfortunately we can't use filtering in many cases so we end up with a lot of views assembled from various table. Before we can limit the resultset SQL Server will have to give us back the entire thing which isn't a very cost effective way of doing things. We tried to create very general purpose views but that just got us in the situation I just described so instead we went with more specific views with lots more filtering inside them. Alternately simply move the results from the view into a temporary store which also gave us some great results.
In many cases we need to join a view onto itself in order to do rolling summations to determine what data is available in the future. Basically a kind of what if calculation without actually going ahead and updating anything. This is a very intensive process where the general purpose views needed to be assembled a couple of times before we could get our results back. Limiting the resultset was the only option to actually make this stuff perform at acceptable levels.
This is not a very scientific SQL optimization post nor is it intended to be. You could probably pick apart my observations and start talking about the what, the why, exclusive lock this, dirty read that, etc. but instead think of it as a more practical look at SQL optimization.I bring you this post with some examples because you might have an easier time determining what action to take if you face situation similar to the ones I just described.
For me starting out with a SQL optimization is pretty vexing because looking at the SQL statement reveals nothing. I really need to get into the meat of things before the possibilities start to reveal themselves. Once I pry open the statement points of issue blossom before my very eyes to reveal where I can get my results. The moment when I see a query speed up from minutes to mere seconds is probably the most satisfying part of my job. Quick wins don't come along very often in software development but SQL affords some of the best ones :)