Yesterday I looked into the building of Background Motion using the Composite Web Block, the Enterprise Library and putting all of the different .NET 3.x technologies together in a demonstration product named Dinner Now. Today was focused around SQL Server 2005 performance, optimisation and scalability followed by .NET language pragmatics.
Writing Applications That Make SQL Server 2005 Fly
Shu Scott presented about writing applications that make SQL Server 2005 fly, however I don’t think that name reflected the presentation all that well. The talk would have been better titled ‘Understanding The Cost Based Optimiser To Make SQL Server 2005 Fly’. None the less, Shu raised a lot of great points in her presentation and some of them I thought interesting are below:
- Don’t use a function passing in a column as a parameter within a query, such as in a
WHERE
clause. SQL Server 2005 calculates statistics for a table per column, so as soon as you use a function on the column the statistics are unusable. The off shoot of this is that SQL Server 2005 can massively under or over estimate the selectivity of a table which on a complex-ish based query can dramatically change the query plan that SQL Server will choose.
- Don’t alter an input parameter to a function or stored procedure after the procedure has started. Shu didn’t specify exactly why this is the case, however after investigating it further on the internet; it is related to the point below regarding parameter sniffing.
- Avoid using local variables within a function or procedure in
WHERE
statements. During the presentation, I didn’t get a clear understanding of why this was sub-optimal, however after some research online it is caused by the cost based optimiser having to use a reduced quality estimate for the selectivity of the table. You can avoid this problem by supplying the OPTION(RECOMPILE>
hint, use a literal instead of a local variable if possible, parameterise the query an accept the value via an input parameter.
- Use automatic statistics, if you have a requirement to not use it – disable it on a per table basis if possible as having quality statistics for your database is vital in the cost based optimiser doing its job.
- Do parameterise your queries where they are common, with a lot of reuse and are hit often. Do not parameterise queries that are ad-hoc or long running. Presumably there is no gain in parameterising a long running query as the server is already going to be spending significant time processing the query, in which case the few milliseconds the server spends generating a query plan won’t be noticed.
- Be aware of parameter sniffing, which is where SQL Server uses the values of the input values to a function/procedure to produce the query plan. This is normally a very good thing, however if the cached plan happens to represent an atypical set of input values – then it is likely that the performance of a typical query is going to be severely impacted.
- Look to utilise the
INCLUDE
keyword when creating non-clustered indexes. The INCLUDE
keyword allows you to extend past the 900 byte limit on the index key and also allows you to include previously disallowed column types within the index (ex: nvarchar(max)). This type of index is excellent for index coverage, as all columns identified are stored within the index in leaf nodes, however only the key columns enforce the index type.
- If you are unable to edit an SQL statement for some reason, consider using the plan guides. A plan guide is essentially a set of option hints for the query, however you aren’t editing the query itself to apply them. You configure the plan guides for a stored procedure, function or an individual statement and when it is matched SQL Server 2005 will automatically apply the suggested guides to the statement.
- In a similar fashion to the plan guide, there is a more complex option called
USE PLAN
which lets you supply an actual execution plan to the SQL statement, again without editing the SQL statement directly. Essentially, you extract the XML representation for the execution plan you would prefer to have execute and supply that to the SQL statement. If you have skewed data sets, this would be a good option to guarantee consistent access speed for a particular query. Using the skewed data sets as an example, it would be possible to have SQL Server cache a query plan which represents the atypical data and as such performs very poorly for the majority of the typical data. Supply the query plan to the SQL statement can avoid that happening. It is worth noting though, if you do supply a query plan you would need to revisit the stored plan periodically to make sure that it still reflects the most efficient access path for your particular data set.
Implementing Scale Out Solutions With SQL Server 2005
This presentation was about scaling SQL Server 2005 out, such that you’re able to continue adding more and more servers to the mix to distribute the load across. I had previously read the majority of the information covered, however I learned about two features named the SQL Server Service Broker and Query Notifications.
- Linked servers let you move whole sections of a database onto another server and you tell the primary server where the other data resides. Linked servers are transactionally safe, however will perform only as fast as the slowest component within the linked server group.
- Distributed Partitioned Views allows you to move subsets of a tables data across servers and uses linked servers behind the scenes to communicate between servers. A partition might be as simple as customers 1 through 100,000 in partition A and 100,001 through 200,000 in partition B and so on.
- SQL Server Shared Database (SSD) allows you to attach many servers to a single read only copy of the database, which might be a great way of increasing performance for a heavily utilised reporting server with relatively static data. Unfortunately, the servers reading from the database need to be detached to refresh the database but this could be managed in an off peak period to reduce impact.
- Snapshop Replication snapshots an entire database and replicates it into the subscribers. Snapshot replication isn’t used a lot as it’s data and resource intensive. It is most commonly used to set up a base system and then enable merge replication to bring it up to date with the publisher or to refresh an infrequently changing database periodically.
- Merge Replication tracks the changes to the data on the publishers and bundles them together, only sending the net changes when appropriate. Merge replication supports bi-directional replication and it also implements conflict resolution as well, however there is no guarantee of consistency as the changes aren’t necessarily being replicated in a near real time environment.
- Transaction Replication sends all changes to all subscribers and is transactionally safe. If there were a lot of DML taking place in a database, there would be considerable overhead for using transactional replication as a simple
UPDATE
statement which might effect 100 rows locally is sent to the subscribers as 100 independent SQL statements; in case some or all of the subscribes have additional data that the publisher does not.
- Peer To Peer (P2P) Replication is a variation of transactional replication however it requires that each peer be the master of it’s own data so as to avoid key read/write problems across servers and consistency issues. As an example, all of the Brisbane office writes its changes into server A, while Sydney writes its changes into server B. By making sure that each server ‘owns’ its respective block of data, it is then possible and safe to replicate data between all peers safely.
- SQL Server Service Broker (SSB) provides a reliable asynchronous messaging system to SQL Server 2005, that allows you to pass messages between application either within the same database, same server or distributed over many servers and databases. The service broker doesn’t do the work for you, however it does provide the plumbing to make developing your system a whole lot simpler. Using the service broker, it would even be possible to send messages from one service into another service on a different machine; might be useful to help keep different pieces of information up to date in a vastly distributed database set up when replication doesn’t quite suit the purpose.
- Query Notification, as it suggests is a notification system which is used to notify clients or caches that they need to update certain data. Once again, the query notification doesn’t do the updating – it merely provides the event to tell you do perform your own action. The Query Notification engine utilises the service broker under the hood.
- Data Dependent Routing isn’t a SQL Server feature but more of an architectural design pattern. Using Data Dependent Routing, the client (whatever it is), knows a little bit about the storage system and optimistically seeks out the data store which is likely to return the best performance.
.NET Programming Language Pragmatics
Joel Pobar presentated on .NET programming language pragmatics and contrasted some of the recent developments in that space. At the start of the talk, he pointed out that there are generally three types of programming languages – static, dynamic and functional. The original version of the .NET Common Language Runtime was based around a static environment and has recently been enhanced to support functional programming and more recently dynamic.
The dynamic programming languages are handled through a new component, the Dynamic Language Runtime which sits on top of the existing CLR. The new Dynamic Language Runtime has allowed people to build IronPython and IronRuby, which are implementations of those particular languages sitting over the top of the .NET CLR.
Outside of the fact that it means you’ll be able to run a Python script in the native Python runtime or inside of the .NET DLR, which is just plain cool; the biggest picture here is that the .NET CLR is being enhanced and soon we’ll have a new super language (LISP advocates, back in your boxes!) which will support all of the current types of programming languages at once.
The presentation was fantastic and it is exciting to hear Joel present as he is so passionate about the field. In fact, I would go as far to say that his enthusiasm for his work is infectious; it is hard to walk away from one of his presentations and not have at least some of his excitement and enthusiasm rubbed off on you.
I’ve heard on the grape vine that Joel might be able to present at the one of the next Gold Coast .NET User Groups, can’t wait if he does!