Currently I’m hard at work on a project where we use Linq to SQL quite a lot. It has led to some interesting patterns and anti-patterns which I want to share in this article. The idea is to show some principals that me and others on the team have encountered and what you can do to prevent the same errors from happening when you use Linq to SQL in a real-world application.
Using Linq to SQL objects in datacontracts
Building services that expose certain domain types requires a lot of code most of the time. You need to define datacontracts for quite a lot of entities that, when you use Linq to SQL reside inside your DBML too.
The logical choice would be to place the DBML at a location that is accessible to the consumers of the service. Mind you, this only works when using conceptual services without employing techniques like WCF. It does save you quite a bit of code and time, but it’s what you get back that will consume the most time after that.
The main concern with exposing Linq to SQL objects in a contract is the fact that the datacontext is also exposed and will be used by others in means that you never thought off. It’s not a matter if other developers will use it, but rather a matter of when. When this happens the encapsulation is gone and you will end up with errors and security leaks that take quite a bit of time to track down, since you don’t know that people went through the datacontext directly instead of your service that is secured.
Static data contexts
DataContexts in Linq to SQL are IDisposable, meaning that they probably use unmanaged resources or resources that shouldn’t be kept in memory for a longer period of time. However if you start using deferred loading you may notice that it is smarter to keep the datacontext alive for as long as the front-end is using the objects retrieved from the datacontext. This makes working with the loaded objects and related objects easier, as the related objects are loaded when you need them. Load a single invoice and present the customer data and the invoice items as you go. It is what Linq to SQL is made for right?
While keeping datacontexts alive longer may help you solve issues with deferred loading it also causes a number of other problems:
- Memory leaks
The garbage collector will not collect static datacontext instances. Thus keeping them alive even after a controller is no longer needed and increasing the memory footprint of your application over time. 4GB of memory in a computer is plenty, but not when you have one of these in there 😉
- Unexpected inserts/updates/deletes
When you have a static datacontext, it is possible to observe some pretty strange behavior over time. Spontanious insert, update and delete actions when you don’t expect them at the moment you call datacontext.SubmitChanges(). I have experienced this first hand earlier when writing about repository patterns in ADO.NET entity framework. You may well end up with an unmanagable situation with transactions that should not cross eachother, crossing eachother.
The data context was originally meant to be used in a unit of work pattern, as described on MSDN.
Multiple developers, multiple connectionstrings
We have had quite a bit of "fun" about this one. On our team we are with four developers working together and we sometimes have to fiddle with DBML files that others have worked on before. This should work out OK, since we all have the same database.
Unfortunatly, Microsoft made the designer in such a way that everytime the connectionstring in the DBML is not the same as for the objects you are dropping on the designer, it will be replaced by a new one. The old one doesn’t get deleted, instead it is kept in the settings of your application. This has caused us quite a bit of trouble, since builds were failing because of wrong connectionstrings in the designer.
One step in the process of solving this issue is creating datacontexts in code and specify a connectionstring for it as a parameter for the constructor. This controls the connectionstring setting at runtime and prevents unexplainable errors from occurring.
The second step is a bit trickier and involves creating aliases for SQL server. You can define an alias for a SQL server through the SQL server management tool that comes with SQL server 2005. You can do the same for SQL server 2000 through the client configuration utility. Use this alias to connect to the SQL server using the same login scheme (Integrated security or username and password) and you should notice that the designer will no longer show those nasty dialogs telling you that it wants to replace the connectionstring.
Lazy loading or deferred loading always made the hairs in my neck stand up straight. It’s a feature that I don’t want or need. This is simply because it causes more trouble than it makes up for. One of the problems that comes with deferred loading in Linq to SQL is the famous error message: "DataContext accessed after Dispose..". This can happen any time after you retrieved data from the datacontext. It most probably happens when serializing data in the case of WCF or in a frontend when the datacontext no longer exists.
You can prevent this error from happening by controlling what is loaded, this however is not an easy task. You need to be absolutely sure that you don’t forget to preload objects that are required at the other end, otherwise you will end up having quite a bit of trouble. Also make sure you don’t model relations that you don’t need to access directly through code e.g. you are using Invoice.CustomerID instead of Invoice.Customer. This prevents developers from accessing stuff that isn’t there.
What I really wanted is to disable Deferred loading all together, but this causes a new and far more interesting problem. When I retrieve a single object with a lot of relations all the related objects will get loaded too. This is performance-wise not really what you want, so deferred loading may look optional, but actually isn’t.
You can prevent deferred loading problems completely when you do not expose types generated through a DBML in your service. So for people that do take the extra time for mapping code and writing data contracts, this should be no problem at all.
Merging your DBML
The DBML file is an XML file describing the mapping between your database and the generated C# code. When working on a team this file is sometimes the victim of merge actions. This may sound harmless when done right, however I have seen quite a lot of problems during development after a DBML was merged. The error that you may receive is an InvalidCastException, casting an object to itself. The cause is unknown, but not less annoying.
The solution is both silly and time consuming. The only way to resolve the InvalidCastException is to remove the offending object and drop it back on the designer from the database. Not really what you want if you are creating larger DBML files.
So for those sharing DBML files among developers: Lock the DBML file and this pitfall belongs to the past 😉
Stored procedures versus LINQ queries
Crazy as I am, I decided that a search operation was best done by generating a dynamic LINQ query in code. So I wrote typical code like the following:
1: IQueryable<A> items = context.Items.AsQueryable();
2: items = items.Where(item => item.B == b);
3: items = items.Where(item => item.C == c);
5: if (d != null)
7: items = items.Where(item => item.D >= d);
10: Collection<A> result = new Collection<A>(items.ToList());
Looks great, but actually isn’t. I forgot one little thing: The last operation (which isn’t here) was a delegate that did some complex calculation on the items and couldn’t be translated to an expression tree compatible with SQL. The result was that I did a ToList() on the sequence and a Where() after that. As soon as I performed the ToList() the whole thing just stopped performing, because I had to transfer a few thousand records over to memory and iterate over them.
That was a moment where I had to write: "Note to self: Stored procedures aren’t dead!". Stored procedures are way better at complex queries and you almost never need C# to perform that complex query you are writing. Heck, I never needed .NET assemblies in my databases before, so why would I need Linq to SQL to perform that complex selection on the database?
Programming hasn’t become any easier with Linq to SQL, there is still a lot to think about when working with this new technology and some things have become even harder than ever before. On the other hand, you have your data-access organised in no time, so you have more time to think about the hard parts like deferred loading entities and writing complex business logic.
I hope that this article helps other developers to overcome the challenges of working with Linq to SQL. If you have more tips and notes on Linq to SQL, feel free to post them as a comment to this post. I would love to hear them.