If you use WCF Data Services, you’ll probably know something about the OData protocol. For example, the OData protocol supports two kinds of updates:
- PUT requests: An entity is replaced with the received entity data. This means that all the entity data must be transferred over the wire.
- MERGE requests: Only the properties that are available in the request are updated on the server. Not all the properties of the entity have to be present in the request, only the ones that need to be updated.
The WCF Data Services client library uses MERGE by default, so you might think that the updates over the wire are optimized to only send the properties that have actually changed. Sadly, this isn’t the case and a lot of users aren’t aware of this. In this post I’m going to provide a solution which can really cut down on the bandwidth used by WCF Data Services when sending updates to the server.
Let’s take a look at a sample database:
It’s a small database but it illustrates the problem quite well. Now Let’s create an Entity Framework Data Model for this database:
Now create a WCF Data Service for it and enable access to the entity sets:
This all leads to the following project:
Now we can add a client project, add a service reference to our service and test it:
This should work and if we take a look at the database the data should be correctly updated. But, in a lot of scenario’s (ASP.NET, stateless services) you only get the updated data along with an id and it’s not acceptable to first send a request for the entity to retrieve it and then update it. Let’s change our code to simulate this:
When running the above code you will get an exception:
When inspecting the SQL that was executed, the cause of the exception becomes clear:
The Entity Framework tries to update all the columns, including the OwnerId column. Problem is that this property has the value 0, since I left it at it’s default value after constructing the new Pet object in the client. When we take a look with Fiddler, this is the data that has been transferred over the wire:
On line 1 you can see that this a MERGE request, but on lines 23-28 it becomes clear that the WCF Data Services client library always sends all the properties in the request, even when using a MERGE and updating only one property. This means that a MERGE essentially becomes the same as a PUT request, which of course, isn’t ideal, most certainly not when updating large objects. Note that this was also the case in the first example when I first retrieved the entity before updating, but it didn’t throw an exception because all the properties had their database values.
The culprit in this example is the UpdateObject() method in the DataServiceContext class. Without it, we can’t notify the context that an object has been modified. With it, the whole object becomes modified, it really is an all or nothing story. To make matters worse, the client library doesn’t have a lot of hooks which let us customize the serialization process or code generation process. I decided to model my solution similar to the solution that was used in the Entity Framework. In the Entity Framework the ObjectContext has an ObjectStateManager, which keeps track of the entities that have been modified. It does this with ObjectStateEntry objects. Each ObjectStateEntry keeps track of which properties have been modified for one entity. Here is an overview of the solution:
The role of the ObjectStateEntry is played by the UpdatedPropertiesEntry class, the role of the ObjectStateManager by the UpdatedPropertiesManager class. We’ll cover the other classes later, but let’s begin with the UpdatedPropertiesEntry class:
The responsibility of this class is to keep track for an entity which properties have been modified. The entity is supplied in the constructor of this class. The class has a lot of methods but most interesting is the SetPropertyModified method on line 43. This method accepts a generic Expression of Func. I decided to use this so that the user of my API can supply a lambda expression to indicate which property has been modified. Modified properties are instances of two classes, SimpleProperty or ComplexProperty (comparable to scalar properties and complex properties in the Entity Framework), both inherit from the abstract Property class. A SimplePropery is not much more than a wrapper around a PropertyInfo object, a ComplexProperty is a wrapper around a collection of PropertyInfo objects. Each property in a ComplexProperty can be modified independently.
The heavy lifting is done in the ConvertExpressionToProperty method of the PropertyUtil class. This class defines a couple of convenience methods that can convert a lambda expression or a property path string to the correct Property objects. Thus, a SimpleProperty if the Expression doesn’t contain a ‘.’ and ComplexProperty if the lambda does.
So let’s take a look at the ConvertExpressionToProperty () method on line 9:
On line 12 it turns the supplied lambda expression in one PropertyInfo object if it’s a simple expression, it turns it into two PropertyInfo objects if the expression is an expression that points to a property of a complex property of an entity. The convert methods on line 16 and 21 are quite easy and are defined on lines 47 and 62. The real work is done in the ConvertExpressionToPropertyInfos() method. This method is defined on line 68.
It first does a couple of checks. The expression body is converted to a MemberExpression, if it’s not a MemberExpression an exception is thrown on line 79. On line 84 it retrieves the MemberInfo object of the MemberExpression and cast’s it to a PropertyInfo object. If it’s not a property, again an exception is thrown. Remember that an expression can point to a property of a complex property, for example: (Person p)=> p.Adddress.Zipcode. The property gotten on 84 is a PropertyInfo object that describes the ZipCode property. On line 90 we determine whether the the expression before the Zipcode (Address.) accesses the lambda parameter. In this case it doesn’t, since ZipCode is a property of Address, not Person. When it doesn’t, this must mean that a complex property is accessed and we can extract another PropertyInfo object, in this case, an object which describes the Address property. this is done on line 102. Finally the two PropertyInfo objects are wrapped in an object and returned. Note that I’ve defined a similar method on line 113 which can accept a property path string to indicate a modified property.
So we have an UpdatedPropertiesEntry object, which holds Property (SimpleProperty or ComplexProperty) objects per entity. Users can indicate to an UpdatedPropertiesEntry object whether a property has changed with a strongly typed lambda expression or a property path string. Next to that, the UpdatedPropertiesEntry object has methods to “UnModify” a property and to check whether a property was modified. This can always be a simple property of an entity, or a simple property of a complex property of an entity. All the methods have overloads to support both a strongly typed lambda expression as well as a property path string.
In order for this all to work, there must be an object which keeps track of all the UpdatedPropertyEntry objects for a DataServiceContext per entity. This is the responsibility of the UpdatedPropertiesManager class (I’ve included the source of this class as a link, since my blog engine can’t handle it within this post, so open it in a new tab :)).
This class is essentially a wrapper around a dictionary. This dictionary associates each entity with an UpdatedPropertiesEntry object. Furthermore, this class has a couple convenience methods which wraps methods of the UpdatedPropertiesEntry class:
Most of these methods are nothing more than retrieving the UpdatedPropertiesEntry object for an entity and delegating to this object. The SetPropertyModified method on line 104 for example, checks whether an entry for the entity exists, if not creates it and then calls the SetPropertyModified method on this entry. The methods in this class also support lambda expressions and property path strings.
Gluing everything together
Okay, We’ve got an UpdatedPropertiesEntry, which can hold which properties have been modified for an entity and we’ve got an UpdatedPropertiesManager which keeps track of all the entities and their corresponding UpdatedPropertiesEntry objects. Now we need to make sure that the DataServiceContext uses this UpdatedPropertiesManager. I wanted a solution that played nicely with the code that was generated by Visual Studio when you add a service reference to a WCF Data Service. The only way you can extend the generated code is by creating another partial DataServiceContext class. I decided that if you want to use my API, the DataServiceContext must implement an interface:
You can let your client side generated DataServiceContext class implement this interface as follows:
Now if there was only a way to automatically add functionality to a class as soon as it implements a certain interface……Oh but there is! They are of course extension methods and are defined in the DataServiceExtensions class:
Most of these methods are wrappers around the methods in the UpdatedPropertiesManager class, take a look at the SetPropertyModified() method in line 63 for example. What you should notice is that it has three generic parameters:
- T: This is needed because my methods only apply to DataServiceContext classes which implement my IDataServiceWithPropertiesManager interface. You can only specify these two constraints with generics.
- TEntity: This is the type of the entity that must be supplied and is inferred from the second argument (the first when you call the method using Extension method syntax).
- TResult: The return type of the supplied lambda expression, which must be generic since I don’t know which type of property it’s going to point at.
Most methods in this class look a lot like the SetPropertyModified() method, so I’m not going to cover them all. There are two interesting methods left. The first is the ClearModifcation() method on line 34. This method is used to reset the state of an entity to UnChanged and is called automatically after all the modified properties have been “UnModified”. You should notice three things:
- You can only set the state for an entity to UnChanged by detaching and attaching it.
- When optimistic concurrency is used you need the ETag when attaching. You can get this from the EntityDescriptor.
- You need to know the entity set name when attaching and this isn’t always the same as the class name. I extract it from the link which is used when an entity needs to be updated, which you can also get from the EntityDescriptor of an entity.
Second important method is OptimizedSaveChanges() on line 100. This is the Extension method a user of my API should call when they want to send all the changes to the database, without sending all the properties of an entity, but only the modified ones. This method attaches an eventhandler to the WritingEntity event of the DataServiceContext class. Then it saves all changes. After the changes have been saved it clears out all it’s entities and their entries since all the entities go back to the UnChanged state. If an exception occurs, dependent on the SaveChangesOptions value, it extracts information from the response about which entities were saved and which weren’t. The ones that were, have their modified properties cleared. Finally it removes the eventhandler for the WritingEntity event. As you’ve probably guessed, all the heavy lifting is done in the eventhandler for the WritingEntity event.
The WritingEntity eventhandler starts on line 154. The interesting parts:
- Line 160: Optimizations will only occur when an entity is being updated. The WritingEntity event also fires for inserts, so a check is needed.
- Line 166: This is the namespace declaration Microsoft uses in OData for metadata. If you scroll back the part where I showed you what was transferred over the wire with Fiddler, you can find this declaration on line 14 with the “m” prefix.
- Line 168: I extract the “properties” element. You can find this element in the “Fiddlered” request on line 23.
- Lines 171-177: I iterate over all the child elements of the properties element previously extracted. These are all the properties of an entity and not only the modified ones. I check with the help of my UpdatedPropertiesManager whether a property has been modified, if not, I remove it from the XML.
I don’t feel like this is the nicest solution, but I think it’s the only solution. The problem is that you can only change the XML that will be sent to the server after it has been created by the DataServiceContext class. I’d much rather create the correct XML in the first place, but this not possible at the moment with the WCF Data Services client library. Also note that if a property of a complex property is modified, all the properties of the complex property are sent to the server. This is done on purpose, since if you don’t send all the properties of a complex property to the server, WCF Data Services updates them with their default .Net values. This seems to be a bug in WCF Data Services and as soon as this is fixed I’ll update the library, since it already has full support for complex properties.
Using the DataServiceExtensions
Using my extensions is quite easy. First make sure that the generated DataServiceContext class implements my IDataServiceWithPropertiesManager interface as shown in my before last code example. After this you can use my API as follows:
Note that in the example above I’ve included a Pet to insert, just for testing. What has changed when using my API:
- Line 3: You need this using statement to bring my extension methods in scope
- Line 27-29: You don’t call UpdateObject() anymore, but SetPropertyModified() instead, indicating which property has been modified with a strongly typed lambda expression. You can see that one property was modified for the pet “soes” and two for “milo”;
- Line 32: You don’t call SaveChanges() anymore, but my OptimizedSaveChanges() extension method.
Easy isn’t it? here is the “Fiddlered” request:
This is an OData batch request as you can see on lines 12 and 13. On line 19 you see the first MERGE request for the Pet with id 1. This is Pet “soes” and she had one property changed, namely “PetName”. Now scroll to line 35. You can see that only the “PetName” property is present in the XML. On line 43 you can see the second MERGE request for Pet with id 2. This is “Milo” and he had two properties changed: “PetName” and “OwnerId”. Now scroll to lines 59 and 60. You can see that only these changed properties are present in the XML. On line 68 is the last request. This is a POST request and indicates an insert. All the properties should be present for an insert and if you take a look at lines 84-87, you can see that this is the case. This optimization can really save you a lot of bandwidth usage. Just try it with the Product table from the AdventureWorks database for example.
Well I think the library is pretty complete for a first release but I still have a couple of good ideas that would make great WCF Data Service Extensions as well:
- An ETag generator at the client.
- Extend above example with Automatic Change Tracking when objects are attached or materialized.
- A Silverlight version of above project.
- Strongly typed Expand() method.
- Methods on the DataServiceContext to call [WebInvoke] service operations.
As always, the code is posted without warranties. I’ve tested my code pretty good, with all kinds of different databases and scenario’s, but I’m sure that I’ve still missed some scenarios and there may still be some bugs present.
You can find my WCF Data Services Extensions here.
And before I forget…….
Happy New Year!!!!!