# Friday, January 23, 2009
« Workflow 4.0 Services and “Invalid... | Main | BlobExplorer Now Supports Blob Metadata »

Sometimes you have to wonder if this subject will ever go away …

A few weeks ago I posted on using OO constructs in your DataContracts. It’s one of those things that is understandable when .NET developers first start building message based systems. Another issue that raises its head over and over again goes along the lines of “I’m trying to send a DataSet back to my client and it just isn’t working properly”. This reminds me of the old joke:

Patient: Doctor, doctor it hurts when I do this (patient raises his arm into a strange position over his head)
Doctor: Well don’t do it then

So what is the issue with using DataSets as parameters or return types in your service operations?

Lets start off with interoperability – or the total lack thereof. If we are talking about interoperability we have to think about what goes into the WSDL for the DataSet – after all it is a .NET type. In fact DataSets, by default serialize as XML so surely it must be ok! Here’s what a DataSet looks like in terms of XML Schema in the WSDL

  <xs:element ref="xs:schema" /> 
  <xs:any />

In other words I’m going to send you some … XML – you work out what do with it. But hold on – if I use Add Service Reference, it *knows* its a DataSet so maybe it is ok. Well WCF cheats; just above that sequence is another piece of XML

       <ActualType Name="DataSet" Namespace="http://schemas.datacontract.org/2004/07/System.Data"
="http://schemas.microsoft.com/2003/10/Serialization/" /> 

So WCF cheats by putting an annotation only it understands into the schema so it knows to use a DataSet. If you really do want to pass back arbitrary XML as part of a message then use an XElement.

So how about if I have WCF on both ends of the wire? Well then you’ve picked a really inefficient way to transfer around the data. You have to remember how highly functional a DataSet actually is. Its not just the data in the tables that support that functionality, there is also : change tracking data to keep track of what rows have been added, updated and removed since the DataSet was filled; relationship data between tables; a schema  describing itself. DataSets are there to support disconnected processing of tabular data, not as a general purpose data transfer mechanism.

Then you may say – “hey we’re running on an intranet – the extra data is unimportant”. So the final issue you get with a DataSet is tight coupling of the service and the consumer. Changes to the structure of the data on one side of the wire cascade to the other side to someone who may not be expecting the changes. Admittedly not all changes will be breaking ones but are you sure you know which ones will be and which ones won’t. As long as the data you actually want to pass isn’t changing why are you inflicting this instability on the other party in the message exchange. The likelihood that you will have to make unnecessary changes to, say, the client when the service changes is increased with DataSets 

So what am I suggesting to do instead? Instead model the data that you do want to pass around using DataContract (or XElement if you truly want to be able to pass untyped XML). Does this mean you have to translate the data from a DataSet to this DataContract when you want to send it? Yes it does, but that code can be isolated in a single place. When you receive the data as a DataContract and want to process it as a DataSet, does this mean you have to recreate a DataSet programmatically? Yes it does, but again you can isolate this code in a single place.

So what does doing this actually buy you if you do that work? You get something which is potentially interoperable, that only passes the required data across the wire and that decouples the service and the consumer.