Wednesday, February 06, 2008

For the third year running I'm going to be speaking at BearPark's DevWeek conference. This year I actually managed to submit some breakout session titles on time too.

Wednesday 12th March

11:30 A developer’s guide to Windows Workflow Foundation
There are many challenges to writing software. Not least of these are lack of transparency of code and creating software that can execute correctly in the face of process or machine restart. Windows Workflow Foundation (WF) introduces a new way of writing software that solves these problems and more. This session explains what WF brings to your applications and explains how it works. Along the way we will see the major features of WF that make it a very powerful tool in your toolkit, removing the need for you to write a lot of complex plumbing.

14:00 Creating robust, long-running Workflows
Long-running processes have unique requirements in that they need to maintain state over process restart; Windows Workflow Foundation (WF) enables this with its persistence infrastructure. However, there are issues around hosting and activity development that require attention for long running workflows to be robust. This session looks at the design of the workflow persistence service; issues around hosting and creating full featured asynchronous activities. This session assumes some familiarity with WF.

16:00 Cross my palm with Silver – creating workflow-based WCF services
There are very good reasons for using a workflow to implement a WCF service: workflows can provide a clear platform for service composition (using a number of building block services to generate functionally richer service); workflows can manage long running stateful services without having to write your own plumbing to achieve this. This session introduces the new Visual Studio 2008 Workflow Services. This technology, previously known as “Silver”, provides a relatively seamless integration between WF and WCF, enabling the service developer to concentrate on the application functionality rather than the plumbing. This session assumes some familiarity with WF and WCF.

Friday 14th March - Postcon

A day of connected systems with Visual Studio 2008
Most businesses find themselves building applications that use two or more machines working together to produce their functionality. One of the challenges in this world, apart from the actual business logic being implemented, is connecting the different parts of the application in a way that best fits the environment the machines are places – are there firewalls in place? Are some parts of the application written on different platforms such as Java? Do the different parts of the application have to maintain their state over machine restart? Late 2006 saw Microsoft release WCF and WF to tackle some of these challenges. However, parts of the story were left untold – especially the integration between the two.
Visual Studio 2008 introduces a number of new features for writing service based software. Its features build on the libraries released as part of .NET 3.0, providing an integration layer between the two. In this pre/post conference session we start at the basics of how WCF and WF work and then look at the various integration technologies introduced in Visual Studio 2008.

So if you're attending DevWeek I hope to see you there

 |  | 
Wednesday, February 06, 2008 2:16:10 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback
 Monday, February 04, 2008

One of the most powerful features of the Windows Workflow Foundation is its ability to automate state management of long running processes using a persistence service. Microsoft supply a persistence service out of the box - the SqlWorkflowPersistenceService. Unsurprisingly this persists state in SQL Server (or SQL Express).

Once you have a persistence service more possibilities open up: different applications can progress the same workflow instance over time; multiple hosts can process workflows in a scale out model. This second feature needs a little investigation - there is a gotcha hiding there you need to be aware of.

The issue is how to stop more than one host picking up the same instance of a workflow and processing it at the same time - imagine the workflow transfered $10,000,000 from one account to another, you'd hardly want this happening twice. So if the possiblity exists for multiple hosts to see the same persistence store, the persistence service must be able to ensure only one host is executing a workflow at any one time.

The SqlWorkflowPersistenceService handles multiple concurrent hosts by using the concept of workflow ownership - the guid of a host (created randomly by the persistence service constructor) is stamped against a workflow that it is actively executing (not in an idle or completed state). Now the question comes "what if the host dies while executing a workflow?". This is what the ownership timeout is for. You set the ownership timeout in the constructor of the SqlWorkflowPersistenceService.

SqlWorkflowPersistenceService sql = 
            
new SqlWorkflowPersistenceService(CONN, 
                                             
true
                                              TimeSpan.FromSeconds(10), 
                                              TimeSpan
.FromSeconds(5));

workflowRuntime.AddService(sql);

Here the third parameter specifies how long a host is allowed to run a workflow before persistence occurs. If the host takes longer than this then it will get an error when it atempts to persist. The fourth parameter is the polling interval for how often the persistence service will check for expired timers.

Now the idea is that if a host dies then it's lock will timeout and another host can pick up the work. There is a problem, however, in the implementation. The persistence service only looks for expired ownership locks when it first starts - not when it polls for expired timers. Therefore, for a workflow instance whose host has died mid-processing, it will only recover if a new host instance starts after the timeout has occurred.

So how can you make this more robust? Well we need a way to explicitly load the workflows that have had their ownership expire - unfortunately there is no exposed method to do this on the SqlWorkflowPersistenceService. Instead we have to get all the workflows, catch the exception if we load a locked one and unload any that aren't ready to run. Here is an example:

TimerCallback cb = delegate
{
 
// get all the persisted workflows
  foreach (var item in
sql.GetAllWorkflows())
 
{
   
try
    {
      // load the workflow - this will throw a WorkflowOwnershipException if
      // the workflow is currently owned
      WorkflowInstance
inst = workflowRuntime.GetWorkflow(item.WorkflowInstanceId);

      // Unload workflow if its still idle on a timer
      DateTime timerExpiry = (DateTime)item.NextTimerExpiration;
      if (timerExpiry > DateTime
.Now)
      {
        inst.Unload();
      }
    }
    catch(WorkflowOwnershipException
e)
    {
      // Loaded a workflow locked by another instance
    }
  }
};

Timer t = new Timer(cb, null, 0, 1000);

So this code will attempt to load workflow instances with expired locks every second. Is it a hack? Yes. But without one of two things in the SqlWorkflowPersistenceService its the sort of code you have to write to pick up unlocked workflow instances robustly. The workflow team could:

  • Check for expired ownership locks in the stored procedure that checks for timer expiration
  • Provide a method on the persistence service that explicitly allows the loading of unlocked workflow instances

Maybe in the next version :-)

 | 
Monday, February 04, 2008 10:07:39 AM (GMT Standard Time, UTC+00:00)  #    Comments [3]Trackback
 Tuesday, January 29, 2008

We, at DevelopMentor, are very excited to be running our SilverLight 2.0 course, Essential SilverLight 2.0 in the UK on 17th March - all the new bits straight from Mix'08! Course author, Dave Wheeler, will be delivering the course at our London facility so if you are interested in the next generation of Rich Internet Applications register now.

Tuesday, January 29, 2008 10:02:35 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback
 Friday, January 04, 2008

Christian noticed a bug in my Async Activity base class that I talked about here and here. The latest code is here

 | 
Friday, January 04, 2008 4:20:57 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback
 Friday, December 14, 2007

When working with WF I always find it useful having a BizTalk background. Issues that are reasonably well known in BizTalk are not immediatedly apparent in Workflow unless you know they are part and parcel of that style of programming.

One issue in BizTalk is where you are waiting for a number of messages to start off some processing and you don't know in which order they are going to arrive. In this situation you use a Parallel Shape and put activating receives in the branches initializing the same correlation set. The orchestration engine understands what you are trying to do and creates a convoy for the remaining messages when the first one arrives. This is known as a Concurrent Parallel Receive. You don't leave the parallel shape until the all the messages have arrived and the convoy ensures that the remaining messages are routed to the same orchestration instance.

There is, however, an inherent race condition in this architecture in that: if two messages arrive simultaneously, due to the asynchronous nature of BizTalk, both messages could be processed before the orchestration engine has a chance to set up the convoy infrastructure. We will end up with two instances of the orchestration both waiting for messages that will never arrive. All you can do is put timeouts in place to ensure your orchestrations can recover from that situation and flag the fact that the messages require resubmitting.

With Workflow Services we essentially have the same issue waiting for us. Lets set up the workflow ...

If we call this from a client as follows:

PingClient proxy = new PingClient();
proxy.Ping1();
proxy.Ping2();

then everything works ok - in fact it works irrespective of the order the operations are called in, that's the nature of this pattern. It works because by the time we make the second call, the first has completed and the context is now cached on the proxy.

But lets make the client a bit more complex:

static IDictionary<string, string> ctx = null;
static void Main(string[] args)
{
  PingClient proxy = new PingClient();
  IContextManager mgr = ((IChannel)proxy.InnerChannel).GetProperty<IContextManager>();

  Thread t = new Thread(DoIt);
  t.Start();


  if (ctx != null)
  {
    mgr.SetContext(ctx);
 
}

  proxy.Ping1();
  ctx = mgr.GetContext();

  Console.WriteLine("press enter to exit");
  Console.ReadLine();
}

static void DoIt()
{
  PingClient proxy = new PingClient();
  IContextManager mgr = ((IChannel)proxy.InnerChannel).GetProperty<IContextManager>();

  if
(ctx != null
)
  {
    mgr.SetContext(ctx);
  }
  proxy.Ping2();
  ctx = mgr.GetContext();
}

Here we make the two calls on different proxies on different threads. A successful call stores the context in the static ctx field. Now a proxy will use the context if it has already been set, otherwise it assumes that it is the first call. So here the race condition has made its way all the way back to the client. There are things we could do about this in the client code (taking a lock out while we're making the call so we complete one and store the context before the other checks to see if the context is null), however, that really isn't the point. The messages may come from two separate client applications which both check a database for the context. Again we have an inherent race condition that we need to put architecture in place to detect and recover from. It would be nice to put the ParallelActivity in a ListenActivity with a timeout DelayActivity. However you can't do this because a ParallelActivity does not implement IEventActivity and so the ListenActivity validation will fail (the first activity in a branch must implement IEventActivity). We therefore have to put each receive in its own ListenActivity and time out the waits individually.

So is the situation pretty much the same for both WF and BizTalk? Well not really. The BizTalk window of failure is much smaller than the WF one as the race condition is isolated server side. Because WF ropes the client into message correlation the client has to receive the context and put it somewhere visible to other interested parties before the race condition is resolved.

Hopefully Oslo will bring data orientated correlation to the WF world

 |  |  |  | 
Friday, December 14, 2007 9:31:35 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback
 Thursday, December 13, 2007

I've just been reading the latest team blog entry from the Volta team. This entry is meant to address the issue that I discussed here about the dangers of taking arbitrary objects and remoting them.

They say that they are abstracting away "unnecessary details" only leaving the necessary ones. Firstly, every abstraction leaks (just look at WCF for an example of that) so no matter how hard you try to abstract away the plumbing it will come through the abstraction in unexpected ways. Secondly the remote boundary is not an unnecessary detail. Its a fundemental part of the design of an application.

Unless Volta has heuristics inside it to generate a remote facade during execution, the interface into a remote object is of huge importance to how an application will perform and scale. Volta should at least give you a warning if you apply the [RunAtOrigin] attribute to a class with properties on it.

Does this mean that the whole idea is broken? Not at all - it just means that applications have to be designed with Volta in mind. Decisions have to be made in the design about which layers *may* get remoted and the interface into that layer should be designed accordingly. Then the exact decision about which layers to *actually* remote can be deferred and tuned according to profiling the application.

Thursday, December 13, 2007 9:15:49 AM (GMT Standard Time, UTC+00:00)  #    Comments [4]Trackback
 Wednesday, December 12, 2007

Ian pointed out that there was a race condition in the UniversalAsyncResult that I posted a few days ago. I have amended the original post rather than repost the code.

The changes are to make the complete flag and the waithandle field volatile and to check the complete flag after allocating the waithandle in case Complete was called while the waithandle was being allocated.

Isn't multithreaded programming fun :-)

 | 
Wednesday, December 12, 2007 8:29:13 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback
 Tuesday, December 11, 2007

On 5th December, Microsoft Live Labs announced Volta. The idea of Volta is to be able to write your application in a .NET language of your choice and then the Volta infrastructure takes care of targetting the right platform (JavaScript or SilverLight, IE or FireFox). This side of things is pretty neat. The other thing it allows you to do is to write your application monolithically and then decide on distribution later on based on profiling, etc. This sounds quite neat but I have some reservations.

I went through the DCOM wave or "COM with a longer wire" as it was labelled at the time and learned some hard lessons. I then went through the .NET Remoting experience and learned  ... well ... the same hard lessons. The reality is unless you design something to be remote then when you remote it its going to suck.

Lets look at a Volta example. First some basics:

  1. Download and install the CTP
  2. Create a Volta application project

Now lets build our Volta application. You create the HTML page for the UI (heres the important snippet)

<body>
 
<p>Press the button to do chatty stuff</p>
 
<
p><button id="chat">Chat!</button></p>
 
<
div id="output" />
</
body>

You create the Volta code to wire up your code to the HTML

public partial class VoltaPage1 : Page
{
 
Button chat;
 
Div output;

 
public VoltaPage1()
 
{
   
InitializeComponent();
 
}

 
partial void InitializeComponent()
 
{
   
chat = Document.GetById<Button>("chat");
   
output = Document.GetById<Div>("output");
  }
}

Now create a class that has a chatty interface

class HoldStatement
{
 
public string The { get { return "The "; } }
 
public string Quick { get { return "Quick "; } }
 
public string Brown { get { return "Brown "; } }
 
public string Fox { get { return "Fox "; } }
 
public string Jumped { get { return "Jumped "; } }
 
public string Over { get { return "Over "; } }
 
public string Lazy { get { return "Lazy "; } }
 
public string Dog { get { return "Dog!"; } }
}

OK now lets wire up the event handler on the button to output a statement in the output div

partial void InitializeComponent()
{
 
chat = Document.GetById<Button>("chat");
 
output = Document.GetById<Div>("output");

 
HoldStatement hs = new HoldStatement();
 
chat.Click += delegate
 
{
   
output.InnerText = hs.The + hs.Quick + hs.Brown + hs.Fox + hs.Jumped +
                       hs.Over + hs.The + hs.Lazy + hs.Dog;
 
};
}

If we compile this in Release mode and press F5 we get a browser and a server appears in the system tray.

If you double click on this server icon you get a monitor window that allows you to trace interaction with the server. Bring that up and check the Log Requests checkbox.

Now click on the Chat! button in the browser. Notice you get a bunch of assembly loads in the log window. Clear the log and press the Chat! button again. Now you see there are no requests going across the wire as the entire application is running in the browser.

Now go to the project properties and select the Volta tab. Enable the <queue dramatic music> Tiersplitter. Go to the HoldStatement class, right click on it and select Refactor -> Tier Split to Run at Origin. Notice we now get a [RunAtOrigin] attribute on the class.

Rebuild and press F5 again. Bring up the Log window again and select to log messages. Now every time you press the Chat! button it makes a request to the server. We have distributed our application with just one attribute! - how cool is that?! But hold on ... how many requests are going to the server? Well one for each property access.

Latency is the death knell of many distributed applications. You have to design remote services to be chunky (lots of data in one round trip) otherwise the network roundtripping will kill your application performance. So saying you can defer architecting your solution in terms of distribution then simply annotate classes with an attribute is a dangerous route to go down. We have already learned the hard way that taking a component and just putting it on the other end of a wire does not make for a good distributed application - Volta appears to be pushing us in that direction again.

Tuesday, December 11, 2007 3:01:02 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback

Yesterday I announced the release of the MapperActivity. I said I'd clean up the source code and post it.

Two things are worth noting:

  1. The dependency on the CodeProject Type Browser has gone
  2. There is a known issue where you can map more than one source element to the same destination - in this case the last mapping you create will win. We will fix this in a subsequent release.

Here's the latest version with the code

MapperActivityLibrary.zip (933.62 KB)

 |  |  | 
Tuesday, December 11, 2007 7:31:41 AM (GMT Standard Time, UTC+00:00)  #    Comments [1]Trackback