Tuesday, February 12, 2008

Recently I wrote an article on Volta for the DevelopMentor newsletter DevelopMents. I concentrated on the core of what Volta is doing, IL rewriting, rather than highlight the IL to JavaScript functionality that has caught most attention. You can read the article here.

Tuesday, February 12, 2008 8:09:49 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback

Recently I discussed creating robust multi-host workflow architectures with WF using the SqlWorkflowPersistenceService. I talked about an issue with the way workflow ownership is implemented and  showed some code that would fix the issue but that I also said was a hack. Ayende rightly pointed out that for high volume systems it would be a disaster. The point of that post was to highlight the problem.

Ideally the workflow team will fix the persistence service to recover from abandoned workflows more robustly - in the meantime the following stored procedure will unlock the abandoned workflows and make then runnable. The idea is to make this a scheduled job in the database running every few seconds - this is essentially what BizTalk does.

CREATE PROCEDURE dbo.ClearUnlockedWorkflows
AS
BEGIN
  SET NOCOUNT ON

  UPDATE
InstanceState Set ownerID=null
                           ownedUntil=null
                           unlocked=1,
                           nextTimer=getdate
()
  WHERE NOT ownerID is NULL and 
            
ownedUntil < getdate
()

  RETURN
END

The only oddity in the code is the setting of the nextTimer to the current time. The issue is that straight persistence (as opposed to unloading on a delay) sets this value to 31st Dec 9999 which is obviously not going to be reached for some time. Unfortunately the workflow will only be scheduled due to an expired timer so I have to reset the timer such that it will expire immediately. I can't think of any issues this would cause but if there's a scenario I haven't thought of all comments are welcome.

 |  | 
Tuesday, February 12, 2008 8:57:13 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback
 Wednesday, February 06, 2008

For the third year running I'm going to be speaking at BearPark's DevWeek conference. This year I actually managed to submit some breakout session titles on time too.

Wednesday 12th March

11:30 A developer’s guide to Windows Workflow Foundation
There are many challenges to writing software. Not least of these are lack of transparency of code and creating software that can execute correctly in the face of process or machine restart. Windows Workflow Foundation (WF) introduces a new way of writing software that solves these problems and more. This session explains what WF brings to your applications and explains how it works. Along the way we will see the major features of WF that make it a very powerful tool in your toolkit, removing the need for you to write a lot of complex plumbing.

14:00 Creating robust, long-running Workflows
Long-running processes have unique requirements in that they need to maintain state over process restart; Windows Workflow Foundation (WF) enables this with its persistence infrastructure. However, there are issues around hosting and activity development that require attention for long running workflows to be robust. This session looks at the design of the workflow persistence service; issues around hosting and creating full featured asynchronous activities. This session assumes some familiarity with WF.

16:00 Cross my palm with Silver – creating workflow-based WCF services
There are very good reasons for using a workflow to implement a WCF service: workflows can provide a clear platform for service composition (using a number of building block services to generate functionally richer service); workflows can manage long running stateful services without having to write your own plumbing to achieve this. This session introduces the new Visual Studio 2008 Workflow Services. This technology, previously known as “Silver”, provides a relatively seamless integration between WF and WCF, enabling the service developer to concentrate on the application functionality rather than the plumbing. This session assumes some familiarity with WF and WCF.

Friday 14th March - Postcon

A day of connected systems with Visual Studio 2008
Most businesses find themselves building applications that use two or more machines working together to produce their functionality. One of the challenges in this world, apart from the actual business logic being implemented, is connecting the different parts of the application in a way that best fits the environment the machines are places – are there firewalls in place? Are some parts of the application written on different platforms such as Java? Do the different parts of the application have to maintain their state over machine restart? Late 2006 saw Microsoft release WCF and WF to tackle some of these challenges. However, parts of the story were left untold – especially the integration between the two.
Visual Studio 2008 introduces a number of new features for writing service based software. Its features build on the libraries released as part of .NET 3.0, providing an integration layer between the two. In this pre/post conference session we start at the basics of how WCF and WF work and then look at the various integration technologies introduced in Visual Studio 2008.

So if you're attending DevWeek I hope to see you there

 |  | 
Wednesday, February 06, 2008 2:16:10 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback
 Monday, February 04, 2008

One of the most powerful features of the Windows Workflow Foundation is its ability to automate state management of long running processes using a persistence service. Microsoft supply a persistence service out of the box - the SqlWorkflowPersistenceService. Unsurprisingly this persists state in SQL Server (or SQL Express).

Once you have a persistence service more possibilities open up: different applications can progress the same workflow instance over time; multiple hosts can process workflows in a scale out model. This second feature needs a little investigation - there is a gotcha hiding there you need to be aware of.

The issue is how to stop more than one host picking up the same instance of a workflow and processing it at the same time - imagine the workflow transfered $10,000,000 from one account to another, you'd hardly want this happening twice. So if the possiblity exists for multiple hosts to see the same persistence store, the persistence service must be able to ensure only one host is executing a workflow at any one time.

The SqlWorkflowPersistenceService handles multiple concurrent hosts by using the concept of workflow ownership - the guid of a host (created randomly by the persistence service constructor) is stamped against a workflow that it is actively executing (not in an idle or completed state). Now the question comes "what if the host dies while executing a workflow?". This is what the ownership timeout is for. You set the ownership timeout in the constructor of the SqlWorkflowPersistenceService.

SqlWorkflowPersistenceService sql = 
            
new SqlWorkflowPersistenceService(CONN, 
                                             
true
                                              TimeSpan.FromSeconds(10), 
                                              TimeSpan
.FromSeconds(5));

workflowRuntime.AddService(sql);

Here the third parameter specifies how long a host is allowed to run a workflow before persistence occurs. If the host takes longer than this then it will get an error when it atempts to persist. The fourth parameter is the polling interval for how often the persistence service will check for expired timers.

Now the idea is that if a host dies then it's lock will timeout and another host can pick up the work. There is a problem, however, in the implementation. The persistence service only looks for expired ownership locks when it first starts - not when it polls for expired timers. Therefore, for a workflow instance whose host has died mid-processing, it will only recover if a new host instance starts after the timeout has occurred.

So how can you make this more robust? Well we need a way to explicitly load the workflows that have had their ownership expire - unfortunately there is no exposed method to do this on the SqlWorkflowPersistenceService. Instead we have to get all the workflows, catch the exception if we load a locked one and unload any that aren't ready to run. Here is an example:

TimerCallback cb = delegate
{
 
// get all the persisted workflows
  foreach (var item in
sql.GetAllWorkflows())
 
{
   
try
    {
      // load the workflow - this will throw a WorkflowOwnershipException if
      // the workflow is currently owned
      WorkflowInstance
inst = workflowRuntime.GetWorkflow(item.WorkflowInstanceId);

      // Unload workflow if its still idle on a timer
      DateTime timerExpiry = (DateTime)item.NextTimerExpiration;
      if (timerExpiry > DateTime
.Now)
      {
        inst.Unload();
      }
    }
    catch(WorkflowOwnershipException
e)
    {
      // Loaded a workflow locked by another instance
    }
  }
};

Timer t = new Timer(cb, null, 0, 1000);

So this code will attempt to load workflow instances with expired locks every second. Is it a hack? Yes. But without one of two things in the SqlWorkflowPersistenceService its the sort of code you have to write to pick up unlocked workflow instances robustly. The workflow team could:

  • Check for expired ownership locks in the stored procedure that checks for timer expiration
  • Provide a method on the persistence service that explicitly allows the loading of unlocked workflow instances

Maybe in the next version :-)

 | 
Monday, February 04, 2008 10:07:39 AM (GMT Standard Time, UTC+00:00)  #    Comments [3]Trackback
 Tuesday, January 29, 2008

We, at DevelopMentor, are very excited to be running our SilverLight 2.0 course, Essential SilverLight 2.0 in the UK on 17th March - all the new bits straight from Mix'08! Course author, Dave Wheeler, will be delivering the course at our London facility so if you are interested in the next generation of Rich Internet Applications register now.

Tuesday, January 29, 2008 10:02:35 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback
 Friday, January 04, 2008

Christian noticed a bug in my Async Activity base class that I talked about here and here. The latest code is here

 | 
Friday, January 04, 2008 4:20:57 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback
 Friday, December 14, 2007

When working with WF I always find it useful having a BizTalk background. Issues that are reasonably well known in BizTalk are not immediatedly apparent in Workflow unless you know they are part and parcel of that style of programming.

One issue in BizTalk is where you are waiting for a number of messages to start off some processing and you don't know in which order they are going to arrive. In this situation you use a Parallel Shape and put activating receives in the branches initializing the same correlation set. The orchestration engine understands what you are trying to do and creates a convoy for the remaining messages when the first one arrives. This is known as a Concurrent Parallel Receive. You don't leave the parallel shape until the all the messages have arrived and the convoy ensures that the remaining messages are routed to the same orchestration instance.

There is, however, an inherent race condition in this architecture in that: if two messages arrive simultaneously, due to the asynchronous nature of BizTalk, both messages could be processed before the orchestration engine has a chance to set up the convoy infrastructure. We will end up with two instances of the orchestration both waiting for messages that will never arrive. All you can do is put timeouts in place to ensure your orchestrations can recover from that situation and flag the fact that the messages require resubmitting.

With Workflow Services we essentially have the same issue waiting for us. Lets set up the workflow ...

If we call this from a client as follows:

PingClient proxy = new PingClient();
proxy.Ping1();
proxy.Ping2();

then everything works ok - in fact it works irrespective of the order the operations are called in, that's the nature of this pattern. It works because by the time we make the second call, the first has completed and the context is now cached on the proxy.

But lets make the client a bit more complex:

static IDictionary<string, string> ctx = null;
static void Main(string[] args)
{
  PingClient proxy = new PingClient();
  IContextManager mgr = ((IChannel)proxy.InnerChannel).GetProperty<IContextManager>();

  Thread t = new Thread(DoIt);
  t.Start();


  if (ctx != null)
  {
    mgr.SetContext(ctx);
 
}

  proxy.Ping1();
  ctx = mgr.GetContext();

  Console.WriteLine("press enter to exit");
  Console.ReadLine();
}

static void DoIt()
{
  PingClient proxy = new PingClient();
  IContextManager mgr = ((IChannel)proxy.InnerChannel).GetProperty<IContextManager>();

  if
(ctx != null
)
  {
    mgr.SetContext(ctx);
  }
  proxy.Ping2();
  ctx = mgr.GetContext();
}

Here we make the two calls on different proxies on different threads. A successful call stores the context in the static ctx field. Now a proxy will use the context if it has already been set, otherwise it assumes that it is the first call. So here the race condition has made its way all the way back to the client. There are things we could do about this in the client code (taking a lock out while we're making the call so we complete one and store the context before the other checks to see if the context is null), however, that really isn't the point. The messages may come from two separate client applications which both check a database for the context. Again we have an inherent race condition that we need to put architecture in place to detect and recover from. It would be nice to put the ParallelActivity in a ListenActivity with a timeout DelayActivity. However you can't do this because a ParallelActivity does not implement IEventActivity and so the ListenActivity validation will fail (the first activity in a branch must implement IEventActivity). We therefore have to put each receive in its own ListenActivity and time out the waits individually.

So is the situation pretty much the same for both WF and BizTalk? Well not really. The BizTalk window of failure is much smaller than the WF one as the race condition is isolated server side. Because WF ropes the client into message correlation the client has to receive the context and put it somewhere visible to other interested parties before the race condition is resolved.

Hopefully Oslo will bring data orientated correlation to the WF world

 |  |  |  | 
Friday, December 14, 2007 9:31:35 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback
 Thursday, December 13, 2007

I've just been reading the latest team blog entry from the Volta team. This entry is meant to address the issue that I discussed here about the dangers of taking arbitrary objects and remoting them.

They say that they are abstracting away "unnecessary details" only leaving the necessary ones. Firstly, every abstraction leaks (just look at WCF for an example of that) so no matter how hard you try to abstract away the plumbing it will come through the abstraction in unexpected ways. Secondly the remote boundary is not an unnecessary detail. Its a fundemental part of the design of an application.

Unless Volta has heuristics inside it to generate a remote facade during execution, the interface into a remote object is of huge importance to how an application will perform and scale. Volta should at least give you a warning if you apply the [RunAtOrigin] attribute to a class with properties on it.

Does this mean that the whole idea is broken? Not at all - it just means that applications have to be designed with Volta in mind. Decisions have to be made in the design about which layers *may* get remoted and the interface into that layer should be designed accordingly. Then the exact decision about which layers to *actually* remote can be deferred and tuned according to profiling the application.

Thursday, December 13, 2007 9:15:49 AM (GMT Standard Time, UTC+00:00)  #    Comments [4]Trackback
 Wednesday, December 12, 2007

Ian pointed out that there was a race condition in the UniversalAsyncResult that I posted a few days ago. I have amended the original post rather than repost the code.

The changes are to make the complete flag and the waithandle field volatile and to check the complete flag after allocating the waithandle in case Complete was called while the waithandle was being allocated.

Isn't multithreaded programming fun :-)

 | 
Wednesday, December 12, 2007 8:29:13 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]Trackback