One of the most powerful features of the Windows Workflow Foundation is its ability to automate state management of long running processes using a persistence service. Microsoft supply a persistence service out of the box - the SqlWorkflowPersistenceService. Unsurprisingly this persists state in SQL Server (or SQL Express).
Once you have a persistence service more possibilities open up: different applications can progress the same workflow instance over time; multiple hosts can process workflows in a scale out model. This second feature needs a little investigation - there is a gotcha hiding there you need to be aware of.
The issue is how to stop more than one host picking up the same instance of a workflow and processing it at the same time - imagine the workflow transfered $10,000,000 from one account to another, you'd hardly want this happening twice. So if the possiblity exists for multiple hosts to see the same persistence store, the persistence service must be able to ensure only one host is executing a workflow at any one time.
The SqlWorkflowPersistenceService handles multiple concurrent hosts by using the concept of workflow ownership - the guid of a host (created randomly by the persistence service constructor) is stamped against a workflow that it is actively executing (not in an idle or completed state). Now the question comes "what if the host dies while executing a workflow?". This is what the ownership timeout is for. You set the ownership timeout in the constructor of the SqlWorkflowPersistenceService.
SqlWorkflowPersistenceService sql =
new SqlWorkflowPersistenceService(CONN,
true,
TimeSpan.FromSeconds(10),
TimeSpan.FromSeconds(5));
workflowRuntime.AddService(sql);
Here the third parameter specifies how long a host is allowed to run a workflow before persistence occurs. If the host takes longer than this then it will get an error when it atempts to persist. The fourth parameter is the polling interval for how often the persistence service will check for expired timers.
Now the idea is that if a host dies then it's lock will timeout and another host can pick up the work. There is a problem, however, in the implementation. The persistence service only looks for expired ownership locks when it first starts - not when it polls for expired timers. Therefore, for a workflow instance whose host has died mid-processing, it will only recover if a new host instance starts after the timeout has occurred.
So how can you make this more robust? Well we need a way to explicitly load the workflows that have had their ownership expire - unfortunately there is no exposed method to do this on the SqlWorkflowPersistenceService. Instead we have to get all the workflows, catch the exception if we load a locked one and unload any that aren't ready to run. Here is an example:
TimerCallback cb = delegate
{
// get all the persisted workflows
foreach (var item in sql.GetAllWorkflows())
{
try
{
// load the workflow - this will throw a WorkflowOwnershipException if
// the workflow is currently owned
WorkflowInstance inst = workflowRuntime.GetWorkflow(item.WorkflowInstanceId);
// Unload workflow if its still idle on a timer
DateTime timerExpiry = (DateTime)item.NextTimerExpiration;
if (timerExpiry > DateTime.Now)
{
inst.Unload();
}
}
catch(WorkflowOwnershipException e)
{
// Loaded a workflow locked by another instance
}
}
};
Timer t = new Timer(cb, null, 0, 1000);
So this code will attempt to load workflow instances with expired locks every second. Is it a hack? Yes. But without one of two things in the SqlWorkflowPersistenceService its the sort of code you have to write to pick up unlocked workflow instances robustly. The workflow team could:
- Check for expired ownership locks in the stored procedure that checks for timer expiration
- Provide a method on the persistence service that explicitly allows the loading of unlocked workflow instances
Maybe in the next version 