.Net Meanderings

Richard Blewett's wanderings around .NET

Please read my disclaimer. Subscribe to my RSS feed

More on the Secret Life of Value Types

In a recent post I made a misleading assertion. I said:

Now, when an instance of a type is created it needs to be initialized to a known state. The CLR provides two mechanisms for this: field initializers and constructors

This is misleading because the CLR doesn’t provide this, C# and VB.NET (among others) do. The CLR actually provides three mechanisms for initialization:

  • it will guarantee to zero all local variables
  • the IL instruction initobj zeros allocated instances
  • it can run a constructor

What C# does with a field initializer is to emit the IL instructions to initialize the member variable on your behalf.

i.e. if we compile the following code

class Foo
{
    int x = 5;
    static void Main()
    {
        Foo f = new Foo();
    }
}

If we look at the IL for the generated constructor we see:

IL_0000: ldarg.0
IL_0001: ldc.i4.5
IL_0002: stfld     int32 Foo::x
IL_0007: ldarg.0
IL_0008: call      instance void [mscorlib]System.Object::.ctor()
IL_000d: ret

We can see the C# compiler has emitted the field initialization in the generated constructor body before the call to the base class constructor.

So having cleared that up, let’s dig a little deeper into value type initialization.

Let’s take a simple example:

struct Point
{
    public int x;
    public int y; 
    public Point( int x, int y )
    {
        this.x = x;
        this.y = y;
    }   

    public override string ToString()
    {
        return string.Format("({0},{1})", x, y);
    }
}

class App
{
    static void Main()
    {
    }
    
    static void Test1()
    {
        Point p;
        p.x = 10;
        p.y = 20;
        
        Console.WriteLine(p);
    }            

    static void Test2()
    {
        Point p = new Point();

        Console.WriteLine(p);
    }                  
    
    static void Test3()
    {
        Point p = new Point( 10, 20 );

        Console.WriteLine(p);
    }
}

We have a simple 2D point value type that has its own constructor and also an override of ToString (just so we can see the effects of the code). We also have three methods in the App class that create and initialize the Point in a number of ways.

  • Firstly through simply declaring an instance and manually setting all of the state.
  • Secondly through using the supposed autogenerated default constructor.
  • Lastly using the custom constructor.

Lets run this through the compiler and look how each of those Test methods look.

Here's Test1

.locals init (valuetype Point V_0)
IL_0000:  ldloca.s   V_0
IL_0002:  ldc.i4.s   10
IL_0004:  stfld      int32 Point::x
IL_0009:  ldloca.s   V_0
IL_000b:  ldc.i4.s   20
IL_000d:  stfld      int32 Point::y 
IL_0012: ldloc.0 
IL_0013: box Point 
IL_0018: call void [mscorlib]System.Console::WriteLine(object) 
IL_001d: ret 

This is fairly straightforward. The instance is declared in the .locals section and then the fields are initialized via the stfld instruction.

Let's look at Test2. I guess you would expect to see a call to the a generated default constructor, but the reality is different.

.locals init (valuetype Point V_0)
IL_0000:  ldloca.s   V_0
IL_0002:  initobj    Point
IL_0008:  ldloc.0
IL_0009:  box        Point
IL_000e:  call       void [mscorlib]System.Console::WriteLine(object)
IL_0013:  ret

Now this is a bit surprising. There is no default constructor generated, instead the initobj IL instruction is used to zero the memory.

Finally, for completeness lets look at Test3

.locals init (valuetype Point V_0)
IL_0000:  ldloca.s   V_0
IL_0002:  ldc.i4.s   10
IL_0004:  ldc.i4.s   20
IL_0006:  call       instance void Point::.ctor(int32,
                                                int32)  
IL_000b:  ldloc.0
IL_000c:  box        Point
IL_0011:  call       void [mscorlib]System.Console::WriteLine(object)
IL_0016:  ret

Here we see no call to initobj, but instead a call to the custom Point constructor to initialize the value type.

OK, so we have seen all three ways the CLR uses to initialize a value type - well almost. I claimed that the CLR will zero the memory anyway and yet in Test2 the C# compiler emits the initobj IL instruction to perform that operation. So was I lying? Lets change the IL to remove that initobj instruction (and as it happens we'll also have to remove the preceding ldloca.s instruction too) and see what the effect is. The resulting IL looks like this:

.locals init (valuetype Point V_0)
IL_0008:  ldloc.0
IL_0009:  box        Point
IL_000e:  call       void [mscorlib]System.Console::WriteLine(object)
IL_0013:  ret  

If we compile this with ILASM.EXE we get a clean compilation and the programs runs with the same output as before - in other words Test2 outputs "(0,0)". So the CLR does truely zero the memory. However, if thats the case, why does the C# compiler emit the initobj instruction in the first place? Well look at this function:

static void Test4()
{
    Point p = new Point();
    p.x = 10;
    p = new Point();
}

The IL looks like this:

.locals init (valuetype Point V_0)
IL_0000:  ldloca.s   V_0
IL_0002:  initobj    Point
IL_0008:  ldloca.s   V_0
IL_000a:  ldc.i4.s   10
IL_000c:  stfld      int32 Point::x
IL_0011:  ldloca.s   V_0
IL_0013:  initobj    Point
IL_0019:  ret

We see that each new Point() emits a call to initobj. For the situation in Test2 not to do this the C# compiler team would have to special case the first call in a method. This is unlikely to be a worthwhile optimzation.

Finally, my thanks to Ian for his pedantry that led me down this road

09/02/2004 8:23 PM | Comments [2370] | #.NET

Content © 2003 Richard Blewett | Subscribe to my RSS feed.

Powered by BlogX