In a recent post I made a misleading assertion. I
said:
Now, when an instance of a type is created it needs to be initialized to a
known state. The CLR provides two mechanisms for this: field initializers and
constructors
This is misleading because the CLR doesn’t provide this, C# and VB.NET (among
others) do. The CLR actually provides three mechanisms for initialization:
- it will guarantee to zero all local variables
- the IL instruction initobj zeros allocated instances
- it can run a constructor
What C# does with a field initializer is to emit the IL instructions to
initialize the member variable on your behalf.
i.e. if we compile the following code
class Foo
{
int x = 5;
static void Main()
{
Foo f = new Foo();
}
}
If we look at the IL for the generated constructor we see:
IL_0000: ldarg.0
IL_0001: ldc.i4.5
IL_0002: stfld int32 Foo::x
IL_0007: ldarg.0
IL_0008: call instance void [mscorlib]System.Object::.ctor()
IL_000d: ret
We can see the C# compiler has emitted the field initialization in the
generated constructor body before the call to the base class constructor.
So having cleared that up, let’s dig a little deeper into value type
initialization.
Let’s take a simple example:
struct Point
{
public int x;
public int y;
public Point( int x, int y )
{
this.x = x;
this.y = y;
}
public override string ToString()
{
return string.Format("({0},{1})", x, y);
}
}
class App
{
static void Main()
{
}
static void Test1()
{
Point p;
p.x = 10;
p.y = 20;
Console.WriteLine(p);
}
static void Test2()
{
Point p = new Point();
Console.WriteLine(p);
}
static void Test3()
{
Point p = new Point( 10, 20 );
Console.WriteLine(p);
}
}
We have a simple 2D point value type that has its own constructor and also an override of ToString (just so we can see the effects of the code). We also have three
methods in the App class that create and initialize the Point in a number of ways.
- Firstly through simply declaring an instance and
manually setting all of the state.
- Secondly through using the supposed autogenerated
default constructor.
- Lastly using the custom constructor.
Lets run this through the compiler and look how each of those Test methods
look.
Here's Test1
.locals init (valuetype Point V_0)
IL_0000: ldloca.s V_0
IL_0002: ldc.i4.s 10
IL_0004: stfld int32 Point::x
IL_0009: ldloca.s V_0
IL_000b: ldc.i4.s 20
IL_000d: stfld int32 Point::y
IL_0012: ldloc.0
IL_0013: box Point
IL_0018: call void [mscorlib]System.Console::WriteLine(object)
IL_001d: ret
This is fairly straightforward. The instance is declared in the .locals section and then the fields are initialized via the stfld instruction.
Let's look at Test2. I guess you would expect to see a call to the a
generated default constructor, but the reality is different.
.locals init (valuetype Point V_0)
IL_0000: ldloca.s V_0
IL_0002: initobj Point
IL_0008: ldloc.0
IL_0009: box Point
IL_000e: call void [mscorlib]System.Console::WriteLine(object)
IL_0013: ret
Now this is a bit surprising. There is no default constructor generated, instead the initobj IL instruction is used to zero the memory.
Finally, for completeness lets look at Test3
.locals init (valuetype Point V_0)
IL_0000: ldloca.s V_0
IL_0002: ldc.i4.s 10
IL_0004: ldc.i4.s 20
IL_0006: call instance void Point::.ctor(int32,
int32)
IL_000b: ldloc.0
IL_000c: box Point
IL_0011: call void [mscorlib]System.Console::WriteLine(object)
IL_0016: ret
Here we see no call to initobj, but instead a call to the custom Point
constructor to initialize the value type.
OK, so we have seen all three ways the CLR uses to initialize a value type -
well almost. I claimed that the CLR will zero the memory anyway and yet in Test2
the C# compiler emits the initobj IL instruction to perform that
operation. So was I lying? Lets change the IL to remove that initobj instruction (and as it happens we'll also have to remove the
preceding ldloca.s instruction too) and see what the effect is. The resulting IL looks like this:
.locals init (valuetype Point V_0)
IL_0008: ldloc.0
IL_0009: box Point
IL_000e: call void [mscorlib]System.Console::WriteLine(object)
IL_0013: ret
If we compile this with ILASM.EXE we get a clean compilation and the programs runs with the same output as before - in other words Test2 outputs "(0,0)". So the CLR does truely zero the memory.
However, if thats the case, why does the C# compiler emit the initobj instruction in the first place? Well look at this function:
static void Test4()
{
Point p = new Point();
p.x = 10;
p = new Point();
}
The IL looks like this:
.locals init (valuetype Point V_0)
IL_0000: ldloca.s V_0
IL_0002: initobj Point
IL_0008: ldloca.s V_0
IL_000a: ldc.i4.s 10
IL_000c: stfld int32 Point::x
IL_0011: ldloca.s V_0
IL_0013: initobj Point
IL_0019: ret
We see that each new Point() emits a call to initobj. For the situation in Test2 not to do this the C# compiler team would have to special case the first
call in a method. This is unlikely to be a worthwhile optimzation.
Finally, my thanks to Ian for his pedantry that led me down this road