C#, CLR

CLR Boxing Demystified

One area of programming that many developers do not fully understand is boxing and this can lead to some interesting bugs and poor performance. A firm grasp of boxing is very important for any programmer who works with .NET and the CLR. I will attempt to demystify boxing from a C# perspective.

What? Why?

Firstly it’s important to understand some of the fundamentals of how types are allocated in memory on a CLR based system. There are two main categories of types in any CLR system, C# included: reference types and value types. Reference types are allocated on the managed heap, where as value types are allocated on a thread’s stack. Value types are the more lightweight of the two. When designing types its very important that you understand this difference for a range of reasons, mostly to do with memory management and performance. Reference types (being on the managed heap) are subject to garbage collection (and allocation could force a gc operation) and items on the heap have some overhead memory usage associated with them per instance. Value types on the other hand are allocated on the thread’s stack and are not subject to garbage collection – their memory is freed as soon as the method that defines the variable ends.

Some examples of value types are stucts, enums, int, bool etc.

It’s also important to understand that a value type variable will store its value directly, where as reference types will store a pointer to the object on the managed heap. Also another important point is that value types are sealed and cannot be inherited from. Value types all derive from System.ValueType which itself is derived from System.Object (all types both reference and value end up being derived from System.Object – important to remember when thinking about boxing).

Important: When designing a value type, it must be immutable, which means that no members update any of its fields – this is very important to save you confusion one day.

You only should use a value type when the following statements are true:

  • Type is simple and immutable (i.e. it acts as a primitive type)
  • Does not need to inherit from another type
  • No types will need to derived from it
  • Data will be relatively small in the type or the type will never be passed into or returned from methods (this is important because each time a value type is assigned to a new variable its entire value will be copied – where as a reference type will simply have its reference to the object on the heap copied.

So why do we need boxing?

A boxing operation occurs when ever a reference of a value type is required, for example in a method that takes an object as a parameter. It also occurs when value types are created using any interfaces they may inherit from (oh yeah, value types can implement interfaces :)). As I mentioned earlier, value types all inherit from System.Object in the end, but System.Object is a reference type… so for a value type to become a System.Object type it must be turned into a reference and placed on the managed heap. This is called boxing. Once the value type is converted to Object, it becomes a reference type.

Why is this important to understand? Because boxing is very expensive… creating objects on the managed heap etc causes more work in assigning the object and for the garbage collector. A boxed value type’s life will extend the life of the unboxed version. It also means you can end up with more than one copy of the object (on on stack and one on heap). Working with boxed objects can also be a little strange to the uninitiated.

So what does a box operation look like? Simple:

int x = 5;
Object o = x;

Done. The variable x has now been placed onto the managed heap. The original variable x will stay on the stack until the current method exits.

How can I tell when a box or unbox operation happens easily?

It’s very handy to know when a boxing or unboxing operation is happening, and it may not always be apparent (especially when you start overloading methods and implementing type safe versions of default methods etc). A very easy way to see this is by using ILDasm and looking at the IL version of your code. The IL code for the code above would look like this:

.locals init ([0] int32 x,
          [1] object o,
          [2] int32 y)
 IL_0000:  nop
 IL_0001:  ldc.i4.5
 IL_0002:  stloc.0
 IL_0003:  ldloc.0
 IL_0004:  box        [mscorlib]System.Int32
 IL_0009:  stloc.1
 IL_000a:  ldloc.1
 IL_000b:  unbox.any  [mscorlib]System.Int32
 IL_0010:  stloc.2

You can clearly see the box and unbox IL calls here… you don’t need to fully understand Intermediary Language to gain benefits from this.

Okay, so what are the traps – this seems pretty easy

Well, aside from the performance issues associated with unwanted boxing and unboxing operations, there is a really big gotcha when dealing with boxed versions of types.

Imagine this code:

A test structure

struct TestStruct
    {
        private int value;           	public TestStruct(int initVar)
        {
            this.value = initVar;
        }           	public override string ToString()
        {
            return value.ToString();
        }           	public void ChangeVal(int newVal)
        {
            value = newVal;
        }          }

A test main method

static void Main(string[] args)
       {
        TestStruct ts = new TestStruct(10);              	 	Object boxedObj = ts;              	 	((TestStruct)boxedObj).ChangeVal(20);              	 	Console.WriteLine(string.Format("Boxed version: {0}", boxedObj.ToString()));
        Console.WriteLine(string.Format("Normal version: {0}", ts.ToString()));
        Console.ReadLine();
       }

What will this code display? We first create the test struct and initiate it to 10. Then we box it into object. Then we cast the boxed object back into a TestStruct type and change its value to 20. Then we write out the two values. The output:

Boxed version: 10
Normal version: 10

What happened? Well first, Object knows nothing about TestStruct’s ChangeVal method, so it needs to be cast back to a TestStruct, which creates a temporary TestStruct value on the stack. Then we change the value to 20. However the boxed version doesn’t get updated! Oh and by the way, C# prevents you from altering the fields on the boxed version of a value type.

So be very careful when dealing with boxing and unboxing it could cause you some headaches one day!