Managing parent and child collection relationships in Entity Framework – What is an identifying relationship anyway?

Some of the posts on this site are more as a reminder to myself about some problem that I have encountered and the solution I have found for that problem. This is one such post. The issue addressed below is fairly common and has been described and solved more comprehensively elsewhere but I have added this so I can find it again in the future should the need arise. If it proves helpful to anyone else then all the better. 

It is often desirable to have a parent entity that has a collection of child entities. For example a Order entity may have a collection of OrderLine entities, or a User entity may have a collection of Role entities. These example may look similar but they have a subtle difference that may cause some behavioural issues in some circumstances.

In the first example the order lines belong to an order and it doesn’t make sense for them to exist on their own. In the second example a Role can exist independently from the User.

This is the difference between an identifying and non-identifying relationship. Another example from Stack Overflow:

A book belongs to an owner, and an owner can own multiple books. But the book can exist also without the owner and it can change the owner. The relationship between a book and an owner is a non identifying relationship.

A book however is written by an author, and the author could have written multiple books. But the book needs to be written by an author it cannot exist without an author. Therefore the relationship between the book and the author is an identifying relationship.

Whatever the type of relationship it would be nice to be able to just add and remove child items to the parent collection and let EF worry about all the bothersome adding and removing from the Database.

Adding to the collection works correctly in both cases and if you have a non identifying relationship (your child foreign key is nullable) you will also be fine when removing items from the collection.

If you have only implied an identifying relationship (your child primary key is on the table Id field and your child foreign key is non nullable) you are likely to be in for a world of pain trying to remove items from the child collection.

When you remove an item from a child collection with the above kind of relationship Entity Framework tries to update the child entity foreign key to null. This causes problems as the foreign key is non-nullable and you get this System.InvalidOperationException when the context saves:

The operation failed: The relationship could not be changed because one or more of the foreign-key properties is non-nullable. When a change is made to a relationship, the related foreign-key property is set to a null value. If the foreign-key does not support null values, a new relationship must be defined, the foreign-key property must be assigned another non-null value, or the unrelated object must be deleted.

The way to stop this from happening is to make the identifying relationship explicit by specifying the foreign key field of the child to be part of its primary key. I know that in the ORM world composite keys are not really in vogue, but this make things explicit to Entity Framework that the child cannot exist without its parent and so should be deleted when it is removed from the parents collection.

How do I specify a composite key?

To set the composite keys simply annotate the entity (or use the fluent API as part of OnModelCreating in the Context). If you have an OrderLine entity that has an identifying relationship with an Order entity the OrderLine entity should be annotated like this:

public class OrderLine
{
    [KeyColumn(Order = 0), DatabaseGenerated(DatabaseGeneratedOption.Identity)] 
    public int Id { getset; }

    public string Description { getset; }

    public Decimal Cost { getset; }

    [KeyForeignKey("Order"), Column(Order = 1)]
    public int PupilId { getset; }

    public virtual Order Order { getset; }
}

Dynamic Method Invocation in C# with PetaPoco (Part 2)

In Part 1 I looked at how PetaPoco used Dynamic Method Invocation to return a method that could convert an IDataReader column into a  Dynamic ExpandoObject. Now its time to look at the the other functionality used to return scalars and POCOs.

Scalars

In PetaPoco a scalar value is returned in the following way:

string column = db.ExecuteScalar<string>("SELECT Column FROM Table");

The functionality to create a factory to get the scalar values is part of the same GetFactory method covered in the previous post. I will pick up the method body when we already have the IlGenerator. After a check to see if the required type is a value type, a string or a Byte[] (indicating a scalar value) the method body is built up with IL. As before there is logic to perform some conversion for particular IDataReader implementations, and once again I am going to ignore it for the sake of brevity.

il.Emit(OpCodes.Ldarg_0);		// rdr
il.Emit(OpCodes.Ldc_I4_0);		// rdr,0
il.Emit(OpCodes.Callvirt, fnIsDBNull);	// bool

The first operations are the start of the if-statement to see if the value is null or not. First the IDataReader is pushed onto the stack then Ldc_I4_0 pushes on an int 0. As ExecuteScalar always returns the first column of the first row, this 0 represents column(0).  The Callvirt calls a previously defined MethodInfo for IDataRow.IsDBNull(). These three statements are the equivalent of r.IsDbNull(0). The result of this method call will be left on the top of the stack.

var lblCont = il.DefineLabel();
il.Emit(OpCodes.Brfalse_S, lblCont);
il.Emit(OpCodes.Ldnull);		// null

If the value on the top of the stack is false (indicating that the value returned from the IDataReader is not null the execution is branched to a label further in the execution. If it is true then null is pushed onto the top of the stack.

var lblFin = il.DefineLabel();
il.Emit(OpCodes.Br_S, lblFin);
il.MarkLabel(lblCont);
il.Emit(OpCodes.Ldarg_0);		// rdr
il.Emit(OpCodes.Ldc_I4_0);		// rdr,0
il.Emit(OpCodes.Callvirt, fnGetValue);	// value

After the null has been pushed to the stack a Br_S jumps to the lblFin label, which we will see in a moment. We can now pick up the execution from the lblCont label. Recall that we would branch to this point if the value is not null. Once again the IDataReader is pushed to the stack, followed by 0. We have encountered fnGetValue previously. It is a MethodInfo for IDataRecord.GetValue(i). It will be called on the IDataReader with the paramerter 0. This will leave the value on the top of the stack.

il.MarkLabel(lblFin);
il.Emit(OpCodes.Unbox_Any, type);       // value converted

Finally Unbox_Any is called on the value. This converts the object value to the unboxed type specified. It has the effect of casting the value from the IDataReader to the specified type. If the incorrect type had been specified in the ExecuteScalar method an InvalidCastExcpetion would be called at this point.

As before the value would be returned by il.Emit(OpCodes.Ret).

POCOs

Now its time for the main event, as POCO support is how PetaPoco came to have its name and is one of the key differentiators from similar micro ORMs.  Lets jump straight in.  We have already encountered many of the op codes and MethodInfo’s so we can pick up the pace a little.

// var poco=new T()
il.Emit(OpCodes.Newobj, type.GetConstructor(BindingFlags.Instance 
    | BindingFlags.Public | BindingFlags.NonPublic, nullnew Type[0], null));

// Enumerate all fields generating a set assignment for the column
for (int i = firstColumn; i < firstColumn + countColumns; i++)
{

The first thing is to push an instance of the the POCO type onto the stack with Newobj. Then its time to loop through the columns.

 // Get the PocoColumn for this db column, ignore if not known
 PocoColumn pc; if (!Columns.TryGetValue(r.GetName(i), out pc)) 
 continue;

PetaPoco is only interested in return columns from the IDataReader that are actually part of the POCO. Columns is a Dictionary holding the POCO property information. If there is not an entry in Columns for the current IDataReader column move onto the next.

 // Get the source type for this column
 var srcType = r.GetFieldType(i);
 var dstType = pc.PropertyInfo.PropertyType;

Next, get the source and destination types for the column.

il.Emit(OpCodes.Ldarg_0);		 // poco,rdr
il.Emit(OpCodes.Ldc_I4, i);		 // poco,rdr,i
il.Emit(OpCodes.Callvirt, fnIsDBNull);	 // poco,bool
var lblNext = il.DefineLabel();
il.Emit(OpCodes.Brtrue_S, lblNext);	 // poco

We have seen something very similar to this before, used to check if the IDataReader column is of type DBNull. If it is we branch to lblNext, which is directly before the end of the loop. This means that if the value is null in the database, no mapping to the POCO will take place.

il.Emit(OpCodes.Dup);			// poco,poco

var valuegetter = typeof(IDataRecord)
            .GetMethod("Get" + srcType.Name, new Type[] { typeof(int) });

Get a MethodInfo to call the required GetType method on the IDataReader.

if (valuegetter != null
    && valuegetter.ReturnType == srcType
    && (valuegetter.ReturnType == dstType 
        || valuegetter.ReturnType == Nullable.GetUnderlyingType(dstType)))
		{

Check if the return type from the valuegetter method is the same as the source type from the IDataReader, and that the return type is also the same as the underlying type. Nullable.GetUnderlyingType is used for the case where dstType is a Nullable type.

Once we have asserted that the mapping can take place we can start to build up the IL.

il.Emit(OpCodes.Ldarg_0);		// poco,poco,rdr
il.Emit(OpCodes.Ldc_I4, i);		// poco,poco,rdr,i
il.Emit(OpCodes.Callvirt, valuegetter);	// poco,poco,value

Get the value from the IDataReader using the MethodInfo defined above and put it on the top of the stack.

// Convert to Nullable
if (Nullable.GetUnderlyingType(dstType) != null)
{
    il.Emit(OpCodes.Newobj, 
    dstType.GetConstructor(new Type[] { Nullable.GetUnderlyingType(dstType) }));
}

If the destination type (the POCO column) is a Nullable type, then convert the value from the IDataReader to the nullable version of the type. The code within the if-statement is equivalent to T? x = new T?(value). The stack is now poco, poco, value.

il.Emit(OpCodes.Callvirt, pc.PropertyInfo.GetSetMethod(true));	// poco
il.MarkLabel(lblNext);
}

Finally, the property setter for the poco property is called with the value. The poco with the assigned value is left on the stack and the process is repeated with the next column.

When all the columns has been processed there is one final thing to do before we return the POCO.

var fnOnLoaded = RecurseInheritedTypes<MethodInfo>(type, 
        (x) => x.GetMethod("OnLoaded"BindingFlags.Instance | BindingFlags.Public 
            | BindingFlags.NonPublic, nullnew Type[0], null));

if (fnOnLoaded != null)
{
    il.Emit(OpCodes.Dup);
    il.Emit(OpCodes.Callvirt, fnOnLoaded);
}

Look for an OnLoaded method somwewhere in the POCO type inheritance hierarchy. If the method is present then call it on the POCO to signify that the POCO has been loaded.

il.Emit(OpCodes.Ret);

Finally return the POCO complete with the values from the IDataReader.

I would definately suggest having a look at the PetaPoco source. There are lots of interesting techniques to discover.

Dynamic Method Invocation in C# with PetaPoco (Part 1)

A look at a powerful development technique as used in the PetaPoco Micro ORM

In a previous post I talked about using PetaPoco on a recent project. I mentioned that it uses Dynamic Method Invocation to create a method on the fly to convert an IDataReader record into a POCO.

The technique itself is very powerful in the right situation, and is used to good effect in PetaPoco (and other Mircro ORMs). If you had to write a method to convert a known IDataReader instance into a known POCO instance it would be strightforward enough. However if you knew neither the format of the values in the IDataReader, not the type of the POCO it would be more difficult. You would have to write quite a convoluted method to be able to deal with it, and the conditional logic (and reflection) would make it perform poorly, especially as there is a potential that the method would be called many times to load a dataset. Instead you could examine the IDataReader and the type and use Dynamic Method Invocation to build a method that works for that exact situation.

There are lots of problems that can be solved by the generalised form of this approach but there are a couple of reasons I think that it is not more widely used; the technique is a little misunderstood and can sometimes feel a bit like magic, and it can look imtimidating because the method body is built by emitting .Net Intermediate Language (which has gone through various guises, but is now known as Common Intermediate Language, or CIL for short) which can be out of some developer’s comfort zone.

The basic idea is that PetaPoco will generate a method to hydrate a POCO at runtime by examining the IDataReader and creating a one off factory method to perform the conversion, so lets dive in and have a look at what PetaPoco is doing. A useful place to start is by looking at how the dynamically created method will be used.

At this point it is worth mentioning that PetaPoco makes use of the Dynamic Language Runtime introduced as part as .Net 4.0 and the ExpandoObject type. This means that you can return a Dynamic ExpandoObject type with the properties matching the fields returned from the IDataReader. It also supports .Net 3.5 without the DLR with #if directives. I am going to assume that we are using .Net 4.0 and will be looking at the code used to return an ExpandoObject.

This particular usage is from the Query method, which returns and IEnumerable, where T is the type of the POCO you want to return. Inside you will find this:

var factory = pd.GetFactory(cmd.CommandText, _sharedConnection.ConnectionString, 
   ForceDateTimesToUtc, 0, r.FieldCount, r) as Func<IDataReader, T>;
using (r)
{
	while (true)
	{
		T poco;
		try
		{
			if (!r.Read())
				yield break;
			poco = factory(r);
		}
		catch (Exception x)
		{
			OnException(x);
			throw;
		}

		yield return poco;
	}
}

Our first clue is that the the Query used the GetFactory method to return a factory which is then used to return the POCOs for the IEnumerable one at at time using the yield return. Notice that the return type from GetFactory is cast to a Func as GetFactory return a Delegate.

public Delegate GetFactory(string sql, 
                string connString, bool ForceDateTimesToUtc, 
                int firstColumn, int countColumns, IDataReader r)

The first part of the GetFactory does some caching to ensure that there is reuse of the factories as far as possible, than a new DynamicMethod is created:

var m = new DynamicMethod("petapoco_factory_" + PocoFactories.Count.ToString(), 
    type, new Type[] { typeof(IDataReader) }, true);

This particular constructor creates an anonymously hosted dynamic method which means the dynamic method is associated with an anonymous assembly, rather than an exisitng type. This isolates the dynamic method from the other code and provides some safety to be used in partial trust environments. The constructor takes a name, a return type (in this case the POCO type), a Type[] containing the parameter types (just the one, the IDataReader) and a bool that sets [italic]restrictedSkipVisibility allowing the method access to private, protected and internal methods from existing types.

So now we have a new DynamicMethod we can start to generate the IL. First get the ILGenerator.

var il = m.GetILGenerator();

Now we need to start building up the method body with IL. Everything you can do with C# can be written in IL (as after all, C# and other .Net languages are compiled down to IL) but the IL can be more verbose. Also IL is a stack based language so operands are pushed onto the stack, then operators pop the operands from the stack and to perform an operation and push the result onto the top of the stack.

There are 3 different sections depending on the type to be returned. Although geared towards POCOs, PetaPoco can happily also return an ExpandoObject or a single scalar value.

First up is the section that returns an Expando with properties mirroring the IDataReader row. We will have a look at the highlights and hopefully learn some IL as we go. The first thing is to create the ExpandoObject and place it in the top of the stack (and also the bottom, as the stack is currently empty).

il.Emit(OpCodes.Newobj, typeof(System.Dynamic.ExpandoObject)
                            .GetConstructor(Type.EmptyTypes));

We are going to need to call a method on the Expando at some point to add the property. A MethodInfo is defined to hold the method meta-data for the Add method of the Expando. It will be used later.

MethodInfo fnAdd = typeof(IDictionary<stringobject>).GetMethod("Add");

Now we can loop through all of the columns in the IDataReader and add them to the Expando using IL. There is some additional logic for some data type conversion to support various IDataReader implementations for some Databases. For the sake of brevity I will leave the converter logic out and focus on the main logic. I have left in the comment from the PetaPoco source.

// Enumerate all fields generating a set assignment for the column
for (int i = firstColumn; i < firstColumn + countColumns; i++)
{
	var srcType = r.GetFieldType(i);

	il.Emit(OpCodes.Dup);	                // obj, obj
	il.Emit(OpCodes.Ldstr, r.GetName(i));	// obj, obj, fieldname

Firstly get the type of the IDataRecord corresponding to column i. Recall that the IL stack already has the Expando we created earlier on it (or to be more precise the reference to the object). Create a duplicate of that object and use Ldstr to push a string reference of the column name.

        // r[i]
	il.Emit(OpCodes.Ldarg_0);	        // obj, obj, fieldname, rdr
	il.Emit(OpCodes.Ldc_I4, i);		// obj, obj, fieldname, rdr,i
	il.Emit(OpCodes.Callvirt, fnGetValue);  // obj, obj, fieldname, value

The next three statements are used to get the value from the IDataReader. Ldarg_0 pushes the argument at position 0 onto the stack, which is the IDataReader. Then Ldc_I4 pushed the int value of  i onto the stack. Callvirt is used to call a method on an object. The fnGetValue has been defined previously. It is a MethodInfo for IDataRecord.GetValue(i). The object in question is the IDataReader and the argument is i. The result of the method call is left on the top of the stack.

        // Convert DBNull to null
	il.Emit(OpCodes.Dup);  // obj, obj, fieldname, value, value
	il.Emit(OpCodes.Isinst, typeof(DBNull)); // obj, obj, fieldname, value, (value or null)

Call dup to duplicate the value on the top of the stack. Isinst is used to check the object on the top of the stack is an instance of a particular type, in this case DBNull. If the value is a DBNull it is cast to DBNull and pushed onto the top of the stack. If it is not DBNull then a null reference is pushed onto the stack. The top of the stack now contains either a reference to DBNull or a reference to Null.

        var lblNotNull = il.DefineLabel();
	il.Emit(OpCodes.Brfalse_S, lblNotNull);	 // obj, obj, fieldname, value
	il.Emit(OpCodes.Pop);			 // obj, obj, fieldname
        il.Emit(OpCodes.Ldnull);		 // obj, obj, fieldname, null
        il.MarkLabel(lblNotNull);
        il.Emit(OpCodes.Callvirt, fnAdd);
    }

This section can be a bit confusing so I will step through it slowly. It defines an if-statement where execution is branched depending on a condition. First define a label. This label will be used as the execution target for the branch. Recall that the top two items on the stack are the value from the IDataReader and either DBNull (if the value is DBNull) or null (if the value is not DBNull).

The Brfalse_S instruction will pop the top item from the stack and check it for a false-ish value, either false, null or zero. If it is false-ish then jump to the label lblNotNull. Remember that the null on the top of the stack represents a non DBNull value which is now left on the top of the stack. Execution now passes to the point marked by MarkLabel and continues with the Callvirt on the fnAdd MethodInfo we defined earlier, which pops the values from the stack to use as arguments until it get the ExpandoObject an then call the method on it. This has the effect of adding a property with the fieldname of column i and the value from IDataReader column i.

If the branch condition is non false (meaning the value on the stack is DBNull) then execution continues without jumping. As DBNull is no use to us in the Expando it is popped from the stack and ldnull is used to push null onto the top of the stack. It is this null value that is then used in the Callvirt method to add a property with a null value to the stack.

These set of instructions are repeated for each column in the IDataReader until the stack contains just a refernce to the ExpandoObject, which has a property for each column. Finally the Expando object is returned.

il.Emit(OpCodes.Ret);

You may be thinking that the above steps could be done without resorting to building up a dynamic method in IL, and you would be correct. It is trivially easy to add named properties to an ExpandoObject.  PetaPoco does it in IL because it is part of a larger area of code that can return an ExpandoObject, a scalar or a POCO depending on what has been asked for.

This covers the case when PetaPoco needs to return a dynamic ExpandoObject. Head over to part 2 to see how PetaPoco goes about returning a scalar value or an typed POCO.

PetaPoco in the wild

A few brief thoughts on my experiences using PetaPoco micro ORM on a ‘real’ project in a production environment.

The problem

I have been involved with several projects that have used Object Relational Mappers, specifically NHibernate and Entity Framework. For these projects the decision to use an ORM was made at the start of the project and the architecture designed accordingly. The project team had full control of the database and the domain of the project was used to drive out the database schema in a code first style. My entire experience (and biases) of ORMs was based on this kind of usage, together with the learning curve that comes with them.

In my latest project the context was very different for a few reasons:

  • The project was a re-engineering of an existing application, which was in fairly bad shape.
  • It featured a hand rolled data access layer with lots of conflicting and ambiguous helper methods that still required lots of boilerplate to deal with things like mapping the results to POCOs and dealing with connection management.
  • A rather limited domain model, often with one class representing multiple domain entities.
  • Very limited control of the databases in use. One in particular appears to be from a time when Hungarian notation was de rigour and vowels were something to be frowned upon when naming tables and columns. Add to that the spectre of Entity Attribute Value pairs and no primary keys.

After stabilising the code base with sensible separation of concerns and lots of refactoring I wanted to replace the data access layer. I had already separated out the data access stuff into repositories, but the hand rolled data access still left a lot to be desired.

Introducing something like Entity Framework (my current ‘full scale’ ORM of choice) didn’t feel like the right thing to do for a few reasons:

  • First and foremost it would be a lot of work to retrofit EF and would cause a lot of disruption for the project, and could seriously impede forward progress. Although not immense, the learning curve for the team would still take time to overcome.
  • The databases are not ORM friendly in any way. Dozens of poorly named tables, each with dozens (some with hundreds) of poorly name columns.
  • EF might just be a bit too heavy weight. The project isn’t huge, and just needs something to take care of the heavy lifting.

Choosing an ORM

I was aware of micro ORMs and their aims through a couple of the more popular examples, Dapper and Massive. Both are notable by their creators. Dapper was created by Sam Saffron (of Stack Overflow) and is used to drive the data access on the Stack Exchange family of sites. Massive was created by Rob Conery of TekPub.com.

Although both were promising, neither seemed quite right. Whilst looking at the performance comparisons of ORMs from Dapper’s project page and I noticed another micros ORM called PetaPoco by Topten Software and on closer inspection it appeared to meet most of my vague criteria:

  • It was lightweight. Although not as small as either Dapper or Massive, it is packed with feature that help to take the pain out of data access.
  • It can be used with POCOs without the need for attributes or sub-classing.
  • It uses SQL to interact with the database. In light of the project this seemed sensible as it would aid refactoring of the data access code. It would be possible to do a straight replacement of the repository methods reusing the existing SQL with PetaPoco’s functionality, and then expanding the domain model as required. This would allow for minimum disruption.
  • It has a notion of transactions, including nested transactions. This is useful as I had introduced a unit of work to deal with transactions across different service methods and had need for some transactions.
  • It had to perform relatively well. The project didn’t call for best of class performance, it just had to be good enough. But its good to know that PetaPoco has it covered fairly well.

Getting started

I grabbed the latest version of PetaPoco core from Nuget. There is also a version with some T4 templates for generating POCO entities from your database. I didn’t need this functionality, and in most cases the database should model the domain model, not the other way round. One of the big advantages of working code first was to keep the focus on the domain and not on the persistence.

PetaPoco is added to your project as a single .cs file, not as a referenced assembly. This would turn out to be useful later on. Many of the micro ORMs work in a similar way and are fairly straight forward wrappers for IDBCommand. If you have ever created your own DAL layer using IDB command you have probably been two thirds of the way to a mirco ORM.

PetaPoco makes the most of the latest C# features. The heavy lifting of materialising entities from the database into objects is done through Dynamic Method Invocation. When a particular entity is requested a method is generated on the fly that takes a IDataReader record and returns the required type. The methods are generated by examining the IDataReader’s columns and creating a custom mapping for the IDataReader record to the entity. The generation is done by emitting and then executing IL at runtime. It’s a deceptively powerful technique when applied to problems like this.

If you aren’t afraid of using the Dynamic feature of c# there is support for that also. During the transitional stages of the project I find it useful for the situations where the domain model is not fully realised and data may be coming from multiple tables at once. Just execute a select and a dynamic object is returned with the properties from the select. Similar things can be done for inserts and updates.

First impressions

My first impressions are overwhelmingly positive, but there are a few things to take into account. In summary:

  • It is very easy to get things up and running. Download the package from NuGet and add a connection string and you are good to go in a couple of minutes.
  • The syntax is easy and intuitive. Most things work as you would expect. There is currently no support for updates to tables with composite keys. In my opinion is an serious oversight. I appreciate that PetaPoco is geared towards working with well designed clean databases, with sensible names and keys, however I think it could easily be used with legacy databases with a few tweaks. Half an hour with the source (I told you it would be useful to have the .cs file) and I had added composite key support for updates. I would hope to see composite key support added at some point.
  • Although I was initially drawn to the POCO support, the syntax is more friendly when using attributes on the entity classes. Indeed some things like result only columns (i.e not persisted) that can only be used through attributes. I found that the more I used PetaPoco the more comfortable with attribute on the classes.

Overall I think that PetaPoco (and micro ORMs) are probably enough for a lot of the application you want to create, especially the type of line of business applications that a lots of people spend their type writing. Before my experiences using a micro ORM on a ‘real’ project I would have always opted for one of the feature rich, enterprisey ORMs, but now I would investigate to see if a micro ORM would work just as well without the weight. They are certainly not only a niche product for hobby projects, and I hope to see more wide scale use in business environments.

Time will tell…