Improving Entity Framework performance by tracking changes manually

Improving Entity Framework performance by tracking changes manually

Introduction

While working on a recent personal project, I had the not-too-unfamiliar pleasure of fighting Entity Framework to try and get some half-decent performance out of a console application that I was using to bulk add and update records based on some external data feeds.

It’s often said that EF isn’t really geared towards bulk database operations and other approaches should be considered (ADO.NET, Integration Services etc.) but there is naturally a trade-off here between time spent implementing the solution versus performance gain. Most people (myself included) choose EF not only because it’s familiar (Integration Services? I’m no DBA!), but typically because you’re already using it elsewhere in the solution (in my case an MVC web application) so it seems awfully like re-inventing the wheel to go back to ADO when, with some tweaking, Entity Framework can be made to run reasonably quickly if you know how to fine-tune your implementation.

The following article should provide some useful guidance for anyone needing to add or update large-ish data sets using Entity Framework. I’m using EF 6 but I believe the principles described are applicable to all recent versions.

How not to do it

Let’s start by examining the typical approach you would take if starting such a project from scratch. I’ll assume you’re already set up with EF and have a DbContext instance ready to add and load/update data with. In my scenario I’m taking a source data set (in this case loaded from a large XML file) looping through the element collection and creating an instance of an entity class using data extracted from each node. I then check to see if a matching entity already exists in my database (based on some key properties of the entity), if so I grab it and update, otherwise I insert the entity as new and save it away. In pseudo-code the process looks like this:

Load data from XML file into memory.
Select elements from XML.
    For each element in selected elements:
        Create entity from element nodes/attributes.
        Check if matching entity exists on DbContext entity set.
        If match found update entity properties, else add entity to DbContext.
    End For
Save changes.

In this example you would probably start by creating an instance of your DbContext up-front and using the same instance the whole way through. This will certainly work but as I discovered, it’s anything but fast, and what’s interesting is that by adding console output so I can see each record being processed, it’s obvious that the process gets slower and slower as each successive record is saved. Why is this?

The answer lies in EF’s change tracking capabilities: whenever you call SaveChanges on the DbContext, Entity Framework first needs to work out what exactly it has to do in order to synchronise your local object instances with what’s currently stored in the database. I won’t go into detail about how this process works (there are plenty of other articles online about it – we’re more interested on how to improve the situation) but it’s fairly obvious that the more objects the DbContext is aware of, the more work it has to do each time you try and save changes. Armed with this knowledge we can move forward and try to improve matters by changing the way we interact with the DbContext.

Beware of multiple DbContext instances

Now the first and perhaps most obvious solution to rectify our earlier problem is simply creating a new DbContext instance inside our main execution loop:

// Load XML

foreach (var element in xmlElements)
{
    var context = new MyDbContext();

    // Rest of code
    context.SaveChanges();
}

Now on first inspection it might seem like this has solved all our problems – it’s certainly consistently faster than the first attempt, due in no small part to the fact that creating new instances of DbContext objects was intentionally made as cheap (i.e. fast) as possible. But when our example becomes a little more complicated we can see this approach has a major flaw which may not even be noticeable at first.

Let’s suppose we want to speed things up a bit by pre-loading our table of existing entities from the database, so we don’t incur the cost of performing a speculative SELECT for each iteration of our loop. We can easily grab the entire list of entities and store them in a list or array:

// Create a context instance
var myDbContext = new MyDbContext();

// Preload existing data
var existingEntities = myDbContext.MyEntitySet.ToList();

But what happens now when we combine this with our new solution and we need to update an existing entity and save the changes back to the data store?

// Preload data
var context = new MyDbContext();
var existingEntities = context.MyEntites.ToList();

// Load XML...

foreach (var element in xmlElements)
{
    context = new MyDbContext();

    if (existingEntities.Any(x => x.Key == element.Value))
    {
        var entity = existingEntities.Single(x => x.Key == element.Value);

        // Update entity properties with data from xml...
    }
    else
    {
        // Code to handle new entities...
    }

    context.SaveChanges();
}

At first I thought I’d found the Holy Grail of solutions as this seemed to execute MUCH faster, but when I checked my database table I found that none of my records had actually been updated! Of course anyone familiar with EF will see the problem straight away: as soon as we overwrite the context object with a new instance of MyDbContext Entity Framework loses track of all the existing entities we loaded at the start of our method. When we update the properties and call SaveChanges(), nothing actually happens as the current context instance has no idea the entity even exists!

Where do we go now

Now we have a sticky problem: we can’t create new contexts each time we loop through our source data set or we lose track of all entities loaded via that context instance, and it appears we can’t use the same context all the way through without the incremental slow-down we saw first off… or can we?

The answer lies in the change tracking we discussed at the start of the article. If we allow EF to manage change tracking, it wastes a lot of time checking for modifications to objects that we know won’t have changed at all. By taking control of this ourselves, we can get the performance boost of pre-loading our existing data set and still have EF update our entities in a fast and efficient manner. Here’s how we do it:

 

var context = new MyContext();

// Disable automatic change tracking
context.Configuration.AutoDetectChangesEnabled = false;

// Preload existing data
var existingEntities = context.MyEntites.ToList();

// Load XML...

foreach (var element in xmlElements)
{
    var context = new MyDbContext();

    if (existingEntities.Any(x => x.Key == element.Value))
    {
        var entity = existingEntities.Single(x => x.Key == element.Value);

        // Update entity properties with data from xml...

        // Tell EF our entity has been modified
        context.Entry(entity).State = EntityState.Modified;
    }
    else
    {
        // Code to handle new entities...
    }

    context.SaveChanges(); 
}

The important aspects here are in these two lines:

context.Configuration.AutoDetectChangesEnabled = false;

This tells EF that we don’t want it to automatically check all attached entities for modifications each time we call SaveChanges();

context.Entry(entity).State = EntityState.Modified;

This tells EF that this particular entity is ‘dirty’, that is, it has been modified and changes must be persisted the next time SaveChanges() is called.

Summary

By disabling automatic change tracking within Entity Framework and managing object state ourselves, we can achieve usable performance levels in situations where many records are being processed as part of a batch or long-running operation.

Note that in most (if not all) cases, ADO.NET would still offer better performance but at the expense of additional coding time writing Stored Procedures or parameterized insert/update statements within the code, not to mention the headache of maintaining these additional aspects each time our underlying structure changes. With EF migrations we can easily make changes to our database schema and continue to use the existing code with little or no interruptions.

Caution

It should be noted that pre-loading an entire entity set is not a recommendation due to the memory overhead of keeping these objects around for the duration of the program – my example is somewhat contrived to illustrate this common pitfall of re-creating DbContexts and magically assuming they’ll know about entities we loaded elsewhere.

 

 

One thought on “Improving Entity Framework performance by tracking changes manually

  1. Hey, nice post…

    I use this method to even specify the exact properties that have changed.
    This way the SQL that is generated is even lighter, as it only send columns that is knows has changed.


    [csharp]
    public void Update(TModel entity, params Expression<Func>[] properties)
    {
    _Context.Attach(ref entity);
    var entry = _Context.Entry(entity);
    var newValues = entry.CurrentValues.Clone();
    entry.State = System.Data.EntityState.Unchanged;
    foreach (var prop in properties)
    {
    entry.Property(prop).CurrentValue = newValues[entry.Property(prop).Name];
    entry.Property(prop).IsModified = true;
    }
    }
    [/csharp]

    I added my own Attach method to my DbContext so I don’t even need to load the Entity up from SQL, as long as I set the Id on the entity and tell it what properties changed I can call SaveChanges.


    [csharp]
    public void Attach(ref T entity) where T : class
    {
    ObjectStateEntry entry;
    bool attach = false;

    if (ObjectContext.ObjectStateManager.TryGetObjectStateEntry(ObjectContext.CreateEntityKey(GetSetName(entity.GetType().BaseType), entity), out entry))
    {
    attach = entry.State == EntityState.Detached;
    entity = (T)entry.Entity;
    }
    else
    attach = true;
    if (attach)
    Set().Attach(entity);
    }

    public string GetSetName(Type entityType)
    {
    return ObjectContext.
    MetadataWorkspace.
    GetEntityContainer(ObjectContext.DefaultContainerName, DataSpace.CSpace).
    BaseEntitySets.
    First(x => x.ElementType.Name == entityType.Name).
    Name;
    }
    [/csharp]

Leave a Reply

Your e-mail address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.