Monty's Gush

Caching the results of LINQ queries

Posted by: Pete Montgomery on: August 7, 2008

An essential feature of many applications is a caching architecture. What if your query layer could offer an easy way to optionally cache the result of any query issued against it? Ideally, this would

work for any LINQ query (over objects, XML, SQL, Entities…)
work for anonymous type projections, as well as business entities
be statically type safe (no casting required)
deal transparently with cache key creation
look syntactically like a custom query operator

var q = from c in context.Customers
        where c.City == "London"
        select new { c.Name, c.Phone };

var result = q.Take(10).FromCache();

The FromCache extension method offers a nice way to opt in for local result caching on any LINQ query. Here’s the method signature:

/// <summary>
/// Returns the result of the query; if possible from the cache, otherwise
/// the query is materialized and the result cached before being returned.
/// </summary>
public static IEnumerable<T> FromCache<T>(this IQueryable<T> query) { ... }

Note that this works for any IQueryable<T>, and for any T (including anonymous types). The complete source is available at the bottom of the article. There’s an additional overload to allow more sophisticated caching policies to be specified, and of course you could easily add more.

Formulating the cache key

The main challenge is to automatically generate a cache key which is sufficiently unique for the query, using only the query and no additional user-supplied information. Normally, cache key generation logic tends to permeate application code and really clutter it up. Handliy, LINQ queries are self-describing – what “identifies” a query is the expression it represents, and this is completely captured by the query’s expression tree.

Clearly it is possible to generate a textual representation of an expression tree – that’s almost exactly what they were designed for, and exactly how LINQ to SQL works. All we need to do is build a query evaluater which, instead of generating SQL, returns a suitable textual representation of the query.

Luckily the ToString method on the Expression class already does almost exactly this. Look what it produces for this simple predicate expression:

Expression<Func<Customer, bool>> predicate = c => c.Name.StartsWith("Smith");

Console.WriteLine(predicate.ToString());

"c => c.Name.StartsWith(\"Smith\")"

The information contained in the expression tree data structure allows the Expression.ToString implementation to produce a string representation which looks something like the query’s source code. This is a great candidate for our cache key!

Partial evaluation

A subtle problem is that the ToString output won’t be unique enough if the expression contains “local” nodes which haven’t been evaluated yet. In particular, the way the C# and VB compilers implement closures means that local variable values you might expect to be present in the query are actually on the other end of closure references.

string pattern = "Smith";
predicate = c => c.Name.StartsWith(pattern);

Console.WriteLine(predicate.ToString());

"c => c.Name.StartsWith(value(App.Program+<>c__DisplayClass1).pattern)"

Unfortunately, exactly the same string representation would be also produced by a query for “Jones”. This is because the expression now contains a reference to an unevaluated member of a captured variable. This is what Matt Warren calls the closure mess.

Recall that LINQ queries are not evaluated until the last possible opportunity. If we were to build a query provider, we would need to evaluate all such unevaluated local expressions just before we generated our SQL (or whatever). If only we had a function like so:

predicate = Evaluator.PartialEval(predicate)

Such a sample implementation of partial query evaluation is provided by Microsoft in Creating an IQueryable LINQ Provider. Applying the PartialEval function to the expression walks the tree, evaluates locally-evaluable nodes and returns the simplified expression. This neatly gets us back to our original ToString expression representation.

"c => c.Name.StartsWith(\"Smith\")"

An additional complexity is this: For ConstantExpressions, the ToString implementation simply delegates to the underlying wrapped object. We therefore need to be careful about what we allow to be wrapped in a ConstantExpression during partial evaluation. For example, the string representation of a LINQ to SQL IQueryable object is some version of the SQL that it would like to generate. This may not be enough to ensure uniqueness. Therefore we need to supply a slightly more sophisticated rule to PartialEval which determines whether a given node in the query is locally evaluable in order to prevent raw queries being locally evaluated and wrapped in ConstantExpressions by the partial evaluator.

This issue caused a few bug reports when I first released the source code, in particular for TVF functions in LINQ to SQL.

Local collections

Local collection values can be supplied as parameters to query operators such as Contains and Any, if the query provider supports them.

LINQ to SQL and now Entity Framework v4 both support local collection values. However, local collections (such as lists or arrays) are just constant expressions in a query, so special support is needed to ensure that a suitable cache key is created representing each of their elements.

This is implemented by a pass over the expression tree to make appropriate local collection values expand themselves during ToString invocation. Each method call in the expression is examined, and the argument to any parameters of type IEnumerable<> or List<> is wrapped in an object with an implementation of ToString which prints every element in the collection.

Please see this post for more info on LocalCollectionExpander.

Squashing the key

A final problem is that the cache key string could get very big, especially for complicated queries. Unless your query results are really super-sensitive, it’s fine to use an MD5 fingerprint of the key instead. MD5 fingerprints are not guaranteed to be be unique, but it should be computationally infeasible to find two identical fingerprints.

Gotchas

(1) Multiple data sources of the same type

While generating a cache key from the query expression in this manner is pretty elegant and very useful for LINQ-based applications and libraries, you should take care when using more than one query datasource of the same data type.

Constants in the query (including IQueryables) create the same key. Two different sources of type IQueryable<Customer> are not distinguished from each other – the same queries against them will generate identical cache keys. Usually this is what you want – for example, you want to be able to cache the results of a query regardless of which data context instance the query was issued against.

It (theoretically) may not be what you want in some scenarios. If in doubt, double-check that the cache key strings being generated don’t conflict.

(2) Caching and object-relational mapping

ORMs generally have their own internal object manager. You will need to make sure that your query results from any such provider (LINQ to SQL, Entity Framework, etc.) are released from their object manager.

For LINQ-to-SQL, set ObjectTrackingEnabled = false on your DataContext.
For Entity Framework, this can be enforced in the FromCache method itself.
For this, and other data sources, see the todo: in the source code.

(3) It’s a cache

Remember that the normal consequences of caching still apply. Any object reference put into an application-wide cache will be kept alive indeterminately. Once you put something into cache it can (and probably will) be accessed from arbitrary threads. Cached objects should probably be treated as read-only data.

Cache provider

This example implementation of FromCache uses the System.Web.Caching.Cache class. It’s worth noting that this useful class isn’t really specific to ASP.NET, and can be used by any .NET application or library. You’ll just need a reference to the System.Web assembly. (Note that its use outside web applications is not officially supported by Microsoft.)

Alternatively, you could just as easily use this technique with another cache implementation.

In a recent update to the source code I have extracted the public method IQueryable.GetCacheKey which could help emphasise that the meat of the idea is to automatically generate a cache key for a query. You can do whatever you like with the key, such as make a simple FromCache extension method with the ASP.NET cache, or doing something more complicated.

Source code

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Security.Cryptography;
using System.Text;
using System.Web;
using System.Web.Caching;

namespace Monty.Linq
{
    /// <remarks>
    /// Copyright (c) 2010 Pete Montgomery.
    /// https://petemontgomery.wordpress.com
    /// Licenced under GNU LGPL v3.
    /// http://www.gnu.org/licenses/lgpl.html
    /// </remarks>
    public static class QueryResultCache
    {
        /// <summary>
        /// Returns the result of the query; if possible from the cache, otherwise
        /// the query is materialized and the result cached before being returned.
        /// The cache entry has a one minute sliding expiration with normal priority.
        /// </summary>
        public static IEnumerable<T> FromCache<T>(this IQueryable<T> query)
        {
            return query.FromCache(CacheItemPriority.Normal, TimeSpan.FromMinutes(1));
        }

        /// <summary>
        /// Returns the result of the query; if possible from the cache, otherwise
        /// the query is materialized and the result cached before being returned.
        /// </summary>
        public static IEnumerable<T> FromCache<T>(this IQueryable<T> query,
            CacheItemPriority priority,
            TimeSpan slidingExpiration)
        {
            string key = query.GetCacheKey();

            // try to get the query result from the cache
            var result = HttpRuntime.Cache.Get(key) as List<T>;

            if (result == null)
            {
                // todo: ... ensure that the query results do not
                // hold on to resources for your particular data source
                //
                //////// for entity framework queries, set to NoTracking
                //////var entityQuery = query as ObjectQuery<T>;
                //////if (entityQuery != null)
                //////{
                //////    entityQuery.MergeOption = MergeOption.NoTracking;
                //////}

                // materialize the query
                result = query.ToList();

                HttpRuntime.Cache.Insert(
                    key,
                    result,
                    null, // no cache dependency
                    Cache.NoAbsoluteExpiration,
                    slidingExpiration,
                    priority,
                    null); // no removal notification
            }

            return result;
        }

        /// <summary>
        /// Gets a cache key for a query.
        /// </summary>
        public static string GetCacheKey(this IQueryable query)
        {
            var expression = query.Expression;

            // locally evaluate as much of the query as possible
            expression = Evaluator.PartialEval(expression, QueryResultCache.CanBeEvaluatedLocally);

            // support local collections
            expression = LocalCollectionExpander.Rewrite(expression);

            // use the string representation of the expression for the cache key
            string key = expression.ToString();

            // the key is potentially very long, so use an md5 fingerprint
            // (fine if the query result data isn't critically sensitive)
            key = key.ToMd5Fingerprint();

            return key;
        }

        static Func<Expression, bool> CanBeEvaluatedLocally
        {
            get
            {
                return expression =>
                {
                    // don't evaluate parameters
                    if (expression.NodeType == ExpressionType.Parameter)
                        return false;

                    // can't evaluate queries
                    if (typeof(IQueryable).IsAssignableFrom(expression.Type))
                        return false;

                    return true;
                };
            }
        }
    }

    /// <summary>
    /// Enables the partial evaluation of queries.
    /// </summary>
    /// <remarks>
    /// From http://msdn.microsoft.com/en-us/library/bb546158.aspx
    /// Copyright notice http://msdn.microsoft.com/en-gb/cc300389.aspx#O
    /// </remarks>
    public static class Evaluator
    {
        /// <summary>
        /// Performs evaluation & replacement of independent sub-trees
        /// </summary>
        /// <param name="expression">The root of the expression tree.</param>
        /// <param name="fnCanBeEvaluated">A function that decides whether a given expression node can be part of the local function.</param>
        /// <returns>A new tree with sub-trees evaluated and replaced.</returns>
        public static Expression PartialEval(Expression expression, Func<Expression, bool> fnCanBeEvaluated)
        {
            return new SubtreeEvaluator(new Nominator(fnCanBeEvaluated).Nominate(expression)).Eval(expression);
        }

        /// <summary>
        /// Performs evaluation & replacement of independent sub-trees
        /// </summary>
        /// <param name="expression">The root of the expression tree.</param>
        /// <returns>A new tree with sub-trees evaluated and replaced.</returns>
        public static Expression PartialEval(Expression expression)
        {
            return PartialEval(expression, Evaluator.CanBeEvaluatedLocally);
        }

        private static bool CanBeEvaluatedLocally(Expression expression)
        {
            return expression.NodeType != ExpressionType.Parameter;
        }

        /// <summary>
        /// Evaluates & replaces sub-trees when first candidate is reached (top-down)
        /// </summary>
        class SubtreeEvaluator : ExpressionVisitor
        {
            HashSet<Expression> candidates;

            internal SubtreeEvaluator(HashSet<Expression> candidates)
            {
                this.candidates = candidates;
            }

            internal Expression Eval(Expression exp)
            {
                return this.Visit(exp);
            }

            public override Expression Visit(Expression exp)
            {
                if (exp == null)
                {
                    return null;
                }
                if (this.candidates.Contains(exp))
                {
                    return this.Evaluate(exp);
                }
                return base.Visit(exp);
            }

            private Expression Evaluate(Expression e)
            {
                if (e.NodeType == ExpressionType.Constant)
                {
                    return e;
                }
                LambdaExpression lambda = Expression.Lambda(e);
                Delegate fn = lambda.Compile();
                return Expression.Constant(fn.DynamicInvoke(null), e.Type);
            }

        }

        /// <summary>
        /// Performs bottom-up analysis to determine which nodes can possibly
        /// be part of an evaluated sub-tree.
        /// </summary>
        class Nominator : ExpressionVisitor
        {
            Func<Expression, bool> fnCanBeEvaluated;
            HashSet<Expression> candidates;
            bool cannotBeEvaluated;

            internal Nominator(Func<Expression, bool> fnCanBeEvaluated)
            {
                this.fnCanBeEvaluated = fnCanBeEvaluated;
            }

            internal HashSet<Expression> Nominate(Expression expression)
            {
                this.candidates = new HashSet<Expression>();
                this.Visit(expression);
                return this.candidates;
            }

            public override Expression Visit(Expression expression)
            {
                if (expression != null)
                {
                    bool saveCannotBeEvaluated = this.cannotBeEvaluated;
                    this.cannotBeEvaluated = false;
                    base.Visit(expression);
                    if (!this.cannotBeEvaluated)
                    {
                        if (this.fnCanBeEvaluated(expression))
                        {
                            this.candidates.Add(expression);
                        }
                        else
                        {
                            this.cannotBeEvaluated = true;
                        }
                    }
                    this.cannotBeEvaluated |= saveCannotBeEvaluated;
                }
                return expression;
            }
        }
    }

    /// <summary>
    /// Enables cache key support for local collection values.
    /// </summary>
    public class LocalCollectionExpander : ExpressionVisitor
    {
        public static Expression Rewrite(Expression expression)
        {
            return new LocalCollectionExpander().Visit(expression);
        }

        protected override Expression VisitMethodCall(MethodCallExpression node)
        {
            // pair the method's parameter types with its arguments
            var map = node.Method.GetParameters()
                .Zip(node.Arguments, (p, a) => new { Param = p.ParameterType, Arg = a })
                .ToLinkedList();

            // deal with instance methods
            var instanceType = node.Object == null ? null : node.Object.Type;
            map.AddFirst(new { Param = instanceType, Arg = node.Object });

            // for any local collection parameters in the method, make a
            // replacement argument which will print its elements
            var replacements = (from x in map
                                where x.Param != null && x.Param.IsGenericType
                                let g = x.Param.GetGenericTypeDefinition()
                                where g == typeof(IEnumerable<>) || g == typeof(List<>)
                                where x.Arg.NodeType == ExpressionType.Constant
                                let elementType = x.Param.GetGenericArguments().Single()
                                let printer = MakePrinter((ConstantExpression) x.Arg, elementType)
                                select new { x.Arg, Replacement = printer }).ToList();

            if (replacements.Any())
            {
                var args = map.Select(x => (from r in replacements
                                            where r.Arg == x.Arg
                                            select r.Replacement).SingleOrDefault() ?? x.Arg).ToList();

                node = node.Update(args.First(), args.Skip(1));
            }

            return base.VisitMethodCall(node);
        }

        ConstantExpression MakePrinter(ConstantExpression enumerable, Type elementType)
        {
            var value = (IEnumerable) enumerable.Value;
            var printerType = typeof(Printer<>).MakeGenericType(elementType);
            var printer = Activator.CreateInstance(printerType, value);

            return Expression.Constant(printer);
        }

        /// <summary>
        /// Overrides ToString to print each element of a collection.
        /// </summary>
        /// <remarks>
        /// Inherits List in order to support List.Contains instance method as well
        /// as standard Enumerable.Contains/Any extension methods.
        /// </remarks>
        class Printer<T> : List<T>
        {
            public Printer(IEnumerable collection)
            {
                this.AddRange(collection.Cast<T>());
            }

            public override string ToString()
            {
                return "{" + this.ToConcatenatedString(t => t.ToString(), "|") + "}";
            }
        }
    }

    public static class Utility
    {
        /// <summary>
        /// Creates an MD5 fingerprint of the string.
        /// </summary>
        public static string ToMd5Fingerprint(this string s)
        {
            var bytes = Encoding.Unicode.GetBytes(s.ToCharArray());
            var hash = new MD5CryptoServiceProvider().ComputeHash(bytes);

            // concat the hash bytes into one long string
            return hash.Aggregate(new StringBuilder(32),
                (sb, b) => sb.Append(b.ToString("X2")))
                .ToString();
        }

        public static string ToConcatenatedString<T>(this IEnumerable<T> source, Func<T, string> selector, string separator)
        {
            var b = new StringBuilder();
            bool needSeparator = false;

            foreach (var item in source)
            {
                if (needSeparator)
                    b.Append(separator);

                b.Append(selector(item));
                needSeparator = true;
            }

            return b.ToString();
        }

        public static LinkedList<T> ToLinkedList<T>(this IEnumerable<T> source)
        {
            return new LinkedList<T>(source);
        }
    }
}

The source code now assumes .NET 4 and above. If you need to run on .NET 3.5, you can use the MSDN ExpressionVisitor and the framework methods below.

    /// <summary>
    /// These methods are built-in to .NET 4 and above.
    /// </summary>
    public static class Framework35Utility
    {
        public static IEnumerable<R> Zip<T, U, R>(
            this IEnumerable<T> first, 
            IEnumerable<U> second, 
            Func<T, U, R> selector)
        {
            using (var e1 = first.GetEnumerator())
            using (var e2 = second.GetEnumerator())
            {
                while (e1.MoveNext() && e2.MoveNext())
                    yield return selector(e1.Current, e2.Current);                
            }
        }
        
        public static MethodCallExpression Update(
            this MethodCallExpression source,
            Expression @object,
            IEnumerable<Expression> arguments)
        {
            if (@object == source.Object && arguments.SequenceEqual(source.Arguments))
            {
                return source;
            }

            return Expression.Call(@object, source.Method, arguments);
        }
    }

Tags: C#, LINQ

Don’t read this

Posted by: Pete Montgomery on: April 11, 2008

In: Uncategorized
3 Comments

My first post.

static void Main()
{
    Console.WriteLine("Hello world.");
}

Hope this works!

Newer Entries

	C# PredicateBuilder… on A universal PredicateBuilder
	How do you implement… on Caching the results of LINQ…
	c# - Entity Framewor… on A universal PredicateBuilder
	Check if any value e… on A universal PredicateBuilder
	ADO.NET EF 4.2 中的查询缓… on Caching the results of LINQ…
	Cоздание переиспольз… on A universal PredicateBuilder
	Cоздание переиспольз… on A universal PredicateBuilder
	Tsahi Asher on A universal PredicateBuilder
	Morpheus on Caching the results of LINQ…
	Grover Preas on Caching the results of LINQ…
	Nichelle Cresencio on Caching the results of LINQ…
	Gabriel T Semonis on A universal PredicateBuilder

Monty’s Gush