Monty’s Gush

I still find it remarkable how syntax alone can alter the way I conceptualise a problem.

Here’s a useful helper for pulling the next n elements out of an enumerator. Think of it as a generalisation of MoveNext. I’ve found it handy for implementing certain IEnumerable extensions.

public static class EnumeratorExtensions
{
    /// <summary>
    /// Returns a specified number of contiguous elements
    /// from the enumerator.
    /// </summary>
    public static IEnumerable<T> Next<T>(
        this IEnumerator<T> source,
        int count)
    {
        int counter = 0;

        while (counter < count)
        {
            if (source.MoveNext())
                yield return source.Current;
            else
                break;

            counter++;
        }
    }
}

kick it on DotNetKicks.com

One handy function in the F# Seq module is Seq.pairwise.

[ 1 ; 2 ; 3 ; 4 ; 5] |> Seq.pairwise

It produces a sequence of, well, pairs:

val it : seq<int * int> = seq [(1, 2); (2, 3); (3, 4); (4, 5)]

Its main use is comparing each value in a sequence with its successor. One example might be analysing a web server log for suspicious crawlers.

Obviously you’ll need your own Pair class, which I’ll leave to you, reader (at least until we get tuples in .NET 4). No argument checking, but otherwise I think complete. Enjoy!


public static class EnumerableExtensions
{
    /// <summary>
    /// Returns a sequence of each element in the input sequence and its
    /// predecessor, with the exception of the first element which is only
    /// returned as the predecessor of the second element.
    /// </summary>
    public static IEnumerable<Pair<T, T>> Pairwise<T>(
        this IEnumerable<T> source)
    {
        T previous = default(T);

        using (var it = source.GetEnumerator())
        {
            // skip the first element
            if (it.MoveNext())
                previous = it.Current;

            while (it.MoveNext())
            {
                yield return new Pair<T, T>
                {
                    First  = previous,
                    Second = it.Current
                };

                previous = it.Current;
            }
        }
    }
}
Update .NET 4 introduced the Enumerable.Zip method into the framework which can be used to easily create a pairwise sequence in C#.

ts.Zip(us.Skip(1), (t, u) => new { t, u});

Tags:

kick it on DotNetKicks.com

In contrast with its more famous sibling System.StringBuilder, the System.UriBuilder class goes mostly unnoticed and unloved.

This is unfortunate, since although the URI is just a string, it’s restricted by detailed specification to a limited subset of all possible strings. This is why we have framework and library code to help us out. Think how many bugs and security errors have been caused by concatenating general-purpose strings!

The .NET UriBuilder class provides additional safety but inexplicably lacks support for one of the most common reasons for concatenating strings to form a URL – manipulating query strings.

In particular, you often find that you would like to set a query key/value pair on an existing URL. You want to add it if it doesn’t already exist (but change it if it does) and leave the rest of the URL intact.

A convenient API can be grafted with a couple of extension methods, GetQueryParams and SetQueryParam.

var uri = new UriBuilder("http://blah.com");

Assert.True( uri.GetQueryParams().Count() == 0 );
uri.SetQueryParam("sortBy", "price");

Assert.True( uri.GetQueryParams().Count() == 1 );
Assert.True( uri.Query.EndsWith("?sortBy=price") );
uri.SetQueryParam("page", "3");

Assert.True( uri.GetQueryParams().Count() == 2 );
Assert.True( uri.Query.Contains("page=3") );
uri.SetQueryParam("page", "4");

Assert.True( uri.GetQueryParams().Count() == 2 );
Assert.True( uri.Query.Contains("page=4") );

Here’s the full source.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Web;
using System.Collections.Specialized;

// ...

    public static class UriBuilderExtensions
    {
        /// <summary>
        /// Sets the specified query parameter key-value pair of the URI.
        /// If the key already exists, the value is overwritten.
        /// </summary>
        public static UriBuilder SetQueryParam(this UriBuilder uri, string key, string value)
        {
            var collection = uri.ParseQuery();

            // add (or replace existing) key-value pair
            collection.Set(key, value);

            string query = collection
                .AsKeyValuePairs()
                .ToConcatenatedString(pair =>
                    pair.Key == null
                    ? pair.Value
                    : pair.Key + "=" + pair.Value, "&");

            uri.Query = query;

            return uri;
        }

        /// <summary>
        /// Gets the query string key-value pairs of the URI.
        /// Note that the one of the keys may be null ("?123") and
        /// that one of the keys may be an empty string ("?=123").
        /// </summary>
        public static IEnumerable<KeyValuePair<string, string>> GetQueryParams(
            this UriBuilder uri)
        {
            return uri.ParseQuery().AsKeyValuePairs();
        }

        /// <summary>
        /// Converts the legacy NameValueCollection into a strongly-typed KeyValuePair sequence.
        /// </summary>
        static IEnumerable<KeyValuePair<string, string>> AsKeyValuePairs(this NameValueCollection collection)
        {
            foreach (string key in collection.AllKeys)
            {
                yield return new KeyValuePair<string, string>(key, collection.Get(key));
            }
        }

        /// <summary>
        /// Parses the query string of the URI into a NameValueCollection.
        /// </summary>
        static NameValueCollection ParseQuery(this UriBuilder uri)
        {
            return HttpUtility.ParseQueryString(uri.Query);
        }
    }

The general-purpose folding function ToConcatenatedString is used in the implementation. Here it is:


	public static class EnumerableExtensions
	{
		/// <summary>
		/// Creates a string from the sequence by concatenating the result
		/// of the specified string selector function for each element.
		/// </summary>
		public static string ToConcatenatedString<T>(this IEnumerable<T> source,
			Func<T, string> stringSelector)
		{
			return source.ToConcatenatedString(stringSelector, String.Empty);
		}

		/// <summary>
		/// Creates a string from the sequence by concatenating the result
		/// of the specified string selector function for each element.
		/// </summary>
		///<param name="separator">The string which separates each concatenated item.</param>
		public static string ToConcatenatedString<T>(this IEnumerable<T> source,
			Func<T, string> stringSelector,
			string separator)
		{
			var b = new StringBuilder();
			bool needsSeparator = false; // don't use for first item

			foreach (var item in source)
			{
				if (needsSeparator)
					b.Append(separator);

				b.Append(stringSelector(item));
				needsSeparator = true;
			}

			return b.ToString();
		}
	}

An essential feature of many applications is a caching architecture. What if your query layer could offer an easy way to optionally cache the result of any query issued against it? Ideally, this would

  • work for any LINQ query (over objects, XML, SQL, Entities…)
  • work for anonymous type projections, as well as business entities
  • be statically type safe (no casting required)
  • deal transparently with cache key creation
  • look syntactically like a custom query operator
var q = from c in context.Customers
        where c.City == "London"
        select new { c.Name, c.Phone };

var result = q.Take(10).FromCache();

The FromCache extension method offers a nice way to opt in for local result caching on any LINQ query. Here’s the method signature:

/// <summary>
/// Returns the result of the query; if possible from the cache, otherwise
/// the query is materialized and the result cached before being returned.
/// </summary>
public static IEnumerable<T> FromCache<T>(this IQueryable<T> query) { ... }

Note that this works for any IQueryable<T>, and for any T (including anonymous types). The complete source is available at the bottom of the article. There’s an additional overload to allow more sophisticated caching policies to be specified, and of course you could easily add more.

Formulating the cache key

The main challenge is to automatically generate a cache key which is sufficiently unique for the query, using only the query and no additional user-supplied information. Normally, cache key generation logic tends to permeate application code and really clutter it up. Handliy, LINQ queries are self-describing – what “identifies” a query is the expression it represents, and this is completely captured by the query’s expression tree.

Clearly it is possible to generate a textual representation of an expression tree – that’s almost exactly what they were designed for, and exactly how LINQ to SQL works. All we need to do is build a query evaluater which, instead of generating SQL, returns a suitable textual representation of the query.

Luckily the ToString method on the Expression class already does almost exactly this. Look what it produces for this simple predicate expression:

Expression<Func<Customer, bool>> predicate = c => c.Name.StartsWith("Smith");

Console.WriteLine(predicate.ToString());
"c => c.Name.StartsWith(\"Smith\")"

The information contained in the expression tree data structure allows the Expression.ToString implementation to produce a string representation which looks something like the query’s source code. This is a great candidate for our cache key!

Partial evaluation

A subtle problem is that the ToString output won’t be unique enough if the expression contains “local” nodes which haven’t been evaluated yet. In particular, the way the C# and VB compilers implement closures means that local variable values you might expect to be present in the query are actually on the other end of closure references.

string pattern = "Smith";
predicate = c => c.Name.StartsWith(pattern);

Console.WriteLine(predicate.ToString());
"c => c.Name.StartsWith(value(App.Program+<>c__DisplayClass1).pattern)"

Unfortunately, exactly the same string representation would be also produced by a query for “Jones”. This is because the expression now contains a reference to an unevaluated member of a captured variable. This is what Matt Warren calls the closure mess.

Recall that LINQ queries are not evaluated until the last possible opportunity. If we were to build a query provider, we would need to evaluate all such unevaluated local expressions just before we generated our SQL (or whatever). If only we had a function like so:

predicate = Evaluator.PartialEval(predicate)

Such a sample implementation of partial query evaluation is provided by Microsoft in Creating an IQueryable LINQ Provider. Applying the PartialEval function to the expression walks the tree, evaluates locally-evaluable nodes and returns the simplified expression. This neatly gets us back to our original ToString expression representation.

"c => c.Name.StartsWith(\"Smith\")"

An additional complexity is this: For ConstantExpressions, the ToString implementation simply delegates to the underlying wrapped object. We therefore need to be careful about what we allow to be wrapped in a ConstantExpression during partial evaluation. For example, the string representation of a LINQ to SQL IQueryable object is some version of the SQL that it would like to generate. This may not be enough to ensure uniqueness. Therefore we need to supply a slightly more sophisticated rule to PartialEval which determines whether a given node in the query is locally evaluable in order to prevent raw queries being locally evaluated and wrapped in ConstantExpressions by the partial evaluator.

This issue caused a few bug reports when I first released the source code, in particular for TVF functions in LINQ to SQL.

Local collections

Local collection values can be supplied as parameters to query operators such as Contains and Any, if the query provider supports them.

LINQ to SQL and now Entity Framework v4 both support local collection values. However, local collections (such as lists or arrays) are just constant expressions in a query, so special support is needed to ensure that a suitable cache key is created representing each of their elements.

This is implemented by a pass over the expression tree to make appropriate local collection values expand themselves during ToString invocation. Each method call in the expression is examined, and the argument to any parameters of type IEnumerable<> or List<> is wrapped in an object with an implementation of ToString which prints every element in the collection.

Please see this post for more info on LocalCollectionExpander.

Squashing the key

A final problem is that the cache key string could get very big, especially for complicated queries. Unless your query results are really super-sensitive, it’s fine to use an MD5 fingerprint of the key instead. MD5 fingerprints are not guaranteed to be be unique, but it should be computationally infeasible to find two identical fingerprints.

Gotchas

(1) Multiple data sources of the same type

While generating a cache key from the query expression in this manner is pretty elegant and very useful for LINQ-based applications and libraries, you should take care when using more than one query datasource of the same data type.

Constants in the query (including IQueryables) create the same key. Two different sources of type IQueryable<Customer> are not distinguished from each other – the same queries against them will generate identical cache keys. Usually this is what you want – for example, you want to be able to cache the results of a query regardless of which data context instance the query was issued against.

It (theoretically) may not be what you want in some scenarios. If in doubt, double-check that the cache key strings being generated don’t conflict.

(2) Caching and object-relational mapping

ORMs generally have their own internal object manager. You will need to make sure that your query results from any such provider (LINQ to SQL, Entity Framework, etc.) are released from their object manager.

  • For LINQ-to-SQL, set ObjectTrackingEnabled = false on your DataContext.
  • For Entity Framework, this can be enforced in the FromCache method itself.
  • For this, and other data sources, see the todo: in the source code.

(3) It’s a cache

Remember that the normal consequences of caching still apply. Any object reference put into an application-wide cache will be kept alive indeterminately. Once you put something into cache it can (and probably will) be accessed from arbitrary threads. Cached objects should probably be treated as read-only data.

Cache provider

This example implementation of FromCache uses the System.Web.Caching.Cache class. It’s worth noting that this useful class isn’t really specific to ASP.NET, and can be used by any .NET application or library. You’ll just need a reference to the System.Web assembly. (Note that its use outside web applications is not officially supported by Microsoft.)

Alternatively, you could just as easily use this technique with another cache implementation.

In a recent update to the source code I have extracted the public method IQueryable.GetCacheKey which could help emphasise that the meat of the idea is to automatically generate a cache key for a query. You can do whatever you like with the key, such as make a simple FromCache extension method with the ASP.NET cache, or doing something more complicated.

Source code

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Security.Cryptography;
using System.Text;
using System.Web;
using System.Web.Caching;

namespace Monty.Linq
{
    /// <remarks>
    /// Copyright (c) 2010 Pete Montgomery.
    /// https://petemontgomery.wordpress.com
    /// Licenced under GNU LGPL v3.
    /// http://www.gnu.org/licenses/lgpl.html
    /// </remarks>
    public static class QueryResultCache
    {
        /// <summary>
        /// Returns the result of the query; if possible from the cache, otherwise
        /// the query is materialized and the result cached before being returned.
        /// The cache entry has a one minute sliding expiration with normal priority.
        /// </summary>
        public static IEnumerable<T> FromCache<T>(this IQueryable<T> query)
        {
            return query.FromCache(CacheItemPriority.Normal, TimeSpan.FromMinutes(1));
        }

        /// <summary>
        /// Returns the result of the query; if possible from the cache, otherwise
        /// the query is materialized and the result cached before being returned.
        /// </summary>
        public static IEnumerable<T> FromCache<T>(this IQueryable<T> query,
            CacheItemPriority priority,
            TimeSpan slidingExpiration)
        {
            string key = query.GetCacheKey();

            // try to get the query result from the cache
            var result = HttpRuntime.Cache.Get(key) as List<T>;

            if (result == null)
            {
                // todo: ... ensure that the query results do not
                // hold on to resources for your particular data source
                //
                //////// for entity framework queries, set to NoTracking
                //////var entityQuery = query as ObjectQuery<T>;
                //////if (entityQuery != null)
                //////{
                //////    entityQuery.MergeOption = MergeOption.NoTracking;
                //////}

                // materialize the query
                result = query.ToList();

                HttpRuntime.Cache.Insert(
                    key,
                    result,
                    null, // no cache dependency
                    Cache.NoAbsoluteExpiration,
                    slidingExpiration,
                    priority,
                    null); // no removal notification
            }

            return result;
        }

        /// <summary>
        /// Gets a cache key for a query.
        /// </summary>
        public static string GetCacheKey(this IQueryable query)
        {
            var expression = query.Expression;

            // locally evaluate as much of the query as possible
            expression = Evaluator.PartialEval(expression, QueryResultCache.CanBeEvaluatedLocally);

            // support local collections
            expression = LocalCollectionExpander.Rewrite(expression);

            // use the string representation of the expression for the cache key
            string key = expression.ToString();

            // the key is potentially very long, so use an md5 fingerprint
            // (fine if the query result data isn't critically sensitive)
            key = key.ToMd5Fingerprint();

            return key;
        }

        static Func<Expression, bool> CanBeEvaluatedLocally
        {
            get
            {
                return expression =>
                {
                    // don't evaluate parameters
                    if (expression.NodeType == ExpressionType.Parameter)
                        return false;

                    // can't evaluate queries
                    if (typeof(IQueryable).IsAssignableFrom(expression.Type))
                        return false;

                    return true;
                };
            }
        }
    }

    /// <summary>
    /// Enables the partial evaluation of queries.
    /// </summary>
    /// <remarks>
    /// From http://msdn.microsoft.com/en-us/library/bb546158.aspx
    /// Copyright notice http://msdn.microsoft.com/en-gb/cc300389.aspx#O
    /// </remarks>
    public static class Evaluator
    {
        /// <summary>
        /// Performs evaluation & replacement of independent sub-trees
        /// </summary>
        /// <param name="expression">The root of the expression tree.</param>
        /// <param name="fnCanBeEvaluated">A function that decides whether a given expression node can be part of the local function.</param>
        /// <returns>A new tree with sub-trees evaluated and replaced.</returns>
        public static Expression PartialEval(Expression expression, Func<Expression, bool> fnCanBeEvaluated)
        {
            return new SubtreeEvaluator(new Nominator(fnCanBeEvaluated).Nominate(expression)).Eval(expression);
        }

        /// <summary>
        /// Performs evaluation & replacement of independent sub-trees
        /// </summary>
        /// <param name="expression">The root of the expression tree.</param>
        /// <returns>A new tree with sub-trees evaluated and replaced.</returns>
        public static Expression PartialEval(Expression expression)
        {
            return PartialEval(expression, Evaluator.CanBeEvaluatedLocally);
        }

        private static bool CanBeEvaluatedLocally(Expression expression)
        {
            return expression.NodeType != ExpressionType.Parameter;
        }

        /// <summary>
        /// Evaluates & replaces sub-trees when first candidate is reached (top-down)
        /// </summary>
        class SubtreeEvaluator : ExpressionVisitor
        {
            HashSet<Expression> candidates;

            internal SubtreeEvaluator(HashSet<Expression> candidates)
            {
                this.candidates = candidates;
            }

            internal Expression Eval(Expression exp)
            {
                return this.Visit(exp);
            }

            public override Expression Visit(Expression exp)
            {
                if (exp == null)
                {
                    return null;
                }
                if (this.candidates.Contains(exp))
                {
                    return this.Evaluate(exp);
                }
                return base.Visit(exp);
            }

            private Expression Evaluate(Expression e)
            {
                if (e.NodeType == ExpressionType.Constant)
                {
                    return e;
                }
                LambdaExpression lambda = Expression.Lambda(e);
                Delegate fn = lambda.Compile();
                return Expression.Constant(fn.DynamicInvoke(null), e.Type);
            }

        }

        /// <summary>
        /// Performs bottom-up analysis to determine which nodes can possibly
        /// be part of an evaluated sub-tree.
        /// </summary>
        class Nominator : ExpressionVisitor
        {
            Func<Expression, bool> fnCanBeEvaluated;
            HashSet<Expression> candidates;
            bool cannotBeEvaluated;

            internal Nominator(Func<Expression, bool> fnCanBeEvaluated)
            {
                this.fnCanBeEvaluated = fnCanBeEvaluated;
            }

            internal HashSet<Expression> Nominate(Expression expression)
            {
                this.candidates = new HashSet<Expression>();
                this.Visit(expression);
                return this.candidates;
            }

            public override Expression Visit(Expression expression)
            {
                if (expression != null)
                {
                    bool saveCannotBeEvaluated = this.cannotBeEvaluated;
                    this.cannotBeEvaluated = false;
                    base.Visit(expression);
                    if (!this.cannotBeEvaluated)
                    {
                        if (this.fnCanBeEvaluated(expression))
                        {
                            this.candidates.Add(expression);
                        }
                        else
                        {
                            this.cannotBeEvaluated = true;
                        }
                    }
                    this.cannotBeEvaluated |= saveCannotBeEvaluated;
                }
                return expression;
            }
        }
    }

    /// <summary>
    /// Enables cache key support for local collection values.
    /// </summary>
    public class LocalCollectionExpander : ExpressionVisitor
    {
        public static Expression Rewrite(Expression expression)
        {
            return new LocalCollectionExpander().Visit(expression);
        }

        protected override Expression VisitMethodCall(MethodCallExpression node)
        {
            // pair the method's parameter types with its arguments
            var map = node.Method.GetParameters()
                .Zip(node.Arguments, (p, a) => new { Param = p.ParameterType, Arg = a })
                .ToLinkedList();

            // deal with instance methods
            var instanceType = node.Object == null ? null : node.Object.Type;
            map.AddFirst(new { Param = instanceType, Arg = node.Object });

            // for any local collection parameters in the method, make a
            // replacement argument which will print its elements
            var replacements = (from x in map
                                where x.Param != null && x.Param.IsGenericType
                                let g = x.Param.GetGenericTypeDefinition()
                                where g == typeof(IEnumerable<>) || g == typeof(List<>)
                                where x.Arg.NodeType == ExpressionType.Constant
                                let elementType = x.Param.GetGenericArguments().Single()
                                let printer = MakePrinter((ConstantExpression) x.Arg, elementType)
                                select new { x.Arg, Replacement = printer }).ToList();

            if (replacements.Any())
            {
                var args = map.Select(x => (from r in replacements
                                            where r.Arg == x.Arg
                                            select r.Replacement).SingleOrDefault() ?? x.Arg).ToList();

                node = node.Update(args.First(), args.Skip(1));
            }

            return base.VisitMethodCall(node);
        }

        ConstantExpression MakePrinter(ConstantExpression enumerable, Type elementType)
        {
            var value = (IEnumerable) enumerable.Value;
            var printerType = typeof(Printer<>).MakeGenericType(elementType);
            var printer = Activator.CreateInstance(printerType, value);

            return Expression.Constant(printer);
        }

        /// <summary>
        /// Overrides ToString to print each element of a collection.
        /// </summary>
        /// <remarks>
        /// Inherits List in order to support List.Contains instance method as well
        /// as standard Enumerable.Contains/Any extension methods.
        /// </remarks>
        class Printer<T> : List<T>
        {
            public Printer(IEnumerable collection)
            {
                this.AddRange(collection.Cast<T>());
            }

            public override string ToString()
            {
                return "{" + this.ToConcatenatedString(t => t.ToString(), "|") + "}";
            }
        }
    }

    public static class Utility
    {
        /// <summary>
        /// Creates an MD5 fingerprint of the string.
        /// </summary>
        public static string ToMd5Fingerprint(this string s)
        {
            var bytes = Encoding.Unicode.GetBytes(s.ToCharArray());
            var hash = new MD5CryptoServiceProvider().ComputeHash(bytes);

            // concat the hash bytes into one long string
            return hash.Aggregate(new StringBuilder(32),
                (sb, b) => sb.Append(b.ToString("X2")))
                .ToString();
        }

        public static string ToConcatenatedString<T>(this IEnumerable<T> source, Func<T, string> selector, string separator)
        {
            var b = new StringBuilder();
            bool needSeparator = false;

            foreach (var item in source)
            {
                if (needSeparator)
                    b.Append(separator);

                b.Append(selector(item));
                needSeparator = true;
            }

            return b.ToString();
        }

        public static LinkedList<T> ToLinkedList<T>(this IEnumerable<T> source)
        {
            return new LinkedList<T>(source);
        }
    }
}

The source code now assumes .NET 4 and above. If you need to run on .NET 3.5, you can use the MSDN ExpressionVisitor and the framework methods below.

    /// <summary>
    /// These methods are built-in to .NET 4 and above.
    /// </summary>
    public static class Framework35Utility
    {
        public static IEnumerable<R> Zip<T, U, R>(
            this IEnumerable<T> first, 
            IEnumerable<U> second, 
            Func<T, U, R> selector)
        {
            using (var e1 = first.GetEnumerator())
            using (var e2 = second.GetEnumerator())
            {
                while (e1.MoveNext() && e2.MoveNext())
                    yield return selector(e1.Current, e2.Current);                
            }
        }
        
        public static MethodCallExpression Update(
            this MethodCallExpression source,
            Expression @object,
            IEnumerable<Expression> arguments)
        {
            if (@object == source.Object && arguments.SequenceEqual(source.Arguments))
            {
                return source;
            }

            return Expression.Call(@object, source.Method, arguments);
        }
    }

kick it on DotNetKicks.com

Tags: ,

My first post.

static void Main()
{
    Console.WriteLine("Hello world.");
}

Hope this works!