Monty’s Gush

Understanding IEnumerable and the iterator design pattern

Posted on: December 8, 2010

This is an unpolished post I used for a talk I just did at work…

The generic IEnumerable interface is the most important abstraction in the .NET framework after System.Object.

If you’re not 100% comfortable with IEnumerable and IEnumerable<T>, this introductory article might help!

Prerequisites
What’s an interface?
An interface, in the general sense, is just the publicly exposed signature of a type.
##use the word “type” for class or struct.
We can describe an interface by writing out the full public signature of a type.
##e.g. signature of well-known class;
All types have “an interface” in this basic sense.
##implementation
.NET interface types
Java and .NET also support a programming construct called an interface type, defined using the interface keyword. This is a formalization of the general OO concept of “interface”.
##e.g. definition of some interface
What? An interface type is just an abstract class with no implementation allowed.
Why? To allow a limited form of multiple inheritance.
##e.g. definition of a class with inheriting base class and interfaces
When you “inherit” from an interface, this means that your type must implement the interface’s methods. This means your type can be used in situations that only require that your type supports a given interface. Often this functionality not directly related to the main job of the type. Interfaces give you a way of mixing and composing functionality into a type, allowing it to be used in more situations.
Common .NET interfaces
IComparable – an object that can be compared with other objects of the same type
IConvertible – an object that can be converted to different types
ISerializable – an object that can be serialized
IDisposable – an object that can deterministically clean up after itself
By convention, .NET interface type names are prefixed with I. Unfortunately this makes them less friendly and intuitive than normal classes. Just pretend the ‘I’ isn’t there when you read them, and keep in mind that interface types are simply a kind of abstract class.
So, interface types can just be used in the same way as other types. In particular, you can declare variables and parameters of interface types.
bool IsABiggerThanB(IComparable a, IComparable b)
{
return a.CompareTo(b) > 0;
}
Here, parameters a and b can be of any type * that implements IComparable, which means they must define a CompareTo method.
[* Footnote. In this case, a and b have have to be the same type as each other, or a runtime exception will be thrown by the CompareTo.]
The CompareTo method returns 1 if the object is bigger, 0 if the same, and -1 if smaller than the object passed to it.
Many types in the .NET Framework support IComparable. For example, you can compare strings to sort them into alphabetical order.
Notice how inflexible this would be without interface types. We’d either have to
(1) inherit from some kind of “Comparable” class, using up our only base class, or
(2) support comparability in an ad-hoc, incompatible way for each type (making the polymorphic code above impossible).
Where are we?
Interface types support rich object oriented polymorphism without the burden of full-blown multiple inheritance.
The Iterator design pattern
Provides a way to traverse the elements of an aggregate object without exposing the underlying representation.
[GoF, 1995]
Aggregate object = a data structure; a collection, list or sequence of things of the same type.
Why?
- You might need different kinds of traversal (e.g. forwards, backwards, in-order, pre-order).
- You might want to keep track of several traversals on the same collection at once.
- All collections, lists, sequences etc. should be traversable in the same way.
But what about…?
Often a list can be traversed with a basic for loop. For example, we can keep track of the current position in a list with a local integer variable and index into an array.
for (int i = 0; i < 10; i++)
{
object current = myCollection[i];
}
But this will only work if the collection is the kind of collection that can be indexed in to, such as a simple list or array. It doesn’t make any sense to “index” into a binary tree, for example; nor into many other data structures that we might want to traverse.
Could we use this method to traverse an infinite list? What if the size were unknown?
The Iterator pattern in .NET
The Iterator design pattern takes the responsibility for traversal out of the collection itself, and puts it into another object. This specialised object is responsible only for traversing the collection (in a particular way), and nothing else.
For example, an instance of this specialised object will probably contain a pointer to keep track of the current item, and code implementing the particular traversal logic. The class will have intimate knowledge of the collection class that it supports traversal over, and may well be defined as a nested class.
The Iterator pattern is supported in the .NET framework via two interface types in the System.Collections namespace.
IEnumerator – a specialised object responsible for doing the traversal
IEnumerable – a data structure that can be traversed
Notice how Microsoft helps matters by using ‘enumerate’ instead of ‘iterate’. So you can think instead of “the Enumerator pattern” if you like. Also note that this sense of ‘enumerate’ has nothing whatever to do with enumerated types (enums in C#).
public interface IEnumerator
{
object Current { get; }
bool    MoveNext();
}
This type is the key to the Iterator pattern. It defines the simplest possible interface for traversing a data structure. An enumerator object can tell you what the current item is, and you can tell it to move to the next item, and that’s it.(*) If MoveNext is false, we have run out of items. These two methods are just a slightly more flexible way of supporting the operation “Give me the next one”, which is the essence of the enumerator interface.
(*) There are one or two other methods on the interface, but they’re not essential to understanding the pattern.
Notice the interface makes no assumptions at all about the structure, internal implementation or size of the data strucure that is being traversed. It’s perhaps the purest notion of a “sequence” you can think of.
while (enumerator.MoveNext())
{
object current = enumerator.Current;
}
Note that Microsoft could have designed this type as an abstract class instead of an interface type.
Great – now we can iterate over any kind of collection or sequence using this code, provided we have an enumerator for that kind of sequence.
If only we had a standardised way to get an enumerator!
public interface IEnumerable
{
IEnumerator GetEnumerator();
}
This is the public face of the Iterator pattern in .NET. It simply provides a well-known way to get an instance of a default enumerator for a particular sequence or collection.
So, if I give you an object that is enumerable, you can call GetEnumerator to give you a nice enumerator object that you can use to do a standard traversal of the object’s items.
The IEnumerable interface is implemented by many different types, including all collection types, and even the String class.
string s = “abcdef”;
IEnumerator enumerator = s.GetEnumerator();
while (enumerator.MoveNext())
{
Console.Write( ((char) enumerator.Current) + “.” );
}
// output: “a.b.c.d.e.f.”
Language-level support
It turns out that this usage pattern (obtaining the default enumerator, then calling MoveNext on it in a loop) is so common it is formalised into a rather helpful language-level construct in C# and VB.
If you’ve ever wondered how the foreach keyword works, now you know. The following code is identical to the previous example.
string s = “abcdef”;
foreach (char c in s)
{
Console.Write(c + “.”);
}
The C# compiler simply expands this code to that in the previous example. The intermediate language produced is identical.(*)
(*) OK, it’s not, but assume it is. Check out the difference in Reflector.
Where are we?
The Iterator design pattern is deeply supported in the .NET framework class library and languages.
Where next? Generics in .NET 2.0 improved support for the pattern by adding strongly-typed versions of the interfaces, IEnumerator<T> and IEnumerable<T>. IEnumerable<T> in particular plays a key role in the language integrated query (LINQ) features and C# 3.0 in general. Further support was provided in C# 2.0 for writing iterators with the yield keyword.
Why can’t the collection manage its own traversal?
Depending on the kind of data structure, we’d probably need some kind private pointer inside the object to keep track of the current element, and some additional methods on the class itself. But you can see that this is not going to provide a solution to any of the above three requirements. This solution would allow one traversal at a time, and you would need to add additional methods to the class for each traversal algorithm you wanted to provide.

Prerequisite – what’s an interface?

An interface, in the general sense, is just the publicly exposed signature of a type, where ‘type’ means class or struct in C#.

We can describe an interface by writing out the full public signature of a type.

public class Customer

{

public int Age{ get; }

public void MakeHappy();

// … more methods and properties!

}

This is not valid C#, but all types have “an interface” in this basic sense. Right click on string in your code and click Go to definition to see what I mean.

.NET interface types

Java and .NET also support a programming construct called an interface type, defined using the interface keyword. This is a formalization of the general OO concept of “interface”.

What? An interface type is really just an abstract class guaranteed to contain no implementation code.

Why? To allow a limited form of multiple inheritance.

When you “inherit” from an interface, you declare that you support the interface’s methods. This means your type can be used in situations that only require that it supports a given interface. I practice, this functionality is not directly related to the main job of the type. Interfaces give you a way of mixing and composing functionality into a type, allowing it to be used in more situations.

Common .NET interfaces

IComparable – an object that can be compared with other objects of the same type

IConvertible - an object that can be converted to different types

ISerializable - an object that can be serialized

IDisposable - an object that can deterministically clean up after itself

By convention, .NET interface type names are prefixed with I. Unfortunately this makes them less friendly and intuitive than normal classes. Just pretend the ‘I’ isn’t there when you read them, and keep in mind that interface types are simply a special kind of abstract class.

So, interface types can just be used in the same way as other types. In particular, you can declare variables and parameters of interface types.

bool IsABiggerThanB(IComparable a, IComparable b)

{

return a.CompareTo(b) > 0;

}

Here, parameters a and b can be of any type* that implements IComparable, which means they must define a CompareTo method.

[* Footnote. In this case, a and b have have to be the same type as each other, or a runtime exception will be thrown by CompareTo.]

The CompareTo method returns 1 if the object is bigger, 0 if the same, and -1 if smaller than the object passed to it. Many types in the .NET Framework support IComparable. For example, you can compare strings to sort them into alphabetical order.

Notice how inflexible this would be without interface types. We’d either have to

(1) inherit from some kind of “Comparable” class, using up our only base class, or

(2) support comparability in an ad-hoc, incompatible way for each type (making the polymorphic code above impossible).

Where are we?

Interface types support rich object oriented polymorphism without the burden of full-blown multiple inheritance.

The Iterator design pattern

Provides a way to traverse the elements of an aggregate object without exposing the underlying representation.

[GoF, 1995]

Aggregate object = a data structure; a collection, list or sequence of things of the same type.

Why?

  • You might need different kinds of traversal (e.g. forwards, backwards, in-order, pre-order).
  • You might want to keep track of several traversals on the same collection at once.
  • All collections, lists, sequences etc. should be traversable in the same way.

Sure, often a list can be traversed with a basic for loop. For example, we can keep track of the current position in a list with a local integer variable and index into an array.

for (int i = 0; i < 10; i++)

{

object current = myCollection[i];

}

But this will only work if the collection is the kind of collection that can be indexed in to, such as a simple list or array. It doesn’t make any sense to “index” into a binary tree, for example; nor into many other data structures that we might want to traverse.

A Binary Tree (from http://www.math.bas.bg)

A binary tree from math.bas.bg

What about infinite lists? What if the size were unknown?

Why can’t the collection manage its own traversal?

Depending on the kind of data structure, we’d probably need some kind private pointer inside the object to keep track of the current element, and some additional methods on the class itself. But you can see that this is not going to provide a solution to any of the above three requirements. This solution would allow one traversal at a time, and you would need to add additional methods to the class for each traversal algorithm you wanted to provide.

The Iterator design pattern in .NET

The Iterator pattern takes the responsibility for traversal out of the collection itself, and puts it into another object. This specialised object is responsible only for traversing the collection (in a particular way), and nothing else.

For example, an instance of this specialised object might contain a pointer to keep track of the current item, and some code implementing the particular traversal logic. The class will probably have intimate knowledge of the collection class that it supports traversal over, and might well be defined as a nested class.

The Iterator pattern is supported in the .NET framework v1 via two interface types in the System.Collections namespace.

IEnumerator – a specialised object responsible for doing the traversal

IEnumerable - a data structure that can be traversed

Microsoft uses ‘enumerate’ instead of ‘iterate’. So you can think instead of “the Enumerator pattern” if you like. Also note that this sense of ‘enumerate’ has nothing really to do with enumerated types (enums in C#).

public interface IEnumerator

{

object Current { get; }

bool    MoveNext();

}

This interface is the really the key to the pattern. It defines pretty much the simplest possible interface for traversing over a data structure. An enumerator object can tell you what the current item is, and you can tell it to move to the next item, and that’s it.(*) If MoveNext is false, we have run out of items. These two methods are just a slightly more verbose way of supporting the operation “Give me the next one“, which is the essence of the enumerator interface.

Notice the enumerator interface makes no assumptions at all about the structure, internal implementation or size of the data strucure that is being traversed. It’s perhaps the purest notion of a “sequence” you can think of.

(*) There are one or two other methods on the interface, but they’re not essential to understanding the pattern.

while (enumerator.MoveNext())

{

object current = enumerator.Current;

}

Note that Microsoft could have designed this type as an abstract class instead of an interface type.

Great – so now we can iterate over any kind of collection or sequence using this code, provided we have an enumerator for that kind of sequence.

If only we had a standardised way to get an enumerator!

public interface IEnumerable

{

IEnumerator GetEnumerator();

}

This is the public face of the Iterator pattern in .NET. It simply provides a well-known way to get an instance of a default enumerator for a particular sequence or collection.

So, if I give you an object that is enumerable, you can call GetEnumerator to give you a nice enumerator object that you can use to do a standard traversal of the object’s items.

The IEnumerable interface is implemented by many different types, including all collection types, and even the String class.

string s = “abcdef”;

IEnumerator enumerator = s.GetEnumerator();

while (enumerator.MoveNext())

{

Console.Write( ((char) enumerator.Current) + “.” );

}

// output: “a.b.c.d.e.f.”

Language-level support

It turns out that this usage pattern (obtaining the default enumerator, then calling MoveNext on it in a loop) is so common it is formalised into a helpful language-level construct in C# and VB.

If you’ve ever wondered how the foreach keyword works, now you know. The following code is identical to the previous example.

string s = “abcdef”;

foreach (char c in s)

{

Console.Write(c + “.”);

}

The C# compiler simply expands this code to that in the previous example. The intermediate language produced is identical.(*)

(*) OK, it’s not, but assume it is. Check out the difference in Reflector.

Where are we?

The Iterator design pattern is deeply supported in the .NET framework class library and languages.

Where next?

Generics in .NET 2.0 improved support for the pattern by adding strongly-typed versions of the interfaces, IEnumerator<T> and IEnumerable<T>. IEnumerable<T> in particular plays a key role in the language integrated query (LINQ) features and C# 3.0 in general. The C# 2.0 compiler was given the power to write its own iterator implementations via the yield keyword.

2 Responses to "Understanding IEnumerable and the iterator design pattern"

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: