Category Archives: design patterns

Hashtable and Dictionary

As I have mentioned in the previous post – the .NET runtime is now open source. This means that we can dive into it’s code. What is important – most of the code is written in C++ (as expected obviously!).

As part of the runtime I have go through the mscorlib project. What got me interested were Dictionary and Hashtable implementations. There is quite sophisticated arithmetic there as we might have expected.

They share the code

The main difference between this two is obvious – the idea is the same, using hashes to store values in buckets. But it is more important that Dictionary code is just the copy of the initial Hashtable implementation. This means that if any bug fix is implemented in either of them, it needs to be made in the other one as well.

Hashtable has thread safety for multiple readers and single writer build into some of its methods and properties. As you probably know the hash table (aka hash map) is a data structure that maps keys to values and uses hash function for calculating the key (index) to put values into buckets.

In .NET implementation the hash function uses double hashing which looks like this:

h(key, n) = h1(key) + n*h2(key)

where n is the number of collisions for a bucket. h1 is our implementation of GetHash(key) (or the default if we did not override it). More complicated is the h2 one which uses h1 and calculates the value between 1 and hashsize-1.

The table where values are stored is initialized with 3 elements when we do not specify the size (we should not treat this as the constant value from version to version – this can differ). Even when we specify smaller number – there is a check to the “awfully small sizes” are avoided. Very interesting is the procedure for extending the size of the array.

Resize hash array

When the procedure reaches the maximum number of buckets in the array (load factor reaches the maximum load factor) – it needs to be resized. The number of buckets in the hashtable is “automatically increased by approximately a factor of two”. What does it mean? It means that that the size is increased to the smallest prime number which is larger than the current size of hashtable.

How is this organized in the code? First of all – the first 72 prime numbers are stored as the readonly array – so there is no need to calculate them. The biggest one is 7 199 369 (that’s quite big number). When the hashtable size is even bigger – the next prime number is going to be calculated.

We know that by doubling we are preserving the asymptotic complexity of the hashtable operations such as add. What is more important – by having a prime guarantees that double hashing does not lead to infinite loops or just one bucket with plenty of collisions.

Summary

By releasing .NET Framework as open source Microsoft allowed us, developers, to verify the code they wrote. We can even make changes and build our own version of the .NET Framework. I think it is very interesting to check how some of the objects were implemented by devs at Microsoft.

In this short post I dig a little bit into the Hashtable implementation in mscorlib.

Advertisements

Fluent interfaces – Builder Pattern (2/2)

Previously I have introduced the concept of the builder pattern – in a very short way but it should have been clear what problem this pattern solves.

What we want to achieve by introducing the fluent interface is the code which will fulfill these assumptions:
– code will be easy to read,
– it will be easy to change,
– changes in the business object will be easy to introduce into the builder object,
– we can encapsulate some additional logic into building the object (e.g. additional associations to dependent objects, default values for required fields).

Each of these assumptions is important but the last one seems to be very common problem for plenty of real-life scenarios. In many cases every object must have the default value set, more importantly there are some objects which need to be created together with our business object.

Having these points in mind we can provide such code for our business object (see previous post):

class BusinessObjectFluentBuilder
{
    private int _id;
    private string _name;
    private IEnumerable<BusinessObject> _children;

    public BusinessObjectFluentBuilder WithName(string name)
    {
        this._name = name;

        return this;
    }

    public BusinessObject Build()
    {
        // some more sophisticated logic here
        return new BusinessObject { Name = _name };
    }
}

Be aware that I have simplified this code as much as possible (we can set only one property) – but this code is just a guidance, not the real life code. Such a builder allows building the object in such way.

var obj2 = new BusinessObjectFluentBuilder()
               .WithName("MyObject")
               .Build();

As you can see this code is easy to read. We have a builder object, we call method to set the name, and next we just call the Build to prepare us a required object. When the business object will get another property the only thing we need to do is to add one method into our builder to set such property. The difference in the building – we refactor Build method.

This way of implementation could be improved – we could skip the Build method in our implementation! Let’s have a look on another implementation of the fluent builder pattern.

class BusinessObjectFluentBuilderSecond
{
    private int _id;
    private string _name;
    private IEnumerable<BusinessObject> _children;

    public BusinessObjectFluentBuilderSecond WithName(string name)
    {
        this._name = name;

        return this;
    }

    public static implicit operator BusinessObject(BusinessObjectFluentBuilderSecond builder)
    {
        // some more sophisticated logic here
        return new BusinessObject { Name = builder._name };
    }
}

As you can see I have introduced the new operator – when we require object, the method is executed. For most of you it is easy to spot that this code has one disadvantage. For those of you who like the implicit typing this code won’t work properly. We have to use this object in such way:

BusinessObject obj3 = new BusinessObjectFluentBuilderSecond()
                          .WithName("MyObject3");

As you can see – in this place it is your choice how to implement it. Maybe even combine both versions into one – so you can use implicit (with Build() ) and explicit (without Build() ) programming paths.

Of course it is hard in cases when the new property is required – then refactoring becomes a nightmare. Of course this can be easy resolved – for example by throwing an exception in the Build method in case the new property is not set. It is also the interesting topic whether the builder object should be reusable – in the real world we do not need another builder to build another building. We re-use them. It is good to know that the builder object should permit the reset option – in such way we can reuse builder.

Most of the real-life objects are very complicated – especially when you use DDD (Domain Driven Design – http://goo.gl/rIOjN) and aggregate entities. For such scenarios you need to build objects as the connection between many. In such scenarios the builder patterns seems to be a good choice for the easy introduction of changes in the future.

Please leave a comment if you have a question.

Fluent interfaces – Builder Pattern, Intro (1/2)

The idea of fluent interfaces has been introduced by Martin Fowler (goo.gl/j4rdu) and Eric Evans – in short words you call one method after another. This way of implementation improves readibility, significantly – makes your code more human-readable than the usual method calls.

.NET framework uses this technique in many places – many of you think now about extension methods which can be chained one with another (about what I will post another time). Currently this seems to be the path for programming world.

The real-life applications enforces on programmers models which creation is very often quite awkward and difficult. Of course in day-to-day scenarios it is common to use builder pattern to help the new object creation process – the place where the object is needed does not have to care about the creation and (more importantly) about dependencies between other objects in the environment. Using builder pattern also helps with testing – there is possibility to change implementation of the builder to make just mock object instead of creation a real one.

The idea of builder pattern is quite easy to understand. What I would like to discuss is the fluent version of the builder – this will be covered in the next post.

Let’s start with a dummy version of the object the application is working on:

class BusinessObject
 {
   public Int32 Id { get; set; }
   public string Name { get; set; }
   public IEnumerable<BusinessObject> Children { get; set; }
 }

The first idea of a builder for such object is for example the BusinessObjectBuilder class.

class BusinessObjectBuilder
 {
    public BusinessObject Build(Int32 id, string name, IEnumerable<BusinessObject> children)
    {
       return new BusinessObject() {Id = id, Name = name, Children = children};
    }
 }

The disadvantages of such solution is trivial. Such scenario could be improved by the default parameters (introduced in the C# 4) – but it still would be not easily readable for a programmer. This would also be hard to refactor the code when the project of the process would change.

The second solution has less cons but it still is far from the ideal situation – where the code will speak for itself rather than the comment (which in plenty of times become outdated!).

class BusinessObjectBuilderSecond
 {
    private int _id;
    private string _name;
    private IEnumerable<BusinessObject> _children;
   public void SetName(string name)
    {
       this._name = name;
    }
// .....
   public BusinessObject Build()
    {
       return new BusinessObject() { Id = _id, Name = _name, Children = _children };
    }
 }

As seen in the code above – the situation is strictly different. In second version the program we could skip some parameters – for example the optional ones. But there is still one con- the one should use it in such way:

var builder = new BusinessObjectBuilderSecond();
 builder.SetName("name");
 // ...
 var obj = builder.Build();

It is clearer and nicer to use than the first version but it is still quite messy. But it makes the code of building an object easier to maintain – when the object will be updated, it is easier to add one method into builder without changing the usage of the builder in every place in the code. In the first scenario the programmer would have two paths – extend the method signature or add another Build method with some additional parameters so the builder class would have plenty of Build methods which is quite frustrating to get on your intelisense.

None of the presented solutions can resolve two main problems in business environment – domain model changes and easy readabilityof the code (which of course result in easier refactoring after changes in the business object). In the next post I will propose two solutions which can help to solve these issues.