Category Archives: c#

Hashtable and Dictionary

As I have mentioned in the previous post – the .NET runtime is now open source. This means that we can dive into it’s code. What is important – most of the code is written in C++ (as expected obviously!).

As part of the runtime I have go through the mscorlib project. What got me interested were Dictionary and Hashtable implementations. There is quite sophisticated arithmetic there as we might have expected.

They share the code

The main difference between this two is obvious – the idea is the same, using hashes to store values in buckets. But it is more important that Dictionary code is just the copy of the initial Hashtable implementation. This means that if any bug fix is implemented in either of them, it needs to be made in the other one as well.

Hashtable has thread safety for multiple readers and single writer build into some of its methods and properties. As you probably know the hash table (aka hash map) is a data structure that maps keys to values and uses hash function for calculating the key (index) to put values into buckets.

In .NET implementation the hash function uses double hashing which looks like this:

h(key, n) = h1(key) + n*h2(key)

where n is the number of collisions for a bucket. h1 is our implementation of GetHash(key) (or the default if we did not override it). More complicated is the h2 one which uses h1 and calculates the value between 1 and hashsize-1.

The table where values are stored is initialized with 3 elements when we do not specify the size (we should not treat this as the constant value from version to version – this can differ). Even when we specify smaller number – there is a check to the “awfully small sizes” are avoided. Very interesting is the procedure for extending the size of the array.

Resize hash array

When the procedure reaches the maximum number of buckets in the array (load factor reaches the maximum load factor) – it needs to be resized. The number of buckets in the hashtable is “automatically increased by approximately a factor of two”. What does it mean? It means that that the size is increased to the smallest prime number which is larger than the current size of hashtable.

How is this organized in the code? First of all – the first 72 prime numbers are stored as the readonly array – so there is no need to calculate them. The biggest one is 7 199 369 (that’s quite big number). When the hashtable size is even bigger – the next prime number is going to be calculated.

We know that by doubling we are preserving the asymptotic complexity of the hashtable operations such as add. What is more important – by having a prime guarantees that double hashing does not lead to infinite loops or just one bucket with plenty of collisions.

Summary

By releasing .NET Framework as open source Microsoft allowed us, developers, to verify the code they wrote. We can even make changes and build our own version of the .NET Framework. I think it is very interesting to check how some of the objects were implemented by devs at Microsoft.

In this short post I dig a little bit into the Hashtable implementation in mscorlib.

CoreCLR–.NET runtime is now open source

.NET team has just announced that .NET Runtime is open source starting now. You can find the repo on GitHub – here to be precise.

This repo contains an execution engine for .NET apps – this is up-to-date CoreCLR codebase which includes RyuJIT, GC, native interop and plenty of other .NET runtime components. This means that we can fork, clone and build our own version of runtime!

I will keep digging into the details of the repo – until then I highly recommend .NET team blog where you can find more details about the release.

Work on C# 7 has already started

When version 6 of the C# language is on its way to the release, the design process for ver. 7 has started. The notes from the design meeting was released couple of days ago. According to Mads Torgersten’s notes – this was the first design meeting.

We can find there couple very interesting parts about future of the language itself but the consequences of making the .NET open source as well.

Design team is still in charge

This is quite important information from these notes – the design process for C# language is not a democratic process. This means that all that community will suggest will be verified by the team but in the end the governance model for C# is benevolent dictatorship. I personally think that having people responsible for the language is a good choice – with community involvement, the language can gain more than it used to be.

The dictatorship does not mean that team will not listen to the community – as the write in notes, they are open for other ideas and suggestions. We must also remember that the day is limited to 24 hours so not all ideas will be verified at the same level.

New version parts (aka themes)

Team has prepared areas on which they want to focus during next version design process. But this is also important that they are willing to add a feature whether it fits in a theme or not.

Working with data

The volume of data in our applications is growing. We can see that more and more interest is focused on functional programming. We have LINQ which is functional-kind-of in the C# language. The design team will investigate further the possibilities to add more features to make working with data much more plesant and easy. They mentioned couple of areas they will investigate – such as pattern matching, immutability, tuples and many other.

Performance and reliability

C# and .NET have areas where the performance and reliability are not the best. There are research projects inside Microsoft which focus on that topic, especially the connection with unmanaged code – the team will investigate this and decide how C# 7 and .NET Framework will be supplemented.

Componentization

This area would probably not influence language itself but they want  to be sure that change how .NET programs are factored and combined could be done.

Distribution

We got async-await in C# 5 for single values – but the area for sequences should be explored. They are also into serialization topic for C#7 – which is quite interesting, as they write “we may no longer be into directly providing built-in serialization“.

Metaprogramming

They are aware that for some reasons in enterprise applications developers are forced to generate code based on some constraints or values from other files. There are areas and ideas to reduce the need to generate code (i.e. virtual extension methods, static ctor constraints, default interface implementations). We may see some of these features in next C# version.

Null

In we will (we got? 🙂 ) get null-conditional operator (x?.Sth) but there are some some more places where such features could be added (i.e. null in foreach statement).

They are also considering non-nullable reference types – which they see as very hard to fully support.

Other

There are number of other not so big feature which are also considered to be implemented/added to version 7.

New version features

Mads makes great effort to fully describe features considered for new version of C# – I think there is no better way as if just read this section from his post. In a nutshell they are interested in:

1. Pattern matching – especially making is operator more powerful, and making case in switch statement more pattern-like than just simple values matching.

2. Array Slices – there is ArraySegment<T> as mentioned in comments but these would be a little bit more language support. This basically mean that the arrays would be better divided into “windows” which could provide us better performance.

3. ref locals and ref returns – this got me thinking that using this is going be not so easy. They are aware that assignment to the getter-only property with ref value – this would create nightmare. They are considering this feature with lots of conditions and explanation how and when use it.

4. Method contracts – we have CodeContracts but there is no language support. What is more important, the runtime has pre- and postconditions for methods – they are considering adding this into the language.

Summary

This is fast! We are waiting for C# 6 release – and we hear about version 7. I expect more information from them as they progress with the design and they will start making decisions. What is more important – they expect us, developers to take part and post ideas on GitHub. The voting system will be used so other developers will have possibility to show which features are more important for real-life applications.

Generics–some things all developers should know

generics 000

Generics are available in most programming languages which are currently used by developers. In this post we will dig into details of the C# language. We will also check how they differ from Java and C++ implementation.

generics 001

When the C# was released in 2002, it did not contain generics. This could also be called “generic-free world”. In that time, a developer had to apply casting when he was using collections such as ArrayList. Sometimes that led to exceptions which either stopped the application or had to be wrapped in try/catch clauses. But more importantly, casting was influencing the performance, which in my opinion, put forward the biggest argument to introduce generics in the next version of the C# framework.

During that time we had to use an object as either a parameter or return type in almost every place. This is bad – like Jon Skeet says – it’s “bad in a necessary evil kind of way”. At some point you will simply need to do the casting – and that is just bad.

generics 002

I think it’s very important to know what is going on out there in the universe. However, before we dig into details about the C# implementation of generics, let’s see how other languages implemented them or emulated such behavior.

generics 003

In C++ there are no generics, there are templates. In a nutshell, they are macros on steroids.

When a particular type is used in C++, the compiler for that particular type parameter generates appropriate code. The compiler will change all the places where the type parameter is used and will change it to the used type for creation of a template. The compiler creates the type only once – so using the same type again is going to apply the same code. This means that there can be a great output library size increase because of that – we will just have more types (as we will see, this is quite more clever in CLR).

One of the biggest advantages of the C# implementation, which the templates have, is the fact that the type parameter does not have to be just type names. We can use function names, variable names or even constant expressions (e.g. 20). This is a very interesting feature which I sometimes miss in C# (but very rarely clip_image001 ).

generics 004

Object = one (ring?) to rule them all.

This is all that JVM knows about a particular object (ok, there is some metadata to describe a type as generic in the bytecode before compilation) that it is generic.

generics 005

This is great example from Skeet book (which I highly recommend  – C# in depth is just must-read position if you are C# developer!) – if it only compiled Smile 

1 ArrayList strings = new ArrayList(); 2 strings.add("hello"); 3 String entry = (String) strings.get(0); 4 strings.add(new Object());

and this one

1 ArrayList<String> strings = new ArrayList<String>(); 2 strings.add("hello"); 3 String entry = strings.get(0); 4 strings.add(new Object());

Both of these codes would, in fact, generate the same bytecode – just because of Java’s feature which is called type erasure. As it is described in the documentation – it is applied by the compiler to implement generics. This just replaces all the type parameters in generic types, insert type casts where necessary and generate bridge methods to preserve polymorphism in extended generic types.

Why?! This is a normal question at this point. I think that the main reason is the back-compatibility. Even from the new code with generics you are able to run your application on an older runtime. This is not possible in C# 2.0 – your application just won’t start on CLR 1.1. Another interesting feature, which can be missing in C#, is the generic variance. Through it you are can form something like this:

1 ArrayList<? extends SuperBase> myList;

generics 006

I wish I could describe more languages and their generics implementation and I hope I can extend this in the future. If any of you have any interesting information – please leave a comment.

generics 007

Microsoft presented a new version of a language in a pack with Visual Studio 2005 and .NET Framework 2.0. One of the most important features introduced by the new version were generics. You may also remember that in 2005 we received partial classes from MS, anonymous methods, a yield keyword and thanks to generics we got Nullable types as well.

generics 008

Using System.Object does not only cost the runtime to perform casting, but also when you keep your value type as an object – there is boxing involved. And you probably know very well that this causes a lot of operations – when saving (there is boxing), and when you retrieve value from the collection – there is unboxing. The JIT can treat value types, so there is no boxing and unboxing involved!

There is not much improvement in performance for reference types. Though this was also slightly improved, you probably won’t see much improvement in your application just by changing the type to the generic one.

generics 009

Thanks to generics we got compile checks – so the safety of our programs increased. But did we get…

generics 010

This question was on my mind for quite a long time. And I am not sure how many bugs in applications were caused by the wrong casting. I talked to a couple of people in my company and we agreed that this is not the greatest output programming the world got from generics. It’s undoubtedly true, but we should not treat generics as a solution for a programmer’s lack of knowledge, experience and time. clip_image001[4]

generics 011

When you use generics in your code it is very often more self-explanatory than the non-generic version. Sometimes we just call our collection list or set and without a comment (which very often goes out-of-date very soon) or deeper analysis it is not easy to recognize its purpose. Thanks to generics the IDE support is also possible.

generics 012

Ok, finally we can see how it looks like:

1 IList<int> myList = new List<int>();

generics 013

There are several ways you can declare a type:

1 List<int> list1 = new List<int>(); 2 var list2 = new List<int>(); 3 IList<int> list3 = new List<int>();

I have presented three different ways of declaring a list. At first glance they would all give us the same.

To be honest, only the 1st and 2nd will give us *exactly* the same resultvar is just removed during a compilation time and changed by the type itself. It does not cause any performance overhead during runtime – you can use it.

In my opinion, the best option is the third one where you use an interface type – I will prepare another post about this behavior, and since I’ve seen so many developers using the first one, I need to give it more focused attention. In brief, you will be able to assign any other type which implements IList to list3 – for example your own type. When using the 1st – such a possibility does not exist.

generics 014

It is very usual that you want to call a method of the generic type – to do that you must define what that type is. You can do that by type constraints – where you force the type to fulfill the constraint.

There are three types of constraints:

1. The first type of constraint ensures that the type is either reference or value type. To say that the type is a reference type you just set where T : class.  The twin constraint is T : struct – where you ensure that the generic is a value type.

2. The second and most encountered by me is the constructor type constraint. In this one you just order that the type must have a parameterless constructor. The only thing you need to remember while using this one is that it must be the last constraint for any particular type parameter.

3. This one is, of course, the most complicated of constraint types– where you ensure that a type can be converted to another type (or in other words – that your generic type implements a specific base class or interface). There is also another interesting (let’s call it a sub-type) feature – that your generic type can be converted to another type argument – this is called a type parameter constraint. I often find this one difficult to understand, but after a while with playing around this comes very handy. Here are couple of samples:

1 class Base1<T> where T : IDisposable 2 class Base2<T> where T : IEnumerable<T> 3 class Base3<T, U> where T : U

These are very easy cases of the conversion type constraint – but you probably already got it. In the future I will dig into the details about these constraint types – because I sometimes receive questions from developers in my company regarding these types of parameter constraints (these are probably harder to understand).

generics 015

Why do we even care if the new() constraint should be the last one?! Here’s what the answer is – it can just frequently save you a lot of time. Of course this simple one is caught by the compiler with a correct error message:

new_error

But you need to remember – that not all of these cases are so easy to spot. And what is also more important – you will need no compiler to design your class on a whiteboard correctly and talk to your peers.

generics 016

Let’s see an example when you work with arrays. Let’s say that you have a hierarchy of fruits – where you have a class Fruit and two derivatives from that – Apple and Banana.

1 Apple apple = new Apple(); 2 Fruit fruit = apple; 3 fruit = new Banana();

Thanks to polymorphism this would work correctly – the runtime will have the information about the banana as the fruit. And this will be resolved correctly.

But hey – if it works correctly on the types, maybe we could do the same with arrays, right? You probably would like to be able to do that:

1 Apple[] apples = new Apple[] { new Apple(), new Apple() }; 2 Fruit[] fruits = apples;

All of this works fine – up to this point. But you have already imagined an example when you just try to assign a banana as the first element of fruits. This will case the ArrayTypeMismatchException. There is a special IL method which checks whether you assign the correct type to the array.

arraytypemismatch 

generics 017

OK, so for five years there were no changes in the generics. But when the version C# 4.0 came into the light, there were some changes in the generics implementation.  Microsoft introduced covariance and contravariance to the language – everything thanks to two additional keywords – in and out.

When you think variance you should think – possibility to use an object as one type, as if it was another, but always in a type-safe way. You are used to that – but you probably never heard of it by such a name – for example when you should return BaseTypeClassand you return SomeDerivedClass instead.  In case of generics, things get a little bit more complicated – because this is not the type, but the type parameters on which the variance is applied.

generics 018

Every book, blog post or any other kind of an article about the variance starts with the covariance – probably because this one is just easier to understand as it seems to be more natural for us.

Covariance means that the value is being returned from an operation back to the caller. One of the most commonly used examples of the covariant type is IEnumerable<T>. When you check the implementation you will see this:

1 public interface IEnumerable<out T> : IEnumerable 2 { 3 IEnumerator<T> GetEnumerator(); 4 }

There is one particular thing you should look at – the out keyword. By setting this keyword you guarantee that the type parameter is returned only by your API. Thanks to that you are able to make such an operation:

1 IEnumerable<Manager> managers = GetManagers(); 2 IEnumerable<Employee> employees = managers;

And everything will work as usually thanks to the improved implementation of the language and runtime. Nothing will be broken, because the base collection (managers) will never be changed – because it is OUT-going parameter.

generics 019

As you probably have guessed – the other way around is the cotravariance where the type goes IN-to the type. That is why the keyword in is used for these types. One of the most used example of the contravariant class is

1 public interface IComparable<in T> 2 { 3 int CompareTo(T other); 4 }

As you can see, the contravariant IComparable<T> is marked with inkeyword – this means that API is consuming the values instead of producing them.

Now you probably understand why it is called contra-variant – because it is contrary to the usual derivation direction.

1 IComparer<Employee> empl = GetEmployeeComparer(); 2 IComparer<Manager> mng = empl;

You probably ask – why are the two additional keywords even introduced – and the compiler is able to derive that information directly from the source code. I think that you should consider two things when you think about that:

1. We are all humans and the more we express our ideas, the easier another person (or even ourselves in the future!) will understand. By looking at the source code you don’t have to think whether it is contravariant, covariant or even invariant (which means that it goes both ways and we cannot mark it either way!).

2. It is easy to add another method into the class (interface) which could probably break some code using it. By having these keywords you will see that by changing that, you are making breaking change by making something invariant. And that’s something we don’t want to do – break some code we are not aware of (either in our own implementation or even worse, in the client who uses our API!).

The topic of contra- and covariance is so broad that it requires more attention, so you can expect another blog post about it in the future.

generics 020

As you have seen – generics gives C# very much. I have only covered the most important parts. Some of them need more detailed attention and this is what I would like to do – prepare shorter posts in which I will dive into more details for some cool features the generics have.

generics 021

This whole post is based on my presentation I created for my colleagues. I wanted to share it with you and I hope you found this more interesting than just a plain text you usually see on the programming blog.

FYI: I do not have rights to the pictures and photos which you can see.

System.Generic.Collections.List–internal structure

Ever wondered how List<T> is implemented? Do you know what is the default capacity for List<T> when you call it’s constructor? If not, read on – after this post you should be able to answer such kind of questions. This is not the post for ordinary developers 🙂

Capacity vs. Count

These two can be sometimes troublesome and one can mistake one from another. But most of us get it correct – the count is the actual number of elements on the List<T>. Where the capacity is the total number of elements the internal data structure can hold without resizing.

As you can expect – behind the curtain there is a data structure like this:

private T[] _items;

In such table the List stores it’s elements – all the methods base on this.

So in other words:

  • count – the number of elements actually in the table,
  • capacity – the length of the table.

Default capacity

Ok, so there is a table behind the List<T> object – but actually what is the default capacity when you call default ctor of list? We do not provide any elements at all, we also do not tell how many elements we do expect.

Here, I have to mention that this is not DEFINED and can be changed in the future. I have checked it on the .NET Framework v4.0 (and before) – it is still the same value.

private const int _defaultCapacity = 4;

This is not a big number you say. So the other question is – what happen when you would like to add 5 elements.

Going further, let’s check the Add(T item) method where the first two lines are like that:

if (this._size == this._items.Length)
        this.EnsureCapacity(this._size + 1);

So going further, analyze of EnsureCapacity(int) method leaves us such operation in case if there is no more space to add new element to the table:

int num = this._items.Length == 0 ? _defaultCapacity  : this._items.Length * 2;

As you can see – this line is responsible not only for extending the capacity (binary growth!) but also for setting the default storage capacity.

This gives us quite important info – when you call List() ctor actually the internal table used for storing data object is EMPTY! That’s quite a surprise, you have to admit..

Summary

So in other words List<T> is a simple implementation of the table data objects where extending the data structure is always a nightmare. Thankfully folks at Microsoft created BCL with a great implementation of the List<T> generic collection which gives us all the speed known from table data structure and the nice execution and usage options at the same time.

Object initializers

When you would like to create an object – you simply call the ctor of specific class. But sometimes building plenty of constructors just for adding another parameter is a waste of time (yours and developer who will use your code). This is why in C# 3.0 the object initializers were introduced.

But let’s start with simple example where we have a class such as below.

class Person
{
    public string Name { get; set; }
    public string Surname { get; set; }

    public Person(string name)
    {
        Name = name;
    }
}

With that to initialize it we were used to do it like that:

var p1 = new Person("Tom");
p1.Surname = "Kuj";

But starting from C# 3.0 you could also do it like that:

var p2 = new Person("Tom2") { Surname = "Kuj2" };

Ok – looking at that there seems to be only the language (compiler) feature to write the initialization differently. BUT there is also a different execution approach which can have significant influence on your multithreaded application.

IL code examination

Let’s check the IL code generated by the compiler.

.locals init (
  [0] class TestObjectInitialazerConsoleApplication.Person p1,
  [1] class TestObjectInitialazerConsoleApplication.Person p2,
  [2] class TestObjectInitialazerConsoleApplication.Person '<>g__initLocal0'
)
IL_0000: nop IL_0001: ldstr "Tom" IL_0006: newobj instance void TestObjInitApp.Person::.ctor(string) IL_000b: stloc.0 IL_000c: ldloc.0 IL_000d: ldstr "Kuj" IL_0012: callvirt instance void TestObjInitApp.Person::set_Surname(string) IL_0017: nop IL_0018: ldstr "Tom2" IL_001d: newobj instance void TestObjInitApp.Person::.ctor(string) IL_0022: stloc.2 IL_0023: ldloc.2 IL_0024: ldstr "Kuj2" IL_0029: callvirt instance void TestObjInitApp.Person::set_Surname(string) IL_002e: nop IL_002f: ldloc.2 IL_0030: stloc.1 IL_0031: ret

First look at this code does not provide much information but with a closer look – there will be interesting flow.

First of all let’s look at the locals initialization – we have 3, not 2 objects in there!!! We have two named p1 and p2 and there is something called ‘<>g__initLocal0’ – this is just temporary variable which will be used for the object creation (you probably understand it already :-)).

For our investigation the most interesting lines starts at IL_0022  where we allocate the object created in IL_001d object into [2] local and we work on that one. The lines IL_002f and IL_0030 – the temporary variable is assigned to the p2 object.

Summary

So object initialization gives us another temporary variable and when it’s finished – this is assigned to our object. But what do we get because of that? Why did designers decide not to work on the original object?

The answer is – the multithreading. You can imagine that it’s possible that some thread will read p1 when its ctor was called but not all properties were set. That is why using the object initialization is a better idea – will make our code more “atomic” I can write. So the object will never be in the state which we did not expect – it’s either not assigned or assigned as expected.

Additional summary (EDIT)

This seems to be the path taken by the designers with more than just object initializers. If you inspect the IL code of collection initializers – this works exactly the same way (with the temporary variable).

C# application connected to ElasticSearch server

Some time ago I mentioned that I will start playing around with ElasticSearch. This is the search engine which is based on Lucene but can be easily distributed to the cloud. There is a server side (git repository) which is responsible for building the indexes and for distributing the requests.

Start the server

This is the simplest step – you need to download source from http://www.elasticsearch.org/ and extract it. Then run bin\elasticsearch.bat and voila! For the first project there is no more interesting things so far (more interesting things will come when the indexes will just be too big to maintain it on one ES server).

Test the connection

The documentation of the ElasticSearch suggest that we should use curl. Of course that is cool if we would like to check whether the connection is established – but more fun we will have when playing with C# code. 🙂

Simple application testing the server

OK, so we have a running server. For todays post the only thing to be done is the test of the connection. ES offers us the RESTful interface which we can easily consume in our application

string uri = "http://localhost:9200/";

var req = System.Net.WebRequest.Create(uri);
System.Net.WebResponse resp = req.GetResponse();
System.IO.StreamReader sr = new System.IO.StreamReader(resp.GetResponseStream());

Console.WriteLine(sr.ReadToEnd().Trim());

The result we get is shown on the picture below.

elastic_ping

This is the information from the server that it exists and the default response for the server.

As you can see parsing it is not a very beautiful way of connection from the C# application – one need to consume the response (JSON data) and act according to the response. Building the query is also quite tricky.

I have found one of the .NET clients – PlainElastic.Net which seems to be a good choice to go next.

Stay tuned for more ES posts..