Search engine for an online shop in C# with Lucene.NET (1/2)

In two previous posts we have seen the basic implementation of search engine and then the possible queries which can be build in your application. These two can be improved so your search engine will have more usability and will fulfill your requirements. In the next two posts I will try to introduce some more sophisticated techniques which you could use when building your own search engine.

After this post your search engine should have all required functionality for modern search engine which will fit into your product. I will try to also explain how to integrate the Lucene.NET library into the web-based application which most parts are stored in the database systems.

GOAL: after this post you should be able to see all of the important parts of the search engine library and some techniques you could use in your app. For the sake of this post I will be assuming that we are building the search engine for the online shop where the user can perform a search based on:

  • name of the product,
  • price range (from.. to…).

As you can see this is not very sophisticated version of the search engine but based on that you can build much more complex ones.

Filtering the search results by its price

When I was discussing the possible queries which you could build in your application (or just left it to your users) I have mentioned the range query. As you know it is possible to ask Lucene for documents where a certain field is in the range of values. This query unfortunately does not fit the goal of the search engine for the online shop. What we would like to achieve is to ask Lucene for some of the products and then filter them by the range of the price.

Of course we could do it in our code after the returned list by the Lucene. But is it possible to use Lucene’s power (indexes, cache etc.)? YES! This is the place where I shall intoduce FilteredQuery class. The definition is rather short: the query that applies a filter after another query.

As you can see to build a FilteredQuery object you need two parts – a query and a filter. How to create a query were dedicated two previous posts, so you can use them as the reference. Now we just need a filter.

Filter filter = NumericRangeFilter.NewIntRange("Price", min, max, true, true);

As you can see the creation method is as simple as it only can be. The explanation is below:

  1. The name of the field in the document.
  2. Lower bound (can be NULL!).
  3. Upper bound (can be NULL!).
  4. Boolean value whether include values where the field (1) is equal the lower bound (2).
  5. The same as 4 but related to upper bound.

As you can see our filter if designed to be build for the price value of the product. It is important to mention right now that if you would like to use a field for the filter purposes you need to declare a special type of the field.

var doc = new Document();
doc.Add(new Field("Id", sampleData.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); // (1)
doc.Add(new Field("Name", sampleData.Name, Field.Store.YES, Field.Index.ANALYZED)); // (2)
doc.Add(new Field("Description", sampleData.Description, Field.Store.YES, Field.Index.ANALYZED)); 
doc.Add(new NumericField("Price").SetIntValue(sampleData.Price)); // (3)

You need to use a NumericField class if you would like to use the filter on your query. Having that we can create

var fq = new FilteredQuery(query, filter); // (4)

Building a filtered query is as simple as its definition. What is important to know is that FilteredQuery derives from Query and because of that you can build very sophisticated queries. Firstly you could search something then apply a filter and after that you perform maybe another filter (on another parameter). As you can see this query can become very powerful tool for your search engine.

Promotions – how to promote certain documents?

As you remember (if not, you can check it here) you can boost any field in the document which will be more important than the rest of them. You probably already expect that you could also do something similar with documents – and you would not be wrong.

It is possible to boost any document to be more relevant. It is done when you create an index. When we previously declared our doc object we could just set a property:

doc.Boost = 2;

This value influences the score value which is used by Lucene for choosing appropriate documents for your query. Normally the boost factor for each document is equal 1. For each document library multiplies the score value by boost value and based on that returns values. As you can see – you can promote some documents by making boost factor greater than 1. By making this value less than one you just decreases the score value and makes the document less relevant.

What is very important once you create the index you can not even read the boost value (the property will ALWAYS return 1!) – but you can be sure that the score value is changed due to this value.

This factor makes it possible to make your search more flexible to your needs – you could make this value a very big number (and probably add programmatically some term to your query). Based on that your query on first positions (or in the couple of first results) will be documents which you would like to show your client as the more relevant and interesting.

How to build a search engine on production systems?

Nowadays probably most of the data is stored in the database systems. How then you could use Lucene in a such environment? What I would suggest is to have the index together with the DB tables. You can make your index as compact as it is only possible – it is important due to performance goal.

Of course there is a very big challenge – how to ensure that your database and search engine index will be coherent? Unfortunately it is up to you and your skills. I will go back to this problem in the next post where we will dig into some more interesting stuff.


So here it is – your fully operational search engine. But still there is a place for improvement. You probably see it – when you use all this techniques your search engine still misses one thing – paging. It is common to show on the result page not all results but just a subset of the big result set. This and some other, more sophisticated topic will be covered in the next post.

Multi-parameter search engine in C# with Lucene.NET

In my previous post (see here) we have created simple search engine implemented in C# with Lucene.NET. This was rather introduction to this technology. As you can expect Lucene offers much more than just simple one/multi-word query. It is possible to create your own query through Lucene’s API but it also provides a rich query language which parses (through Query Parser) the input string into Lucene Query. I strongly recommend the documentation which is available online if you find this topic interesting – the implementation can vary from version to version so it is the best source of knowledge.

This post will cover couple of available query techniques.

  • querying specific fields,
  • wildcard searches,
  • range searches,
  • boosting the term (and the document as well!),
  • Boolean operators.


As it was presented in the simple search engine each entry in the index is build from a set of fields. For example we have previously defined our document as:

The document above contains three fields and two of them are analyzed during indexing procedure. Lucene

var doc = new Document(); doc.Add(new Field("Id", sampleData.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.Add(new Field("Name", sampleData.Name, Field.Store.YES, Field.Index.ANALYZED)); doc.Add(new Field("Description", sampleData.Description, Field.Store.YES, Field.Index.ANALYZED));

allows us to query not only a set of fields. It is possible to query value in a specific field. Let’s assume that we would like to find the documents which name is “searching”.

This could be done by:


If we would like to ask for more than one word we should use “ as it is shown below (we try to find a document with “search engine” in the title). This is called Phrase Terms.

Name:”search engine”

If we have used

Name:search engine

Lucene would look up only search in the name and “engine” for all of the fields from the query. Of course it is possible to ask Lucene for words in more than just one field.

Name:search Descrption:engine

This is straightly connected to the Boolean operators where we can use bool logic to build more sophisticated queries.


It is well known in the search engines that you can use wildcards such as * or ?. Lucene has it implemented also so you could search for example for




The difference between these two is

  • * is used for multiple character wildcard
  • ? is used for single character wildcard

Range search

It is possible to create a query where the field value will be in a range of values.

Name:[Ada TO Tom] // (1)

Name:{Ada TO Tom} // (2)

Both of the queries will result with values from the range. The difference between them is with the lower and upper bound – in the (1) the documents whose names equals Adam or Tom will be included, in the (2)nd it will be otherwise.

To sum up:

  • [] brackets (square) are used in inclusive range queries,
  • () brackets (curly) are used in exclusive range queries.

Boost the term

As you know Lucene calculates the relevance level of matching documents based on found terms. It is possible to promote (boost) a term – simply by using the caret (^) symbol with a boost factor at the end of the term which is about to be promoted. Of course, the higher the boost factor is, the more relevant the term will be. As you will see in the next post – it is also possible to boost the document during indexing phase.

Previously we had an example of searching for search engine. Let’s assume that we would like to focus our search more on the search term rather than treat each equally. This is done in the next query.

search^2 engine

The search query above states that “search” is twice more relevant than engine. The default value of the boost factor is 1. It should always be positive, but it can be less than 1 – for example you could build a query with this factor = 0,1. 

Boolean operators

Lucene offers very sophisticated Boolean logic to be used in your queries. There are operators: AND, “+”, OR, NOT and “-“. NOTE: operators must be in CAPS to be recognized.

The default conjunction operator is OR and this means that if you do not specify any Boolean operator between two terms, Lucene will put there OR. This operator means that Lucene finds a matching document if either of the terms exist in a document.

Note that these two queries below are equal.

“search engine” search
”search engine” OR search


Boolean operators can be very powerfull tool – together with the query using fields names and sophisticated grouping.


This operator states that both of the terms should appear in the requested document.

To find a document about search engine and cool you should use such query:

”search engine” AND cool


Plus operator means that the appearance of the term with + must appear in the document.

For example if you look for a document where search must appear and engine can you should basically use:

+search engine

NOT (or !)

On the other side there is NOT operator which is the opposite for plus operator. This one means that the document with term after not will not appear in the result.

“search engine” NOT gear

IMPORTANT – you cannot use such query

NOT “search engine”

It will always return zero documents.

This operator prohibits documents in which the term appears.This one is a little bit more restrict than NOT operator


It is important to be aware that you could prepare nested queries such like this one.

(Name:”search engine” AND cool) OR interesting


As you can see Lucene can be queried using quite sophisticated strings to be parsed. Well prepared queries can provide very accurate results for your solution.

This is important to be aware that it should not always be entered as the query by the user. Sometimes (for multi-fields queries) the app should build query based on the input fields from the form. Then format the query using all required and accurate techniques.

There is still some point missing – numeric values. You could expect that range query will fulfill your job – to build a search engine where just results from a range will be returned (for example based by the price, score). Unfortunately this requires different way of building a query – of course this does not mean you should not pay attention to build a query more interesing than just a list of fields (and as you know now – connected with OR operator).

Really cool search engine we will build in the next post – we will prepare a search engine which can be used for example in the on-line shop. In such cases you not only search the product based on the name but also you would like to allow your customer to narrow results where the price will be in a specified range.

Simple search engine in C# with Lucene.NET

Nowadays it is common that you see search boxes on websites. Most of them are using most popular search engines which search something on the website – I think about Bing, Google etc. This way of providing searching is not very sophisticated and dedicated developer would like to provide his/her own search engine. In this post I will try to shortly present capabilities of Lucene.

Lucene and Lucene.NET

It is not easy to build a search tool which will be more than just simple SQL query with couple of LIKE clausules. This is where the developers need to find suitable solution. One of the possible ones is to use third-party library. One of the most well known one is Lucene ( – full-text search engine library. One of the biggest disadvantages for C# developer is that Lucene is entirely written in Java. Fortunately there is a port version – called Lucene.NET ( Apache Lucene as well as Lucene.NET are open source projects available for free downloads (Lucene.NET also as NuGet package).

As you can expect this port-library is under ongoing development and can cause potential problems. The current version of the core is stable and no major bugs were announced so far. Thanks to that library you do not need to implement sophisticated search logic in your application or SQL queries you use. You just need to properly include and use Lucene.NET in your application.


There are couple of aspects which needs to be introduced before we dig into the code. Lucene uses something called index which is a textual form of the data on which the search methods will work – there are two main forms: file and memory index. Base on that your search engine can use the power of Lucene.

Each query returns a set of data which fulfill your requirements. But it is very important to understand that every file (document in Lucene’s language) can be more or less good as the search result. In this case we need a way of scoring the return values – this is done for you by the library. Each time you will receive result it will contain not only info about the documents but also scores for each of them. You can decide what level of scores will be enough to recognize a document as the search result.

Build index

First step is to create index for Lucene. This part contains couple of steps. Let’s get through them.

1. Create Writer which later will write down the Analyzer.

var dir = FSDirectory.Open(new DirectoryInfo(@"C:/test_lucene")); // (1) var analyzer = new StandardAnalyzer(Version.LUCENE_30); // (2) var writer = new IndexWriter(dir, analyzer, IndexWriter.MaxFieldLength.UNLIMITED); // (3)

In (1) the directory on the C disk is opened – this line is straightforward. In the second the analyzer is instantiated. In short words the analyzer is tokenizer, stemmer and stop-words filter. Used StandardAnalyzer filters input sequence with StandardFilter (normalizes tokens), LowerCaseFilter (normalizes token text to lower case) and StopFilter (using a list of English stop words). In the third line we create IndexWriter which just simply creates index – we can think of this index as if it was an index on the Database. This index has 20-30% the size of text indexed.

2. Add data into the index.

foreach (var sampleData in data) { var doc = new Document(); doc.Add(new Field("Id", sampleData.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.Add(new Field("Name", sampleData.Name, Field.Store.YES, Field.Index.ANALYZED)); doc.Add(new Field("Description", sampleData.Description, Field.Store.YES, Field.Index.ANALYZED)); writer.AddDocument(doc); }

This iteration goes through the data enumeration (in this example this data is not important, so I have omitted it). As you can see there is a new concept – the Document object is created for each enumeration element. As you can check in the API documentation, documents are the unit of search and indexing. Each Document is a set of fields, where every has a name and a textual value. Each document should (typically) contain one of more stored fields which uniquely identify the document.

The constructor of Field used in the example takes 4 arguments:

    1. First one is the name of the value which we can later reference (this probably should be some constant value to simple re-usage and refactoring).
    2. Second is the actual value of the property for the document.
    3. Determines whether the value should be stored in the index or not.
    4. Fourth and the last one specifies whether and how a field should be indexed. In the example I have used only two possible states (out of 5): NOT_ANALYZED and ANALYZED. In the first the field’s value is indexed without using an Analyzer. The tokens are indexed by running the field’s value through an Analyzer in the second one. More can be found here.

For each element on the list the document is created and then added to index writer.

3. Close the stream objects.


This fragment of code is self-describing – we just close all used object to release index and make it available for other parts of the app.

In this part of the application we have created the index for our data. I can imagine creating the index in Lucene for some part of the data stored in the DB – where more information is available in database. Only parts important to use in the search are included in the Lucene’s index.

Use index

In the previous section we have created the index. Now it is time to use it and see the magic of Lucene.

1. Firstly we need to open index and prepare analyzer.

var directory = FSDirectory.Open(new DirectoryInfo(@"C:/test_lucene"));
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);

In the first line we have opened our index and in second we have created analyzer for the index.

2. It is time to see the most interesting part of this post – the actual usage of the index in which we will search the text with the input text.

var parser = new MultiFieldQueryParser(Version.LUCENE_30, new[] { "Name", "Description" }, analyzer); // (1) Query query = parser.Parse(text); // (2) var searcher = new IndexSearcher(directory, true); // (3) TopDocs topDocs = searcher.Search(query, 10); // (4)

Firstly (1) we create query parser – as you can see in this example I have used parser for multiple fields, not just only one. For one you will just use QueryParser instead of MultiFieldQueryParser. This is used in (2) to parse the input value (text). In (3) the searcher on the index is created – in this place we indicate our directory where we have created the index for our data. In the (4) we search for top 10 results which fulfill requirements for searched text.

3. Use the result from the search.

int results = topDocs.ScoreDocs.Length;
Console.WriteLine("Found {0} results", results);

for (int i = 0; i < results; i++)
   ScoreDoc scoreDoc = topDocs.ScoreDocs[i];
   float score = scoreDoc.Score;
   int docId = scoreDoc.Doc;
   Document doc = searcher.Doc(docId);

   Console.WriteLine("{0}. score {1}", i + 1, score);
   Console.WriteLine("ID: {0}", doc.Get("id"));
   Console.WriteLine("Text found: {0}\r\n", doc.Get("Name"));

In the previous point we have received topDocs on which we can iterate and get interesting us data. As I have already mentioned in this place we could get more info for found documents and download it from database or file system. One interesting part is the Score value (it is important to spot that the results are ordered by this value!) which is the score value for the query. This is always a number – the higher, the better the document satisfies the query.


There were only around of 100 lines of code in which we have created a simple search engine – and it was together with the sample data. Of course in real-world scenario there will be more sophisticated logic and more operations for optimizing the index. Especially when it will grow and become a very big one.

There are some important assumptions to be known while working with Lucene. One of the biggest is that the index is fully thread safe what means that multiple threads can call any of its methods, concurrently.

As you can imagine the index should be prepared during the loading of the application. I can imagine the index to stay in the memory and be updated when the new data goes to DB. You can decide what part of the data will be included in the text search source.

This post was not the full-introduction for the text-based search. It presented the potential of the Lucene and its port for .NET framework.

I think playing around with this library can be quite interesting and eyes-opening. Especially when we will understand the sophisticated algorithms behind the scenes.

Fluent interfaces – Builder Pattern (2/2)

Previously I have introduced the concept of the builder pattern – in a very short way but it should have been clear what problem this pattern solves.

What we want to achieve by introducing the fluent interface is the code which will fulfill these assumptions:
– code will be easy to read,
– it will be easy to change,
– changes in the business object will be easy to introduce into the builder object,
– we can encapsulate some additional logic into building the object (e.g. additional associations to dependent objects, default values for required fields).

Each of these assumptions is important but the last one seems to be very common problem for plenty of real-life scenarios. In many cases every object must have the default value set, more importantly there are some objects which need to be created together with our business object.

Having these points in mind we can provide such code for our business object (see previous post):

class BusinessObjectFluentBuilder
    private int _id;
    private string _name;
    private IEnumerable<BusinessObject> _children;

    public BusinessObjectFluentBuilder WithName(string name)
        this._name = name;

        return this;

    public BusinessObject Build()
        // some more sophisticated logic here
        return new BusinessObject { Name = _name };

Be aware that I have simplified this code as much as possible (we can set only one property) – but this code is just a guidance, not the real life code. Such a builder allows building the object in such way.

var obj2 = new BusinessObjectFluentBuilder()

As you can see this code is easy to read. We have a builder object, we call method to set the name, and next we just call the Build to prepare us a required object. When the business object will get another property the only thing we need to do is to add one method into our builder to set such property. The difference in the building – we refactor Build method.

This way of implementation could be improved – we could skip the Build method in our implementation! Let’s have a look on another implementation of the fluent builder pattern.

class BusinessObjectFluentBuilderSecond
    private int _id;
    private string _name;
    private IEnumerable<BusinessObject> _children;

    public BusinessObjectFluentBuilderSecond WithName(string name)
        this._name = name;

        return this;

    public static implicit operator BusinessObject(BusinessObjectFluentBuilderSecond builder)
        // some more sophisticated logic here
        return new BusinessObject { Name = builder._name };

As you can see I have introduced the new operator – when we require object, the method is executed. For most of you it is easy to spot that this code has one disadvantage. For those of you who like the implicit typing this code won’t work properly. We have to use this object in such way:

BusinessObject obj3 = new BusinessObjectFluentBuilderSecond()

As you can see – in this place it is your choice how to implement it. Maybe even combine both versions into one – so you can use implicit (with Build() ) and explicit (without Build() ) programming paths.

Of course it is hard in cases when the new property is required – then refactoring becomes a nightmare. Of course this can be easy resolved – for example by throwing an exception in the Build method in case the new property is not set. It is also the interesting topic whether the builder object should be reusable – in the real world we do not need another builder to build another building. We re-use them. It is good to know that the builder object should permit the reset option – in such way we can reuse builder.

Most of the real-life objects are very complicated – especially when you use DDD (Domain Driven Design – and aggregate entities. For such scenarios you need to build objects as the connection between many. In such scenarios the builder patterns seems to be a good choice for the easy introduction of changes in the future.

Please leave a comment if you have a question.

Fluent interfaces – Builder Pattern, Intro (1/2)

The idea of fluent interfaces has been introduced by Martin Fowler ( and Eric Evans – in short words you call one method after another. This way of implementation improves readibility, significantly – makes your code more human-readable than the usual method calls.

.NET framework uses this technique in many places – many of you think now about extension methods which can be chained one with another (about what I will post another time). Currently this seems to be the path for programming world.

The real-life applications enforces on programmers models which creation is very often quite awkward and difficult. Of course in day-to-day scenarios it is common to use builder pattern to help the new object creation process – the place where the object is needed does not have to care about the creation and (more importantly) about dependencies between other objects in the environment. Using builder pattern also helps with testing – there is possibility to change implementation of the builder to make just mock object instead of creation a real one.

The idea of builder pattern is quite easy to understand. What I would like to discuss is the fluent version of the builder – this will be covered in the next post.

Let’s start with a dummy version of the object the application is working on:

class BusinessObject
   public Int32 Id { get; set; }
   public string Name { get; set; }
   public IEnumerable<BusinessObject> Children { get; set; }

The first idea of a builder for such object is for example the BusinessObjectBuilder class.

class BusinessObjectBuilder
    public BusinessObject Build(Int32 id, string name, IEnumerable<BusinessObject> children)
       return new BusinessObject() {Id = id, Name = name, Children = children};

The disadvantages of such solution is trivial. Such scenario could be improved by the default parameters (introduced in the C# 4) – but it still would be not easily readable for a programmer. This would also be hard to refactor the code when the project of the process would change.

The second solution has less cons but it still is far from the ideal situation – where the code will speak for itself rather than the comment (which in plenty of times become outdated!).

class BusinessObjectBuilderSecond
    private int _id;
    private string _name;
    private IEnumerable<BusinessObject> _children;
   public void SetName(string name)
       this._name = name;
// .....
   public BusinessObject Build()
       return new BusinessObject() { Id = _id, Name = _name, Children = _children };

As seen in the code above – the situation is strictly different. In second version the program we could skip some parameters – for example the optional ones. But there is still one con- the one should use it in such way:

var builder = new BusinessObjectBuilderSecond();
 // ...
 var obj = builder.Build();

It is clearer and nicer to use than the first version but it is still quite messy. But it makes the code of building an object easier to maintain – when the object will be updated, it is easier to add one method into builder without changing the usage of the builder in every place in the code. In the first scenario the programmer would have two paths – extend the method signature or add another Build method with some additional parameters so the builder class would have plenty of Build methods which is quite frustrating to get on your intelisense.

None of the presented solutions can resolve two main problems in business environment – domain model changes and easy readabilityof the code (which of course result in easier refactoring after changes in the business object). In the next post I will propose two solutions which can help to solve these issues.

Kinect: installing the device

The Kinect sensor is designed to be part of the console XBox 360, but fortunately Microsoft has decided to release the SDK for Windows 7 users.

This article is the first of a series of posts I will made during the time I will have this sensor at home and I will be able to play with it a little bit.

To install the sensor there are requirements:

This is everything you need to install the Kinect sensor on your Windows 7 computer.

Installing the device

1. Install the VS and .NET framework.
2. Install Kinect SDK (Beta so far).
3. Connect the device to your computer (first plug it into the AC/DC, then to your USB port).
4. You can verify the installation in your “Device manager”.DeviceManager
. The green diode is blinking on the device.

If you have corretly finished these points – you should have the device connected to your Windows 7 PC computer and can start the developing your own application.

Device Manager with Kinect installed correctly