I had considered approaches similar to what you mention, but was curious if there was a more efficient way to do it. Some of our result sets have more than 100,000 records. Looks like Lucene does not have a function for selecting distinct.

As I understand, you would like to show counts for your results for some terms – e.g. car (10), toy (13), electronic (3). For such scenario I can think of two possible solutions:

– subqueries for each of your category, where you just count results,

– analyze results provided by Lucene for your query.