jump to navigation

Prototyping lucene based search in eZ publish: first results June 2, 2006

Posted by Paul Borgermans in eZ publish, Lucene.
trackback

After roughly 12 hours coding (including removing some dust from my Java skills), Kristof and I reached a first milestone in our Lucene project: full text search implemented as a normal search plug-in for eZ publish. First impressions in a nutschell: fast, accurate and even faster :-)

Even though we use the PHP-Java bridge in a non-optimal development mode with default Java settings (like low memory), indexing and searching is way faster than the ezsearch plugin. Accurate benchmarks are lacking at this point, we'll do that later but it appears to be at least an order of magnitude faster. A typical search over an index with ~4000 documents takes 0.050 secs; including the step to fetch the content objects from the ez publish database for the displayed hits (10 at a time) this amount increases to typically 0.1 secs (machine is a dual 3 Ghz CPU 64bit, 4GB RAM).

Some technical details on the milestone reached:

  • the plugin is written in php, and calling the lucene classes and methods (Java version) is implemented through the php-java bridge
  • object attributes are indexed as separate fields, as well as object meta-data (owner, dates, section, class, path…)
  • for lucene users: the analyzer used is the multi-field analyzer
  • full text search is over all fields (our test case: 137 fields)
  • all the typical richness of lucene queries (Boolean searches on keywords, fuzzy matching, keyword boosting, …)
  • no field (attribute) or document (object) boosting during the indexing phase at this time, using only the standard heuristics of lucene (like short fields with keyword hits increase the relevance ranking more than long fields with matching keywords)
  • sub-tree searches are implemented
  • no security yet

The next phase will consists of

  • experimenting with boosting factors during indexing (for example use the number of reverse object relations to determine a boost factor at the object level, keyword attributes are more important than the rest, using configured boost factors from an ini file, …)
  • implementing the advanced ez search interface (class/attribute filtering, range queries including dates)
  • implementing a normal query interface (mainly for template programmers who want to include dedicated search results on certain pages/node views)
  • implementing security: this will be done in Java by writing a dedicated lucene filter which interrogates the database like the ez publish search does

Stay tuned, source code will be released around the ez summer conference where I 'll have a talk on this and other developments done here ;-)

About these ads

Comments»

1. Tore Skobba - June 6, 2006

Sounds like a great contribution, I always feel eZ/PHP has been lacking without support for an search engine, MySql or any other SQL database is not really approriate for text searches, at least if you wanna have scalable text searching. I guess they are used since it is what is available.

Cheers
Tore

2. jonathan reid - May 3, 2008

What’s its comparison to the Zend Implementation?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: