jump to navigation

Starting work on new search functionality for eZ publish based on Lucene May 11, 2006

Posted by Paul Borgermans in eZ publish, Lucene.

After exploring various options for improving the search functionality in eZ publish, I finally settled for Lucene … (the Java version) which will server as a base platform to build upon. I won't give a detailed list of pro's and con's of alternatives (like Xapian, Egothor) or why not use complete search engines like mnogosearch, htdig … because … I'm lazy. But here are my reasons to choose Lucene.

From a management / high level point of view:

  • It is a mature open source project, backed by the Apache foundation
  • It has very powerful features for information retrieval (aka search)
  • Feature-wise it is a good to excellent match to eZ publish (more below)
  • Integration with eZ publish is feasible (mabe even easy) through a PHP-Java bridge
  • It is extensible for special ranking algorithms, filtering and to implement object level security

From a technical point of view

  • It has a beautiful, simple API
  • Can be used with both PHP4 and PHP5
  • The concepts match well with eZ publish:
    • Lucene "documents" <-> eZ publish objects
    • Lucene fields within documents <-> eZ publish object attributes
    • Lucene special fields <-> eZ publish object meta-data
  • It appears fast enough for typical eZ publish use
  • Possibility to index a wide variety of file types

The "drawback" is maybe that additional software needs to be installed (Java, PHP-Java bridge) which means you will need almost full control over the servers which may not be feasible with some hosting companies.

In the next weeks, Kristof and I will be coding and prototyping … which will allow us to come up with a schedule and a more detailed feature implementation plan.