Improving search in eZ publish January 31, 2006
Posted by Paul Borgermans in eZ publish.trackback
Search in eZ publish is not really rocket science with the current plugins (ezsearch and openfts). Our team at SCK-CEN had a meeting with the Mathematics department of the University of Ghent on order to brainstorm on a possible collaboration with them in the areas of fuzzy logic, AI, … applied to search with eZ publish.
It was a very interesting discussion and revived my old complaints on eZ search and my interest in doing something about it. So here is my plan for the 3.x series, a gradual (step-wise) addition of features, possibly also a test-bed for searching in the 4.x series:
- Do a lot more logging of what is going on in ez publish sites (Kristof started with this, again in a very elegant way), to be used as research data with AI and fuzzy logic algorithms
- Improve ranking based on priorities at the class and attribute level
- Provide stemming in case of non-existing keywords
- Take into account synonyms
- Take into account object relations
- Take advantage of the keyword and indexed word statistics
- as they are indexed
- as they are used in searches
- as they match user profiles (if a user is logged in of course)
- Improve deterministic/heuristics of the above with some basic AI/fuzzy algorithms
Design work is planned for February/March 2006, implementation and prototypes from April 2006. Later on collaboration projects with the University of Ghent for more advanced features and research are possible
do you setup ez? can u help with setup? charge… is ?
if u can, email me
and see this tread: http://ez.no/community/forum/install_configuration/installation_problem_with_ezpublish_3_7_2
Great idea! While users don’t usually complain about bad search, the irrelevancy of the results often annoys me, especially at ez.no. The users may not know what they want, but they will be well served by a better search engine. I’m really looking forward to seeing this!
Any news on that topic ? You were going to brainstom with smart guys at universities, weren’t you ?
X+
Yes,
One project is the current outcome: a master thesis proposal on linguistic algorithms and relevance ranking, taking into account the structure (read: intra and extra document/page relations). This is for the academic year 2006-2007 and of course, a student has to be found for actually doing it
In the short term, a plugin and enhancements based on Apache Lucene is the plan.