Life can be beautiful

Happy birthday! June 3, 2008

Posted by Paul Borgermans in eZ publish.
4 comments

Since a bit more than one year, eZ Labs Belgium was born as a result of a long-time symbiosis between a few community members of eZ Publish and eZ Systems.

While we are still with just two people in the office (when not traveling that is), it has been quite an adventure already …

So time to have a drink at one of the many bars and pubs near our basecamp in the historical city of Leuven! If you are in the neighborhood, just tell us and we’ll show you why the next eZ Conference should be held here

Cheers

eZ Barcamp, Lyon: advanced searching and navigation topic January 17, 2007

Posted by Paul Borgermans in eZ publish, Lucene, Searching, Solr.
8 comments

During the international partner meeting of eZ systems in Lyon (January 24-26), there will also be a Barcamp. I will use this occasion to talk about a new advanced search and navigation engine for eZ publish that is in the works.

This search engine listens to the name Aurora, as it builds on the Apache Solr (pronounced solar) incubator project and well, eZ publish home is in Norway where aurora’s can be seen more often then here in Belgium 🙂

In fact, the Aurora plugin/engine is a follow-up to the Lucene based plugin Kristof and I developed some time ago. The features I wanted to add like faceted search/navigation, keyword highlighting, high performance caching and more are all built in the Solr backend, which in itself is based on the Lucene Java IR libraries. So instead of writing this myself with Lucene and the PHP-Java bridge, I can concentrate more on fundamental aspects. Also, no need for a PHP-Java bridge extension to be installed, the Solr backend is used over a HTTP connection. I think this is good news for all those who complained about that aspect (but you still need Java 1.5, aka J2EE) installed.

The effort for this new search plugin also has a few benefits for the larger PHP world :

I created a new Solr response writer in Java for PHP: no XML result parsing necessary, the results are returned as a string which can be eval’ed as a multidimensional PHP array (so PHP now joins the Ruby, Python, XSLT, XML and JSON response writers already available). The code is not in Solr yet, but will in the coming weeks.
A core PHP utility class/library for Solr is in the works which will form the basis for a component in the eZ components PHP library (if the eZ team accepts this of course).

And a note to the PHP lovers who do not like Java: the object/class persistence and caching for Java web applications (like Solr, which runs inside a servlet container) has no counterpart in the PHP world. The speed is simply amazing.

Cheers!

Eggplant (Aubergine) rolls with Ricotta and Tomato sauce August 28, 2006

Posted by Paul Borgermans in Cooking, Recipes.
1 comment so far

This is a recipe that apparently pleases quite a few of my friends, so upon request I share this with anyone who wants to try. In fact this may be regarded as one of the many variations that exist on the Italian/Toscan classic: eggplant with tomatoes in the furnace. It also serves the same purposes: you can serve it as a light meal or let it accompany another dish.

The inspiration for this recipe is from many sources and my own preference and experiments.

(more…)

Damn, my Nikon D70 gave up August 9, 2006

Posted by Paul Borgermans in Hobbies, Photography.
1 comment so far

After a few weird over and underexposed pictures, my beloved D70 refuses to work anymore. Of course, the warranty period expired a few months back …

Anyone who is making money with the Lucene search plugin is hereby invited to sponsor me on the purchase of a D200 body. If paying the full price of the D200 body (or sending it to me), he/she will get 16 hours of high quality support on the Lucene plugin from me in exchange … now that’s a deal!!

Sniff

Second release of our Lucene based search plugin for eZ publish July 12, 2006

Posted by Paul Borgermans in eZ publish, Lucene.
1 comment so far

.. or an open source ECMS meets an open source enterprise level search engine ;-).

If you are using eZ publish and are in control of the servers it runs on, please test our contribution (beta) by downloading:

http://ez.no/community/contribs/applications/lucene_java_search_plugin

Though you need to install the php-java bridge as an additional php extension, it is worth the trouble if you need a good search engine. We use it already on production sites. One experimental feature added is a “more like this” module/view. You can use it in node/view templates as

<a href={concat(“/lucene/similar/”,$node.node_id)|ezurl}>Show similar objects</a>

For comments, requests or bug reports, use the comments feature here or on ez.no

In the future, we plan to add quite some more exciting features like image search by example
Happy searching!

Paul

Some eZ publish summer conference pictures July 7, 2006

Posted by Paul Borgermans in eZ publish, Lucene.
1 comment so far

Trying the picasa web album things, not bad and pretty fast 🙂

Here is a selection of my eZ publish summer conference pictures

http://picasaweb.google.com/Paul.Borgermans/20060623Ezsummer

FYI, flickr contains also quite a few pictures of this magnificent event/conference:

http://www.flickr.com/photos/tags/ezconference2006/
Enjoy!

–paul

Prototyping lucene based search in eZ publish: first results June 2, 2006

Posted by Paul Borgermans in eZ publish, Lucene.
2 comments

After roughly 12 hours coding (including removing some dust from my Java skills), Kristof and I reached a first milestone in our Lucene project: full text search implemented as a normal search plug-in for eZ publish. First impressions in a nutschell: fast, accurate and even faster 🙂

Even though we use the PHP-Java bridge in a non-optimal development mode with default Java settings (like low memory), indexing and searching is way faster than the ezsearch plugin. Accurate benchmarks are lacking at this point, we'll do that later but it appears to be at least an order of magnitude faster. A typical search over an index with ~4000 documents takes 0.050 secs; including the step to fetch the content objects from the ez publish database for the displayed hits (10 at a time) this amount increases to typically 0.1 secs (machine is a dual 3 Ghz CPU 64bit, 4GB RAM).

Some technical details on the milestone reached:

the plugin is written in php, and calling the lucene classes and methods (Java version) is implemented through the php-java bridge
object attributes are indexed as separate fields, as well as object meta-data (owner, dates, section, class, path…)
for lucene users: the analyzer used is the multi-field analyzer
full text search is over all fields (our test case: 137 fields)
all the typical richness of lucene queries (Boolean searches on keywords, fuzzy matching, keyword boosting, …)
no field (attribute) or document (object) boosting during the indexing phase at this time, using only the standard heuristics of lucene (like short fields with keyword hits increase the relevance ranking more than long fields with matching keywords)
sub-tree searches are implemented
no security yet

The next phase will consists of

experimenting with boosting factors during indexing (for example use the number of reverse object relations to determine a boost factor at the object level, keyword attributes are more important than the rest, using configured boost factors from an ini file, …)
implementing the advanced ez search interface (class/attribute filtering, range queries including dates)
implementing a normal query interface (mainly for template programmers who want to include dedicated search results on certain pages/node views)
implementing security: this will be done in Java by writing a dedicated lucene filter which interrogates the database like the ez publish search does

Stay tuned, source code will be released around the ez summer conference where I 'll have a talk on this and other developments done here 😉

Starting work on new search functionality for eZ publish based on Lucene May 11, 2006

Posted by Paul Borgermans in eZ publish, Lucene.
10 comments

After exploring various options for improving the search functionality in eZ publish, I finally settled for Lucene … (the Java version) which will server as a base platform to build upon. I won't give a detailed list of pro's and con's of alternatives (like Xapian, Egothor) or why not use complete search engines like mnogosearch, htdig … because … I'm lazy. But here are my reasons to choose Lucene.

From a management / high level point of view:

It is a mature open source project, backed by the Apache foundation
It has very powerful features for information retrieval (aka search)
Feature-wise it is a good to excellent match to eZ publish (more below)
Integration with eZ publish is feasible (mabe even easy) through a PHP-Java bridge
It is extensible for special ranking algorithms, filtering and to implement object level security

From a technical point of view

It has a beautiful, simple API
Can be used with both PHP4 and PHP5
The concepts match well with eZ publish:
- Lucene "documents" <-> eZ publish objects
- Lucene fields within documents <-> eZ publish object attributes
- Lucene special fields <-> eZ publish object meta-data
It appears fast enough for typical eZ publish use
Possibility to index a wide variety of file types

The "drawback" is maybe that additional software needs to be installed (Java, PHP-Java bridge) which means you will need almost full control over the servers which may not be feasible with some hosting companies.

In the next weeks, Kristof and I will be coding and prototyping … which will allow us to come up with a schedule and a more detailed feature implementation plan.

Improving search in eZ publish January 31, 2006

Posted by Paul Borgermans in eZ publish.
4 comments

Search in eZ publish is not really rocket science with the current plugins (ezsearch and openfts). Our team at SCK-CEN had a meeting with the Mathematics department of the University of Ghent on order to brainstorm on a possible collaboration with them in the areas of fuzzy logic, AI, … applied to search with eZ publish.

It was a very interesting discussion and revived my old complaints on eZ search and my interest in doing something about it. So here is my plan for the 3.x series, a gradual (step-wise) addition of features, possibly also a test-bed for searching in the 4.x series:

Do a lot more logging of what is going on in ez publish sites (Kristof started with this, again in a very elegant way), to be used as research data with AI and fuzzy logic algorithms
Improve ranking based on priorities at the class and attribute level
Provide stemming in case of non-existing keywords
Take into account synonyms
Take into account object relations
Take advantage of the keyword and indexed word statistics

as they are indexed
as they are used in searches
as they match user profiles (if a user is logged in of course)

Improve deterministic/heuristics of the above with some basic AI/fuzzy algorithms

Design work is planned for February/March 2006, implementation and prototypes from April 2006. Later on collaboration projects with the University of Ghent for more advanced features and research are possible

We got rid of the “internal” draft issue, who’s next? January 20, 2006

Posted by Paul Borgermans in eZ publish.
7 comments

If you are using eZ publish with lots of “editors”, usually when implementing a portal with collaborative features, you probably faced the issue of drafts created by users without storing anything.

It bothers me for a long time.
The reason: eZ publish creates a draft version of an object whenever an edit action is called by a user. Wether that user stores a real draft version or not, the next time someone tries to edit the given object, a “nice” screen is presented that the object is currently being edited by you or someone else.

In highly collaborative situations, this turns users off, confuses them, and in our real life applications make them think it is impossible to do what they intended to. Really annoying or even worse…
The issue was discussed quite some time ago in the forums and during the eZ publish 2005 summer conference by Derick and me, and the best idea to solve it was to create a new status value for objects for the draft created by te edit action when called. In this way, this “internal” draft could be distinguished from real stored drafts … and a cron job could periodically scan the database and remove the internal drafts
I did not have the time to do it, but my collegue Kristof implemented the above idea recently and more: he patched the edit action even more. When there is an “internal draft” and the user who created this is no longer logged in, the internal draft is ignored and a normal edit screen is presented. That even avoids the need for running a cron job.

Go check out the patch at http://pubsvn.ez.no/community/trunk/hacks/untoucheddrafts/ for the 3.6 and 3.7 versions and tell us here or in the forums what you think. If all goes well, it may land in 3.8 with patches from us against the previous versions on pubsvn.

Cheers

« older posts

	Paul Borgermans on Happy birthday!
	bjoern d. on Happy birthday!
	Felix Laate on Happy birthday!
	gaetano on Happy birthday!
	jonathan reid on Prototyping lucene based searc…

Life can be beautiful

Happy birthday! June 3, 2008

eZ Barcamp, Lyon: advanced searching and navigation topic January 17, 2007

Eggplant (Aubergine) rolls with Ricotta and Tomato sauce August 28, 2006

Damn, my Nikon D70 gave up August 9, 2006

Second release of our Lucene based search plugin for eZ publish July 12, 2006

Some eZ publish summer conference pictures July 7, 2006

Prototyping lucene based search in eZ publish: first results June 2, 2006

Starting work on new search functionality for eZ publish based on Lucene May 11, 2006

Improving search in eZ publish January 31, 2006

We got rid of the “internal” draft issue, who’s next? January 20, 2006

Tags

@paulborgermans

Recent Posts

Archives

Recent Comments

Blog Stats

Feeds