James Lorenzen's Blog: RSS, Lucene, and REST

Thursday, July 1, 2010

RSS, Lucene, and REST

Sorry for the horrible title. I struggled trying to come up with a worthy title, but after a few minutes I decided to not let perfection get in the way of good.

My team has recently worked on a new feature I am pretty excited about: adding support for RSS/Atom in our application. I know your thinking so what. It's not really the what I am excited about but the how. What I'm really excited about was how the story was defined and implemented.

Approach
We had the simple requirement from a newer customer to provide an RSS feed for newly created items. This actually wasn't the first time for this requirement. We prototyped a similar capability a long time ago using OpenESB and the RSS BC, but for multiple reasons it just didn't work out.

So our first decision had to answer: how we were going to implement it......again, but better. Before the sprint began, a few of us got together and hashed out a potential solution: how about we use the Search REST Service, which is backed by Lucene, to support Advanced searches and return RSS?

Why does this excite me so much? To understand that I need to explain our application at a high level. It's a completely javascript-based application using ExtJS (now sencha), backed by REST Services using Jersey. Consequently, we have a lot of REST Services. Right now those REST Services support returning XML or JSON using a custom Response Builder we have created internally.

I'm excited because this single user story could have a huge improvement on the entire system:

If we modified the Search Service to return RSS, then all our REST Services could support RSS.
The REST Service would now support Advanced searches. Previously, it only really supported basic keyword searches.
Any search they perform could now be subscribe to via RSS.

Implementation
I'm not going to go into every detail on how it was done. I wasn't even actually the one who implemented it (see Matt White. He did a fantastic job.). We did have one major hurdle we had to overcome, and that was how to index items to enable advanced searches like Status=New.

Previously this wasn't possible given how we were indexing our items. We were basically indexing the item by building up a large String containing all the item information like the following:

import org.apache.lucene.document.Document
import org.apache.lucene.document.Field

def Document createDocument(item) {
    Document d = new Document()
    
    doc.add(new Field("content",
        getContent(item),
        Field.Store.NO,
        Field.Index.ANALYZED))
        
    return d
}

def String getContent(item) {
    def s = new StringBuilder()
    
    s.append(item.getTitle()).append(" ")
    s.append(item.getStatus()).append(" ")
    s.append(item.getPriority()).append(" ")
    s.append(item.getDescription()).append(" ")
    
    return s.toString()
}

The problem with this is performing a search for "New" would have returned any item with a status of New as well as any items that contained the word New. The solution was to just add another Field to the Document.

doc.add(new Field("Status",
    item.getStatus(),
    Field.Store.NO,
    Field.Index.NOT_ANALYZED));

Now the Search Service could support advanced searches like: Status:"New". You should put the value in quotes in case the value contains spaces (ie Status:"In Progress"). And since Lucene is so powerful, it also means the follow search would work: Status:"New" AND Priority:"High" AND "Hurricane". Now users have the freedom to subscribe to a near limitless amount of RSS feeds based on Advanced Searches.

Start to Finish
I think there were several reasons why this story was a success in my eyes. Most importantly where the two really smart co-workers who worked on it: Matt White and Chuck Hinson. All three of us knew of this user story ahead of time and we were able to discuss it technically days before backlog selection. This allowed us to brainstorm some ideas. Once we narrowed it down, we spent some more time separately looking into the code to find out the level of difficulty and if Advanced Searches like Status:New would be possible. Overall, together I'd say we spent 3-4 hours doing the preliminary work. Doing that preliminary work I think really enabled us to give a proper WAG for the story.

I really can't speak for how the development went (I was at Disney World for 10 days with the family), but I was really impressed with the tests Matt wrote. He wrote a number of unit tests making sure advanced searches worked and basic searches still worked. On top of that, he wrote an overall functional test using HttpBuilder executing the REST Service just as our javascript client would.

Finally, once the main work was finished, we uploaded a diff file to our internal instance of Review Board. From there I was able to perform a peer review where we found a minor bug in the changes.

Summary
I am sure it's not an original idea, but I thought it was a fun User Story that hopefully will provide a lot of value beyond what was originally estimated. Ideally, this might help others who are in similar situations.

If we modified the Search Service to return RSS, then all our REST Services could support RSS.
The REST Service would now support Advanced searches. Previously, it only really supported basic keyword searches.
Any search they perform could now be subscribe to via RSS.

import org.apache.lucene.document.Document
import org.apache.lucene.document.Field

def Document createDocument(item) {
    Document d = new Document()
    
    doc.add(new Field("content",
        getContent(item),
        Field.Store.NO,
        Field.Index.ANALYZED))
        
    return d
}

def String getContent(item) {
    def s = new StringBuilder()
    
    s.append(item.getTitle()).append(" ")
    s.append(item.getStatus()).append(" ")
    s.append(item.getPriority()).append(" ")
    s.append(item.getDescription()).append(" ")
    
    return s.toString()
}

doc.add(new Field("Status",
    item.getStatus(),
    Field.Store.NO,
    Field.Index.NOT_ANALYZED));