Attending the Haystack conference
The past two days I attended the Haystack Conference in Charlottesville. Two days with a lot of talks about search relevance. Always nice to spend some days with like-minded people. Besides the talks, there was plenty of room to network. As a committer to the Learning to Rank plugin, it was nice to finally meet some of the other committers to the project. In this blog post, I am giving a summary of the sessions I attended.
The Haystack conference is all about relevant search. The conference is organized by Opensource Connections. There were around 150 people that wanted to learn and talk about search relevance. All sessions are recorded and will be made available online. So if I write about a session that is of interest to you, check the haystack website for the talk.
Keynote by Max Irwin
As with every conference, it all starts with a keynote. Max Irwin is a managing consultant at open source connections. What I took away from his talk has to do with judgments list and experts. He talked about differences in judgments between different experts. Judgments lists are lists where the most relevant documents are appointed to specific queries by experts. Differences in these expert judgments do not always mean that one of them is right and the other is wrong, it could mean a difference in context. When having multiple judgments you can calculate a disagreement factor. This factor was calculated using an Elo rating. Using this rating you can find the best experts for your specific situation. You might give more weight to their opinion when creating the judgment list. Having a good judgment list enables you to start tuning your search relevance.
Ontology and Oncology: NLP for Precision Medicine
This talk was presented by Sean Mullane from the University of Virginia. An interesting talk about using Natural Language Processing techniques to search through all the available papers about cancer treatments to find the right treatment for a specific patient. A lot of research is done for cancer treatment. A lot of different forms of cancer are available and treatment of one form is completely different from the other. With a lot of information available Sean did research to help find the best information to create the best plan for treating a specific form of cancer.
In his research, Sean worked on mapping terms and phrases to specific concepts. With these concepts available during indexing, it is easier to find them using different names or identifiers of the same concept. He showed a form of query expansion using graphs. Using the graphs he can find relatedness of one concept to the other. It was interesting to see how he found the most important relation by not looking at relations that are related to a lot of other concepts but find those relations that have an almost unique relation with the concept you are searching for. Using this technique each path in the graph gets a relatedness factor. Using this factor the query is expanded.
Some references:
- https://github.com/seanmullane/TREC_2018_UVA
- https://www.researchgate.net/publication/300337713_Path-Based_Semantic_Relatedness_on_Linked_Data_and_Its_Use_to_Word_and_Entity_Disambiguation
Autocomplete as Relevancy
This talk was presented by three different speakers from Lexis Nexis: Rimple Shah, Revant Malay and David Rhodes. The intention of their project was to help the user write better queries, so improvement before the query gets to the search engine. The way to do it is through optimizing the auto suggestions while typing. They started out with the suggestions as provided by Solr. In the end, they created their own custom endpoint. More flexibility in sorting the suggestions and adding more properties for scoring and matching. I liked this approach as it is sort of the same I use for one of my customers. Some of the take aways from this talk:
- With longer queries, the focus of the suggestions is only the last few words.
- To make a boost for the starting word using edismax queries, they added a place holder Term at the beginning of the sentence and query. That way a match with one word is already a match with two words. Interesting way to use pf2/ps2 with only one term.
After the talk, I had a good chat with the speakers about the topic. If you by any chance read this post, thanks for that.
Query relaxation – a rewriting technique between search and recommendations
For me, Rene Kriegler presented one of the most interesting talks of the conference. The goal for the talk was to present results to the user when the actual query did not return any results. For some of my own customers, the first step is to switch from the AND operator to the OR operator. We have even implemented a way to show the missing terms in case of an OR query. Rene took another approach. He used techniques to determine the best word to skip to give the most relevant results. He presented multiple approaches and made it really interesting by presenting a technique to determine which approach returned the best results.
He used word vectors as the input for a neural network to train with the goal to determine the best word to remove from the query. When working with word2vec input vectors you can also add other features. He showed some examples with the length of terms, and frequency of terms.
An interesting presentation that deserves more research.
Evolution of Yelp search to a generalized ranking platform
Umesh Dungat is working for Yelp. If you do not know Yelp, they provide a way to search for local businesses. Umesh talked about the way they moved from a custom Lucene based solution to an Elasticsearch based solution. At Yelp they have multiple custom plugins to support their business. They are also a big user of the Learning To Rank plugin. The most important reason for them to start working with the LTR plugin was the option to dynamically change the model, without real downtime. Check the item on their technical blog: Moving Yelp’s Core Business Search to Elasticsearch and Fast Order Search Using Yelp’s Data Pipeline and Elasticsearch.
Lightning Talks
The first day ended with lightning talks. As they are very short by nature, the intention is to trigger viewers to start some research or experiment. Some things on my list to have a look at the coming period.
- Quaerite – Search relevance evaluation toolkit
- Smui – Search Management UI
- Querqy – Is a framework for query preprocessing in Java-based search engines
- Quepid – Makes improving your app’s search results a repeatable, reliable engineering process that the whole team can understand
That was the end of day 1. After the presentations, there was beer and food and games outside. A well spend evening with some nice chats, mostly about search :-).
Want to know more about what we do?
We are your dedicated partner. Reach out to us.