Auto-complete and Query Classification

New England Search Technologies Meetup – Thursday, February 28, 2019

5:30 – 6:00 PM – Food/Drink and Networking (Sponsored by Rakuten)

Autocomplete at Rakuten – Keith Thoma

Autocomplete or query completion is an important part of a search experience on any website. It is often the first feature users engage with when performing a search. Issues such as bad term suggestions, slow responses, or bad filters can lead to degraded user experience and a decrease in sales for an e-commerce organization. Rakuten has built a solution that can quickly be deployed to various properties across the world. This solution is built using a combination of technologies like: Solr, NLP, Python and ML models. It also includes generating suggestions, cleaning suggestions, suggesting filters and being able to handle public traffic.

About the Presenter

Keith Thoma is a software engineer at the Rakuten Americas Big Data Department in Boston MA. His primary role is to develop search and data solutions for Rakuten subsidiaries as part of the Americas Big Data team. This includes tasks such as relevancy tuning, NLP, and platform migrations. The team has successfully launched search and data projects in the United States, Brazil, Europe, and Japan. Prior to his work at Rakuten, Keith worked on search projects for companies all over the globe including Dell and European Directories affiliates.

Query Classification – Yiu-Chang Lin

Query classification has been widely studied to understand users’ search intent to improve user satisfaction and e-commerce conversion rates. A query can be associated with a category label that belongs to a taxonomy tree describing the items in the catalog. However, product-related search queries are typically short, ambiguous, and continuously changing depending on seasonal trends and the introduction of new products over time. Moreover, having humans annotate large amount of queries with the proper category label is nearly infeasible in practice.

In this talk, we will introduce an unsupervised method that converts millions of users’ browsing behavioral data into automatically labeled data that can be consumed by machine learning models. We also compare and contrast different state-of-the-art text classifiers and demonstrate that an ensemble of linear SVMs achieves the best performance in terms of F-1 score.

About the Presenter

Yiu-Chang Lin is a research scientist at Rakuten Institute of Technology (RIT) in Boston, MA. Prior to that, he was a graduate research assistant in the School of Computer Science while pursuing an M.S. in Language Technology at Carnegie Mellon University. His research interest lies in the intersection of machine learning and natural language processing. He has been working on numerous e-commerce related projects including query understanding, learning to rank, product linking, etc.

Lightning Talk – Al Cole

Search platforms like Apache Solr and Elasticsearch can deliver a highly relevant search experience when configured properly for the use case they will be serving. However, there are times when overriding those results makes sense and business rules are a vehicle for accomplishing that goal. In this lightning talk, Al will introduce what business rules are (e.g. triggers and actions) and when they might be appropriate for your use case.

About the Presenter

Al Cole of NorthRidge Software provides search consulting services and develops insight applications for his clients.
https://www.linkedin.com/in/coleal