Adding search to data pipeline

After scraping all of the big ecommerce sites for prices, crunching the data and storing the results of the machine learning models, the client wanted their employees to easily access the data. They wanted to enable employees to easily access the price data to be used for sales conversations.

Project Outline

Below is a quick overview of the project's requirements and characteristics.

The Problem

The client was in the green tech space and provided devices as a service in the B2B space. Think equipping entire organizations with iphones, laptops, tablets etc.

To achieve this, it required a large amount of devices in stock that could go into the ecosystem of the company's fleet of devices. And since they also offered a service where they would buy up their customers old devices, they needed reliable price data.

To solve this, they had built an impressive system of web scrapers to get price information, and machine learning models to predict a price based on certain parameters.

However, they lacked a good way to provide access to this data.

The Solution

The team responsible for this wasn't large so they had chosen to use Ruby on Rails to speed up development of their internal tools so the solution needed to work well within a Rails codebase.

Lots of options exist in this space but they can quickly grow very expense if your not careful. And this being an internal facing tool they wanted to keep cost low and try to use an opensource tool.

I approached the problem by quickly get an overview of opensource tools that would fit the requirements and then built simple MVPs using each option. As this project was limited to 1 month there wasn't much time to ponder over options if there was to be a working tool at the end of that month.

A tool called typesense ended up fitting the bill. It was opensource, could be selfhosted and would allow for easy re-indexing of the data. Basically, you choose a property (or properties) that the search query will look for. To stay up to date, these properties will have to updated and created as new data comes in and old data gets updated.

Technologies used