The challenge
Imagine: we give you a box filled with 1000 rings, and there are three barely different rings that you have to classify manually. Time-consuming, to say at the least. As you can probably guess, this was a daily reality for our client. The company receives packages filled with thousands of unclassified jewels to replenish its inventory. When these jewels arrive in their warehouse, employees must put them in the right spot, which takes some time.
The client had a specific technical solution in mind to solve this problem, namely a classical classification tool powered by artificial intelligence. The idea: a machine learning algorithm that could scan every jewelry bag and suggest which product it is exactly. A great idea to get started, but we decided to challenge the concept and specify the role machine learning could play in this case.
There are a few reasons why we believe that classical classification was not the way to go in this instance. For example, adding a new product would require the whole AI model to be retrained, and this isn’t a small task, as each product - and product number - would need a minimum of 50 to 100 different images to train the model correctly. Finally, classical classification doesn’t work for extensive classes, and in this instance, there are more than a thousand products—conclusion: not ideal.
All the reasons above are why we proposed to work with another approach, namely similarity embedding. Similarity embedding is a machine learning method where an algorithm compares the products in a database and selects the top n-matches in terms of visual similarity. The matches are then shown in order of relevance.
This way of working enables us to work more accurately for extensive classes. Moreover, adding a new product does not require retraining the whole AI model. What’s even better about this approach is that a single image per class already suffices. Of course, more images per class improve accuracy, but it is not required.
In the beginning of the project, we established some non-functional requirements with the client. First, the search speed (how fast a similar image is shown) should be max. 10 seconds. Second, the accuracy (how many times our algorithm is correct) should be above 90%. Thanks to our talented developers, we smashed those results as the search speed ended up at 5 seconds and the model’s accuracy was 93%. And, as it is an AI algorithm, it will keep getting smarter, of course.