Nitro-Net.com – Internet Marketing Services – A Global Marketing Group Company
A new report by Percifient Digital has sought to compare some of the key players in the image recognition engine sector.
Published in July – Who Has the Best Recognition Engine? – analyzed the accuracy of AWS Rekognition (Amazon), Google Vision, IBM Watson and Microsoft Azure. The study also enlisted some of their users to tag images by hand in order to show how the engines compare to our own human image processing abilities.
The results are fascinating. Of the image recognition engines in the experiment, Google performed very well – especially across basic accuracy and when breaking this down by confidence level. Let’s have a look at some of the data, how the engines compare to humans, and link it back to some of Google’s latest moves in visual search.
Overall scores of image recognition engines
The top-level results of the study show Google Vision to be the best of the engines when it comes to basic accuracy.
81.7% of the 2000 images it tagged were accurate, second only to the human team who managed to tag the images with 87.7% accuracy.
Accuracy at greater than 90% confidence
Along with the tags returned in the study, the engines also return a score of the confidence level they have with each. Overall, 55% of tags were returned with a confidence level of 70% or higher.
When it came to images the engines scored at 90% confidence or higher, both Google Vision and Microsoft Azure boasted impressively high accuracy at 92.4% and 90.9% respectively. For comparison, when the human team returned tags of over 90% confidence, 87.7% were accurate.
Accuracy at greater than 80% confidence
The Google and Microsoft engines also performed very well with images at the 80% or higher confidence level.
Again, the engines returned just slightly better accuracy levels than the human team.
Microsoft Azure actually did best in this instance, with 89.6% accuracy. This was closely followed by Google Vision at 88.2% and the human team at 87.7%.
How good are image recognition engines at matching human descriptions?
Another part of the study looked at how human-like (or not) the image recognition engines were when it came to tagging images. After all, image tags may be technically accurate, but they still might be very different to the phrase or word a human would use to describe them.
To analyse this, users were asked to rank the top five tags each engine suggested for each image. They also did the same for the tags chosen by the human team.
The results here showed the engines all to score considerably lower than the human tags. But out of the engines, Google does do best, matching 217 times.
‘Humans can still see and explain what they are seeing to other humans better than machine APIs can,’ the study concludes. ‘This is because of several factors, including language specificity and a greater contextual knowledge base.’
Takeaways for search and ecommerce
The study will be of interest to digital marketers. Image recognition technology is increasingly permeating search, social and ecommerce – with Google, in particular, rolling out new tools for users and businesses.
Google Lens was launched in 2017 allowing users to conduct searches simply by capturing an image on their smartphone camera. It is now available across Android and iOS devices.
On the retailer side, back in March Google also launched shoppable ads on Google Images. These ads are displayed in the image SERPs and allow users to select items within images in order to discover ecommerce sites which sell those products.
As the blog post accompanying the launch of shoppable ads highlighted: ‘50 percent of online shoppers said images of the product inspired them to purchase, and increasingly, they’re turning to Google Images.’
Both users and businesses are increasingly benefiting from smarter image recognition
While the Percifient Digital report shows us that machine learning might have some ground to make up in terms of human-like tagging and vocabulary, the accuracy of these engines is making impressive progress indeed. As we have seen, AI sometimes has an even better handle on visual information than humans do.
We can also see how competitive the field is. Some of the biggest names in digital are clearly investing vast amounts in these machine APIs, with Google clearly winning in many ways. But it will be the engine which can succeed in better-than-human image recognition while also satisfying a need among consumers to interact with fresh, useful, rich media images which will continue to lead the way.