

“Surpassing human performance on nocaps is not an indicator that image captioning is a solved problem”Īs Harsh Agrawal, one of the creators of the benchmark, told The Verge over email: “Surpassing human performance on nocaps is not an indicator that image captioning is a solved problem.” Argawal noted that the metrics used to evaluate performance on nocaps “only roughly correlate with human preferences” and that the benchmark itself “only covers a small percentage of all the possible visual concepts.” (You can get an idea of the mixture of images and captions by exploring the nocaps dataset here or looking at the gallery below.) Algorithms are tested on their ability to create captions for these pictures that match those from humans. These images span a range of scenarios, from sports to holiday snaps to food photography and more.

The nocaps benchmark consists of more than 166,000 human-generated captions describing some 15,100 images taken from the Open Images Dataset. The algorithm, which was described in a pre-print paper published in September, achieved the highest ever scores on an image-captioning benchmark known as “nocaps.” This is an industry-leading scoreboard for image captioning, though it has its own constraints. Microsoft says the algorithm is twice as good as its previous image-captioning system, in use since 2015. So, the algorithm can look at a picture and not just say what items and objects it contains (e.g., “a person, a chair, an accordion”) but how they are interacting (e.g., “a person is sitting on a chair and playing an accordion”). Microsoft’s new image-captioning algorithm will improve the performance of Seeing AI significantly, as it’s able to not only identify objects but also more precisely describe the relationship between them.
MICROSOFT POWERPOINT WORD ART SOFTWARE
Microsoft does not disclose user numbers for Seeing AI, but Eric Boyd, corporate vice president of Azure AI, told The Verge the software is “one of the leading apps for people who are blind or have low vision.” Seeing AI has been voted best app or best assistive app three years in a row by AppleVis, a community of blind and low-vision iOS users. It can also be used to describe images in other apps, including email clients, social media apps, and messaging apps like WhatsApp. It can identify household items, read and scan text, describe scenes, and even identify friends. Seeing AI uses computer vision to describe the world as seen through a smartphone camera for the visually impaired.

These apps include Microsoft’s own Seeing AI, which the company first released in 2017. The new algorithm is twice as good as its predecessor says Microsoft
