Microsoft uses artificial intelligence in Bing to enhance image search

As Google recently demonstrated in conjunction with the latest language models, artificial intelligence and machine learning have the potential to significantly improve the accuracy of search engine results. Not to be outdone, Microsoft announced that it has injected many artificial intelligence technologies into Bing’s image search engine to better search for images with specific contexts or attributes.

Today we would like to share how our image search is evolving further towards a more intelligent and more precise search engine through multi-granularity matches, improved understanding of user queries, images, and webpages, as well as the relationships between them“, Bing Image Processing Team is on a blog the post reads.

One of these tools is vector matching, which maps queries and documents to the semantic space to help find more relevant results. Adding BERT and Transformer to Bing’s technology stack, these techniques use pre-training and attention mechanisms to model the relationships between words and embed images and pages in a way that correlates with each other, dramatically improving the overview of photos and pages.

Image: Bing

Transformer is a new type of nerve architecture by Google AI research in 2017. Like all deep neural networks, Transformer also contains neurons (mathematical functions) arranged in interconnected layers that transmit signals from the input data and gradually adjust the synaptic strength (weight) of each connection. This is how all AI models extract features and learn to predict, but Transformers is unique in that each output element is connected to each input element. The weight between them can be effectively calculated dynamically.

Another method recently applied to Bing image search – attribute matching – extracts a set of object attributes from queries and candidate documents and uses them for matching. The team trained the detector using a multitasking optimization strategy that allowed the detector to identify certain attributes from the image content and surrounding text even on pages with insufficient text information, although the technology is currently only available for limited scenes and attributes.

The Bing team is also committed to enriching image metadata with high-quality information, using the vector matching and attribute matching methods described above. The best representative query of the image (natural language query for summarizing web pages and image content) is generated by inputting text from the web page into a machine learning model that extracts the phrase from the long text on the web page. The text information is then embedded with the image into a single semantic vector and then compared to other queries in the repository to identify close matches.