Image recognition, or computer vision, is the foundation of new opportunities in everything from automotive to advertising. Its growing importance is such that the upcoming LDV Vision Summit, an annual conference on visual technology, is now in its third year. Computer vision has expanded through trends that have benefited other forms of AI, including open source, deep learning technology, easier programming tools and faster, cheaper computing, opening up opportunities for a wide range of businesses.
TechCrunch explains the foundations of computer vision and how different participants are using it. To learn to accurately identify images, computers need to learn via massive amounts of data: “The only way computers can accurately identify cats in photos is because they have already learned what cats look like by analyzing millions of pictures tagged with the word ‘cat’.” Two major, free visual databases provide what most companies are using as the basis for machine learning around images.
In 2009, Stanford and Princeton computer scientists launched ImageNet, which holds an annual visual recognition challenge. It’s grown from 80,000 tagged images to its current size of more than 14 million tagged images. The second image database is Pascal VOC, run out of several British universities, which has fewer images but more descriptive tags. The approach “improves the accuracy and breadth of the machine learning and, for some applications, speeds up the overall process.”
In addition to these open-source databases, some big tech companies, Google and Facebook chief among them, have access to user-identified images, which, notes TechCrunch, is the reason why these platforms let users upload so many photos for free. “It’s because those pictures are used to train their deep learning networks to become more accurate.”
To build the machine that can learn from the data, freely available open-source software libraries act as frameworks to create systems for specific services, such as facial recognition or medical screening. Google’s proprietary TensorFlow, parts of which were open-sourced last year, is helping the company build autonomous cars.
Beginning in 2009, UC Berkeley created Caffe, which offers “ease of customizability and large community of innovators, not to mention heavy use by Pinterest and Yahoo!/Flickr.” Google even used Caffe for DeepDream. Facebook AI Research (FAIR) uses Torch, created in 2002; in 2015, it open sourced some modules.
To optimize the computer’s GPU (graphics processing unit) performance, Nvidia offers cuDNN, another open-source software library. Other tools are also able to use more than one computer or GPU to speed up machine learning.