Researchers at MIT have developed a method for teaching computers to understand what is happening in video content. The method uses a similar approach to textual analysis, such as natural language processing, by looking at each part of a video to figure out what the whole thing means. The researchers created an algorithm to identify what occurs in individual frames of the video, and then determines what those mean when combined in a certain order.
“It could prove meaningful as more companies look to images and video to analyze everything from consumer behavior to health care,” reports GigaOM.
“As one might expect, identifying the actions taking place within videos is a machine learning problem,” notes the article. “The algorithm was trained on videos of specific actions, but it had to learn on its own which steps comprise a larger action (e.g., making tea or lifting weights) and the normal flow from one step to the next.”
This type of algorithm could be useful in automatically tagging and indexing online videos, including videos on YouTube. The researchers are also hoping their algorithm could be used for medical purposes, including monitoring exercise or reminding people to take their medication.
The technology also has the potential to detect if there is an armed robbery taking place at an ATM, for example, or to alert zoologists that animals at the zoo are breeding.
Dropcam, a startup working with cloud-connected cameras, is pursuing similar goals by creating its own model of identifying actions from what its cameras are monitoring.
“It’s clear that video will soon become just as important a source of data as text and videos for smart companies and other institutions trying to glean insights in any numbers of areas,” suggests GigaOM. “There was always lots of information buried within tweets, photos and videos, but few organizations had the manpower to look at them all. Thanks to advances in artificial intelligence, they’ll soon only need a credit card.”