Meta Develops Computer Vision AI That Learns Like Humans

Meta Platforms continues to make progress on a mission to develop artificial intelligence that can teach itself to learn how the world works. Chief AI Scientist Yann LeCun has taken a special interest in developing the new model, called Image Joint Embedding Predictive Architecture, or I-JEPA, which learns by building an internal representation of the outside world and analyzing image abstracts instead of comparing pixels. The approach allows AI techto learn more like humans do, with their ability to figure out complex tasks and adapt to new situations.

I-JEPA and joint embedding predictive architecture models operate on the premise that humans soak up enormous amount of “background knowledge” about the world without acting consciously to learn, through passive observation. AI researchers have been trying to devise learning algorithms that can capture background knowledge and store it away for later, when a situation arises for which it would be useful.

“To be effective, the system must learn these representations in a self-supervised manner — that is to say, directly from unlabeled data such as images or sounds, rather than from manually assembled labeled datasets,” Meta explained in a blog post.

“At a high level, I-JEPA can predict the representation of part of an input, such as an image or piece of text, using the representation of other parts of that same input,” writes SiliconANGLE. “That’s different from newer generative AI models, which learn by removing or distorting portions of the input, for instance by erasing part of an image or hiding some words in a passage, then attempting to predict the missing input.”

Meta says a flaw in generative AI models is that their mandate is to try to logically add all missing information, which is contrary to the world’s inherently unpredictable nature.

That’s why, for all their smarts, generative models often make mistakes a human wouldn’t. They expend too much energy on irrelevant details. (An example of this is how AI often botches representations of human hands, adding digits and other distortion.)

“We believe this is an important step towards applying and scaling self-supervised methods for learning a general model of the world,” Meta said, adding that “in the future, JEPA models could have exciting applications for tasks like video understanding.” In May, Meta introduced ImageBind, which also tries to replicate human learning.

“I-JEPA demonstrates there’s a lot of potential for architectures that can learn competitive off-the-shelf representations without the need for additional knowledge encoded in hand-crafted image transformations,” SiliconANGLE writes, noting Meta says it is “open-sourcing both I-JEPA’s training code and model checkpoints, and their next steps will be to extend the approach to other domains, such as image-text paired data and video data.”

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.