Apple’s ReALM AI Advances the Science of Digital Assistants

Apple has developed a large language model it says has advanced screen-reading and comprehension capabilities. ReALM (Reference Resolution as Language Modeling) is artificial intelligence that can see and read computer screens in context, according to Apple, which says it advances technology essential for a true AI assistant “that aims to allow a user to naturally communicate their requirements to an agent, or to have a conversation with it.” Apple claims that in a benchmark against GPT-3.5 and GPT-4, the smallest ReALM model performed “comparable” to GPT-4, with its “larger models substantially outperforming it.” Continue reading Apple’s ReALM AI Advances the Science of Digital Assistants

Amazon Increases Its Investment in Anthropic AI to $4 Billion

Amazon has added $2.75 billion to its initial September 2023 investment of $1.25 billion in Anthropic, completing its announced $4 billion stake in the artificial intelligence startup formed in 2021 by former members of OpenAI. As part of the resulting strategic collaboration, Anthropic’s most powerful models, including the Claude 3 series, are available on Amazon Bedrock, a service providing fully managed foundation models. Anthropic is using Amazon Web Services as its primary cloud provider and Amazon says Anthropic will use AWS Trainium and Inferentia chips “to build, train, and deploy its future models.” Continue reading Amazon Increases Its Investment in Anthropic AI to $4 Billion

Apple Unveils Progress in Multimodal Large Language Models

Apple researchers have gone public with new multimodal methods for training large language models using both text and images. The results are said to enable AI systems that are more powerful and flexible, which could have significant ramifications for future Apple products. These new models, which Apple calls MM1, support up to 30 billion parameters. The researchers identify multimodal large language models (MLLMs) as “the next frontier in foundation models,” which exceed the performance of LLMs and “excel at tasks like image captioning, visual question answering and natural language inference.” Continue reading Apple Unveils Progress in Multimodal Large Language Models

Cineverse to Launch cineSearch Powered by Google’s Vertex AI

Global streamer Cineverse has launched an AI-powered search and discovery tool called cineSearch. Developed using the Vertex AI platform from Google Cloud, cineSearch will be released in public beta this spring. A waitlist has already been started, and Cineverse says it will be made more widely available in partnership with OEM and third-party streaming platform partners in the coming months. Simultaneously, the Los Angeles-based firm is rolling out a new ad platform called Cineverse 360, or C360 for short, that will connect brands across omnichannel services reaching 82 million monthly viewers at launch. Continue reading Cineverse to Launch cineSearch Powered by Google’s Vertex AI

ChatGPT Goes Multimodal: OpenAI Adds Vision, Voice Ability

OpenAI began previewing vision capabilities for GPT-4 in March, and the company is now starting to roll out the image input and output to users of its popular ChatGPT. The multimodal expansion also includes audio functionality, with OpenAI proclaiming late last month that “ChatGPT can now see, hear and speak.” The upgrade vaults GPT-4 into the multimodal category with what OpenAI is apparently calling GPT-4V (for “Vision,” though equally applicable to “Voice”). “We’re rolling out voice and images in ChatGPT to Plus and Enterprise users,” OpenAI announced. Continue reading ChatGPT Goes Multimodal: OpenAI Adds Vision, Voice Ability

vETC Coming to IBC 2023: The Path of Sustainable Innovation

ETC@USC is hosting its 9th vETC virtual conference at this year’s IBC in Amsterdam, September 15-18. The event — which highlights significant presentations of emerging technologies and their impact on the M&E industry — will explore how generative AI, machine learning, and other compelling new tools help simplify building 3D worlds and tackle today’s computer vision challenges. The sessions will be recorded and posted on ETC’s YouTube channel. For those attending IBC who may be interested in attending the sessions (located in Hall 7 at Booth C28), visit the program guide, which includes a full schedule and speaker bios. Continue reading vETC Coming to IBC 2023: The Path of Sustainable Innovation

Amazon Integrating AI to Modernize NFL Viewing Experience

Amazon is using artificial intelligence to change the way viewers experience “Thursday Night Football” on Amazon Prime Video this season. Now in the second year of its 10-year NFL deal, Amazon joins Disney’s ESPN in using AI to change how people experience televised sports by parsing a variety of analytics and using machine learning to interpret 2D video into 3D for a variety of viewpoints on any play. Amazon is auto-generating highlights feeds for each game, so late arrivals can catch up. September 14 marks the debut of the new AI Prime features and the games in 1080p HDR. Continue reading Amazon Integrating AI to Modernize NFL Viewing Experience

ETC Will Host Sessions at SIGGRAPH Conference This Week

ETC@USC will host its 8th vETC virtual conference at SIGGRAPH 2023 in Los Angeles, August 8-10. The event – which highlights significant presentations of emerging technologies and their impact on the M&E industry – will explore how generative AI, machine learning, and other compelling new tools help simplify building 3D worlds and tackle today’s computer vision challenges. Three days of sessions will be recorded and posted on ETC’s YouTube channel. For those attending SIGGRAPH who may be interested in attending the sessions (located at Z by HP Booth 215), visit the program guide, which includes a full schedule and speaker bios. Continue reading ETC Will Host Sessions at SIGGRAPH Conference This Week

ETC Will Host Sessions at SIGGRAPH Conference Next Week

ETC@USC will host its 8th vETC virtual conference at SIGGRAPH 2023 in Los Angeles, August 8-10. The event – which highlights significant presentations of emerging technologies and their impact on the M&E industry – will explore how generative AI, machine learning, and other compelling new tools help simplify building 3D worlds and tackle today’s computer vision challenges. Three days of sessions will be recorded and posted on ETC’s YouTube channel. For those attending SIGGRAPH who may be interested in attending the sessions (located at Z by HP Booth 215), visit the program guide, which includes a full schedule and speaker bios. Continue reading ETC Will Host Sessions at SIGGRAPH Conference Next Week

MAGE AI Unifies Generative and Recognition Image Training

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have introduced a computer vision system that combines image recognition and image generation technology into one training model instead of two. The result, MAGE (short for MAsked Generative Encoder) holds promise for a wide variety of use cases and is expected to reduce costs through unified training, according to the team. “To the best of our knowledge, this is the first model that achieves close to state-of-the-art results for both tasks using the same data and training paradigm,” the researchers said. Continue reading MAGE AI Unifies Generative and Recognition Image Training

Meta Develops Computer Vision AI That Learns Like Humans

Meta Platforms continues to make progress on a mission to develop artificial intelligence that can teach itself to learn how the world works. Chief AI Scientist Yann LeCun has taken a special interest in developing the new model, called Image Joint Embedding Predictive Architecture, or I-JEPA, which learns by building an internal representation of the outside world and analyzing image abstracts instead of comparing pixels. The approach allows AI techto learn more like humans do, with their ability to figure out complex tasks and adapt to new situations. Continue reading Meta Develops Computer Vision AI That Learns Like Humans

Apple Eyes AI Video Compression with WaveOne Acquisition

Apple has acquired WaveOne, a Mountain View-based startup that has been developing AI algorithms for video compression. Cupertino has been mum about the purchase, but the deal reportedly closed in January, and WaveOne employees are said to have been absorbed into Apple’s machine learning groups. WaveOne’s codecs use machine learning to squeeze more picture out of less bandwidth, including optimizing for signal interruptions, so the picture doesn’t freeze or disappear, making it ideal for mobile. As Netflix and YouTube tout picture improvements, WaveOne could potentially advantage Apple TV+ and a mixed reality headset. Continue reading Apple Eyes AI Video Compression with WaveOne Acquisition

Pinterest Grows Its Active Users, Focuses on Video Shopping

Pinterest grew Q4 year-over-year revenue by 4 percent, to $877 million, while full year sales jumped 9 percent in 2022 totaling $2.8 billion. The company said that global monthly active users also grew by 4 percent in the three month period ending December 31, to a total of 450 million. CEO Bill Ready emphasized on the earnings call the intent to eventually “make every pin shoppable.” Similar to how it is monetizing still images Pinterest is focusing on making videos “more actionable” by applying what it calls “our computer vision technology.” Continue reading Pinterest Grows Its Active Users, Focuses on Video Shopping

CES: Encoding Environmental Intelligence with New Chip Tech

The design of truly contextual experiences — whether for utility or entertainment — requires a knowledge of both the user and the environment they are in. This becomes especially relevant when we think of what it means to build interesting mixed reality experiences. CES this year showcased a variety of computer vision AI software tools oriented towards understanding environmental context. At Eureka Park in the Venetian, however, MantiSpectra’s chip sensor technology provided a peek into the benefits for user experience enabled by environmental intelligence arising from hardware. Continue reading CES: Encoding Environmental Intelligence with New Chip Tech

Google Search Reinvention Focuses on Visuals and Discovery

Google is the latest tech giant to be swayed by the influence of TikTok and Instagram as it reimagines a more visual, discovery-centric type of search. That was major media’s takeaway from the third annual Google Search On event, which continued the trend of trying to find more intuitive ways to search, namely visually and vocally, by snapping a photo or asking your phone a question. Thanks to advances in artificial intelligence, the Alphabet company says it is “going far beyond the search box to create search experiences that work more like our minds.” Continue reading Google Search Reinvention Focuses on Visuals and Discovery