Apple Launches Public Demo of Its Multimodal 4M AI Model

Apple has released a public demo of the 4M AI model it developed in collaboration with the Swiss Federal Institute of Technology Lausanne (EPFL). The technology debuts seven months after the model was first open-sourced, allowing informed observers the opportunity to interact with it and assess its capabilities. Apple says 4M was built by applying masked modeling to a single unified Transformer encoder-decoder “across a wide range of input/output modalities — including text, images, geometric and semantic modalities, as well as neural network feature maps.” Continue reading Apple Launches Public Demo of Its Multimodal 4M AI Model

DeepMind’s V2A Generates Music, Sound Effects, Dialogue

Google DeepMind has unveiled new research on AI tech it calls V2A (“video-to-audio”) that can generate soundtracks for videos. The initiative complements the wave of AI video generators from companies ranging from biggies like OpenAI and Alibaba to startups such as Luma and Runway, all of which require a separate app to add sound. V2A technology “makes synchronized audiovisual generation possible” by combining video pixels with natural language text prompts “to generate rich soundscapes for the on-screen action,” DeepMind writes, explaining that it can “create shots with a dramatic score, realistic sound effects or dialogue.” Continue reading DeepMind’s V2A Generates Music, Sound Effects, Dialogue

Apple Unveils Progress in Multimodal Large Language Models

Apple researchers have gone public with new multimodal methods for training large language models using both text and images. The results are said to enable AI systems that are more powerful and flexible, which could have significant ramifications for future Apple products. These new models, which Apple calls MM1, support up to 30 billion parameters. The researchers identify multimodal large language models (MLLMs) as “the next frontier in foundation models,” which exceed the performance of LLMs and “excel at tasks like image captioning, visual question answering and natural language inference.” Continue reading Apple Unveils Progress in Multimodal Large Language Models

Startup Cognition Launches AI Software Coding Engine Devin

Months-old startup Cognition AI has emerged from stealth mode with Devin, a generative platform it is calling “the world’s first fully autonomous AI software engineer.” Although Cognition has yet to make Devin widely available, much less allow independent testing, if its claims are true it would mark a turning point in the AI coding space, moving it from a field of AI assistants to a full-fledged AI engineer. Based on natural language instruction, Devin could potentially take a project from concept to execution rather than simply suggesting code snippets or offering barebones frameworks. Continue reading Startup Cognition Launches AI Software Coding Engine Devin

ElevenLabs Promotes Its Latest Advances in AI Audio Effects

“What if you could describe a sound and generate it with AI?,” asks startup ElevenLabs, which set out to do just that, and says it has succeeded. The two-year-old company explains it “used text prompts like ‘waves crashing,’ ‘metal clanging,’ ‘birds chirping,’ and ‘racing car engine’ to generate audio.” Best known for using machine learning to clone voices, the AI firm founded by Google and Palantir alums has yet to make publicly available its new text-to-sound model but began teasing it by releasing online demos this week. Some see the technology as a natural complement to the latest wave of image generators. Continue reading ElevenLabs Promotes Its Latest Advances in AI Audio Effects

Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

Researchers at Amazon have trained what they are calling the largest text-to-speech model ever created, which they claim is exhibiting “emergent” qualities — the ability to inherently improve itself at speaking complex sentences naturally. Called BASE TTS, for Big Adaptive Streamable TTS with Emergent abilities, the new model could pave the way for more human-like interactions with AI, reports suggest. Trained on 100,000 hours of public domain speech data, BASE TTS offers “state-of-the-art naturalness” in English as well as some German, Dutch and Spanish. Text-to-speech models are used in developing voice assistants for smart devices and apps and accessibility. Continue reading Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

Apple’s Keyframer AI Tool Uses LLMs to Prototype Animation

Apple has taken a novel approach to animation with Keyframer, using large language models to add motion to static images through natural language prompts. “The application of LLMs to animation is underexplored,” Apple researchers say in a paper that describes Keyframer as an “animation prototyping tool.” Based on input from animators and engineers, Keyframer lets users refine their work through “a combination of prompting and direct editing,” the paper explains. The LLM can generate CSS animation code. Users can also use natural language to request design variations. Continue reading Apple’s Keyframer AI Tool Uses LLMs to Prototype Animation

Conversational Chatbot Optimizes Google Ads, Search Results

Google’s multimodal Gemini large language model will offer chat capabilities that help advertisers build and scale Search campaigns within the Google Ads platform using natural language prompts. “We’ve been actively testing Gemini to further enhance our ads solutions, and, we’re pleased to share that Gemini is now powering the conversational experience,” Google said, explaining the functionality is now available in beta to English language advertisers in the U.S., UK and will be rolling out globally to all English language advertisers over the next few weeks, with additional languages offered in the months ahead. Continue reading Conversational Chatbot Optimizes Google Ads, Search Results

CES: Rabbit Launches AI-Powered Pocket Controller for Apps

Santa Monica-based AI startup Rabbit Inc. is offering a virtual assistant in the form of a pocket device that the company says can improve upon mobile phones by learning to use your apps and running them for you. Heavily publicized at CES 2024 in Las Vegas this week, the initial run of the company’s r-1 units had as of Tuesday sold out at $199 each. The retro-looking device with a 2.88-inch touchscreen is continuing to take preorders; shipments are scheduled to begin in late March. The company says its proprietary Rabbit OS is the first operating system built on a Large Action Model (LAM) foundation. LAMs are LLMs trained on datasets of actions and consequences. Continue reading CES: Rabbit Launches AI-Powered Pocket Controller for Apps

SageMaker HyperPod: Amazon Accelerates AI Model Training

Amazon has launched five new capabilities to its SageMaker service, including Sagemaker HyperPod, which accelerates large language and foundation model training and tuning. Sagemaker HyperPod is said to shorten the training time by up to 40 percent using its purpose-built infrastructure designed for distributed training at scale. By optimizing acceleration, SageMaker Inference reduces foundation model deployment costs by 50 percent and latency by 20 percent on average, Amazon claims. “SageMaker HyperPod removes the undifferentiated heavy lifting involved in building and optimizing machine learning infrastructure,” said Amazon. Continue reading SageMaker HyperPod: Amazon Accelerates AI Model Training

Meta Touts Its Emu Foundational Model for Video and Editing

Having made the leap from image generation to video generation over the course of a few months in 2022, Meta Platforms introduces Emu, its first visual foundational model, along with Emu Video and Emu Edit, positioned as milestones in the trek to AI moviemaking. Emu uses just two diffusion models to generate 512×512 four-second long videos at 16 frames per second, Meta said, comparing that to 2022’s Make-A-Video, which requires a “cascade” of five models. Internal research found Emu video generations were “strongly preferred” over the Make-A-Video model based on quality (96 percent) and prompt fidelity (85 percent). Continue reading Meta Touts Its Emu Foundational Model for Video and Editing

Aptos Teams with Microsoft Azure OpenAI on Web3 Solutions

Blockchain startup Aptos Labs will use the Microsoft Azure OpenAI Service to “explore innovative solutions” in blockchain and Web3 for technologies involving artificial intelligence, tokenization and payments. As part of the deal Aptos describes as a “partnership,” the company is launching Aptos Assistant, which will enable natural language prompts, making Web3 applications like smart contracts and decentralized apps more “user-friendly and secure” for “everyday Internet users and organizations” as well as developers. Aptos offers what is known as Layer 1 blockchain, technology designed to facilitate transactions at scale. Continue reading Aptos Teams with Microsoft Azure OpenAI on Web3 Solutions

Wix AI Site Generator Builds Websites Using Only AI Prompts

Global SaaS and website creation platform Wix Ltd. will release an AI Site Generator that allows people to create websites using only natural language artificial intelligence prompts. The generator will include a suite of AI-powered capabilities, many of which Wix is already offering as part of its template-based site-building framework. The package “significantly streamlines the entire website-building, design and management process,” offering automated tools that provide the opportunity for Wix users to “operationalize and grow their businesses with never-before-seen ease,” the company co-founder and CEO Avishai Abrahami said. Continue reading Wix AI Site Generator Builds Websites Using Only AI Prompts

IBM Bows Watsonx Suite of Enterprise AI Products, Services

With artificial intelligence development dating back to the 1950s, IBM was clearly ahead of its time. The company has quietly built a commercial portfolio, with more than 100 million customers across 20 industries using its Watson suite, the company says. At its annual Think conference, the company unboxed IBM Watsonx, a next-generation platform that leverages the scale and scope of foundation models to provide custom solutions for data-driven clients. Described as an “enterprise studio for AI builders,” Watsonx is an end-to-end framework that combines the tools, infrastructure and consulting expertise corporations can use to onboard AI. Continue reading IBM Bows Watsonx Suite of Enterprise AI Products, Services

Walmart Leans into AI, Retools Site to Compete with Amazon

Walmart has rolled out a new online look in a bid to catch up with Amazon, simultaneously advancing its conversational AI capabilities using OpenAI’s GPT-4 and Google’s BERT. Starting last year, generative AI has reportedly been a major initiative of the Arkansas-based retailer in key areas including search, supply chain management and virtual shopping, although it is only now that the company is emphasizing the tools to customers by expanding its offerings like Text to Shop. The text- or voice-activated way to add items to Walmart.com shopping carts is one of nearly two dozen conversational AI experiences at Walmart. Continue reading Walmart Leans into AI, Retools Site to Compete with Amazon