CES: Getty Rolls Out iStock Generative AI Powered by Nvidia

Getty Images and Nvidia are expanding their AI partnership with the addition of the text-to-image platform Generative AI by iStock, designed to produce stock photos that can be used by individuals or enterprise customers. Built on Nvidia Picasso, a foundry for custom AI models, and trained exclusively on data from Getty Images’ proprietary creative libraries, Generative AI by iStock “has been engineered to guard against generations of known products, people, places or other copyrighted elements,” Getty explains, adding that “any licensed visual that a customer generates comes with iStock’s standard $10,000 USD legal coverage.” Continue reading CES: Getty Rolls Out iStock Generative AI Powered by Nvidia

VideoPoet: Google Launches a Multimodal AI Video Generator

Google has unveiled a new large language model designed to advance video generation. VideoPoet is capable of text-to-video, image-to-video, video stylization, video inpainting and outpainting, and video-to-audio. “The leading video generation models are almost exclusively diffusion-based,” Google says, citing Imagen Video as an example. Google finds this counter intuitive, since “LLMs are widely recognized as the de facto standard due to their exceptional learning capabilities across various modalities.” VideoPoet eschews the diffusion approach of relying on separately trained tasks in favor of integrating many video generation capabilities in a single LLM. Continue reading VideoPoet: Google Launches a Multimodal AI Video Generator

Google Debuts Turnkey Gemini AI Studio for Developing Apps

Google is rolling out Gemini to developers, enticing them with tools including AI Studio, an easy-to-navigate Web-based platform that will serve as a portal to the multi-tiered Gemini ecosystem, beginning with Gemini Pro, with Gemini Ultra to come next year. The service aims to allow developers to quickly create prompts and Gemini-powered chatbots, providing access to API keys to integrate them into apps. They’ll also be able to access code, should projects require a full featured IDE. The site is essentially a revamped version of what was formerly Google’s MakerSuite. Continue reading Google Debuts Turnkey Gemini AI Studio for Developing Apps

Microsoft Says Phi-2 Can Outperform Large Language Models

Microsoft is releasing Phi-2, a text-to-text small language model (SLM) that outperforms some LLMs, yet is light enough to run on a mobile device or laptop, according to Microsoft CEO Satya Nadella. The 2.7 billion-parameter SLM beat Meta Platforms’ Llama 2 and Mistral 7B from France (each with 7 billion parameters) says Microsoft, emphasizing its complex reasoning and language comprehension are exceptional for a model with less than 13 billion parameters. For now, Microsoft is making it available “for research purposes only” under a custom license. Continue reading Microsoft Says Phi-2 Can Outperform Large Language Models

EU Makes Provisional Agreement on Artificial Intelligence Act

The EU has reached a provisional agreement on the Artificial Intelligence Act, making it the first Western democracy to establish comprehensive AI regulations. The sweeping new law predominantly focuses on so-called “high-risk AI,” establishing parameters — largely in the form of reporting and third-party monitoring — “based on its potential risks and level of impact.” Parliament and the 27-country European Council must still hold final votes before the AI Act is finalized and goes into effect, but the agreement, reached Friday in Brussels after three days of negotiations, means the main points are set. Continue reading EU Makes Provisional Agreement on Artificial Intelligence Act

Google’s NotebookLM is a Personalized Lite Language Model

Google personalized AI assistant NotebookLM is an experimental product that has been in early access since July. Now the company is integrating its new Gemini Pro LLM with NotebookLM and making it available to U.S. residents 18 and older. NotebookLM is engineered “to help you do your best thinking,” Google says, with documents uploaded to the service making it “an instant expert in the information you need,” allowing it to answer questions about your data. Unlike generic chatbots, NotebookLM draws responses from the documents you feed it, meaning it will be hyper-focused — a lite version of a custom trained model. Continue reading Google’s NotebookLM is a Personalized Lite Language Model

Amazon Previews Titan Image Generator for Bedrock Clients

Amazon is debuting its Titan Image Generator in preview for AWS Bedrock customers. The new Titan generative AI model can create new images from a text prompt or existing image, and automatically adds watermarking to protect intellectual property. The move into generative imaging puts Amazon in competition with a growing field that includes large firms like Adobe and Google. Unlike those companies and others, the e-retail giant is at present focusing exclusively on enterprise customers. Amazon Bedrock is a managed service giving developers access to a range of foundation models from companies including Meta Platforms, Anthropic, and Amazon itself. Continue reading Amazon Previews Titan Image Generator for Bedrock Clients

SageMaker HyperPod: Amazon Accelerates AI Model Training

Amazon has launched five new capabilities to its SageMaker service, including Sagemaker HyperPod, which accelerates large language and foundation model training and tuning. Sagemaker HyperPod is said to shorten the training time by up to 40 percent using its purpose-built infrastructure designed for distributed training at scale. By optimizing acceleration, SageMaker Inference reduces foundation model deployment costs by 50 percent and latency by 20 percent on average, Amazon claims. “SageMaker HyperPod removes the undifferentiated heavy lifting involved in building and optimizing machine learning infrastructure,” said Amazon. Continue reading SageMaker HyperPod: Amazon Accelerates AI Model Training

Stability Introduces GenAI Video Model: Stable Video Diffusion

Stability AI has opened research preview on its first foundation model for generative video, Stable Video Diffusion, offering text-to-video and image-to-video. Based on the company’s Stable Diffusion text-to-image model, the new open-source model generates video by animating existing still frames, including “multi-view synthesis.” While the company plans to enhance and extend the model’s capabilities, it currently comes in two versions: SVD, which transforms stills into 576×1024 videos of 14 frames, and SVD-XT that generates up to 24 frames — each at between three and 30 frames per second. Continue reading Stability Introduces GenAI Video Model: Stable Video Diffusion

Meta Touts Its Emu Foundational Model for Video and Editing

Having made the leap from image generation to video generation over the course of a few months in 2022, Meta Platforms introduces Emu, its first visual foundational model, along with Emu Video and Emu Edit, positioned as milestones in the trek to AI moviemaking. Emu uses just two diffusion models to generate 512×512 four-second long videos at 16 frames per second, Meta said, comparing that to 2022’s Make-A-Video, which requires a “cascade” of five models. Internal research found Emu video generations were “strongly preferred” over the Make-A-Video model based on quality (96 percent) and prompt fidelity (85 percent). Continue reading Meta Touts Its Emu Foundational Model for Video and Editing

Elon Musk’s xAI Rolling Out ‘Grok’ LLM in Early Access Beta

Elon Musk’s startup xAI has unveiled its first product, a large language model with chatbot capabilities named Grok, currently available via an early access waitlist with plans to go wide to Premium+ subscribers to the X social platform (formerly Twitter) following beta tests. The company says Grok has “access to search tools and real-time information” and is extremely up-to-date, but “as with all the LLMs trained on next-token prediction, our model can still generate false or contradictory information.” The chatbot is distinguished by sarcasm and wit, “so please don’t use it if you hate humor,” xAI warns. Continue reading Elon Musk’s xAI Rolling Out ‘Grok’ LLM in Early Access Beta

Woodpecker: Chinese Researchers Combat AI Hallucinations

The University of Science and Technology of China (USTC) and Tencent YouTu Lab have released a research paper on a new framework called Woodpecker, designed to correct hallucinations in multimodal large language AI models. “Hallucination is a big shadow hanging over the rapidly evolving MLLMs,” writes the group, describing the phenomenon as when MLLMs “output descriptions that are inconsistent with the input image.” Solutions to date focus mainly on “instruction-tuning,” a form of retraining that is data and computation intensive. Woodpecker takes a training-free approach that purports to correct hallucinations from the basis of the generated text. Continue reading Woodpecker: Chinese Researchers Combat AI Hallucinations

Nightshade Data Poisoning Tool Targets AI to Protect Artist IP

A new tool called Nightshade offers creators a way to fend off artificial intelligence models attempting to train on visual artwork without permission. Created by a University of Chicago team led by Professor Ben Zhao, Nightshade makes it possible to include an instruction set that can cause AI models to “break” during unauthorized scraping. It does this by inserting “invisible pixels.” As a result, popular AI models including DALL-E, Midjourney and Stable Diffusion will subsequently render erratic results, turning dogs into cats and cars into cows, and so forth. Continue reading Nightshade Data Poisoning Tool Targets AI to Protect Artist IP

Dell Partnering with Nvidia and Starburst for GenAI Solutions

Dell Technologies is expanding its Generative AI Solutions portfolio to help enterprise customers add GenAI to their workflow. The expansion includes support for advanced infrastructure and collaborative data solutions that optimize and help secure intelligence gathering and utilization. Dell takes a “validated design” approach to optimization and acceleration, testing different hardware configurations designed to fit the needs of various use cases. Dell has partnered with Nvidia for validated GenAI design for model customization, and with Starburst on data lakehouse solutions that tap multi-cloud data for AI end-use. Continue reading Dell Partnering with Nvidia and Starburst for GenAI Solutions

DeepMind and Academics Advance General Purpose Robots

“Robots are great specialists, but poor generalists,” according to Google DeepMind, which says models are typically trained for individual tasks, and changing a single variable can mean starting again from scratch. Now the London-based Alphabet subsidiary thinks it’s come up with a way to combine knowledge across robotics for a general purpose machine helper. In conjunction with 33 academic labs, Google DeepMind has pooled data from 22 different robot types to create the Open X-Embodiment dataset. Simultaneously, the group releases the RT-1-X robotics transformer (RT) model derived from RT-1. Continue reading DeepMind and Academics Advance General Purpose Robots