Audio-First Social Platform Airchat Has Successful Relaunch

Airchat is the latest app to take tech leaders in Silicon Valley by storm. Described as a “combination of voice notes and Twitter,” Airchat lets you follow other users and scroll through posts — adding replies, likes and shares — but the twist is the content is generated through audio recordings the app then transcribes. Airchat ranked 27th on the App Store’s social networking chart, even though users must be invited to join. Launched last year by Naval Ravikant, founder of AngelList, and erstwhile Tinder product exec Brian Norgard, Airchat was just relaunched on iOS and Android. Continue reading Audio-First Social Platform Airchat Has Successful Relaunch

Google Offers Public Preview of Gemini Pro for Cloud Clients

Google is moving its most powerful artificial intelligence model, Gemini 1.5 Pro, into public preview for developers and Google Cloud customers. Gemini 1.5 Pro includes what Google claims is a breakthrough in long context understanding, with the ability to run 1 million tokens of information “opening up new possibilities for enterprises to create, discover and build using AI.” Gemini’s multimodal capabilities allow it to process audio, video, text, code and more, which when combined with long context, “enables enterprises to do things that just weren’t possible with AI before,” according to Google. Continue reading Google Offers Public Preview of Gemini Pro for Cloud Clients

OpenAI Integrates New Image Editor for DALL-E into ChatGPT

OpenAI has updated the editor for DALL-E, the artificial intelligence image generator that is part of the ChatGPT premium tiers. The update, based on the DALL-E 3 model, makes it easier for users to adjust their generated images. Shortly after DALL-E 3’s September debut, OpenAI integrated it into ChatGPT, enabling paid subscribers to generate images from text or image prompts. The new DALL-E editor interface lets users edit images “by selecting an area of the image to edit and describing your changes in chat” without using the selection tool. Desired changes can also be prompted “in the conversation panel,” according to OpenAI. Continue reading OpenAI Integrates New Image Editor for DALL-E into ChatGPT

Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Deepgram’s new Aura software turns text into generative audio with a “human-like voice.” The 9-year-old voice recognition company has raised nearly $86 million to date on the strength of its Voice AI platform. Aura is an extremely low-latency text-to-speech voice AI that can be used for voice AI agents, the company says. Paired with Deepgram’s Nova-2 speech-to-text API, developers can use it to “easily (and quickly) exchange real-time information between humans and LLMs to build responsive, high-throughput AI agents and conversational AI applications,” according to Deepgram. Continue reading Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Apple Unveils Progress in Multimodal Large Language Models

Apple researchers have gone public with new multimodal methods for training large language models using both text and images. The results are said to enable AI systems that are more powerful and flexible, which could have significant ramifications for future Apple products. These new models, which Apple calls MM1, support up to 30 billion parameters. The researchers identify multimodal large language models (MLLMs) as “the next frontier in foundation models,” which exceed the performance of LLMs and “excel at tasks like image captioning, visual question answering and natural language inference.” Continue reading Apple Unveils Progress in Multimodal Large Language Models

Grok-1 Architecture Open-Sourced for General Release by xAI

Elon Musk’s xAI has released its Grok chatbot and open-sourced part of the underlying Grok-1 model architecture for any developer or entrepreneur to use for purposes including commercial applications. Musk unveiled Grok in November and announced that it would be publicly released this month. The chatbot itself is available to X social premium members, who can ask the cheeky AI questions and get answers with a snarky attitude inspired by “The Hitchhiker’s Guide to the Galaxy” sci-fi novel. The training for Grok’s foundation LLM is said to include X social posts. Continue reading Grok-1 Architecture Open-Sourced for General Release by xAI

AI Video Startup Haiper Announces Funding and Plans for AGI

London-based AI video startup Haiper has emerged from stealth mode with $13.8 million in seed funding and a platform that generates up to two seconds of HD video from text prompts or images. Founded by alumni from Google DeepMind, TikTok and various academic research labs, Haiper is built around a bespoke foundation model that aims to serve the needs of the creative community while the company pursues a path to artificial general intelligence (AGI). Haiper is offering a free trial of what is currently a web-based user interface similar to offerings from Runway and Pika. Continue reading AI Video Startup Haiper Announces Funding and Plans for AGI

GenAI Lets Snapchat+ Subscribers Create and Share Images

Snapchat+ is rolling out new artificial intelligence features that let subscribers use text prompts to create generative AI images to share with friends. In addition, the Dreams feature, which creates generative AI selfies, is now able to add your friends to those photos. Snapchat+ subscribers get one pack of 8 Dreams per month as part of their $3.99 monthly fee. An onscreen button labeled “AI” lets subscribers access the AI image generator to choose from a menu of prompts (including “sunny day at the beach” and “planet made of cheese”) or they can enter their own descriptions. Continue reading GenAI Lets Snapchat+ Subscribers Create and Share Images

Threads Lets Users Delete Accounts Separate from Instagram

Threads, the Twitter competitor launched in July by Meta Platforms to record-breaking numbers, has added features that make it easier for users to separate their Threads feeds from Instagram and Facebook. Users can now delete their Threads accounts separate from Instagram, something that previously confounded users. Because those signing up for Threads were required to do so either from their existing or a new Instagram account, the two were entwined. Instagram/Threads CEO Adam Mosseri also announced that propagation of Threads posts to Instagram and Facebook can now be turned off, to keep discussions separate. Continue reading Threads Lets Users Delete Accounts Separate from Instagram

Meta’s WhatsApp Launches Voice Chat for Up to 128 People

Meta Platforms-owned instant messaging and VoIP service WhatsApp has updated its Voice Chat feature for mobile so it can now host group calls of up to 128 participants. Voice chats allow WhatsApp users to instantly talk live with members of a group chat while still being able to message within the group. The new feature, which is being compared to a Discord server, is being rolled out globally. The idea is to have the Voice Chat be less disruptive than group calling, which rings-in all group members. Voice chats can be quietly started with an in-chat bubble users tap to join. The updated version will have end-to-end encryption by default. Continue reading Meta’s WhatsApp Launches Voice Chat for Up to 128 People

Woodpecker: Chinese Researchers Combat AI Hallucinations

The University of Science and Technology of China (USTC) and Tencent YouTu Lab have released a research paper on a new framework called Woodpecker, designed to correct hallucinations in multimodal large language AI models. “Hallucination is a big shadow hanging over the rapidly evolving MLLMs,” writes the group, describing the phenomenon as when MLLMs “output descriptions that are inconsistent with the input image.” Solutions to date focus mainly on “instruction-tuning,” a form of retraining that is data and computation intensive. Woodpecker takes a training-free approach that purports to correct hallucinations from the basis of the generated text. Continue reading Woodpecker: Chinese Researchers Combat AI Hallucinations

Yasa-1: Startup Reka Launches New AI Multimodal Assistant

Startup Reka AI is releasing in preview its first artificial intelligence assistant, Yasa-1. The multimodal AI is described as “a language assistant with visual and auditory sensors.” The year-old company says it “trained Yasa-1 from scratch,” including pretraining foundation models “from ground zero,” then aligning them and optimizing to its training and server infrastructures. “Yasa-1 is not just a text assistant, it also understands images, short videos and audio (yes, sounds too),” said Reka AI co-founder and Chief Scientist Yi Tay. Yasa-1 is available via Reka’s APIs and as docker containers for on-site or virtual private cloud deployment. Continue reading Yasa-1: Startup Reka Launches New AI Multimodal Assistant

OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

OpenAI is experimenting with new voice and image capabilities in ChatGPT. According to the company, users can now “speak with ChatGPT and have it talk back,” thanks to an intuitive new interface that, in addition to facilitating voice conversations, will allow users to show ChatGPT an image to discuss. “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it,” OpenAI explains, alternatively suggesting you “snap pictures of your fridge and pantry to figure out what’s for dinner” or have it help with homework based on pictures of a math problem. Continue reading OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

Google Introduces an AI Watermark That Cannot Be Removed

Google DeepMind and Google Cloud have teamed to launch what they claim is an indelible AI watermark tool, which if it works would mark an industry first. Called SynthID, the technique for identifying AI-generated images is being launched in beta. The technology embeds its digital watermark “directly into the pixels of an image, making it imperceptible to the human eye, but detectable for identification,” according to DeepMind. SynthID is being released to a limited number of Google’s Vertex AI customers using Imagen, a Google AI language model that generates photorealistic images. Continue reading Google Introduces an AI Watermark That Cannot Be Removed

Meta’s AudioCraft Turns Words into Music with Generative AI

Meta Platforms is releasing AudioCraft, a generative AI framework that creates “high-quality,” “realistic” audio and music from text prompts. AudioCraft consists of three models: MusicGen, AudioGen and EnCodec, all of which Meta announced it is open-sourcing. Released in June, MusicGen was trained on Meta-owned and licensed music, and generates music from text prompts, while AudioGen, which was trained on public domain samples, generates sound effects (like honking horns and barking dogs) from text prompts. The EnCodec decoder allows “higher quality music generation with fewer artifacts,” according to Meta. Continue reading Meta’s AudioCraft Turns Words into Music with Generative AI