Music Industry Considers Impact of AI as New Tools Emerge

Alphabet is developing an AI tool that would let creators generate music in the voice of famous recording artists. Lyor Cohen, global head of music for Google and its YouTube subsidiary, has reportedly been in discussions with music labels for several months about obtaining the rights to use songs by major artists to train an AI model in this manner. The discussions continue, but not without raising concerns in the music business. Meanwhile, other AI tools are already generating new content, but not without facing some resistance. The use of artificial intelligence to generate creative works in the style of others is being hashed out in the courts. Continue reading Music Industry Considers Impact of AI as New Tools Emerge

ChatGPT Goes Multimodal: OpenAI Adds Vision, Voice Ability

OpenAI began previewing vision capabilities for GPT-4 in March, and the company is now starting to roll out the image input and output to users of its popular ChatGPT. The multimodal expansion also includes audio functionality, with OpenAI proclaiming late last month that “ChatGPT can now see, hear and speak.” The upgrade vaults GPT-4 into the multimodal category with what OpenAI is apparently calling GPT-4V (for “Vision,” though equally applicable to “Voice”). “We’re rolling out voice and images in ChatGPT to Plus and Enterprise users,” OpenAI announced. Continue reading ChatGPT Goes Multimodal: OpenAI Adds Vision, Voice Ability

OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

OpenAI is experimenting with new voice and image capabilities in ChatGPT. According to the company, users can now “speak with ChatGPT and have it talk back,” thanks to an intuitive new interface that, in addition to facilitating voice conversations, will allow users to show ChatGPT an image to discuss. “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it,” OpenAI explains, alternatively suggesting you “snap pictures of your fridge and pantry to figure out what’s for dinner” or have it help with homework based on pictures of a math problem. Continue reading OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

Google’s MusicLM AI Can Generate Tunes from Text Prompts

Google is introducing a new artificial intelligence app called MusicLM that creates music in any style or genre based on text prompts and can translate a whistled melody or casually hummed snipped into instrument sounds. TechCrunch calls the technology “impressive” but says the Alphabet company “fearing the risks, has no immediate plans to release it,” in recognition of the controversy surrounding AI models trained using copyrighted material. MusicLM was created using a dataset of 280,000 musical hours, resulting in the ability to generate minutes-long songs of “significant complexity.” Continue reading Google’s MusicLM AI Can Generate Tunes from Text Prompts

CES: Startup Leverages AI to Address Problematic Acoustics

There are a growing number of companies working on technologies that strive to make a person’s voice more intelligible to the listener over speakers, headphones, hearing aids and other consumer audio devices. Augmented Hearing, a Danish startup launched two years ago, is one of the more interesting companies at CES 2023 focusing on this space. The firm’s software-based solution runs on iOS, Windows and other CE operating systems. Their solution could mitigate the current trend of people across all age groups turning on closed captioning because they often find video dialogue difficult to understand. Continue reading CES: Startup Leverages AI to Address Problematic Acoustics

WhatsApp Debuts Communities with End-to-End Encryption

Meta Platforms is globally releasing a major update for WhatsApp called Communities, which doubles the number of group chat members to 1,024, and adds video (and voice) for up to 32. Designed for schools, clubs, churches, the workplace and other organizations, Communities features include support for sub-groups, admin controls and in-chat polls. “We’re aiming to raise the bar for how organizations communicate with a level of privacy and security not found anywhere else,” the company said of the upgrade, stressing end-to-end encryption. In fact, Communities are not publicly discoverable, requiring an invitation. Continue reading WhatsApp Debuts Communities with End-to-End Encryption

Meta Says Its AI-Compressed Audio Codec Beats MP3 by 10x

Meta Platforms says its vision for the metaverse will rely heavily on compression technology “to deliver high-quality, uninterrupted experiences for everyone.” With that in mind, it’s trained its Fundamental AI Research (FAIR) lab on developing “hypercompression” solutions. First up is EnCodec, an audio technology it says compresses at 64 kbps, with no loss in quality, and at 10 times the efficiency of MP3. The EnCodec protocol has the potential to  greatly improve the sound and reliability of speech over low-bandwidth (like when your mobile phone is only getting one bar). It also works for music. Continue reading Meta Says Its AI-Compressed Audio Codec Beats MP3 by 10x

Adobe Study: Most Companies Are Investing in Voice Tech

According to a study released by Adobe this week, nine out of 10 companies are currently investing in voice technologies, including things like voice-based commerce. Of the 401 companies surveyed, just over one-fifth have already released a voice app, while 44 percent plan to release one this year. A total of 88 percent are building apps for both Amazon and Google smart speakers and other voice-enabled devices, while only 39 percent are building for Apple’s iOS ecosystem; even fewer are building for Microsoft’s Cortana or Samsung’s Bixby.

Continue reading Adobe Study: Most Companies Are Investing in Voice Tech

Microsoft Shares Vision For Present and Future Productivity

Microsoft recently invited journalists into its Envisioning Center for a peek into its vision of the future — in particular, its vision of the future of productivity. Inside the center, Microsoft houses some of its prototype work. Journalists witnessed teams working together on giant collaborative screens, meeting in rooms equipped with devices to automatically recognize participants, and doing work at touch-powered desks. The bulk of all provided demonstrations revolved around the use of touch, voice, and augmented reality. This marks a new way forward for Microsoft.

Continue reading Microsoft Shares Vision For Present and Future Productivity

AI Firm Shows Multilingual Translator That Fits in Your Pocket

The iFLYTEK Translator 2.0 is a handheld spoken language translator developed with Chinese AI technology and training. The size of a mobile phone, it can translate between any two of 63 languages and is trained in a number of “professional vocabularies.” The device touts a 5-hour battery life, and at $450, would be a useful and affordable business and personal tool. This Chinese tech also raises some interesting privacy and geopolitical issues. In addition to the upgraded Translator 2.0, the company also announced its iFLYREC Series voice-to-text products, AI Note for recording and transcription, and iFLYOS voice-interaction system at CES. Continue reading AI Firm Shows Multilingual Translator That Fits in Your Pocket

Alexa, Cortana, Watson Execs Discuss Today’s AI Limitations

In what might have been the most popular panel at CES 2018, the executives responsible for three major AI-enabled applications — IBM Watson, Microsoft Cortana and Amazon Alexa — met to dig deep into artificial intelligence today and tomorrow. In a conversation led by Tom’s Guide editorial director Avram Piltch, the three executives stressed that machine learning and AI is nothing new, but, in fact, has been the technology behind long-established activities from recommendations to warehouse robots. Continue reading Alexa, Cortana, Watson Execs Discuss Today’s AI Limitations

Rovi Renames Itself TiVo After Buyout, Launches UX Interface

Rovi has completed its $1.1 billion cash and stock deal to acquire DVR pioneer TiVo and, in an unusual move, announced that it would rename itself after the company it just purchased. The company also unveiled TiVo UX, its new on-screen user experience that integrates programming options from multiple platforms for a seamless search and recommendation interface. The new UI — featuring TiVo’s innovative Prediction tech — is designed to access content from TV and mobile sources quicker and easier, in an effort to “allow every device to become a primary screen for video consumption.” Continue reading Rovi Renames Itself TiVo After Buyout, Launches UX Interface

EU Proposes Regulations for Online Communication Services

The European Union’s executive arm is poised to propose that online communication services such as Microsoft’s Skype and Facebook’s WhatsApp be regulated similarly to telecoms, a move that telecom executives have long advocated as creating a level playing field. Telecoms would actually prefer that the EU repeal regulations on user privacy among other specifics but, in lieu of that, are content to see their industry-specific regulations extended to online communication services, most of which are currently free. Continue reading EU Proposes Regulations for Online Communication Services

New Voice-Powered App Takes On Leading Digital Assistants

Santa Clara-based startup SoundHound has developed a voice-powered digital assistant that could take on early players in the field, including Siri, Google Now and Cortana. Like the others, the Hound app (for iOS and Android) allows users to interact via voice so that it can perform requested tasks. However, Hound claims to be faster and smarter than its competitors. The app has been in beta with 150,000 testers since last summer, and is now publicly available along with new Yelp and Uber partnerships for restaurant info and ride hailing from within the app. Continue reading New Voice-Powered App Takes On Leading Digital Assistants

New Hound App Could Prove Rival to Siri, Cortana, Google Now

As the battle heats up with tech companies over artificial intelligence and digital assistants, SoundHound released an app this week called “Hound” that promises to enhance voice search with its ability to quickly and efficiently handle complex questions. According to Keyvan Mohajer, SoundHound founder and chief exec, Hound has a leg up on the competition since it performs voice recognition and natural-language processing in a single step, as opposed to translating speech to text and then performing a search using that text. Continue reading New Hound App Could Prove Rival to Siri, Cortana, Google Now