New AI-Based Google System Converts Webpages to Video

Google announced it has developed URL2Video, an AI-enabled system that automatically converts webpages into short videos by extracting text and images. The system also harvests design styles such as colors, fonts, graphics and layouts from HTML sources and organizes all the elements into a sequence of shots that looks and feels similar to the original webpage. Google is targeting businesses with websites for their products and services, enabling them to easily create marketing videos out of existing resources.

VentureBeat reports that, “a typical video costs between $880 and $1,200 and can take days to weeks to produce,” but Google’s AI-based offering could simplify the process and reduce cost. Google, which first presented URL2Video at the 2020 User Interface Software and Technology Symposium, said the system “automatically selects key content from a page and decides the temporal and visual presentation of each asset.”

It is based on “a set of heuristics identified through a study with designers … and [captures] video editing styles including content hierarchy, constraining the amount of information in a shot and its time duration while providing consistent color and style for branding.”

URL2Video, relying on raw assets as well as HTML tags, CSS styles and rendered locations, “extracts document object model information and multimedia materials on a per-webpage basis, identifying visually distinguishable elements as a candidate list of asset groups containing headings, product images, descriptions, and call-to-action buttons.” It then “ranks the asset groups by assigning each a priority score based on their visual appearance and annotations, with the asset group that occupies a larger area at the top of the page [receiving] a higher score.”

To keep the video concise, the system “presents only dominant elements from a page, such as a headline and a few multimedia assets, and constrains the duration of elements.” URL2Video “transfers the layout of elements into the video’s aspect ratio and applies the style choices including fonts and colors, adjusting the presentation timing of assets and rendering the content into an MPEG-4 video.”

The system was tested with designers at Google and it “effectively extracted elements from a webpage and supported the designers by bootstrapping the video creation process.” Google research scientists Peggy Chi and Irfan Essa wrote that, “while this current research focuses on the visual presentation, we are developing new techniques that support the audio track and a voiceover in video editing.”

“We envision a future where creators focus on making high-level decisions and an ML model interactively suggests detailed temporal and graphical edits for a final video creation on multiple platforms,” they added.

A link to Google’s white paper can be found here.

Related:
Google Details How It’s Using AI and Machine Learning to Improve Search, VentureBeat, 10/15/20
Google Search Is Getting New AI Tools to Decipher Your Terrible Spelling, The Verge, 10/15/20
Google AI Executive Sees a World of Trillions of Devices Untethered From Human Care, ZDNet, 10/29/20