OpenAI’s Point-E Offers a New Take on Text-to-3D Modeling

By Paula Parisi
January 3, 2023

In the wake of overwhelming public response to recent offerings DALL-E 2 and ChatGPT, OpenAI this week introduced Point-E, a text-to-3D model generator that is garnering positive feedback. Faster and less resource intensive than comparable systems, it’s still in the early stages and prone to occasional disjointed results but has advanced the proposition. Using a single Nvidia V100 GPU, Point-E can create a 3D model in under two minutes, generating “point clouds” — data sets representing a 3D shape. Point clouds compute more easily than the wire-fame meshes traditionally used to model 3D objects.

While point clouds speed the process, that efficiency comes at the expense of detail in shape and texture. “To get around this limitation, the Point-E team trained an additional AI system to convert Point-E’s point clouds to meshes,” TechCrunch explains.

In addition to the mesh-generating model, Point-E synthesizes a text-to-image model and also draws on an image-to-3D model. “Currently, our pipeline requires synthetic renderings, but this limitation could be lifted in the future by training 3D generators that condition on real-world images,” the Point-E developers wrote in a paper published through Cornell University.

Conceding their method produces shapes “at a relatively low resolution,” the team concludes that with further work, “extending this method to produce high-quality 3D representations such as meshes or NeRFs” could result in a more polished look.

The concept of AI-powered text-to-3D model generators is tantalizing to the entertainment industry as a stepping stone to 3D-video image generators that could reduce the cost of high-end 3D animation. After researching the space, Google announced DreamFusion, announced in October. That and similar platforms offer more detail but are slower and require more compute power.

Text-to-2D image generators like DALL-E 2 and Stability AI’s Stable Diffusion were trained on labeled images to connect associations between words and visual concepts. While text-to-3D evolved from that research, Point-E “was fed a set of images paired with 3D objects so that it learned to effectively translate between the two,” TechCrunch reports. The code is posted on GitHub.

“Creating photorealistic 3D images is still a resource and time consuming process, despite Nvidia’s work to automate object generation and Epic Games’ RealityCapture mobile app, which allows anyone with an iOS phone to scan real-world objects as 3D images,” Engadget reports.

Nvidia’s Magic3D is dissected by Ars Technica, which also lists “rudimentary text-to-video generators from Google and Meta” as advancing the cause. Disney’s recently unveiled FRAN also leverages some of the same concepts.

In addition to eventually being used for game development and animation workflows, “Point-E’s point clouds could be used to fabricate real-world objects, for example through 3D printing,” TechCrunch says.

Related:
What to Expect from AI in 2023, TechCrunch, 12/26/22
AI Creation Tools Will Change the Way We Create, Engage and Interact in 2023, Social Media Today, 12/20/22
There’s Now an Open Source Alternative to ChatGPT, but Good Luck Running It, TechCrunch, 12/30/22
AI-Assisted Plagiarism? ChatGPT Bot Says It Has an Answer for That, The Guardian, 12/31/22
How the ChatGPT Watermark Works and Why It Could Be Defeated, Search Engine Journal, 12/30/22

OpenAI’s Point-E Offers a New Take on Text-to-3D Modeling

No Comments Yet

Leave a comment