At Amazon Lab126, researchers proposed three related AI algorithms to create Outfit-VITON, an image-based virtual try-on system for apparel. The algorithms could form the basis of an assistant to help a customer shop for clothes by describing a product’s variations, recommending items that go with the one selected, and synthesizing the image of a model wearing clothes to show how all the items work as an outfit. The algorithms will be presented at the annual IEEE Conference on Computer Vision and Pattern Recognition (CVPR will be held virtually this year, June 14-19).
VentureBeat reports that, “with style recommendations and programs like Prime Wardrobe, which allows users to try on clothes and return what they don’t want to buy, the retailer is vying for a larger slice of sales in a declining apparel market while surfacing products that customers might not normally choose.”
Outfit-VITON “can be trained on a single picture using a generative adversarial network (GAN) … a type of model with a component called a discriminator that learns to distinguish generated items from real images.” The model is comprised of several parts: “a shape generation model whose inputs are a query image, which serves as the template for the final image; and any number of reference images, which depict clothes that will be transferred to the model from the query image.”
The shape representation then “moves to a second model — the appearance generation model — that encodes information about texture and color, producing a representation that’s combined with the shape representation to create a photo of the person wearing the garments.” Its third model “fine-tunes the variables of the appearance generation model to preserve features like logos or distinctive patterns without compromising the silhouette.”
“Our approach generates a geometrically correct segmentation map that alters the shape of the selected reference garments to conform to the target person,” explained the researchers.
Another paper to be presented looks at “the challenge of using text to refine an image that matches a customer-provided query.” That requires the feature to be able to translate the customer’s request for “something more formal” or “change the neck style” to follow the instructions but also preserve some image features.
Models are trained on triplets of inputs: “a source image, a textual revision, and a target image that matches the revision.” The inputs [also] “pass through three different sub-models in parallel, and at distinct points in the pipeline, the representation of the source image is fused with the representation of text before it’s correlated with the representation of the target image.”
The last paper “investigates a technique for large-scale fashion data retrieval, where a system predicts an outfit item’s compatibility with other clothing, wardrobe, and accessory items.” “Being able to recommend compatible items at the right moment would improve their shopping experience,” said the researchers. “Our system is designed for large-scale retrieval and outperforms the state-of-the-art on compatibility prediction, fill-in-the-blank, and outfit complementary item retrieval.”
For additional details, a PDF version of “Image Based Virtual Try-on Network from Unpaired Data” is available online.
Due to the coronavirus pandemic, IEEE’s CVPR will be held virtually next week. Visit the CVPR site for a full list of workshops, tutorials and conference offerings.