Skip to main content

AI system learns to model how fabrics interact by watching videos

AI modeling system fabrics

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


In a paper published on the preprint server Arxiv.org, researchers at MIT CSAIL, Nvidia, the University of Washington, and the University of Toronto describe an AI system that learns the physical interactions affecting materials like fabric by watching videos. They claim the system can extrapolate to interactions it hasn’t seen before, like those involving multiple shirts and pants, enabling it to make long-term predictions.

Causal understanding is the basis of counterfactual reasoning, or the imagining of possible alternatives to events that have already happened. For example, in an image containing a pair of balls connected to each other by a spring, counterfactual reasoning would entail predicting the ways the spring affects the balls’ interactions.

The researchers’ system — a Visual Causal Discovery Network (V-CDN) — guesses at interactions with three modules: one for visual perception, one for structure inference, and one for dynamics prediction. The perception model is trained to extract certain keypoints (areas of interest) from videos, from which the interference module identifies the variables that govern interactions between pairs of keypoints. Meanwhile, the dynamics module learns to predict the future movements of the keypoints, drawing on a graph neural network created by the inference module.

AI modeling system fabrics

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.
Request an invite

The researchers studied V-CDN in a simulated environment containing fabric of various shapes: shirts, pants, and towels of varying appearances and lengths. They applied forces on the contours of the fabrics to deform them and move them around, with the goal of producing a single model that could handle fabrics of different types and shapes.

AI modeling system fabrics

The results show that V-CDN’s performance increased as it observed more video frames, according to the researchers, correlating with the intuition that more observations provide a better estimate of the variables governing the fabrics’ behaviors. “The model neither assumes access to the ground truth causal graph, nor … the dynamics that describes the effect of the physical interactions,” they wrote. “Instead, it learns to discover the dependency structures and model the causal mechanisms end-to-end from images in an unsupervised way, which we hope can facilitate future studies of more generalizable visual reasoning systems.”

AI modeling system fabrics

The researchers are careful to note that V-CDN doesn’t solve the grand challenge of causal modeling. Rather, they see their work as an initial step toward the broader goal of building physically grounded “visual intelligence” capable of modeling dynamic systems. “We hope to draw people’s attention to this grand challenge and inspire future research on generalizable physically grounded reasoning from visual inputs without domain-specific feature engineering,” they wrote.

VB Daily - get the latest in your inbox

Thanks for subscribing. Check out more VB newsletters here.

An error occured.