
Meta is expanding the Segment Anything family with SAM 3D and focusing on 3D reconstruction from natural images. The platform comprises two models: SAM 3D Objects for object and scene reconstruction, and SAM 3D Body for capturing body pose and shape. The goal is to achieve robust 3D perception in real-world environments that is suitable for creative applications in areas such as robotics, interactive media, and sports analysis. This combination of database and models represents a new option for workflows that derive 3D meshes from image data.
SAM 3D Objects addresses known bottlenecks in real-world 3D data. Until now, mainly isolated synthetic assets were available, often a single, high-resolution object against a simple background. Meta now combines a scalable data engine approach with a multi-stage training recipe. Annotators evaluate several mesh variants generated by models, with difficult cases being sent to specialized 3D artists. This results in around 3.14 million meshes with shape, texture, and layout for almost one million images.
The training strategy uses synthetic assets for pre-training and then relies on post-training with physical image data to narrow the gap between simulation and reality. At the same time, SAM 3D Artist Objects (SA-3DAO) is creating an evaluation dataset with challenging everyday motifs that will serve as a benchmark for visually grounded 3D reconstruction. According to Meta, SAM 3D Objects significantly outperforms competing methods, delivering dense, textured reconstructions in seconds, making it suitable for near-real-time scenarios. However, limitations in detail, full-body reconstructions, and understanding physical object relationships are described.
SAM 3D Body aims to achieve precise 3D body pose and shape from single images, even with obscured body parts, unusual postures, or multiple people in the image. The model can be controlled on demand, for example via segmentation masks or 2D keypoints. It is based on the new Meta Momentum Human Rig, which separates skeletal structure and soft tissue. A transformer encoder-decoder processes high-resolution image details and outputs the parameters of the MHR mesh.
For training, Meta compiles a dataset of approximately eight million images, obtained from billions of photos, multi-camera video systems, and synthetic material. An automated data engine process filters out rare poses and difficult conditions. SAM 3D Body is said to outperform existing models on several benchmarks, but still has weaknesses in interactions between people and objects, as well as in hand poses. MHR is available under an open commercial license. Both models can be tested with your own images via the Segment Anything Playground, for example for interactive 3D scenes based on real-world footage.
You can find out more directly from Meta.
Subscribe to our Newsletter
3DPresso is a weekly newsletter that links to the most exciting global stories from the 3D printing and additive manufacturing industry.



















