NASA's Jet Propulsion Laboratory has integrated Meta's DINOv2 vision foundation model into robotic explorers designed for future deep space missions, achieving a 67% reduction in computational parameters while maintaining capability equivalent to larger rovers. The implementation addresses critical constraints facing next-generation planetary exploration robots that must operate autonomously with reduced size, lower power capacity, and limited onboard compute resources.
JPL engineers developed a Visual Perception Engine built on Meta's open-source DINOv2 model, released in April 2023 by Meta Fundamental AI Research. The system enables efficient reuse of vision foundation model features across multiple tasks through a common backbone while minimising feature copies and reducing GPU compute and memory requirements. The framework is available to the open source community on GitHub with a robot operating system interface.
The team tested their approach using the Nebula Spot Robot equipped with an Inertial Measurement Unit and RGB camera sending data to an Nvidia Orin AGX onboard computer. Traditional approaches required separate machine learning models for different tasks, creating computational bottlenecks that prevented real-time operation on limited hardware. The new Visual Perception Engine streamlines processing by retaining DINOv2 extracted features in GPU memory and sharing them across multiple tasks, enabling parallel deployment of smaller model heads.
The system addresses critical robotic functions including depth measurements for mapping, object detection for science objectives, and segmentation capabilities for object interaction. Unlike task-specific models requiring individual vision feature extraction per task, the unified approach increases overall image throughput while decreasing total model parameter count.
JPL has implemented additional learning capabilities enabling robots to adapt to unknown terrain through vision and terrain interaction. Using DINOv2 features combined with robot power usage data, the system learns to identify terrain traversal costs in real time, helping avoid treacherous conditions like smooth, soft sand that could trap rovers. This capability addresses historical challenges including the 2005 incident where Mars Opportunity Rover became stuck in sand, requiring five weeks of Earth-based commands across 108 million miles to resolve.
The technology enables simultaneous deployment of multiple smaller robotic explorers for rapid, multi-planetary exploration while maintaining cost-effectiveness. Communication latencies spanning minutes to hours between Earth and distant spacecraft require autonomous operation with minimal computational resources, making the 67% parameter reduction crucial for mission success.
Organisations developing autonomous robotics for extreme environments can leverage this open-source framework to achieve equivalent capabilities with significantly reduced computational overhead. The dual-purpose technology applies to Earth-based applications including humanitarian rescue efforts in challenging terrain like cave systems. JPL envisions deployment for extraterrestrial life detection missions where caves provide natural radiation shielding for potential organisms, combining space exploration objectives with terrestrial emergency response capabilities.