
Job Description
About the Role
We are looking for a Data Engineer to build and operate the backbone of our robotics data infrastructure. In this role, you will design and maintain scalable data pipelines that collect, process, and store large volumes of multimodal data generated from robots at the edge.
You will work closely with cross-functional teams including Vision, Conversation AI, and Robotics Engineering to ensure high-quality data flows into centralized systems for training, analysis, and intelligent querying.
Key Responsibilities
- Build and Maintain Data Pipelines
Design and implement end-to-end data pipelines that ingest processed data from edge devices (robots) and deliver it to centralized storage and processing systems.
Ensure reliable, scalable, and efficient data flow across different layers of the system architecture.
- Manage Knowledge Databases
Deploy and optimize vector databases and graph databases to manage metadata and vectorized multimodal data (audio, text, video).
Enable efficient and intelligent data retrieval for downstream AI systems.
- Ensure Data Quality
Collaborate with internal teams (e.g., Conversation AI, Vision) to implement high-quality data filtering and distillation pipelines.
Support the development of robust processes for large-scale data processing and refinement.
- Security and Monitoring
Implement access control, monitoring, and alerting systems to ensure secure and stable data operations across multiple sites.
Monitor pipeline health and system performance to maintain reliability.
Your Skills and ExperienceTechnical Requirements
- Programming
Strong proficiency in Python and SQL for building and maintaining automated data pipelines.
- Cloud Infrastructure
Hands-on experience with AWS, particularly EC2 and S3, including compute and storage resource management.
- Databases
Experience with Vector Databases such as Qdrant, Pinecone, or similar technologies.
Familiarity with handling multimodal data (e.g., video, LiDAR, robot state data).
Experience with mCAP or similar robotics data formats is a plus.
- Systems & Infrastructure
Solid understanding of distributed systems, large-scale data processing, and data synchronization mechanisms.
Expected Outcomes
Build a complete data pipeline infrastructure connecting cloud databases, processing servers, and local storage.
Enable:
Large-scale data movement with fast retrieval
Efficient data querying and visualization
Successfully implement the data distillation pipeline to produce high-quality datasets for downstream AI systems.
Why You'll Love Working Here- Competitive Compensation
- World-Class Team in Humanoid Robotics
- Cutting-Edge Humanoid Robot Products
Benefits
- Competitive Compensation
- World-Class Team in Humanoid Robotics
- Cutting-Edge Humanoid Robot Products