Maximilian Sieb

I am a Founding Engineer & Head of AI at an early-stage start-up in San Francisco working on large-language models. Before that, I was working on computer vision and reinforcement & imitation learning for robot manipulation at Covariant.ai.

I graduated from the MSR program at the Robotics Institute at Carnegie Mellon University, where I was glad to have been co-advised by Katerina Fragkiadaki & Oliver Kroemer.
I also got a Bachelors in Mechanical Engineering and a Masters in Computational Engineering from the Technical University Darmstadt.

Google Scholar  /  Twitter  /  GitHub  /  LinkedIn

Research




sosmc

Convolutional Occupancy Models for Dense Packing of Complex, Novel Objects
Nikhil Mishra, Pieter Abbeel, Xi Chen, Maximilian Sieb
IROS, 2023
paper | project page

In this work, we present a fully-convolutional shape completion model, F-CON, which can be easily combined with off-the-shelf planning methods for dense packing in the real world. We also release a simulated dataset, COB-3D-v2, that can be used to train shape completion models for real-word robotics applications, and use it to demonstrate that F-CON outperforms other state-of-the-art shape completion method.

sosmc

Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction
YuXuan Liu, Nikhil Mishra, Maximilian Sieb, Yide Shentu, Pieter Abbeel, Xi Chen
ECCV, 2022
paper | project page

We propose methods for leveraging autoregressive models to make high confidence 3D bounding box predictions, achieving strong results on SUN-RGBD, Scannet, KITTI, and our new dataset, COB-3D. We release this simulated dataset which highlights new types of ambiguity that arise in real-world robotics applications.

sosmc

Embodied Language Grounding with 3D Visual Feature Representations
Mihir Prabhudesai*, Hsiao-Yu Fish Tung*, Syed Ashar Javed*, Maximilian Sieb, Adam W. Harley, Katerina Fragkiadaki
CVPR, 2020
paper | project page

We introduce a computational model of simulation semantics that associate language utterances to 3D visual abstractions of the scene they describe. We encode the visual abstractions via 3-dimensional visual feature maps that we obtain via view prediction from different RGB images of the scene in a self-supervised manner.

sosmc

Graph-Structured Visual Imitation
Maximilian Sieb*, Zhou Xian*, Audrey Huang, Oliver Kroemer, Katerina Fragkiadaki
CoRL, 2019 (spotlight)
paper | project page | slides

We propose a graph-structured state representation for visual imitation learning. We show how we can leverage different visual entities of various granularities to obtain a state representation that can be used for reinforcement learning to learn manipulation skills within a few minutes of real-life policy rollouts.

sosmc

Data Dreaming for Object Detection: Learning Object-Centric State Representations for Visual Imitation
Maximilian Sieb, Katerina Fragkiadaki
International Conference on Humanoid Robots, 2018 (oral)
paper | slides

We show how we can use synthetically generated image data from only a few background-subtracted ground-truth images to build instance-specific object detectors robust to partial occlusions. Further, we demonstrate how we can use these detectors to imitate human demonstrations of manipulation tasks in a sample-efficient manner, where the overall imitation learning process takes less than 10 minutes.

sosmc

Probabilistic Trajectory Segmentation by Means of Hierarchical Dirichlet Process Switching Linear Dynamical Systems
Maximilian Sieb*, Matthias Schultheis*, Sebastian Szelag*, Rudolf Lioutikov, Jan Peters
arXiv, 2018
paper | code

This work is about inferring dynamical modes of a given trajectory in a non-parametric fashion. Simply said, the algorithm tries to fit multiple linear segments within the trajectory where the number of fitted segments does not have to specified, but is inferred as well by using a non-parametric dirichlet prior.

Patents

sosmc

Identifying scene correspondences with neural networks
Maximilian Sieb, Nikhil Mishra, Yan Duan
US patent, 2021
patent page

The idea here is the following: We have a bin from which we wish to pick items. For every pick, we want to avoid picking items which we have failed to pick before. However, because items in the bin might move around between subsequent picks, it is not obvious how to correlate objects across different picks to keep track of the failure count. In this work, we introduce a deep learning based approach to predict correspondences of objects given two scenes.

sosmc

Trajectory optimization using neural networks
Haoran Tang, Xi Chen, Yan Duan, Nikhil Mishra, Philipp Wu, Maximilian Sieb, Yide Shentu
US patent, 2021
patent page

We use use deep neural networks to quickly predict optimized robotic arm trajectories according to certain constraints in pick & place application for industrial robotics.

Selected Projects


sosmc

Visual Imitation Learning for Robot Manipulation
CMU, Master thesis, 2019

In my Master thesis I talk about how we can leverage visual imitation learning for robot manipulation. More specifically, I focus on how we can use state-of-the-art computer vision models and reinforcement learning to imitate an expert trajectory from just a single demonstration.

sosmc

Aligning Word Embeddings of Different Languages within a Shared Embedding Space
CMU, 10-725 Convex Optimization, 2018

Learning translations between different languages has drawn a lot of focus recently. In this work, we examined how we can leverage shared embedding spaces of different languages to learn better translations overall.

sosmc

Design and Analysis of a Ball-Balancing Plate
TU Darmstadt/UIUC, Bachelor thesis, 2016

This project revolves around the design of a ball-balancing plate and the anaylsis & implementation of different control algorithms. The thesis describes the entire design process: The CAD-design of the contraption, the dynamical & mechanical analysis, the control design & analysis, the microcontroller implementation, and experimental verification.


Yet another Jon Barron website.
Last updated July 2022.