<iframe src="https://www.googletagmanager.com/ns.html?id=GTM-MR2QVMZ" height="0" width="0" style="display:none;visibility:hidden">
Three hands

RoboGlobal Insights

Hate making your bed? Robots are using deep learning and depth sensing for everyday tasks

By Ken Goldberg, William S. Floyd Jr. Distinguished Chair in Engineering, UC Berkeley; ROBO Global Strategic Advisor

Hate making your bed? You’re not alone. According to this article in the Huffington Post, making the bed is one of the most annoying household jobs. My students and I at UC Berkeley’s AUTOLab are exploring how robots might address daily chores like this. Combining depth sensing with deep learning, we are developing methods to enable robots to grasp objects in heaps—and, soon, even to make your bed.

Robots are getting better at assisting human workers. An exciting class of mobile manipulation robots that includes deliverables like the Fetch Robot and the Toyota Human Support Robot has been making headlines for incorporating robotic arms and built-in depth sensing cameras that enable robots to identify, pick up, and carry objects. Investors have taken note. In late July, Fetch Robotics announced it had raised $46M in Series C funding that it plans to use to expand its operations globally, boost production to meet growing demand, and fund new R&D). Toyota is investing $1B into its AI-focused Toyota Research Institute in Silicon Valley and establishing a $100M fund to invest in startups and new robotics technology. In the healthcare space, Johnson & Johnson recently acquired Auris Health for $3.4B, marking its largest investment to date in surgical robotics—a market that Global Market Insights has projected to reach $24B by 2025. The race is on to create robots that can live up to the promise of our imaginations, and one key to bringing those visions to life is technology that can enable robots to perceive and interact with the complex world around them.

Over the past 5 years, deep learning has been helping robots improve perception, often using convolutional neural networks (CNNs). CNNs usually use RGB (red, green, blue) color channels, similar to the type of image taken using a standard digital camera, to capture and analyze visual imagery. But in many tasks, such as driving and grasping, knowing just the colors of an object is limiting. What matters even more is the spatial geometry of the scene. The physical process of manipulation depends not on the object’s color or texture, but on its geometry, pose, size, etc. When you manipulate a pen with your hand, for instance, you can often move it seamlessly without looking at the pen, so long as you have a good understanding of the location and orientation of contact points. When it comes to perception needed to manipulate objects, geometry matters.

Adding depth imagery to the equation

In the past, robotic depth sensing has involved matching pairs of points between aligned color images from two different cameras, and then analyzing the differences between those points to perceive depth. While this stereo-vision approach has been somewhat effective, there is still often substantial error in the resulting depth estimates.

An alternative is to use depth cameras that output single-channel grayscale images that specifically measure depth values from the camera. Depth is also used to ‘filter’ points beyond a certain distance to remove any background noise that can confuse the robot’s ability to perceive an object accurately.

Microsoft had a breakthrough in this technology back in 2012 when it rolled out its Kinect for the X-Box. The low-cost Kinect 3d (depth) camera changed the game (pun intended) by enabling much better body tracking than previous methods.

Another benefit to depth images is that you can synthesize realistic examples effectively from geometric models. My students and I developed the Dexterity Network—or Dex-Net—research project to focus on robot grasping using deep learning and synthetic depth images. Using this method, we were able to ‘train’ an ABB YuMi robot to pick up a broad variety of previously unseen objects. The most striking fact about this achievement: our robot was able to outperform the RGB approaches despite not being trained on any real images.

Combining the power of deep learning and depth imagery, approaches like this can give robots the ability to learn about the world around them in new ways. Ultimately, this powerful combination will enable robots like Fetch to pick and pack objects in warehouses, and Toyota’s Human Support Robot to help with daily chores, including making the bed—a feat that will surely be more than welcome by senior citizens, parents, and teenagers around the world.


For a closer look at our research in this area (including photos of the AUTOLab’s robot making a quarter-scale bed), see Drilling Down on Depth Sensing and Deep Learning by UC Berkeley’s Daniel SeitaJeff MahlerMike DanielczukMatthew Matl, and Ken Goldberg. To learn more about the lab, including links to published papers, visit autolab.berkeley.edu.

Read On