Home robot cuts planning time in half with AI | Massachusetts Institute of Technology News

Applications of AI


A new domestic robot is delivered to your house and you ask it to make you coffee. You know some basic skills from previous practice in the simulated kitchen, but there are too many actions you can perform, such as turning on the faucet, flushing the toilet, and emptying the flour container. However, there are a small number of actions that can help. How does the robot decide what steps are wise in a new situation?

We may use PIGINet, a new system aimed at efficiently enhancing the problem-solving abilities of domestic robots. Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are using machine learning to reduce the typical iterative process of task planning considering all possible actions. PIGINet eliminates task plans that fail to meet the collision-free requirement, reducing planning time by 50-80 percent when trained with only 300-500 problems.

Robots typically try different task plans and iteratively adjust their movements until a viable solution is found, which is inefficient and time consuming, especially when there are mobile or articulated obstacles. There is a possibility. For example, you may want all your sauces to be in the cabinet after cooking. This problem can take anywhere from 2 to 8 steps, depending on what the world looks like at the moment. Does the robot need to open multiple cabinet doors, or are there obstacles that need to be moved inside the cabinet to make room? You don’t want your robot to be annoyingly slow. If you burn your dinner while you’re thinking about it, it’s even worse.

Domestic robots are usually thought of as performing tasks according to predefined recipes, which are not always suitable for diverse and changing environments. So how does PIGINet get around these pre-defined rules? A neural network that predicts the probability of improvement. In a nutshell, it employs a versatile, state-of-the-art model of trans-encoders designed to work with data sequences. The input sequence in this case is information about the task plan under consideration, a picture of the environment, the initial state and the desired target symbolic encoding. The encoder combines task plans, images, and text to generate predictions about the feasibility of the selected task plan.

With things left in the kitchen, the team created hundreds of simulated environments. Each contains a different layout and specific tasks that require you to rearrange objects between counters, refrigerators, cabinets, sinks and cooking pots. We compared PIGINet to previous approaches by measuring time to problem resolution. One correct work plan is to open the left side door of the refrigerator, remove the lid from the pot, move the cabbage from the pot to the refrigerator, move the potatoes to the refrigerator, pick up the jar from the sink, place the jar in the sink, Picking up bottles, etc. Put tomatoes, or tomatoes. PIGINet significantly reduced planning time by 80% for simple scenarios with long planning sequences and little training data, and by 20-50% for more complex scenarios.

“Systems like PIGINet harness the power of data-driven techniques to efficiently handle common cases, but they also use first-principles to validate learning-based suggestions to solve novel problems. ” Planning techniques can also be relied upon, offering the best of both worlds. MIT professor and his CSAIL principal investigator Leslie Pack Kelbling said:

PIGINet’s use of multimodal embeddings in the input sequence allows it to better represent and understand complex geometric relationships. Image data enables models to understand spatial layout and object composition for accurate collision checking without knowing the object’s 3D mesh, enabling faster decision making in a variety of environments. became.

One of the major challenges we faced while developing PIGINet was the lack of suitable training data. This is because all viable and infeasible plans have to be generated by traditional planners, which takes time to begin with. However, by using a pre-trained vision language model and data augmentation tricks, the team was able to meet this challenge, solving not only the problem of visible objects, but also zero-shotting of hitherto unseen objects. The generalization also saved us a lot of planning time.

“Every home is different, so the robot needs to be an adaptive problem solver, not just a recipe-follower. Using learning models to select promising ones, the result is practical home robots that are more efficient, more adaptable, and capable of agile navigation in complex and dynamic environments. , the practical application of PIGINet is not confined to the home,” said Zhutian Yang, Ph.D. student at MIT CSAIL and lead author of the study. “Our future goal is to further improve PIGINet to suggest alternative task plans after identifying infeasible actions. It makes the generation of actionable task plans even faster, without needing to.” This could revolutionize the way robots are trained during development and then applied to every home. we believe ”

“This paper addresses a fundamental challenge in implementing general-purpose robots: to speed up the decision-making process in unstructured environments filled with many mobile and articulated obstacles. It’s about how to learn from experience,” said Beomjoon Kim PhD ’20. , Associate professor at the Korea Advanced Institute of Science and Technology (KAIST). “The central bottleneck in problems like this is how to determine the high-level task plan such that there is a low-level action plan that realizes the high-level plan. It is necessary to go back and forth between planning and task planning, which leads to significant computational inefficiencies.Zhutian’s work addresses this problem by using learning to eliminate infeasible task plans. It is a step in a promising direction.”

Yang authored this paper with NVIDIA research scientist Caelan Garrett. SB ’15, MEng ’15, PhD ’21. MIT Professor of Electrical Engineering and Computer Science, CSAIL members Tomas Lozano Perez and Leslie Kerbling. Senior Director of Robotics Research at NVIDIA and Dieter at the University of Washington He is Professor Fox. The team was supported by grants from AI Singapore and the National Science Foundation, Air Force Office of Scientific Research, and Army Research Office. This project was partially implemented when Yang was an intern at his NVIDIA Research. Their work will be presented at the conference Robotics: Science and Systems in July.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *