Artificial Intelligence – Predicting the behavior of others on the road

A robot must be able predict the actions of nearby pedestrians and cyclists if it is to safely navigate a vehicle through Boston.

It is difficult to predict behavior. Current artificial intelligence solutions may be too simple (they might assume that pedestrians walk in straight lines), too conservative (to avoid walking, the robot simply parks the car in the garage), or too limited (roads often carry multiple users simultaneously).

Researchers at MIT have found a simple way to solve this complex problem. The researchers break down a multiagent prediction problem and tackle each one separately, so that a computer can solve the complex task in real time.

The behavior-prediction framework uses these relationships to predict the future for multiple agents. It first determines which road user (pedestrian, cyclist, car) has right of way and which agent will yield.

These trajectories were better than any other machine-learning models and were comparable to real traffic flow from a huge dataset collected by Waymo, an autonomous driving company. Waymo’s model was even better than the one developed by MIT. Their technique required less memory because they broke down the problem into smaller pieces.

“This is an intuitive idea that no one has ever fully explored, but it works very well. It’s a great advantage that it is so simple. Our model is compared with models from other companies in the field. We found that our model performs well against this demanding benchmark. This model has a lot of potential,” co-lead author Xin Cyrus Huang, a graduate student at the Department of Aeronautics and Astronautics and a research associate in the lab of Brian Williams (professor of aeronautics and an astronautics) and a member of Computer Science and Artificial Intelligence Laboratory.

Huang and Williams are joined by three researchers from Tsinghua University, China. They are co-lead authors Qiao sun, a research assistant, Junru Gu and Hang Zhao PhD’19, an assistant professor. The paper will be presented at Conference on Computer Vision and Pattern Recognition.

Multiple small models

M2I is a machine-learning algorithm used by researchers. It takes two inputs: the past trajectories and configurations of pedestrians and cyclists as well as street maps with street addresses and lane configurations.

A relation predictor uses this information to determine which agent has the right-of-way first. It classifies one agent as a passer and another as a yielder. A prediction model known as a marginal predictor then guesses the trajectory of the passing agent since each agent behaves independently.

The second prediction model, also known as a conditional forecastor, predicts the behavior of the yielding agent based on the actions taken by the passing agent. The system can predict a variety of trajectories from the passer and yielder, calculates each probability individually, then selects six of the most likely to occur.

M2I gives a prediction of the agent’s movements through traffic over the next eight seconds. One example is that their method caused a vehicle slow down to allow pedestrians to cross the street and then speed up once they clear the intersection. Another example was that the vehicle waited until several cars passed before turning left from a side street onto the main road.

This initial research is focused on interactions between agents. M2I can infer relationships between agents and then predict their trajectory by linking multiple conditional and marginal predictors.

Driving tests in the real world

Researchers trained the models with the Waymo Open Motion Dataset. This dataset contains millions of real traffic scenes that include pedestrians and cyclists. These were recorded using lidar (light detection & ranging) sensors, cameras, and cameras mounted to the company’s vehicles. They specialized in cases involving multiple agents.

They compared six of each method’s prediction samples (weighted by their confidence levels) to the actual trajectories following the cyclists, pedestrians and cars in a given scene. Their method was the best. The M2I model also performed better than the baseline models in a measure called overlap rate. If two trajectories intersect, it is a collision. The overlap rate for M2I was the lowest.

We chose to build a model that was more complicated than the problem. Instead, we used a method that is closer to how humans think about their interactions with other people. Humans don’t think about all the possible future behavior combinations. We make decisions quite fast,” Huang says.