Training a dog and training a robot aren’t so different

For AI and robotics researcher Josiah Hanna and his lab, high throughput computing is a critical tool in reinforcement learning.

Artificial intelligence (AI) robotics expert Josiah Hanna’s research has a lot in common with training dogs: Both robotics training and dog training use a type of reinforcement learning to encourage the desired behavior. With computers or robots, however, this type of reinforcement learning is a branch of machine learning (ML) that models an intelligent agent interacting with a task environment.

Comparing robotic reinforcement learning to training a dog how to sit, Hanna explains that “you don’t explicitly tell the dog how to sit, but you coax the dog into sitting, and when it shows that behavior, you reward that. Over time, the robot dog learns these are the actions that lead to getting the reward, and it learns to avoid actions that don’t lead to the reward. We want to give computers and robots the ability to learn through experience, by seeing what works and what leads to them achieving the goals we set for them. Then, when they see the actions that lead to reaching their goals, they know that they should do that again in the future.”

In other words, Hanna’s research specifically seeks to develop algorithms that enable computers to learn goal-oriented behavior in order to better accomplish their goals. Unlike a dog, robots aren’t necessarily rewarded but instead learn from past mistakes and take that information to determine what a successful action is. Through trial and error, the agent learns which actions it needs to take to achieve its goals. “It’s critical that they’re [computers] able to learn through their experience. That’s what my research and the whole field of reinforcement learning studies — the kinds of algorithms which will enable this to happen,” Hanna elaborates.

Another way that UW–Madison Computer Sciences Ph.D. student Nicholas Corrado describes it is like teaching a robot how to walk. Initially, the robot moves its legs randomly and likely falls over. Through trial and error, however, the robot eventually discovers that it can make forward progress by moving its legs to take only a single step forward. Wanting to maximize its forward progress, the robot then increases the probability of executing this stepping behavior and eventually learns how to walk. “It requires a lot of computing to do this because starting from random movements, and getting to walking behavior is not super straightforward,” Corrado elaborates.

Unlike other types of ML that are classification-based, a lot of reinforcement learning relies on simulations because it’s based on modeling agents performing some task. The difference between other areas of ML and reinforcement learning, Corrado explains, is that with reinforcement learning, “You have this multi-step decision-making process that you must learn how to solve optimally. It’s so much harder because the agent needs to learn how its action right now affects its performance way down the road, so reinforcement learning feels like a much harder problem to focus on than what we call supervised learning methods.”

Since learning on physical robots is difficult, Hanna’s lab will sometimes use simulations as a “surrogate” for physical robots. This is where high throughput computing (HTC) becomes a valuable tool. Hanna shares that “it’s really useful to have high throughput computing so you can run your simulation or learning algorithm for many different processes. You can see how different learning algorithms or different parameters for learning algorithms affect the ability of an algorithm to produce robust behavior or high-performing behavior.” In this sense, the Center for High Throughput Computing (CHTC) is a “huge resource” for Hanna’s students who evaluate a wide variety of different algorithms they think might work better than previous ones. It’s a great enabler of increasing experimentation bandwidth, or how many experiments they can run. In fact, for the Hanna Lab, its CHTC usage is nearly 5.7 million hours.

One project the Hanna lab is working on is enabling robots to learn to play soccer, Corrado says. With reinforcement learning, researchers programmed robots to play soccer and then entered an annual international competition where they placed third despite it being their first time participating, “greatly exceeding our expectations,” Corrado highlights. The end goal isn’t necessarily to train robots how to play soccer but rather “develop reinforcement learning techniques that enable us to train agents to work cooperatively” and “develop techniques that improve the data efficiency of reinforcement learning. If we can reduce the data requirement, reinforcement learning is going to be much, much more practical for industrial applications.”

From the annual RoboCup
Standard Platform League (SPL) competition, a research competition that aims to advance the capabilities of robotics in challenging, real-time domains.
From the annual RoboCup Standard Platform League (SPL) competition, a research competition that aims to advance the capabilities of robotics in challenging, real-time domains.


Even before Hanna came to UW–Madison, he had experience with HTCondor Software Suite (HTCSS) from graduate school. It was a “critical resource” for Hanna then and remains as such today in his role as a researcher and professor at UW–Madison. “One of the first things I did when I got here [UW–Madison] was tap into HTC resources,” Hanna recalls. As a new principal investigator (PI), Hanna also had a meeting with a CHTC facilitator to learn how to obtain access and what resources it provides.

Since he found the tool so valuable while he was a graduate student, Hanna also tries to set up his students with the CHTC early on instead of running experiments locally on their computers. Hanna shares “It’s a great resource we have to leverage that helps speed things up.” For the research group, running a high volume of simulations and experiments is a key enabler of progress. This means Hanna encourages his students to run experiments whenever they reach uncertainties, which can help provide clarity. “Oftentimes it’s just easier to run the experiment. Something I try to guide the students on is knowing when some experiments just need to be run to understand some aspect of designing reinforcement learning algorithms.” His students are developing their own pipelines with CHTC, learning how to work more efficiently with it, and writing scripts to launch experiments with it.

To put into context exactly how many experiments reinforcement learning requires, Corrado says, “Benchmarks contain anywhere from 5–10 tasks, and maybe you need to compare four different algorithms and run 20 independent runs of each algorithm on each task. At that point, you’re running hundreds of experiments. I’ve even had to run thousands of experiments.” In fact, for a paper currently under review, through performing a hyperparameter sweep of an algorithm — which determines the hyperparameter combination that performs best out of many combinations — Corrado had submitted enough jobs to hit the default CHTC limit of a 10,000-job submission. This was something he definitely could not have accomplished on his personal laptop or with a lab-specific server.

Hanna says he is also seeing a shift toward more high-performance computing with GPUs in his lab, which CHTC has helped enable. “Up until recently, reinforcement learning was separate from other forms of deep learning that were going on, and you really couldn’t benefit that much from a GPU unless you had a lot of CPUs as well, which is what high throughput computing is really good for,” Hanna explains.

When asked about the future use of CHTC in his lab, Hanna imagines spending more time with multi-processing and networking several CPUs together, both of which reinforcement learning experiments could benefit from. As CHTC continues increasing its GPU capacity, Hanna says he plans to use that more in their work as well.

Without the CHTC, the type of large-scale experimentation the Hanna Lab uses would be impractical, Corrado says. For this type of work, HTC is almost always necessary and continues to expand the horizons of the lab.