Machine learning models gather momentum

A fast, accurate and more rigorous method of training algorithms in reinforcement learning tasks has been developed by KAUST researchers. Improving on current best-practice optimization methods, the new approach has broad applications, from training robots to walk, to developing video games that can outsmart players.

Similar to the learning process in animals and humans, reinforcement learning works by providing reward feedback when an algorithm is close to achieving a specified goal. The closer the algorithm gets to performing a task correctly, the more it is rewarded.

While many machine learning methods, such as stochastic gradient descent, use derivative information to plot the fastest path to convergence, this information is not available in reinforcement learning. This makes it more challenging to train algorithms.

As a result, reinforcement learning methods tend to rely on guesswork. “Much of the work on algorithms in this area are heuristics; there is no mathematical guarantee that they will work,” says Peter Richtárik, computer scientist at KAUST’s Visual Computing Center.

To tackle this issue, Richtárik developed a derivative-free algorithm that was able to reach an optimal value using just one parameter¹. Known as simplified direct search (SDS), the algorithm provided a quicker and more reliable alternative to the standard direct search method.

“It was the best approach at the time, but we have since improved it,” says Richtárik.

With computer scientist, El Houcine Bergou, Richtárik applied randomness to the SDS algorithm, which narrowed the process further. The modified approach, called the stochastic three points method, pinpoints the best objective function for proceeding to the next step in the training process².

“This method is extremely simple to use in practice,” says Bergou. “Its analysis is also simple compared to state-of-the-art similar methods,” says Bergou.

Further developing the process with Ph.D. student, Adel Bibi, Richtárik and Bergou added importance sampling into the mix, which enables machine learning models to select the data points with the best probabilities.

When the team used this approach to teach simulated humanoid, ant and cheetah robots to walk in a reinforced learning scenario, they found that their method outperformed leading standard practices³.

“The proposed algorithm enjoys theoretical convergence rates that are the best we know of in the derivative-free optimization literature,” says Bibi.

The researchers also applied momentum, another technique that fast tracks convergence, to the stochastic three points method with importance sampling to control the movements of the virtual robots⁴.

While momentum is most often used in deep learning, this was the first instance where it was used in derivative-free optimization.

“We did the analysis and showed that it works mathematically,” says Richtárik. “It’s better than any other robot training approaches that have been proposed.”

The next step for Richtárik and his colleagues is to apply these new optimization methods to other reinforcement learning tasks, such as training real robots and drones.

“We like doing stuff where we can have guarantees and it works in practice,” says Richtárik. “That’s really the jackpot.”

References

Konečný, J. & Richtárik, P. Simple complexity analysis of simplified direct search. arXiv:1410.0390 (2014).| article
Bergou, E.H., Gorbunov, E. & Richtárik, P. Stochastic three points method for unconstrained smooth minimization. arXiv: 1902.03591(2019).| article
Bibi, A., Bergou, E., Sener, O., Ghanem, B. & Richtárik, P. A stochastic derivative-free optimization method with importance sampling: Theory and learning to control. arXiv: 1902.01272 (2019).| article
Gorbunov, E., Bibi, A., Sener, O., Bergou, E.& Richtárik, P. A stochastic derivative free optimization method with momentum. arXiv:1905.13278 (2019).| article

ABOUT THE AUTHOR

Peter Richtárik

Professor

Peter joined KAUST in 2017 to continue pursuing his interests, which lie at the intersection of mathematics, computer science, machine learning, optimization, numerical linear algebra, high-performance computing and applied probability.