FlipNet: Fourier Lipschitz Smooth Policy Network for Reinforcement Learning

Abstract

Deep reinforcement learning (RL) is an effective method for decision-making and control tasks. However, RL-trained policies encounter the action fluctuation problem, where consecutive actions significantly differ despite minor variations in adjacent states. This problem results in actuators' wear, safety risk, and performance reduction in real-world applications. To address the problem, we identify the two fundamental reasons causing action fluctuation, i.e. policy non-smoothness and observation noise, then propose the Fourier Lipschitz Smooth Policy Network (FlipNet). FlipNet adopts two innovative techniques to tackle the two reasons in a decoupled manner. Firstly, we prove the Jacobian norm is an approximation of Lipschitz constant and introduce a Jacobian regularization technique to enhance the smoothness of policy network. Secondly, we introduce a Fourier filter layer to deal with observation noise. The filter layer includes a trainable filter matrix that can automatically extract important observation frequencies and suppress noise frequencies. FlipNet can be seamlessly integrated into most existing RL algorithms as an actor network. Simulated tasks on DMControl and a real-world experiment on vehicle-robot driving show that FlipeNet has excellent action smoothness and noise robustness, achieving a new state-of-the-art performance. The code and videos are publicly available at https://iclr-anonymous-2025.github.io/FlipNet .

TL;DR

The paper propose FlipNet, a policy network incorporating a Jacobian regularization and a Fourier filter layer. It can be used as policy network in most actor-critic RL algorithms to obtain smoother control action in real-world applications.

Reasons Identification of Action Fluctuation

Our paper identifies the two fundamental reasons that causes action fluctuation:

Non-smoothness of policy network
Existence of observation noise

Our paper proposes two techniques to address the two fundamental reasons causing action fluctuation, respectively.

Policy Network Structure

There are two techniques incorporated in FlipNet:

Jacobian regularization
Fourier filter layer

The two techniques respectively tackle the two fundamental reasons causing action fluctuation.

FlipNet is applicable in almost all actor-critic RL algorithms, including DDPG, TD3, PPO, TRPO, SAC, and DSAC, etc. FlipNet produces smooth control actions, which contributes to the applications of RL in the real world.

Mini-Vehicle Driving Task

This is a real-world application on the vehicle-robot driving task. The vehicle is controlled by the policy network, i.e. FlipNet, trained by RL.

Click to watch the video:

User-friendly Packaging

The user-friendly packaging of FlipNet does not disturb original RL algorithm, allowing application in various RL algorithms. Practitioners can use FlipNet just like an MLP. The code will be released after review.