A research paper by students from Shanghai Jiao Tong University Global College (SJTUGC, abbreviated as GC hereafter) has recently been accepted by the International Conference on Machine Learning (ICML), one of the world’s premier conferences in machine learning. Titled “Secure Multi-agent Reinforcement Learning for Service Systems with Affinity and Byzantine Nodes: Stability Analysis and Protection Design.”  The paper was co-first authored by GC undergraduate student Yifan Jiang and doctoral student Jiasheng Pan, under the supervision of Associate Professor Li Jin. The research was supported by the National Natural Science Foundation of China, with collaboration from Mengtian Li of Shanghai University.

ICML is widely recognized as one of the most influential conferences in the field of machine learning, dedicated to advancing frontier research in artificial intelligence. The conference is renowned for presenting cutting-edge studies across machine learning, statistics, and data science, with applications spanning computer vision, computational biology, speech recognition, robotics, and other key domains.

The study addresses the challenge of “Byzantine attacks,” a concept derived from the classic Byzantine Generals Problem in computer science. In the original problem, multiple generals must coordinate an attack through messengers, while some traitors deliberately send conflicting information to disrupt consensus. In modern AI systems, Byzantine attacks manifest as compromised or malfunctioning nodes that intentionally transmit misleading gradients or parameters during collaborative learning. Such “internal corruption” is particularly difficult to detect because it directly undermines the communication channels upon which intelligent agents rely for cooperation.

The research focuses on three representative scenarios: large language model (LLM) inference routing, edge computing scheduling, and intelligent manufacturing logistics. These systems share two important characteristics. First, task queues can grow without predefined limits. Second, tasks and servers often exhibit implicit “affinity” relationships. For example, a GPU that has cached a certain type of request can process similar tasks more efficiently, while mobile robots repeatedly serving the same warehouse can reduce switching costs. Once these affinity structures are maliciously disrupted, system efficiency deteriorates rapidly and overall stability may collapse.

To simultaneously defend against malicious nodes while maintaining system stability and efficiency, the team designed two complementary algorithmic mechanisms. The first is synchronous policy mixing, in which each intelligent agent combines a learned strategy with a predefined “safe strategy,” such as always assigning tasks to the server with the shortest queue. As the system becomes riskier, the weight of the safe strategy increases, functioning like a “safety airbag” for the learning process. The second mechanism is W-MSR (Weighted Mean Subsequence Reduced) resilient consensus. After receiving values from neighboring nodes, each node ranks the inputs, discards the highest and lowest values, and updates itself using only the average of the remaining data, effectively filtering out malicious interference. Working together, the two mechanisms suppress divergence at the behavioral level while filtering corrupted information at the communication level, ensuring that learning can still converge under attack.

On the theoretical side, the research overcomes the limitations of existing approaches that typically rely on bounded state spaces or finite basis functions. For the first time, the team established almost sure convergence guarantees for decentralized multi-agent reinforcement learning under unbounded state spaces. By constructing an exponential Lyapunov function, the study proved the geometric ergodicity of the policy mixing mechanism. Based on the Poisson equation and a stochastic approximation differential inclusion framework, the researchers further characterized the joint convergence conditions of resilient consensus and policy optimization under a two-timescale setting. The study also provided theoretical descriptions of convergence neighborhoods and non-asymptotic relative errors under Byzantine perturbations. This analytical framework unifies system stability, learning convergence, and resilience in adversarial environments within a rigorous mathematical framework, providing a theoretical foundation for future studies in related areas.

Experiments were conducted across all three application scenarios with attackers accounting for 20 percent of system nodes. Without protection mechanisms, service times deteriorated sharply and queues expanded almost without bound. After enabling the proposed defense algorithms, service times returned to levels only slightly above normal conditions, while affinity structures between tasks and servers recovered from severe disorder to clear organization. As AI computing networks and industrial Internet of Things systems move toward large-scale decentralized collaboration, learning mechanisms capable of tolerating malicious internal nodes are becoming increasingly essential. The study offers not only a solution for specific scenarios, but also an integrated design framework that combines stability guarantees with resilient learning.

Author Introduction

Yifan Jiang
Undergraduate student enrolled in 2022

During his undergraduate studies, Jiang published two CCF-A papers and received the Outstanding Graduate award from Shanghai Jiao Tong University. He will begin his master’s studies at GC in September 2026.

Jiasheng Pan
Doctoral student enrolled in 2024

His research focuses on the theory of multi-agent reinforcement learning.

Li Jin
Associate Professor

Jin is a nationally recognized young scholar and a senior member of IEEE. He received his bachelor’s degree from Shanghai Jiao Tong University and his Ph.D. from Massachusetts Institute of Technology, and previously taught at New York University. His research interests include network control theory and its applications in intelligent transportation systems and computing clusters.