I am Chuheng (Seal) Zhang

Currently, I am a Researcher/Post-doc in Microsoft Research Asia. My research interest is on reinforcement learning (RL) methods in the era of foundation models, including RL for foundation models (e.g., alignment, RLHF) and foundation models for control (e.g., embodied AI). I am enthusiastic about foundation models for practical industrial problems, and have wide cooperation with industrial teams from different fields including quantitative investment, recommending systems, dialog systems, and inventory management. I have published 30+ papers in ICLR, ICML, AAAI, WWW, SIGIR, CIKM, ICDM, etc., and I serve as the reviewer in top AI conferences and journals. I have been granted the best student paper award runner-up in ICDM-22 and several competition awards in ABC Credit Risk Prediction Contest, UBIQUANT Quantitative Trading Contest, etc.

Experiences

Researcher

2022 - Present
Microsoft Research

My main focus as a research in Microsoft is reinforcement learning application in inventory management, RLHF, and embodied AI. I collaborate closely with Dr. Li Zhao and Dr. Jiang Bian .

Projects

I am facinated by the goal of artifical general intellegence (AGI), and believe in the role of reinforcement learning (RL) can play in the path to AGI. RL is the way how a human baby learns from born, and it is also the way I learn - keeping curious, exploring, failing fast, and learning fast.

Embodied AI - I am the core contributor of IGOR and Villa-x, and interested in the learning of LAM.
RL for LLMs - We have developed PF-PPO to improve popular the PPO-based RLHF algorithm, and AdaptiveStep to efficiently learn and use process-level reward models.
RL for Industry - I have studied RL's application in inventory management using Whittle index and primal-dual approaches.

Publications

See my publications at Google Scholar .