INTRODUCTION

Hongshi TAN currently is a Ph.D. student at the School of Computing, National University of Singapore, supervised by Prof. Bingsheng He and Prof. Weng-Fai Wong. His research interests include high performance computing with special emphasis on FPGA-based heterogeneous systems for graph processing and machine learning. He also works closely with Prof. Qizhen Zhang from the University of Toronto and Prof. Gustavo Alonso from ETH Zurich (ETHZ).

He is a committee member of the NUS SoC Student Area Search Committee, and serves as the Student Lab Manager of the System & Network Lab. He was responsible for the Heterogeneous Accelerated Compute Cluster at NUS (HACC) under AMD University Program.

Hongshi was an embedded system engineer at Dept. of Flight Control, DJI and has proficient experiences on architectural design and implementation for FPGA-CPU heterogeneous systems. He led the embedded system team for developing the world first digital beamforming (DBF) frequency modulated continuous wave radar on agricultural drones. He was also interested in solving the real problems on system security area including firmware anti-hack and communication protection. The global navigation satellite system (GNSS) information signature scheme developed by him is the most critical component for flight safety and has been adopted in millions of DJI drones around the world.

EDUCATION

School of Computing,
National University of Singapore

Jul 2022 - Present

Ph.D. Student in Computer Science

School of Computing,
National University of Singapore

Oct 2019 - Jul 2022

Master's Degree in Computer Science (Master of Computing)
Research Assistant in Department of Computer Science

School of Automotive Engineering,
Harbin Institute of Technology

Sep 2012 - Jun 2016

Bachelor's Degree in Thermal Energy and Power Engineering
Thesis: Analysis of Blade Motion Flow Field on Quad Rotor Drones Using OpenFOAM

WORK EXPERIENCE

ByteDance Ltd., Singapore

Jul 2023 - Jul 2024

Internship in Large-scale Graph Neural Network System

Shenzhen DJI Technology Co., Ltd

Jan 2016 - Mar 2019

Filght Control Embedded System Engineer (Top Five Ranked)
Leader of Radar Embedded System Team

Shanghai Engineering Center for Microsatellites

Nov 2015 - Jan 2016

Internship in Satellite Embedded System

ACADEMIC PUBLICATIONS

RidgeBridge: Random Access Optimized Interconnect Architecture for Scalable Graph Random Walks

Hongshi Tan, Yao Chen, Xinyu Chen, Qizhen Zhang, Weng-Fai Wong, and Bingsheng He.

Full paper accepted by MICRO'26.

2026-06-27

HiCAM: Accelerating Parallel Triangle Counting via Bit-Efficient Content-Addressable Memory on FPGA

Yao Chen, Feng Yu, Hongshi Tan, Xuanhua Shi, Weng-Fai Wong, Bingsheng He, and Hai Jin.

Full paper accepted by MICRO'26.

2026-06-27

Approaching Shannon Bound with Lossless LLM Weight Compression

Hongshi Tan, Yao Chen, Gustavo Alonso, Weng-Fai Wong, and Bingsheng He.

Full paper accepted by ISCA'26.

2026-06-14

Hardware-accelerated Aggregation: Unification and Specialization

Alireza Shateri, Hongshi Tan, Michael Ng, Bingsheng He, and Qizhen Zhang.

Preprint available on arXiv.

2026-06-08

MGI: A Communication Framework for Data Processing in Massive GPU Infrastructures

Di Wu, Hongshi Tan, Hanzhang Yang, Bingsheng He, and Qizhen Zhang.

Full paper accepted by VLDB'26.

2026-06-01

XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA

Feng Yu, Hongshi Tan, Yao Chen, Weng-Fai Wong, and Bingsheng He.

Full paper accepted by ISCA'26.

2026-05-07

RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs

Hongshi Tan, Yao Chen, Xinyu Chen, Qizhen Zhang, Cheng Chen, Weng-Fai Wong, Bingsheng He

Full paper accepted by HPCA'26.

2025-11-08

Efficient Graph Data Access for Out-of-Memory GPU Streaming Graph Processing

Qiange Wang, Yongze Yan, Hongshi Tan, Cheng Chen, Cheng Zhao, Jiaming Tian, Jiaxin Jiang

Full paper accepted by VLDB'25.

2025-08-12

Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing

Feng Yu, Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, Weng-Fai Wong

Full paper accepted by SIGMOD'25.

2025-01-31

Towards a Better 16-bit Number Representation for Training Neural Networks

Himeshi De Silva, Hongshi Tan, Nhut-Minh Ho, John L Gustafson, Weng-Fai Wong.

Full paper accepted by CoNGA'23.

2023-03-01

LightRW: FPGA Accelerated Graph Dynamic Random Walks

Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, and Weng-Fai Wong.

Full paper accepted by SIGMOD'23 (Acceptance rate: 28%).

2022-11-15

ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines

Xinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Bingsheng He, and Weng-Fai Wong.

Full paper accepted by MICRO'22 (Acceptance rate: 22%).

2022-07-21

ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS

Xinyu Chen, Feng Cheng, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong and Deming Chen.

Invited journal paper accepted by ACM TRETS.

2022-03-05

Skew-Oblivious Data Routing for Data-Intensive Applications on FPGAs with HLS.

Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong and Deming Chen.

Full paper accepted by DAC'21 (Acceptance rate: 23%).

2021-12-05

ThundeRiNG: Generating Multiple Independent Random Number Sequences on FPGAs.

Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, and Weng-Fai Wong.

Full paper accepted by ICS'21 (Acceptance rate: 25%).

2021-06-14

ThunderGP: HLS-based Graph Processing Framework on FPGAs.

Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong and Deming Chen.

Full paper accepted by FPGA'21 (Acceptance rate: 20%).

2021-02-27

HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy.

Yao Chen, Xin Long, Jiong He, Yuhang Chen, Hongshi Tan, Zhenxiang Zhang, Marianne Winslett and Deming Chen.

Demo paper accepted by ICDCS'20.

2020-12-05

MAJOR PATENTS

Time Synchronization Method, Device and System, and Storage Medium.

Zebin Fang, Hongshi Tan and Wenxin Hu.

U.S. Patent App. 17/326,316

2021-09-09

Method and Apparatus for Detecting Radar Wave Offset.

Huangjian Zhu, Bin Huang, Hongshi Tan, Wenxin Hu, and Chunming Wang.

U.S. Patent App. 17/090,263

2021-07-29

Continuous Wave Radar Terrain Prediction Method, Device, System, and Unmanned Aerial Vehicle.

Huangjian Zhu, Chunming Wang, Di Gao and Hongshi Tan.

U.S. Patent App. 17/183,315

2021-07-01

PROJECTS

INTERESTS

POSTS

June 27, 2026

RidgeBridge: Random Access Optimized Interconnect Architecture for Scalable Graph Random Walks

Abstract Graph Random Walks (GRWs) are fundamental to applications such as recommendation, fraud detection, and graph-enhanced AI, creating an urgent need for distributed solutions that can keep pace with the...

June 27, 2026

HiCAM: Accelerating Parallel Triangle Counting via Bit-Efficient Content-Addressable Memory on FPGA

Abstract Triangle counting is a fundamental primitive in large-scale graph analytics, yet its execution is bottlenecked by irregular memory access patterns and limited parallelism on conventional architectures. Prior FPGA-based CAM...

June 14, 2026

Approaching Shannon Bound with Lossless LLM Weight Compression

Abstract Large language models (LLMs) now scale to trillions of parameters, driving weight storage into the terabyte regime and creating an acute mismatch with GPU memory capacity. Although lossless compression...

June 08, 2026

Hardware-accelerated Aggregation: Unification and Specialization

Abstract The high efficiency of domain-specific hardware has sparked substantial interest in adopting accelerators in data analytics systems. Among many choices, GPUs and FPGAs thrived as two popular solutions due...

June 01, 2026

MGI: A Communication Framework for Data Processing in Massive GPU Infrastructures

Abstract This paper presents MGI, a general communication framework for performing data processing tasks in massive GPU infrastructures. Inter-GPU data transfer performance is crucial to multi-GPU data processing, and existing...

Top

INTRODUCTION

RECENT NEWS

EDUCATION

WORK EXPERIENCE

ACADEMIC PUBLICATIONS

MAJOR PATENTS

PROJECTS

INTERESTS

POSTS

RidgeBridge: Random Access Optimized Interconnect Architecture for Scalable Graph Random Walks

HiCAM: Accelerating Parallel Triangle Counting via Bit-Efficient Content-Addressable Memory on FPGA

Approaching Shannon Bound with Lossless LLM Weight Compression

Hardware-accelerated Aggregation: Unification and Specialization

MGI: A Communication Framework for Data Processing in Massive GPU Infrastructures