INTRODUCTION



Hongshi TAN currently is a Ph.D. student at the School of Computing, National University of Singapore, supervised by Prof. Bingsheng He and Prof. Weng-Fai Wong. His research interests include high performance computing with special emphasis on FPGA-based heterogeneous systems for graph processing and machine learning. He also works closely with Prof. Qizhen Zhang from the University of Toronto and Prof. Gustavo Alonso from ETH Zurich (ETHZ).

He is a committee member of the NUS SoC Student Area Search Committee, and serves as the Student Lab Manager of the System & Network Lab. He was responsible for the Heterogeneous Accelerated Compute Cluster at NUS (HACC) under AMD University Program.

Hongshi was an embedded system engineer at Dept. of Flight Control, DJI and has proficient experiences on architectural design and implementation for FPGA-CPU heterogeneous systems. He led the embedded system team for developing the world first digital beamforming (DBF) frequency modulated continuous wave radar on agricultural drones. He was also interested in solving the real problems on system security area including firmware anti-hack and communication protection. The global navigation satellite system (GNSS) information signature scheme developed by him is the most critical component for flight safety and has been adopted in millions of DJI drones around the world.

                             

RECENT NEWS

EDUCATION

WORK EXPERIENCE

ACADEMIC PUBLICATIONS



Approaching Shannon Bound with Lossless LLM Weight Compression
Hongshi Tan, Yao Chen, Gustavo Alonso, Weng-Fai Wong, and Bingsheng He.
Full paper accepted by ISCA'26.

Hardware-accelerated Aggregation: Unification and Specialization
Alireza Shateri, Hongshi Tan, Michael Ng, Bingsheng He, and Qizhen Zhang.
Preprint available on arXiv.

MGI: A Communication Framework for Data Processing in Massive GPU Infrastructures
Di Wu, Hongshi Tan, Hanzhang Yang, Bingsheng He, and Qizhen Zhang.
Full paper accepted by VLDB'26.

XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA
Feng Yu, Hongshi Tan, Yao Chen, Weng-Fai Wong, and Bingsheng He.
Full paper accepted by ISCA'26.

RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs
Hongshi Tan, Yao Chen, Xinyu Chen, Qizhen Zhang, Cheng Chen, Weng-Fai Wong, Bingsheng He
Full paper accepted by HPCA'26.

Efficient Graph Data Access for Out-of-Memory GPU Streaming Graph Processing
Qiange Wang, Yongze Yan, Hongshi Tan, Cheng Chen, Cheng Zhao, Jiaming Tian, Jiaxin Jiang
Full paper accepted by VLDB'25.

Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing
Feng Yu, Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, Weng-Fai Wong
Full paper accepted by SIGMOD'25.

Towards a Better 16-bit Number Representation for Training Neural Networks
Himeshi De Silva, Hongshi Tan, Nhut-Minh Ho, John L Gustafson, Weng-Fai Wong.
Full paper accepted by CoNGA'23.

LightRW: FPGA Accelerated Graph Dynamic Random Walks
Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, and Weng-Fai Wong.
Full paper accepted by SIGMOD'23 (Acceptance rate: 28%).

ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines
Xinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Bingsheng He, and Weng-Fai Wong.
Full paper accepted by MICRO'22 (Acceptance rate: 22%).

ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS
Xinyu Chen, Feng Cheng, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong and Deming Chen.
Invited journal paper accepted by ACM TRETS.

Skew-Oblivious Data Routing for Data-Intensive Applications on FPGAs with HLS.
Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong and Deming Chen.
Full paper accepted by DAC'21 (Acceptance rate: 23%).

ThundeRiNG: Generating Multiple Independent Random Number Sequences on FPGAs.
Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, and Weng-Fai Wong.
Full paper accepted by ICS'21 (Acceptance rate: 25%).

ThunderGP: HLS-based Graph Processing Framework on FPGAs.
Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong and Deming Chen.
Full paper accepted by FPGA'21 (Acceptance rate: 20%).

HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy.
Yao Chen, Xin Long, Jiong He, Yuhang Chen, Hongshi Tan, Zhenxiang Zhang, Marianne Winslett and Deming Chen.
Demo paper accepted by ICDCS'20.

PROJECTS



INTERESTS



                       

POSTS


Approaching Shannon Bound with Lossless LLM Weight Compression

Abstract Large language models (LLMs) now scale to trillions of parameters, driving weight storage into the terabyte regime and creating an acute mismatch with GPU memory capacity. Although lossless compression...

Hardware-accelerated Aggregation: Unification and Specialization

Abstract The high efficiency of domain-specific hardware has sparked substantial interest in adopting accelerators in data analytics systems. Among many choices, GPUs and FPGAs thrived as two popular solutions due...

MGI: A Communication Framework for Data Processing in Massive GPU Infrastructures

Abstract This paper presents MGI, a general communication framework for performing data processing tasks in massive GPU infrastructures. Inter-GPU data transfer performance is crucial to multi-GPU data processing, and existing...

XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA

Abstract The widespread adoption of mixed-precision quantization in large language models (LLMs) has created demand for hardware that can efficiently perform multiply-accumulate (MAC) operations across mixed datatypes and switch datatypes...

RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs

Abstract Graph Random Walks (GRWs) offer efficient approximations of key graph properties and have been widely adopted in many applications. However, GRW workloads are notoriously difficult to accelerate due to...