INTRODUCTION



Hongshi TAN currently is a Ph.D. student at the School of Computing, National University of Singapore, supervised by Prof. Bingsheng He and Prof. Weng-Fai Wong. His research interests include high performance computing with special emphasis on FPGA-based heterogeneous systems for graph processing and graph representation learning. He is responsible for the Heterogeneous Accelerated Compute Cluster at NUS (HACC) under AMD University Program.

Hongshi was an embedded system engineer at Dept. of Flight Control, DJI and has proficient experiences on architectural design and implementation for FPGA-CPU heterogeneous systems. He led the embedded system team for developing the world first digital beamforming (DBF) frequency modulated continuous wave radar on agricultural drones. He was also interested in solving the real problems on system security area including firmware anti-hack and communication protection. The global navigation satellite system (GNSS) information signature scheme developed by him is the most critical component for flight safety and has been adopted in millions of DJI drones around the world.

                             

RECENT NEWS

EDUCATION

WORK EXPERIENCE

ACADEMIC PUBLICATIONS



RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs
Hongshi Tan, Yao Chen, Xinyu Chen, Qizhen Zhang, Cheng Chen, Weng-Fai Wong, Bingsheng He
Full paper accepted by HPCA'26.

Efficient Graph Data Access for Out-of-Memory GPU Streaming Graph Processing
Qiange Wang, Yongze Yan, Hongshi Tan, Cheng Chen, Cheng Zhao, Jiaming Tian, Jiaxin Jiang
Full paper accepted by VLDB'25.

Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing
Feng Yu, Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, Weng-Fai Wong
Full paper accepted by SIGMOD'25.

Towards a Better 16-bit Number Representation for Training Neural Networks
Himeshi De Silva, Hongshi Tan, Nhut-Minh Ho, John L Gustafson, Weng-Fai Wong.
Full paper accepted by CoNGA'23.

LightRW: FPGA Accelerated Graph Dynamic Random Walks
Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, and Weng-Fai Wong.
Full paper accepted by SIGMOD'23 (Acceptance rate: 28%).

ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines
Xinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Bingsheng He, and Weng-Fai Wong.
Full paper accepted by MICRO'22 (Acceptance rate: 22%).

ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLS
Xinyu Chen, Feng Cheng, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong and Deming Chen.
Invited journal paper accepted by ACM TRETS.

Skew-Oblivious Data Routing for Data-Intensive Applications on FPGAs with HLS.
Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong and Deming Chen.
Full paper accepted by DAC'21 (Acceptance rate: 23%).

ThundeRiNG: Generating Multiple Independent Random Number Sequences on FPGAs.
Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, and Weng-Fai Wong.
Full paper accepted by ICS'21 (Acceptance rate: 25%).

ThunderGP: HLS-based Graph Processing Framework on FPGAs.
Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong and Deming Chen.
Full paper accepted by FPGA'21 (Acceptance rate: 20%).

HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy.
Yao Chen, Xin Long, Jiong He, Yuhang Chen, Hongshi Tan, Zhenxiang Zhang, Marianne Winslett and Deming Chen.
Demo paper accepted by ICDCS'20.

PROJECTS



INTERESTS



                       

POSTS


RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs

Abstract Graph Random Walks (GRWs) offer efficient approximations of key graph properties and have been widely adopted in many applications. However, GRW workloads are notoriously difficult to accelerate due to...

Efficient Graph Data Access for Out-of-Memory GPU Streaming Graph Processing

Abstract Leveraging GPUs’ high parallelism can significantly improve the real-time computation efficiency of streaming graph processing. However, when a large-scale graph exceeds GPU memory capacity, CPU-GPU cooperative processing often results...

Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing

Abstract Efficient graph processing is critical in various modern applications, such as social network analysis, recommendation systems, and large-scale data mining. Traditional single-FPGA systems struggle to handle the increasing size...

Towards a Better 16-bit Number Representation for Training Neural Networks

Abstract Error resilience in neural networks has allowed for the adoption of low-precision floating-point representations for mixed-precision training to improve efficiency. Although the IEEE 754 standard had long defined a...

LightRW: FPGA Accelerated Graph Dynamic Random Walks

Abstract Graph dynamic random walks (GDRWs) have recently emerged as a powerful paradigm for graph analytics and learning applications, including graph embedding and graph neural networks. Despite the fact that...