MGI: A Communication Framework for Data Processing in Massive GPU Infrastructures

Di Wu; Hongshi Tan; Hanzhang Yang; Bingsheng He; Qizhen Zhang

doi:10.14778/3819518.3819549

Proceedings of the VLDB Endowment, Vol. 19, No. 9. VLDB'26.
Written by Di Wu, Hongshi Tan, Hanzhang Yang, Bingsheng He, and Qizhen Zhang.
on June 01, 2026

MGI: A Communication Framework for Data Processing in Massive GPU Infrastructures

Abstract

This paper presents MGI, a general communication framework for performing data processing tasks in massive GPU infrastructures. Inter-GPU data transfer performance is crucial to multi-GPU data processing, and existing solutions repeatedly implement the same set of communication optimizations. MGI identifies these techniques and applies them judiciously behind a simple interface. Enabling MGI are (1) a central controller that models relevant hardware resources as an annotated graph and automates infrastructure-level optimizations to construct transfer plans and (2) a scalable data plane where buffers and executors are carefully designed to incorporate device- and link-level optimizations to execute data transfers efficiently. Our experiments on a variety of GPU infrastructures and workloads show that MGI significantly improves multi-GPU data processing performance compared to existing frameworks. This work is open-sourced on Github.

[Paper]

← → Top