Efficient ML & Distributed GPU
Orchestration
We publish practical research on workload optimization and orchestration across heterogeneous
GPUs. Find papers, reproducible benchmarks, grants, and media coverage

Publications
Peer-reviewed papers and preprints on efficient training, model offloading, inference latency, and GPU schedking.
Highly-efficient billion-scale AI models training and inference using affordable GPUs
ZeRO-Offload and Sentinel for transformers
DyNN-Offload for Mixture-of-Experts (MoE)
TECO-Offload on disaggregated memory
Billion-scale graph neural network
AI training based on parallelism management
Runtime Concurrency Control and Operation Scheduling
Tree structure-aware high performance inference engine
Award
Decentralized AI Computing Operating System for Accessible and Cost-Effective AI


Media & Press
Coverage of our research, open-source and product releases.