• Pricing

AI-Native OS at Planetary Scale

Product

Compute

Elastic Deployment

Resource

Whitepaper

Docs

Blog

Company

About Us

Careers

Community

X (Twitter)

Telegram

Discord

Medium

Linkedin

Privacy Policy

Terms of Service

© 2026 Yotta Labs. All rights reserved.

Efficient ML & Distributed GPU
Orchestration

We publish practical research on workload optimization and orchestration across heterogeneous
GPUs. Find papers, reproducible benchmarks, grants, and media coverage

particle effect

Publications

Peer-reviewed papers and preprints on efficient training, model offloading, inference latency, and GPU schedking.

Highly-efficient billion-scale AI models training and inference using affordable GPUs

ZeRO-Offload and Sentinel for transformers

USENIX ATC’21HPCA’21

DyNN-Offload for Mixture-of-Experts (MoE)

HPCA’24

TECO-Offload on disaggregated memory

SC’24

Billion-scale graph neural network

ASPLOS’23

AI training based on parallelism management

Runtime Concurrency Control and Operation Scheduling

IPDPS’21

Tree structure-aware high performance inference engine

EuroSys’21

AI training using novel hardware

Energy-efficient training on GPU-FPGA accelerators

ICS’21

Processing-in-memory for energy-efficient DNN

MICRO’18

Award

Decentralized AI Computing Operating System for Accessible and Cost-Effective AI

NSF
particle effect

Media & Press

Coverage of our research, open-source and product releases.

01

Microsoft And The University Of California, Merced Introduces ZeRO-Offload, A Novel Heterogeneous DeepLearning...

02

Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

03

What’s New in HPC Research: Galaxies, Fugaku, Electron Microscopes & More

04

Microsoft Releases AI Training Library ZeRO-3 Offload