Ziyi Zhang

I am Ziyi Zhang, an incoming research scientist at Bytedance Seed. I will be working on building better infrastructure for large language models.

I was a PhD student and got my Master’s Degree at University of Chicago. It is my pleasure to be advised by Prof. Hank Hoffmann. Before that, I went to my undergrad at University of Wisconsin-Madison, where I was advised by Prof. Shivaram Venkataraman on system research and by Prof. Dieter van Melkebeek on competitive programming.

news

Apr 22, 2024	My teammates and I won the 8th place and silver medal at 2023 (46th) ICPC World Finals at Luxor, Egypt! (Check out the school news (English), my reflection (Chinese/English(AI-translated)))
Apr 15, 2022	I am going to UChicago for my PhD this fall!

selected papers

arXiv, 2025

SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding

Ziyi Zhang , Ziheng Jiang , Chengquan Jiang , Menghan Yu , Size Zheng , Haibin Lin , Henry Hoffmann , and Xin Liu

2025

PDF
NSDI’24

GRACE: Loss-Resilient Real-Time Video through Neural Codecs

Yihua Cheng , Ziyi Zhang , Hanchen Li , Anton Arapin , Yue Zhang , Qizheng Zhang , Yuhan Liu , Kuntai Du , Xu Zhang , Francis Y. Yan , Amrita Mazumdar , Nick Feamster , and Junchen Jiang

In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24) , Apr 2024

PDF
SOSP’23

Bagpipe: Accelerating Deep Recommendation Model Training

Saurabh Agarwal , Chengpo Yan , Ziyi Zhang , and Shivaram Venkataraman

In Proceedings of the 29th Symposium on Operating Systems Principles (SOSP 23) , Koblenz, Germany, Apr 2023

Abs PDF

Deep learning based recommendation models (DLRM) are widely used in several business critical applications. Training such recommendation models efficiently is challenging because they contain billions of embedding-based parameters, leading to significant overheads from embedding access. By profiling existing systems for DLRM training, we observe that around 75% of the iteration time is spent on embedding access and model synchronization. Our key insight in this paper is that embedding access has a specific structure which can be used to accelerate training. We observe that embedding accesses are heavily skewed, with around 1% of embeddings representing more than 92% of total accesses. Further, we also observe that during offline training we can lookahead at future batches to determine which embeddings will be needed at what iteration in the future. Based on these insights, we develop Bagpipe, a system for training deep recommendation models that uses caching and prefetching to overlap remote embedding accesses with the computation. We design an Oracle Cacher, a new component that uses a lookahead algorithm to generate optimal cache update decisions while providing strong consistency guarantees against staleness. We also design a logically replicated, physically partitioned cache and show that our design can reduce synchronization overheads in a distributed setting. Finally, we propose a disaggregated system architecture and show that our design can enable low-overhead fault tolerance. Our experiments using three datasets and four models show that Bagpipe provides a speed up of up to 5.6x compared to state of the art baselines, while providing the same convergence and reproducibility guarantees as synchronous training.