Jiayi Yuan

I write efficient code at xAI.

About me

I’m Jiayi Yuan ([dʒa-ˈi:], 袁加熠), I received my Ph.D. degree from the Department of Computer Science at Rice University, advised by Dr. Xia "Ben" Hu. I aim to build efficient machine learning algorithms and systems (MLSys) through methods like quantization, sparsity and re-parameterization while enhancing system robustness and security. My research applications span language, vision, time series, graph, and healthcare domains. Previously, I worked on:

Efficiency problems of long-context LLMs. [BLASST] [KIVI] [KVBench] [Stop Overthinking] [AutoL2S]
LLM post-training: finetune, RL, and evaluation. [Give Me FP32] [The Science] [DHP]
LLM Agent, LLM Routing, LLM safety. [Honeypot] [Rethink Router] [RouterArena] [Taylor Unswift] [LoRATK]

Earlier, I received my bachelor’s degree in computer science from Tsinghua University, where I also studied statistics as a minor.

I lived in Beijing for 22 years and in Houston for 4 years.

Education & Experience

Internship, 2025, NVIDIA
Internship, 2024, Amazon
Ph.D. in Computer Science, 2022 - 2025. Rice University
B.Eng. in Computer Science and Technology, 2017 - 2021. Tsinghua University

Highlights

BLASST has been integrated into TensorRT-LLM and NVIDIA Model Optimizer.
"Give me FP32" studies nondeterminism, which has become a heated topic; e.g., it was recently featured in a blog post by Thinking Machines Lab.
KIVI largely inspires KV Cache quantization in Huggingface and is integrated into Transformers. Full code is available here.
Rice News: Large language models could be the key to better patient-trial matching - Rice CS Ph.D. student wins AMIA Best Student Paper Award.
Rice News: Rice CS' Xia Ben Hu investigates LLMs and likely applications.

News

BLASST accepted by MLSys 2026 and RouterArena accepted by ICLR 2026, good ending
"Give Me FP32 or Give Me Death" got accepted to NeurIPS 2025 as an Oral (77 out of 21575 submissions) — numerical precision errors have become a hot topic! Code & Talk
I got three papers at NAACL, ACL, and EMNLP 2025 each, wish I got to visit Albuquerque, Vienna, and Suzhou this year
One survey on efficient LLM reasoning has been accepted by TMLR! Feel free to UPVOTE
Check out our recent insights and discussions on LLM evaluation
Two papers accepted by EMNLP 2024 (Main + Finding). See you in Miami!
Check out our recent benchmarking works on KV Cache compression, time series foundation models and LLM evaluation!
KIVI and SEED-GNN got accepted by ICML 2024. See you in Vienna!
Our LLM-PTM paper is selected as a best student paper at AMIA 2023
One paper accepted by NeurIPS 2023
Joined Microsoft Accelerating Foundation Models Research program
...