Projects - Yifan Cui

projects & interests

vLLM
High-throughput and memory-efficient inference engine for LLMs

python · cuda · pytorch
TileLang
DSL for high-performance GPU/CPU/Accelerator kernels

python · compiler
SGLang
High-performance serving framework for LLMs and multimodal models

python · inference
DeepGEMM
Clean and efficient FP8 GEMM kernels with fine-grained scaling

cuda · fp8
FlashMLA
Efficient MLA decoding kernels

c++ · cuda
nanochat
Train a 500M+ model end-to-end for less than $100

LLM training