About Me
Hi there! I'm Jinyang Su, also known as susun online. I'm a systems engineer working on LLM inference and infrastructure.
what i do now
I work as an LLM inference engineer, building production serving infrastructure. My main project is Pegaflow — a distributed KV cache system with RDMA support for LLM inference. The daily work is a mix of storage architecture, networking, and GPU-side optimization — figuring out how to move data as fast as possible so models can serve at scale.
On the side, I'm building pegainfer — a from-scratch LLM inference engine written in Rust with hand-written CUDA kernels.
where i come from
Before LLM inference, I was a database storage engineer. I worked on storage engines, write-ahead logs, compaction strategies, and all the low-level plumbing that makes databases reliable. That background turns out to be surprisingly relevant — distributed caching, eviction algorithms, async I/O, and memory management are just as central to inference serving as they are to databases.
what i think about
I care about controlling complexity. The essence of programming is managing complexity, and I've learned (sometimes the hard way) that understanding must come before delegation — whether to a teammate or an AI coding agent. I use OKRs to keep myself focused on what matters, and I try to remind myself: if everything is equally important, nothing is.
this blog
Writing forces clarity. This blog is where I write about systems engineering, inference optimization, RDMA, storage internals, and lessons from building production systems. I write primarily for my future self — to solidify understanding and document the journey. If others find it useful, that's a bonus.
get in touch
You can find my work on GitHub. Feel free to reach out if you want to chat about systems, inference, or anything in between.