Dr. Leo
Profile

Dr. Leo

Embracing the Future of AGI!

About Me

I'm Leo, an AI infrastructure architect focused on pushing the limits of modern hardware scalability—bridging GPUs, interconnects, and distributed systems to unlock their full potential. My work isn't just about performance tuning; it's about understanding the workarounds, bottlenecks, and structural shifts that define this technological era. Beyond engineering, I'm personally drawn to the deeper, human questions behind these changes—how they reshape not only computation, but our relationship to intelligence itself.

Four Layers of AI Infrastructure

1. GPU & NVLink

Modern GPUs provide immense compute power, but their true value emerges when linked via NVLink into coherent clusters. This high-bandwidth, low-latency interconnect transforms isolated devices into a unified compute fabric—critical for training large-scale models like LLMs. From my perspective, this trend marks a fundamental shift in computing: from linear pipelines to two-dimensional, matrix-like computation, where performance depends not just on raw power, but on the architecture of connections themselves.

2. RDMA & Networking

Traditional networking protocols like TCP/IP were never designed for the demands of modern AI. RDMA fundamentally redefines this layer by enabling memory-to-memory transfers without CPU involvement—achieving microsecond latencies and full link saturation at 200 Gbps+. But more than a performance hack, RDMA represents a shift toward spatial communication models. In two-dimensional computing, data is no longer passed linearly, but flows across a fabric—emphasizing topology, synchronization, and bandwidth symmetry as first-class design principles.

3. Kubernetes & Scheduling

Kubernetes brought scale to the cloud, but AI workloads require more than generic orchestration—they demand hardware-aware, interconnect-sensitive scheduling. My work pushes K8s toward a topology-driven model, where GPU placement, NVLink awareness, and RDMA path optimization are tightly integrated. This aligns with the broader move from time-based to space-based computation: where resource geometry and inter-node relationships directly impact performance and scalability.

4. AI-Native Application Stack

AI applications today aren’t just services—they're dynamic, emergent systems. LLM inference, multimodal fusion, token streaming—all rely on real-time orchestration of compute, memory, and network. To support this, infrastructure must become more reactive, adaptive, and tensor-native. In my view, this signals a new layer of computing logic—one rooted in matrix structures and flow fields, where software doesn't just run on hardware, it resonates with it.

Books (coming soon)

  • 📘 Beyond Algorithms — Explores the structural and philosophical foundations of intelligence through AI, networks, and quantum theory.
  • 📗 Behind Algorithms — A technical and historical tour from CUDA cores to fabric-native GPU clusters.
  • 📙 Between Algorithms — Reflects on how AI infrastructure reshapes society, cognition, and our understanding of life.
Beyond Algorithms cover Behind Algorithms cover Between Algorithms cover

Education

  • Ph.D., University of Wisconsin - Madison
  • B.S., University of Science and Technology of China

Professional Affiliations

  • Voting member, Linux Foundation Edge and Akraino Project (2022, 2023)
  • Vice Director, Film Advanced Technology Committee, CSMPTE (2018)
  • Industry Professor, Jiangnan University (2017–2021)

Selected Patents

No. Patent ID Title
21US20120290696Longest Prefix Matching of Variable-Sized Hierarchical Names by Treelets
20CN20150605File Transfer Based on Named Data Networking Caching Algorithm
19CN119179440ADistributed Storage Based on Centralized Metadata
18CN119336686AExpanding Base Address Register Space in PCIe Devices
17CN114827151AHeterogenous Clustered Devices via PCIe CXL and UCIe
16CN114745325AMAC in MAC Network Encoding via PCIe, CXL, UCIe
15CN110891081APacket Routing and Broadcasting Method
14CN20150605CNVehicular Networking with Content-Centric Networking
13CN111027396AAssisted Driving System and Cloud Server
12CN110929087AAudio Classification Method and Device
11CN106708749AFast Searching Based on Fractional Algorithms
10CN109688204AFile Download Using Named Data Networking
9CN109448684AIntelligent Music Composition
8CN111209098AIntelligent Rendering Scheduling System
7CN110955515AFile Processing for Electronic Devices
6CN111178151AMicro-Expression Recognition Based on AI
5CN106095996BText and Content Classification System
4CN110944034AWeb-Based Resumable Transmission Protocol
3CN111125045ALightweight ETL Processing Platform
22017211770803Portable Accelerated Mobile Video Transmission
1CN107819704AScalable Wireless Media App for Edge Computing

Contact

Feel free to connect via [email protected] or on GitHub.