Dr. Fu Li (Leo)
LinkedIn Profile: www.linkedin.com/in/leoustc
GitHub: https://github.com/leoustc
Professional Summary
Dr. Leo is an AI Infrastructure Architect with 15+ years of experience in AI infrastructure, high-performance computing, distributed systems, and advanced networking. His work has focused on architecting and benchmarking large-scale AI systems across GPU clusters, RDMA fabrics, NVLink environments, and cloud superclusters, with hands-on experience in CUDA, MPI, FPGA, PCIe, and CXL-based architectures. He is currently experimenting with a practical approach he refers to as Guided AI Engineering, which combines progressive context and goal setting, harness engineering, and deterministic automated workflows to help AI coding systems address real-world problems and frontier engineering challenges.
Core Skills
Problem Solving • Guided AI Engineering • AI Infrastructure • Heterogeneous Computing • CUDA/MPI GPU Computing • RDMA/NVLink Fabrics • PCIe/CXL Architectures • Performance Benchmarking • Superclusters • Systems Innovation
Selected Projects
- Agent-Runner: A scalable Multi-Agent Orchestration Platform
use systemd and inotifywait and mailbox protocol to scale local agents upto 100+ per node.
https://github.com/leoustc/agent-runner - Method and System for GPU Resource Management in Distributed Computing
(patent filed) - System And Method For Dynamic Distributed Code Execution And Orchestration Of Interactive Kernels
(patent filed) - Nextflow IaC Plugin for Heterogeneous HPC Orchestration
Developed Nextflow IaC plugin to run pipeline on arm, gpu and x86.
https://blogs.oracle.com/cloud-infrastructure/run-nextflow-with-heterogeneous-computing-on-oci
https://blogs.oracle.com/cloud-infrastructure/cut-nextflow-costs-by-70-with-oci
https://github.com/leoustc/nf-iac-plugin - PCI Subsystem over Network
Built a remote PCIe virtualization approach that allows remote PCIe subsystems to be accessed as local devices.
https://www.youtube.com/watch?v=PlotPtpnI38 - Distributed MCP Protocol for AI-native CDN Architecture
https://www.youtube.com/watch?v=Vszq1UBnhU4 - PCIe-Net
Worked on a TCP/IP-over-PCIe/CXL fabric concept for extending communication semantics across high-speed interconnects. - RDMA over PCIe/CXL
Contributed to high-performance data movement architectures based on RDMA principles over PCIe and CXL environments.
https://www.youtube.com/watch?v=EfgwZvFfmts - CXL Switch SoC
Explored a switch-chip architecture as an alternative interconnect model for scalable AI systems. - PCIe Switch SoC and 64-CPU SuperPod Architecture
Contributed to scalable multi-CPU and multi-node interconnect system design. - Multi-rail HPC Clusters for Rendering and Postproduction
Built and optimized multi-rail HPC cluster architectures for media and rendering workloads.
Professional Experience
- March 2025 – present, Singapore, AI Infra Architect at Center of Excellence, Oracle
- Benchmarking the state-of-art RDMA networking and NVLink System of OCI
- https://blogs.oracle.com/cloud-infrastructure/zettascale-osu-nccl-benchmark-h100-ai-workloads
- Innovation on GPU resources management, programming model (two patents)
- Solution and accelerator for GPU cluster optimization
- Solution on hybrid and heterogeneous AI/GPU computing (nextflow on infra-as-code)
- https://blogs.oracle.com/cloud-infrastructure/run-nextflow-with-heterogeneous-computing-on-oci
- https://blogs.oracle.com/cloud-infrastructure/cut-nextflow-costs-by-70-with-oci
- Solution on AI in a box and pipeline driven AI platform (data ETL + AI pipeline)
- 2022 – 2023, Technical Steering Committee Member, Akraino, Linux Foundation
- Project: Akraino Type 5: Integrated Edge Cloud
Developed compact all-in-one edge computing server utilizing PCIe fabric, achieving an 80% reduction in networking.
https://www.youtube.com/watch?v=EfgwZvFfmts
- Project: Akraino Type 5: Integrated Edge Cloud
- March 2014 - Oct 2024, Startups and Projects
- Project: High-performance interconnection protocol using CXL and CXL Switch (RDMA over CXL)
Directed a skilled team to resolve PCIe and CXL memory mapping challenges and developed PCIe/CXL routing modules enabling networking functionality for next-generation AI systems. - Project: Scalable memory pooling system using CXL Switch
Led the hardware and logic team creating a scalable FPGA-based memory pooling system with two-layer networking architecture and lightweight protocol akin to CXL 3.1. - Project: High-performance interconnection protocol using PCIe switch SoC (RDMA over PCIe)
Pioneered protocols including PCIe Net (TCP/IP over PCIe Fabric), RDMA over PCIe up to 64 nodes. NVLink system with PCIe SoC - Project: Next-generation Server with Shared NIC (AWS Graviton Servers)
Designed and led the hardware team for multi-way CPU systems with shared NIC architecture, alternative to AWS Graviton Servers. - Project: High-Performance Computing for Media Creation
- Project: High-performance interconnection protocol using CXL and CXL Switch (RDMA over CXL)
- May 2012 – March 2014, Cisco Systems, Inc., Software Engineer III
- Project: Cat4500 Switch ASIC Bringup
- Jul 2011 – May 2012, FutureWei Technologies Inc.
- Project: GPU-Accelerated LPM Switch Design and Implementation
Education
- Dec. 2011, Ph.D., University of Wisconsin – Madison
- Dec. 2008, M.S., University of Wisconsin – Madison
- Jul. 2006, B.S., University of Science and Technology of China
Professional Affiliations and Memberships
- Voting member of Linux Foundation Edge and Akraino Project 2022, 2023
- Vice Director of the Film Advanced Technology Committee of CSMPTE 2018
- Industry Professorship of Jiangnan University 2017-2021
Patents
| NO | ID | Title |
|---|---|---|
| 1 | US20120290696 | Method and System for Longest Prefix Matching of Variable-Sized Hierarchial Names by Treelets |
| 2 | CN20150605 | Method and System for File Transfer based on Named Data Networking Caching Algorithm |
| 3 | CN114827151A | Method and Device for Heterogenous clustered devices and servers based on PCIe CXL and UCIe Physical Links |
| 4 | CN114745325A | Method and Device for MAC in MAC Network Encoding Based on PCIe, CXL, and UCIe Physical Links |
| 5 | CN110891081A | Method and Device for Packet Sending, Routing, Broadcasting and Receiving Method and Device for Packet Sending, Routing, Broadcasting and Receiving |
| 6 | CN 20150605CN | Method and Device for Vehicular Networking Based on Content-Centric Networking |
| 7 | CN111027396A | Method and System for Assisted Driving, Apparatus, Onboard Terminal and Cloud Server |
| 8 | CN110929087A | Method and System for Audio Classification, Apparatus, Electronic Device, and Storage Medium |
| 9 | CN106708749A | Method and System for Fast Searching based on Fractional Algorithms |
| 10 | CN109688204A | Method and System for File Download Based on Named Data Networking, Node, and Terminal |
| 11 | CN109448684A | Method and System for Intelligent Music Composition |
| 12 | CN111209098A | Method and System for Intelligent Rendering Scheduling, Server, Management Node and Storage Medium |
| 13 | CN110955515A | Method and System for Processing Files, Apparatus, Electronic Devices, and Storage Medium |
| 14 | CN111178151A | Method and System for Recognizing Micro-Expressions in Facial Changes Based on AI Technology |
| 15 | CN106095996B | Method and System for Text and Content Classification |
| 16 | CN110944034A | Method and System for Web-based Resumable Transmission, Device, Electronic Device and Storage Medium |
| 17 | CN111125045A | Method and System for a Lightweight ETL Processing Platform |
| 18 | 2017211770803 | Method and System for a Portable Mobile Video Content Accelerated Transmission Device |
| 19 | CN107819704A | Method and System of Scalable Wireless Media Application for Edge Computing |