Blog

Human in the Loop: Engineering with Vibe Coding

February 22, 2026 · Leo Li

Vibe coding accelerates innovation. HPC workloads demand discipline.

Recently, I was asked to help on a customer case involving Nextflow pipelines on OCI. The request seemed simple on the surface: could we run real Nextflow pipelines efficiently on OCI? The challenge was that there was no production-ready Nextflow plugin/executor integration for OCI like on other clouds. So the task was not just “can OCI run Nextflow,” but whether we could make it run predictably, end-to-end, and in a way customers could adopt.

The context and the constraint

I had to build nf-iac in a few weeks from a place where I was not yet comfortable with Nextflow internals. It became a real test of where AI helps and where engineering discipline still has to lead.

I did have an advantage:

solid infrastructure and IaC experience
distributed systems instincts
tolerance for operational edge cases (resources, locality, retries, and data movement)

Time was short, so I did not start by trying to become a Nextflow expert. I used a practical method:

Deconstruct → observe → understand → reassemble.

Deconstruction: split the problem first

I split the system into two layers:

a thin Nextflow-side executor/plugin that extracts task context and sends JSON to a REST gateway
an nf-server service that receives that context and performs execution with IaC logic

This design gave immediate observability while I was still learning:

JSON payloads could be logged, diffed, and replayed
each transition became explicit
integration behavior became testable under time pressure

With AI, a working version was surprisingly fast.

The inputs were narrow and explicit:

architecture split
REST contract
IaC starting template
a few real sample pipelines

That first phase felt like a real “vibe coding” moment: mechanical tasks disappeared quickly and scaffolding appeared.

Why AI moved us to “it runs” quickly but not “it’s ready”

AI is very good at producing something that runs.
It is less effective at enforcing maintainability, minimalism, and coherence on its own.

So I kept boundaries tight:

clear contracts
deterministic behavior
deliberate ownership of taste and structure

The first lesson was simple:

AI can take you to “it runs.”

Engineering is what takes it from “it runs” to “it is trustworthy.”

Reality check: from demos to production-shaped tests

The first version worked on my own pipeline, then on small baselines.

Then came the real workload: nf-core/methylseq on a large dataset.

That moved the problem from “demo success” to “systems reliability.”

The break happened quietly: early-stage failures, intermittent missing outputs, and behavior that passed simple checks but broke under load.

The failure pattern pointed to assumptions, not syntax:

timing
consistency
filesystem semantics
metadata behavior under pressure

Object storage semantics became the blocker.

Object storage mismatch and the storage bridge

This shifted the project into a design problem.

REST and API-style decomposition helped for speed, but HPC workloads exposed mismatched assumptions between:

POSIX-oriented workflow expectations
object storage behavior

I built BucketFS to make the gap explicit:

work directories on local NVMe for normal filesystem behavior
bucket as durable namespace through a FUSE mount
explicit data movement points
observable upload/commit flow

The prototype came quickly, but systems bugs were immediate:

path handling edge cases
directory vs file semantics
mount behavior nuances
hidden workflow assumptions around symlinks and metadata

AI accelerated scaffolding and surfaced likely traps, but it did not replace the design decisions.

I simplified further:

explicit stage-in / stage-out
deterministic local-first execution
scripts over daemons where possible
minimal dependency surface
predictable cloud CLI path for operational simplicity

That move was boring in the best sense: stable, readable, and reliable.

Reassembly: back to a single operational shape

With enough understanding, version 1 had done its job.

I had learned what Nextflow really does, and where failures propagate.

So I collapsed to nf-iac:

one coherent wrapper around the execution model
clearer operational surface
stronger support for IaC-driven lifecycle

The merge took about three days from code standpoint, but mostly because the uncertainty was gone.

AI worked best when the target behavior was already clear:

high-volume refactors
interface translation across layers
wrapper-style consolidation

It still could not replace:

contract ownership
invariants
tradeoff decisions

Why tool-shaped architecture worked better than service-shaped API design

Version 1 worked as a RESTful API split: thin plugin + execution service.

For learning, that was perfect.

For scale, that became extra lifecycle complexity.

nf-iac as a tool was easier to reason about because it kept:

narrow input/output boundaries
deterministic behavior
fewer moving operational surfaces

This project reinforced what I now prefer in many AI-assisted systems: tools with explicit contracts rather than service-like flexibility that drifts into ambiguity.

Vibe Coding to Value

Only after stability did I run the real cost comparison versus an alternative cloud execution path with:

same pipeline
same dataset size
comparable performance expectations

For this measurement, nf-iac showed clear cost advantage without performance sacrifice.

The result is not a universal benchmark; it is one workload-specific measurement. But it validated one practical point:

closing the execution/contract gap can make OCI cost advantages visible and real.

Why it did not stop at completion

Once the execution became declarative, extending to heterogeneous compute (Arm + GPU stages in one framework) became natural.

The architecture was now a shape:

one framework
multiple compute targets
lower cost of experimentation

That extension took about two days and validated the same pattern:

AI can build quickly. Humans still decide what should be built, how far to extend it, and why.

Working with AI: what works, what does not

What I use AI for

quick orientation in unfamiliar systems
language and layer translation
low-level scaffolding where constraints are clear
fast iteration on scoped refactors

What I do not use AI for

owning architecture tradeoffs
guaranteeing end-to-end correctness
solving unknown frontier contract mismatches alone

AI can enumerate options fast. It can shorten feedback loops.
Humans still verify, decide, and own the consequences.

Final reflection

Before this work, I already trusted OCI strengths.

This project proved the difference is not AI speed alone, but:

clear method
workload-driven testing
explicit contracts
disciplined ownership of correctness

The lesson is simple:

AI can compress mechanical work.

Humans still own the hard parts.