How We Gave Jenkins Superpowers
We nearly locked ourselves out of production. An automated firewall policy push replaced the nftables rules on our CI host — and among them, the SSH allow rule. The machine vanished. We had to drive to the datacenter and plug in a keyboard. That night, we started building the Jenkins Firecracker Cloud Plugin (JC).
Jenkins has a reputation problem. It's seen as the old guard — a GUI-configured relic from an era before GitLab CI and GitHub Actions made YAML the default. But YAML isn't the point. The point is isolation, speed, and treating CI infrastructure as programmable hardware. JC turns Jenkins inside out: every stage runs in its own Linux kernel. Every pipeline is a hardware specification. And every failure becomes an interactive debug session — not a log file.
Declarative First, Not YAML First
GitLab CI and GitHub Actions won because they made configuration readable. A .gitlab-ci.yml file is self-documenting: you see stages, images, scripts, artifacts — all in one place. Jenkins had this capability for years via Declarative Pipeline syntax, but the agent model held it back. Docker executors leak kernel state. SSH agents need pre-provisioned hosts. Kubernetes pods add scheduling latency that makes a matrix build across 15 kernels take 15 times as long.
JC solves this by making the agent declaration the hardware specification. You don't configure a Docker image or an SSH host. You declare a kernel, a root filesystem, CPU cores, memory, and cache mounts — and JC provisions a hardware-isolated Firecracker microVM that boots directly into the Jenkins agent process. Here's a real pipeline that compiles and tests a Linux firewall policy compiler across seven kernel variants in parallel:
pipeline {
agent none
parameters { string name: 'COMMIT', defaultValue: 'main' }
stages {
stage('Kernel Matrix') {
matrix {
axes {
axis {
name 'KERNEL'
values 'linux-5.15', 'linux-6.1', 'linux-6.6', 'linux-6.12',
'linux-rt-6.1', 'linux-cloud-6.6', 'linux-hardened-6.6'
}
}
stages {
stage('Build and Test') {
agent { firecracker {
image "lpf-test-${KERNEL}"
cpus 4
memory '4096M'
snapshotOnFailure true
}}
steps {
sh 'dune build @install'
sh 'dune runtest'
sh 'lpf check fixtures/policies/basic.lpf'
sh 'lpf diff --live fixtures/policies/basic.lpf'
sh 'lpf sysctl check'
sh 'lpf state list --json'
}
post { failure { archiveArtifacts 'evidence/' } }
}
}
}
}
}
}
Seven VMs boot simultaneously, each with a different Linux kernel. The wall-clock time is bounded by the slowest kernel, not the sum of all seven. Here's the actual output from a live run:
Six kernels pass. The hardened kernel fails because it rejects unprivileged network namespace creation. Instead of a cryptic log line, the snapshotOnFailure flag froze the VM at the exact moment of failure and wrote its full memory image to S3. The build page shows a "Debug Failure" button. Clicking it boots a clone of that snapshot — same registers, same stack, same open file descriptors — and opens a terminal via jc exec where the assertion just fired.
Time-Travel Debugging with GDB and RR
Traditional CI gives you a log. You read it, guess what happened, push a speculative fix, and wait for the pipeline to rerun. This loop costs minutes per iteration. With snapshot debugging, the loop collapses to seconds.
When a test fails inside a JC-provisioned VM, the operator issues PATCH /vm/state to pause the microVM and PUT /snapshot/create to capture its exact state. The snapshot includes RAM contents, CPU registers, device states, and the full filesystem at the instant of failure. This snapshot is written to the build's evidence directory and synced to durable storage. From the Jenkins UI, a developer clicks "SSH into Failure" — the operator boots a clone of the snapshot in a new microVM, bridges the Vsock interface to a WebSocket terminal, and the developer is inside the failure.
For deterministic replay, JC integrates with rr — the Mozilla record-and-replay debugger. The operator can boot a VM with rr record wrapping the test process. When the test fails, you have a deterministic recording. Replay it forward and backward, set reverse breakpoints, inspect every instruction. The same recording replays identically on any machine — no "works on my machine" ambiguity.
The Message Bus: NATS as the Compute Plane Spine
Most CI systems are request-response: you push code, a pipeline runs, you get a result. JC replaces that with a pub-sub architecture built on NATS. Every Jenkins node, every Firecracker operator, every microVM agent, and every external client connects to the same NATS cluster. Pipelines are triggered by publishing to a subject. Test results stream back over reply subjects. The entire compute fleet becomes a single addressable mesh.
An AI agent publishes a job request to firecracker.provision. The operator picks it up, boots a VM, and publishes the VM handle to firecracker.provisioned.<id>. The agent subscribes to that subject, gets the Vsock CID, and starts streaming commands over firecracker.exec.<id>. Stdout and stderr stream back on dedicated reply subjects in real time — no polling, no REST loops, just fire-and-forget with guaranteed delivery.
Here's a full session trace — an agent provisions a VM, runs a test suite, finds a failure, patches the source in-VM, reruns, confirms the fix, and discards the environment:
NATS gives this architecture three properties that a WebSocket or REST API can't match. First, location transparency — the agent doesn't know or care which physical host the VM is on. It publishes to a subject and NATS routes the message to the correct operator. Second, guaranteed delivery — JetStream-backed subjects ensure that a test failure notification is never lost, even if the subscriber is temporarily disconnected. Third, fan-out observability — multiple subscribers can listen to the same firecracker.stdout.* wildcard subject simultaneously, so a human operator watching the pipeline console sees the exact same output stream as the AI agent driving the fix.
The NATS cluster runs alongside the Firecracker operators on bare metal. Each operator connects to a local NATS server. The cluster handles routing and failover automatically. If an operator node dies mid-build, its VM leases are re-advertised on the bus and another operator picks them up. The pipeline doesn't fail — it migrates.
How JC Compares
| Jenkins + JC | GitLab CI | GitHub Actions | |
|---|---|---|---|
| Isolation | Hardware (KVM) | Container (runc) | Container (runc) |
| Boot time | <10ms | ~3s (pod) | ~5s (runner) |
| Kernel matrix | Native, parallel | Not possible | Not possible |
| Config format | Groovy DSL | YAML | YAML |
| Snapshot debug | gdb + rr replay | No | No |
| Message bus | NATS pub-sub | No | No |
Ad-Hoc Pipeline Execution
The message bus doesn't require a Jenkinsfile. You can submit a pipeline definition over the bus and get a VM back — no repository, no branch, no push. This is useful for running a test suite against an in-progress patch, evaluating a security fix across your entire kernel matrix, or giving an AI agent a sandbox to iterate in:
You can also invoke it directly against a specific commit or branch, add kernel matrix axes, or attach a policy file for validation — all without touching the Jenkins UI:
VM Density and Image Pipeline
Running a fleet of microVMs needs efficient packing. Firecracker's balloon API reclaims unused memory pages from running VMs — a VM allocated 8GB that only uses 1GB releases the remaining 7GB back to the host. This lets us safely overprovision, scheduling against actual usage rather than reservation. Combined with a memory snapshot cache that reuses identical kernel pages across VMs, a single bare-metal node runs 50+ concurrent builds without swapping.
Images are built from standard OCI containers. A Dockerfile defines the environment — FROM ubuntu:22.04, install toolchains, cache warmup scripts. The JC image builder converts the OCI layers into a Firecracker-compatible rootfs, injects the custom init binary, and publishes the image to the fleet's object store. No snowflake VM images. Every image is reproducible from its Dockerfile. The same jc pipeline run --image lpf-test-linux-6.12 command works identically on a developer's laptop (via KVM or macOS Hypervisor framework) and in the production cluster.
Security: Trust Domains and Jailer
Caching at scale introduces trust problems: an untrusted pull request must not poison the cache for the mainline. JC enforces Trust Domains — caches produced in an untrusted context are cryptographically isolated from the trusted domain. The operator validates every cache export before promotion to the distribution mesh.
At process level, each Firecracker instance runs inside jailer, which enforces cgroups (CPU and memory caps per VM) and namespaces (network, PID, and mount isolation). Secrets never touch environment variables; the agent queries the Microvm Metadata Service (MMDS) at 169.254.169.254 for short-lived credentials scoped to that build. Untrusted code — a kernel module, an nftables ruleset, a bpf program — runs with hardware isolation and ephemeral, least-privilege access. The worst it can do is crash its own VM.
Why This Matters
The hardest software to test operates below the application layer. Firewall policy compilers, network function virtualizers, kernel modules, eBPF programs — these manipulate kernel state directly. A test failure in this space isn't a stack trace; it's a kernel panic, a network partition, a routing loop. Testing these systems demands hardware-level isolation with zero interference between test cases.
JC provides that by making every test run in its own Linux kernel. The seven-kernel matrix above runs concurrently because each VM is a fully independent machine. The hardened kernel failure doesn't corrupt the other six. The snapshot of the failure captures the exact kernel state at the moment of the assertion — not just a log line, but the registers, the page tables, the netfilter rules, the conntrack table, the active qdiscs. Every subsystem is frozen mid-execution and available for interactive inspection.
The jc CLI ships alongside the plugin. A developer who sees a failure in CI can run jc exec <build-id> to boot the identical VM image on their laptop, replay the failure deterministically with jc replay --with-rr, and debug it with gdb — all before pushing a fix. This closes the loop between CI-driven discovery and local reproduction that has plagued infrastructure testing for decades.
The Jenkins Firecracker Cloud Plugin is open source and available at github.com/ingresslabs/lpf. It requires Jenkins 2.x, a Linux host with KVM enabled, and a Firecracker binary. No Kubernetes cluster needed — the operator runs as a systemd service on bare metal and speaks the same message bus protocol as the clustered deployment. Start with one host, scale to a fleet when you need the kernel matrix.