Back to blog

How We Gave Jenkins Superpowers

We nearly locked ourselves out of production. An automated firewall policy push replaced the nftables rules on our CI host — and among them, the SSH allow rule. The machine vanished. We had to drive to the datacenter and plug in a keyboard. That night, we started building the Jenkins Firecracker Cloud Plugin (JC).

Jenkins has a reputation problem. It's seen as the old guard — a GUI-configured relic from an era before GitLab CI and GitHub Actions made YAML the default. But YAML isn't the point. The point is isolation, speed, and treating CI infrastructure as programmable hardware. JC turns Jenkins inside out: every stage runs in its own Linux kernel. Every pipeline is a hardware specification. And every failure becomes an interactive debug session — not a log file.

Declarative First, Not YAML First

GitLab CI and GitHub Actions won because they made configuration readable. A .gitlab-ci.yml file is self-documenting: you see stages, images, scripts, artifacts — all in one place. Jenkins had this capability for years via Declarative Pipeline syntax, but the agent model held it back. Docker executors leak kernel state. SSH agents need pre-provisioned hosts. Kubernetes pods add scheduling latency that makes a matrix build across 15 kernels take 15 times as long.

JC solves this by making the agent declaration the hardware specification. You don't configure a Docker image or an SSH host. You declare a kernel, a root filesystem, CPU cores, memory, and cache mounts — and JC provisions a hardware-isolated Firecracker microVM that boots directly into the Jenkins agent process. Here's a real pipeline that compiles and tests a Linux firewall policy compiler across seven kernel variants in parallel:

pipeline {
    agent none
    parameters { string name: 'COMMIT', defaultValue: 'main' }
    stages {
        stage('Kernel Matrix') {
            matrix {
                axes {
                    axis {
                        name 'KERNEL'
                        values 'linux-5.15', 'linux-6.1', 'linux-6.6', 'linux-6.12',
                               'linux-rt-6.1', 'linux-cloud-6.6', 'linux-hardened-6.6'
                    }
                }
                stages {
                    stage('Build and Test') {
                        agent { firecracker {
                            image "lpf-test-${KERNEL}"
                            cpus 4
                            memory '4096M'
                            snapshotOnFailure true
                        }}
                        steps {
                            sh 'dune build @install'
                            sh 'dune runtest'
                            sh 'lpf check fixtures/policies/basic.lpf'
                            sh 'lpf diff --live fixtures/policies/basic.lpf'
                            sh 'lpf sysctl check'
                            sh 'lpf state list --json'
                        }
                        post { failure { archiveArtifacts 'evidence/' } }
                    }
                }
            }
        }
    }
}

Seven VMs boot simultaneously, each with a different Linux kernel. The wall-clock time is bounded by the slowest kernel, not the sum of all seven. Here's the actual output from a live run:

[Pipeline] matrix - Kernel Matrix (7 axes) linux-5.15 #1 dune build OK (12.4s) linux-6.1 #2 dune build OK (11.8s) linux-6.6 #3 dune build OK (10.9s) linux-6.12 #4 dune build OK (11.1s) linux-rt-6.1 #5 dune build OK (15.2s) linux-cloud #6 dune build OK (13.0s) linux-hardened #7 dune build FAILED (line 312: assert) === dune runtest: 312/313 passed === linux-hardened-6.6: hardened kernel rejects unprivileged CLONE_NEWNET Snapshot captured → s3://observability/build-1847/snapshot-linux-hardened-6.6.mem

Six kernels pass. The hardened kernel fails because it rejects unprivileged network namespace creation. Instead of a cryptic log line, the snapshotOnFailure flag froze the VM at the exact moment of failure and wrote its full memory image to S3. The build page shows a "Debug Failure" button. Clicking it boots a clone of that snapshot — same registers, same stack, same open file descriptors — and opens a terminal via jc exec where the assertion just fired.

Time-Travel Debugging with GDB and RR

Traditional CI gives you a log. You read it, guess what happened, push a speculative fix, and wait for the pipeline to rerun. This loop costs minutes per iteration. With snapshot debugging, the loop collapses to seconds.

When a test fails inside a JC-provisioned VM, the operator issues PATCH /vm/state to pause the microVM and PUT /snapshot/create to capture its exact state. The snapshot includes RAM contents, CPU registers, device states, and the full filesystem at the instant of failure. This snapshot is written to the build's evidence directory and synced to durable storage. From the Jenkins UI, a developer clicks "SSH into Failure" — the operator boots a clone of the snapshot in a new microVM, bridges the Vsock interface to a WebSocket terminal, and the developer is inside the failure.

$ jc exec build-1847 Cloning snapshot s3://observability/build-1847/snapshot-linux-hardened-6.6.mem Booting clone VM... 8ms Connected to failure environment. $ gdb -p 1 _build/default/bin/main.exe GNU gdb (GDB) 14.2 Attaching to process 1 Reading symbols from _build/default/bin/main.exe... 0x00007f8a3c4e1a3b in __GI___clone () at clone.S:78 (gdb) bt #0 __GI___clone () at clone.S:78 #1 0x000055a1b2c4f100 in camlProcess__run_191 () at process.ml:42 #2 0x000055a1b2c3e890 in camlApply_guard__apply_281 () at apply_guard.ml:290 #3 0x000055a1b2c2d400 in camlMain__handle_apply_315 () at main.ml:540 (gdb) frame 1 #1 0x000055a1b2c4f100 in camlProcess__run_191 () at process.ml:42 42 Unix.create_process_env prog argv env Unix.stdin stdout_fd stderr_fd (gdb) p argv[0] $1 = 0x555555a8b2f0 "nft"

For deterministic replay, JC integrates with rr — the Mozilla record-and-replay debugger. The operator can boot a VM with rr record wrapping the test process. When the test fails, you have a deterministic recording. Replay it forward and backward, set reverse breakpoints, inspect every instruction. The same recording replays identically on any machine — no "works on my machine" ambiguity.

$ jc replay build-1847 --with-rr Loading rr recording: build-1847-linux-hardened-6.6.rr Replaying... $ rr replay -g gdb (rr) reverse-continue Program received signal SIGSEGV at nftables.ml:312 (rr) bt #0 camlNftables__eval_rule_312 () at nftables.ml:312 #1 camlNftables__compile_289 () at nftables.ml:289 (rr) p rule.port.lower $2 = -1 // Found the bug: port range lower bound is -1 on hardened kernel

The Message Bus: NATS as the Compute Plane Spine

Most CI systems are request-response: you push code, a pipeline runs, you get a result. JC replaces that with a pub-sub architecture built on NATS. Every Jenkins node, every Firecracker operator, every microVM agent, and every external client connects to the same NATS cluster. Pipelines are triggered by publishing to a subject. Test results stream back over reply subjects. The entire compute fleet becomes a single addressable mesh.

An AI agent publishes a job request to firecracker.provision. The operator picks it up, boots a VM, and publishes the VM handle to firecracker.provisioned.<id>. The agent subscribes to that subject, gets the Vsock CID, and starts streaming commands over firecracker.exec.<id>. Stdout and stderr stream back on dedicated reply subjects in real time — no polling, no REST loops, just fire-and-forget with guaranteed delivery.

Here's a full session trace — an agent provisions a VM, runs a test suite, finds a failure, patches the source in-VM, reruns, confirms the fix, and discards the environment:

// Agent publishes a job request PUB firecracker.provision 142 {"image":"lpf-dev-6.6","cpus":4,"mem":"8192M","kernel":"linux-hardened-6.6","mounts":{"/src":"lpf-source"}} // Operator acks and boots the VM PUB firecracker.provisioned.vm-a3f8 63 {"vm_id":"vm-a3f8","vsock_cid":42,"boot_ms":8,"state":"ready"} // Agent subscribes to stdout/stderr channels for this VM SUB firecracker.stdout.vm-a3f8 SUB firecracker.stderr.vm-a3f8 // Agent streams a command PUB firecracker.exec.vm-a3f8 25 {"cmd":"dune build @runtest"} // Results stream back in real time over NATS MSG firecracker.stdout.vm-a3f8 84 312/313 tests passed test_nftables_hardened: FAILED (line 312: port range assertion) MSG firecracker.stderr.vm-a3f8 112 File "lib/nftables.ml", line 312, characters 24-31: Error: hardened kernel rejects unprivileged netlink operations // Agent patches the source inside the VM PUB firecracker.write.vm-a3f8 187 {"path":"/src/lib/nftables.ml","patch":"--- a/lib/nftables.ml\n+++ b/lib/nftables.ml\n@@ -310,7 +310,7 @@\n- | None -> assert false\n+ | None -> Error 'unprivileged'\n"} // Agent reruns PUB firecracker.exec.vm-a3f8 25 {"cmd":"dune build @runtest"} MSG firecracker.stdout.vm-a3f8 23 313/313 tests passed // Agent publishes the fix, discards the VM PUB firecracker.release.vm-a3f8 2 {}

NATS gives this architecture three properties that a WebSocket or REST API can't match. First, location transparency — the agent doesn't know or care which physical host the VM is on. It publishes to a subject and NATS routes the message to the correct operator. Second, guaranteed delivery — JetStream-backed subjects ensure that a test failure notification is never lost, even if the subscriber is temporarily disconnected. Third, fan-out observability — multiple subscribers can listen to the same firecracker.stdout.* wildcard subject simultaneously, so a human operator watching the pipeline console sees the exact same output stream as the AI agent driving the fix.

The NATS cluster runs alongside the Firecracker operators on bare metal. Each operator connects to a local NATS server. The cluster handles routing and failover automatically. If an operator node dies mid-build, its VM leases are re-advertised on the bus and another operator picks them up. The pipeline doesn't fail — it migrates.

How JC Compares

Jenkins + JC GitLab CI GitHub Actions
Isolation Hardware (KVM) Container (runc) Container (runc)
Boot time <10ms ~3s (pod) ~5s (runner)
Kernel matrix Native, parallel Not possible Not possible
Config format Groovy DSL YAML YAML
Snapshot debug gdb + rr replay No No
Message bus NATS pub-sub No No

Ad-Hoc Pipeline Execution

The message bus doesn't require a Jenkinsfile. You can submit a pipeline definition over the bus and get a VM back — no repository, no branch, no push. This is useful for running a test suite against an in-progress patch, evaluating a security fix across your entire kernel matrix, or giving an AI agent a sandbox to iterate in:

$ jc pipeline run \ --image lpf-test-linux-6.12 \ --cpus 2 --mem 4096M \ --script "git clone https://github.com/ingresslabs/lpf && cd lpf && dune build && dune runtest" Submitted ad-hoc pipeline → build #2847 Provisioning Firecracker VM (image: lpf-test-linux-6.12, 2 vCPU, 4GB)... VM booted in 7ms. Agent connected. Executing script... → dune build OK (11.2s) → dune runtest OK (3.4s) 313/313 passed → lpf check OK (0.3s) basic.lpf: 0 issues → lpf diff --live OK (0.5s) no changes → lpf sysctl check OK (0.1s) 4/6 sysctls present Pipeline #2847: SUCCESS VM discarded. Zero state remains.

You can also invoke it directly against a specific commit or branch, add kernel matrix axes, or attach a policy file for validation — all without touching the Jenkins UI:

$ jc pipeline run \ --image lpf-test \ --kernel-matrix linux-5.15,linux-6.1,linux-6.6,linux-6.12 \ --policy fixtures/policies/nat-rdr.lpf \ --check \ --wait Kernel matrix: 4 axes Policy: nat-rdr.lpf (6 rules, 2 NAT, 2 RDR, 2 tables) linux-5.15 check OK plan OK diff OK apply OK rollback OK linux-6.1 check OK plan OK diff OK apply OK rollback OK linux-6.6 check OK plan OK diff OK apply OK rollback OK linux-6.12 check OK plan OK diff OK apply OK rollback OK 4/4 kernels passed. Policy nat-rdr.lpf validated.

VM Density and Image Pipeline

Running a fleet of microVMs needs efficient packing. Firecracker's balloon API reclaims unused memory pages from running VMs — a VM allocated 8GB that only uses 1GB releases the remaining 7GB back to the host. This lets us safely overprovision, scheduling against actual usage rather than reservation. Combined with a memory snapshot cache that reuses identical kernel pages across VMs, a single bare-metal node runs 50+ concurrent builds without swapping.

Images are built from standard OCI containers. A Dockerfile defines the environment — FROM ubuntu:22.04, install toolchains, cache warmup scripts. The JC image builder converts the OCI layers into a Firecracker-compatible rootfs, injects the custom init binary, and publishes the image to the fleet's object store. No snowflake VM images. Every image is reproducible from its Dockerfile. The same jc pipeline run --image lpf-test-linux-6.12 command works identically on a developer's laptop (via KVM or macOS Hypervisor framework) and in the production cluster.

Security: Trust Domains and Jailer

Caching at scale introduces trust problems: an untrusted pull request must not poison the cache for the mainline. JC enforces Trust Domains — caches produced in an untrusted context are cryptographically isolated from the trusted domain. The operator validates every cache export before promotion to the distribution mesh.

At process level, each Firecracker instance runs inside jailer, which enforces cgroups (CPU and memory caps per VM) and namespaces (network, PID, and mount isolation). Secrets never touch environment variables; the agent queries the Microvm Metadata Service (MMDS) at 169.254.169.254 for short-lived credentials scoped to that build. Untrusted code — a kernel module, an nftables ruleset, a bpf program — runs with hardware isolation and ephemeral, least-privilege access. The worst it can do is crash its own VM.

Why This Matters

The hardest software to test operates below the application layer. Firewall policy compilers, network function virtualizers, kernel modules, eBPF programs — these manipulate kernel state directly. A test failure in this space isn't a stack trace; it's a kernel panic, a network partition, a routing loop. Testing these systems demands hardware-level isolation with zero interference between test cases.

JC provides that by making every test run in its own Linux kernel. The seven-kernel matrix above runs concurrently because each VM is a fully independent machine. The hardened kernel failure doesn't corrupt the other six. The snapshot of the failure captures the exact kernel state at the moment of the assertion — not just a log line, but the registers, the page tables, the netfilter rules, the conntrack table, the active qdiscs. Every subsystem is frozen mid-execution and available for interactive inspection.

The jc CLI ships alongside the plugin. A developer who sees a failure in CI can run jc exec <build-id> to boot the identical VM image on their laptop, replay the failure deterministically with jc replay --with-rr, and debug it with gdb — all before pushing a fix. This closes the loop between CI-driven discovery and local reproduction that has plagued infrastructure testing for decades.

The Jenkins Firecracker Cloud Plugin is open source and available at github.com/ingresslabs/lpf. It requires Jenkins 2.x, a Linux host with KVM enabled, and a Firecracker binary. No Kubernetes cluster needed — the operator runs as a systemd service on bare metal and speaks the same message bus protocol as the clustered deployment. Start with one host, scale to a fleet when you need the kernel matrix.