The CI/CD tool changes; the lessons don't
Every team has a story about migrating CI from one tool to another, with the same problems showing up on the new tool within a year. The patterns that survive across tools, after seeing a fair number of pipelines mature and rot, are below.The tool comparison, briefly
- GitHub Actions — the default if your code is on GitHub. Marketplace is wide. YAML is reasonable. Cost scales acceptably for small-medium teams; can surprise at scale.
- GitLab CI — tightly integrated with GitLab, including the security/dependency scanning add-ons. Solid choice when GitLab is the platform.
- Jenkins — the legacy default. Endless flexibility via plugins, the corresponding endless plugin maintenance. Still common in enterprises with deep Jenkins expertise.
- Buildkite, CircleCI, Drone, Argo Workflows, Tekton — each has a niche. Don't pick one without a specific reason.
The right tool is the one your team can run without it being a project of its own. For most teams in 2026 starting fresh, GitHub Actions is the path of least resistance.
The pipeline structure that holds
A working pipeline does, at minimum:- Compile / build the artefact
- Run unit tests
- Run linters / static analysis
- Build the container image (or other deployable artefact)
- Run integration tests against a deployed instance of the artefact
- Run security scans (SAST, dependency scan, image scan)
- Tag and publish the artefact, gated on tests passing
- Deploy to staging automatically on main branch
- Deploy to production via a manual approval gate
The order matters. Cheap fast checks (lint) before expensive slow checks (integration). Fail fast.
The cardinal sin: a multi-hour pipeline
A pipeline that takes 45 minutes to give a developer feedback is a pipeline they will avoid. The result: features merged with broken tests, deferred to "fix later", culture of cynicism toward the pipeline.The targets we hold ourselves to:
- Compile + lint + unit tests: under 5 minutes for any single PR
- Full pipeline (build + test + integration + scan): under 20 minutes
- If exceeded: split, parallelise, cache aggressively. Pipeline speed is a first-class problem.
Caching: the highest-leverage optimisation
Most slow pipelines are slow because they're rebuilding things they could have cached.- Dependency caches (npm, pip, go mod, cargo, maven) — keyed by lockfile hash
- Docker BuildKit cache — multi-stage caching, exported to a registry for shared cache across runs
- Test result caching — when no source files have changed since the last passing run, skip the test
A well-cached pipeline runs in 2-5 minutes for typical PRs. An uncached pipeline doing the same work runs in 20+. The difference compounds.
Secrets in pipelines: the ways it goes wrong
- Secrets in environment variables — fine if scoped to the job, dangerous if logged
- Secrets in pull request runs from forks — disable or scope tightly. PR-from-fork malicious code can exfiltrate secrets
- Long-lived static credentials — replace with OIDC-based federation where the platform supports it (GitHub Actions → AWS / GCP / Azure all support this)
- Secret rotation — automate it; manual rotation gets skipped
The default pipeline configuration on most CI tools is more permissive than appropriate. Audit early.
Branch protection, deployment protection, environment protection[/HEADING>
- main branch requires PR + review + green CI to merge
- production environment requires manual approval from a defined approver group
- rollback path — every deploy must have a tested rollback. Not "we can rebuild from yesterday's tag" — a rollback that's been exercised.
Pipeline observability — yes, it's a thing
Track:
- Pipeline duration — trending over time
- Pipeline failure rate — by job, not just overall
- Flaky test rate — tests that fail intermittently destroy trust in the suite
- Mean time to fix a broken main — if main is broken for hours, the team's velocity stops
These metrics are usually missing from CI dashboards. They're the ones that matter for engineering velocity.
One pattern we'd warn about
Configuration via giant pipeline files with no abstraction. Every CI system supports reusable templates / composite actions / shared workflows. Use them. The team that copy-pastes 200-line pipeline blocks across 30 services pays the maintenance cost forever.
One pattern that always pays off[/HEADING>
A "pipelines as code" repo separate from application code, with pipeline templates the application repos depend on. When a security policy changes, you change one file, not 30. We do this with reusable workflows in GitHub Actions; the equivalent exists in every modern CI system.
What's your CI tool, and what's your worst flaky-test story?
A "pipelines as code" repo separate from application code, with pipeline templates the application repos depend on. When a security policy changes, you change one file, not 30. We do this with reusable workflows in GitHub Actions; the equivalent exists in every modern CI system.
What's your CI tool, and what's your worst flaky-test story?