After years of building and maintaining CI/CD pipelines at Dell Technologies, I’ve learned that enterprise CI/CD is a different beast than what you read about in tutorials. The principles are the same, but the scale, the politics, and the consequences of getting it wrong are amplified.
Here’s what I’ve learned works—and what I wish someone had told me earlier.
Pipeline Design That Actually Scales
Keep Stages Focused
Early in my career, I built pipelines that tried to do everything in one stage. Build, test, scan, deploy—all crammed together. It worked until it didn’t. When something failed, good luck figuring out what.
Now I follow a simple rule: one responsibility per stage.
pipeline {
agent any
stages {
stage('Build') {
steps {
sh 'go build -o app ./cmd/server'
}
}
stage('Unit Tests') {
steps {
sh 'go test -v -coverprofile=coverage.out ./...'
}
post {
always {
publishCoverage adapters: [coberturaAdapter('coverage.xml')]
}
}
}
stage('Security Scan') {
steps {
sh 'snyk test --severity-threshold=high'
}
}
stage('Build Image') {
steps {
sh 'docker build -t myapp:${GIT_COMMIT} .'
}
}
stage('Deploy to Dev') {
when {
branch 'develop'
}
steps {
sh 'kubectl apply -f k8s/dev/'
}
}
}
}
When the security scan fails, I know exactly where to look. When tests pass but deployment fails, the problem is isolated.
Fail Fast, Fail Cheap
Run your fastest checks first. There’s no point waiting 20 minutes for integration tests to complete if a linting error would have caught the problem in 30 seconds.
My typical ordering:
- Linting and static analysis (seconds)
- Unit tests (minutes)
- Security scanning (minutes)
- Build artifacts (minutes)
- Integration tests (longer)
- Deployment (varies)
stage('Quick Checks') {
parallel {
stage('Lint') {
steps {
sh 'golangci-lint run'
}
}
stage('Format Check') {
steps {
sh 'gofmt -l . | grep -q . && exit 1 || exit 0'
}
}
}
}
Running lint and format checks in parallel saves time. If either fails, the pipeline stops before burning compute on longer tests.
Security Scanning That Doesn’t Slow You Down
Security scanning is non-negotiable at enterprise scale. But I’ve seen teams disable scanners because they added 15 minutes to every build. That’s a process problem, not a security problem.
Scan Dependencies Separately
Don’t scan your entire dependency tree on every commit. Instead, run comprehensive dependency scans on a schedule and fast scans on commits.
stage('Security') {
parallel {
stage('Code Scan') {
// Fast - runs on every commit
steps {
sh 'snyk code test'
}
}
stage('Dependency Scan') {
// Only on main branch or weekly
when {
anyOf {
branch 'main'
triggeredBy 'TimerTrigger'
}
}
steps {
sh 'snyk test --all-projects'
}
}
}
}
Don’t Block on Every Finding
Not every vulnerability is critical. Configure your scanner to fail only on high and critical issues, and track the rest separately.
# Fail on high/critical only
snyk test --severity-threshold=high
# Report everything for tracking
snyk monitor
We review medium and low findings weekly rather than blocking every deployment.
Testing Strategies That Work at Scale
The Testing Pyramid Still Applies
I’ve worked with teams that had thousands of end-to-end tests and almost no unit tests. Builds took hours. Failures were cryptic. Everyone was afraid to touch the tests.
The pyramid exists for a reason:
- Unit tests: Fast, isolated, lots of them
- Integration tests: Test component interactions, fewer than unit
- E2E tests: Validate critical paths only, expensive to maintain
# Example test distribution
tests:
unit:
count: 500+
runtime: ~2 minutes
runs_on: every_commit
integration:
count: ~100
runtime: ~10 minutes
runs_on: every_commit
e2e:
count: ~20
runtime: ~30 minutes
runs_on: main_branch_and_nightly
Parallelize Everything
If your tests can run independently (they should), run them in parallel. Jenkins, GitHub Actions, and GitLab all support this.
stage('Integration Tests') {
parallel {
stage('API Tests') {
steps {
sh 'go test -v ./tests/api/...'
}
}
stage('Database Tests') {
steps {
sh 'go test -v ./tests/db/...'
}
}
stage('Auth Tests') {
steps {
sh 'go test -v ./tests/auth/...'
}
}
}
}
We cut our test runtime from 45 minutes to 12 minutes by parallelizing tests that were running sequentially for no good reason.
Lessons Learned the Hard Way
Always Have a Rollback Plan
Early in my DevOps career, I deployed a change that passed all tests but broke production in a way we didn’t test for. We had no quick rollback. It took four hours to fix.
Now every deployment includes a rollback strategy:
# Deploy with revision tracking
kubectl apply -f k8s/deployment.yaml
kubectl rollout status deployment/myapp
# If something goes wrong
kubectl rollout undo deployment/myapp
For database migrations, this means writing reversible migrations or having a tested restore procedure.
Secrets Management is Not Optional
I’ve seen API keys in Jenkins environment variables, database passwords in plain text config files, and tokens committed to Git. At enterprise scale, this isn’t just bad practice—it’s a compliance violation waiting to happen.
We use HashiCorp Vault for secrets, injected at runtime:
stage('Deploy') {
steps {
withVault(configuration: [vaultUrl: 'https://vault.internal'],
vaultSecrets: [[path: 'secret/myapp/prod',
secretValues: [[vaultKey: 'db_password',
envVar: 'DB_PASSWORD']]]]) {
sh 'kubectl apply -f k8s/prod/'
}
}
}
Monitor Your Pipelines
A pipeline that fails silently is worse than no pipeline. We track:
- Build success rate
- Average build time
- Time to recovery after failures
- Queue wait times
When our average build time crept from 8 minutes to 15, we caught it early and fixed the root cause (a slow dependency mirror) before it became a real problem.
What I’d Tell My Past Self
-
Start simple. A working pipeline that deploys reliably beats a complex one that fails mysteriously.
-
Document everything. Six months from now, you won’t remember why that stage exists. Write it down.
-
Make failures actionable. “Build failed” is useless. “Unit test failed in auth/login_test.go:47” tells you exactly where to look.
-
Treat pipeline code like application code. Version control, code review, testing—all of it applies.
-
Talk to your developers. The best pipeline improvements came from developers telling me what slowed them down, not from me guessing.
Working on CI/CD at scale? I’m always interested in comparing notes. Connect with me on LinkedIn.