pwillis-els/JenkinsAntiPattern-OpsToil.md

## JenkinsAntiPattern-OpsToil.md

      
    Raw
  

              JenkinsAntiPattern-OpsToil.md
            
          
    Jenkins: A DevOps Anti-Pattern

Jenkins is the WordPress of CI/CD. Designed in another era, it creates more problems than it needs to and is more complex than it needs to be. But because it’s free and user-friendly, it is ubiquitous and perennial, like a weed. Every year, people will try to use it, unaware of the problems it will create.
The "tl;dr" is that Jenkins was not designed to be used like modern Cloud-native DevOps-friendly software. You can "make it work", in the same sense that you can make pigs fly.... but they're really not designed to fly.
What is Jenkins?

Jenkins is an “automation server”. Basically it’s software that can continuously run automated tasks for you. It has a friendly web-based user interface, and because it’s written in Java, you can run it on any computer. And it has a lot of plugins to add features to do whatever you need.
What is it used for?

Software developers typically use Jenkins to do continuous integration and continuous delivery. Operations people typically use Jenkins to run automated tasks such as managing infrastructure or doing deployments.
Because you can run it on any computer, and it’s been around for a long time, most people have used it before.
If it’s easy to use, why do Ops people spend so much time setting it up and maintaining it?

Jenkins is easy to use if you’re a single user on a desktop computer. You might back up the files yourself, edit its configuration locally, upgrade it at your leisure. Basically, you’re only responsible to yourself.
When a group of people need to use it, it’s not so easy to maintain anymore. You run into a series of small problems that add up to a lot more complexity than you anticipate. And it can be difficult to build one Jenkins server that fits all needs/use cases.
The next problem is the version of Jenkins itself, and its plugins. It's sort of like running "one version of Linux": the software that's running on it may have different dependencies, and upgrading it breaks things, so you end up having multiple versions of it to support different teams.
Problems managing Jenkins

The design of Jenkins inherently leads to usability, management, and maintenance problems.
1. Jenkins Core version

The version of Jenkins that you download from their website is called Jenkins Core. It’s a single .jar file that you run with java (after you install java).
There are two kinds of Jenkins Core releases: regular and LTS (Long Term Support). The regular releases are released….. regularly. The LTS releases are farther apart, and they are supported longer than the regular releases.
Each kind of release is compatible with a version of Java, and a version of each Plugin (which are also compatible with a version of Java). So you have to make sure your version of Java, your Jenkins Core, and your Jenkins Plugins, all are compatible with each other.
If you change one thing and it’s incompatible, everything breaks, and you have to try to revert your changes (I hope you were backing up everything before you made that change).
2. Jenkins Plugin versions

Unlike the Jenkins Core versions, there are no LTS releases of Plugins. Once a new version of a Plugin comes out, it may break, as it may be incompatible with your other Plugins, your Jenkins Core, or your Java.
Every upgrade of a Plugin is like playing Russian Roulette. Will it break my whole Jenkins system? Will I need to back out the latest changes, find out what broke, submit a bug report, and wait for it to be fixed? What if the new plugin version was a security patch?
There is no simple way to simply say “Only update plugins with the latest security fixes”. Either upgrade everything to latest, or manually pick through release notes to determine whether an upgrade is safe. And pray [in vain] for no unknown bugs.
3. Jenkins Server Configuration

Jenkins uses a typical ‘manager/agent’ (formally known as ‘master/slave’) topology. One node contains the server configuration, and it tells other nodes what to do.
This server configuration (along with job configuration) is created in the web interface and stored as XML files. And of course, every version change of Jenkins Core or Plugins can result in different configuration, which may or may not be compatible with any other version of Jenkins Core or Plugins.
Jenkins has no notion of version-controlling its configuration, or of storing them in some cloud-based object storage. They’re just XML files written to disk with the current configuration. Anyone with permission in the web interface can edit them at any time, and you’ll have no idea who changed it or when or why, and you won’t be able to revert it.
In recent times, a project called JCasC (Jenkins Configuration as Code) emerged, which promises the ability to load a simple configuration file at run time, rather than manually-edited XML files. This enables Jenkins server administrators to deploy a Jenkins server using automation, rather than manually.
However, not all Jenkins plugins and functionality is supported by JCasC, and using it can be difficult, due to a lack of documentation. In addition, it poses some other complexity challenges. It cannot install plugins, so you must already have all plugins installed that are needed to load the configuration (a potential chicken-and-egg problem). You might also need to run Jenkins first to create the configuration (another chicken-and-egg).
Finally, for secret configuration values, there are plugins that can retrieve secrets at run-time from a service like AWS Secrets Manager. However, you need to configure AWS properly first to provide the secrets, and then configure JCasC to load them properly.
3.1. Authentication

If you’re running Jenkins locally, you only have one admin user account. You probably never change it once you first set it up.
If you’re running Jenkins for a bunch of different people, you may need to manage accounts for them. To comply with company policies (or just make it more scalable) you might instead like users to login with their company Active Directory account, or their GitHub credentials.
To do that, you need to configure the Jenkins Server Configuration for that authentication method. It may not be simple or straightforward, depending on the method you choose and the limitations of that method. You may have to deal with network access issues, self-signed certificates, publicly-accessible servers (for an OAuth service), setting up a service account, and storing the credentials for it somewhere for Jenkins to load.
3.2. Authorization

If you’re just running Jenkins locally, you basically have an admin account, and can do whatever you want.
But if you’re running Jenkins for a team, you probably need to limit what specific people can do. You don’t want users to have admin access, because then they’ll change the Jenkins Server Configuration and break the server for everyone. (They don’t mean to, but Jenkins is surprisingly fragile and complicated)
So you’ll have to figure out how to authorize specific groups of users to perform specific functions. This changes based on the authentication method. You then have to figure out how to configure Jenkins to load this configuration each time it starts up.
4. Jenkins Jobs Configuration

Jenkins Jobs (aka “Freestyle Jobs”) are also primarily XML files created in the web interface. Like Server Configuration, there is no version control.
In recent times, there are new, simpler ways to configure Jenkins Jobs as Code.….But not using JCasC. That would be too simple.
Instead, you need to choose from some one of the following formats to store your Jenkins jobs.
4.1. Jenkinsfile Declarative Pipelines

This DSL (Domain-Specific Language) allows a user to write a “simple” job configuration, and keep it in their Git repository. This allows the job to be version-controlled, and allows the developers to define their build pipelines as they see fit.
However, there are some limitations:


It cannot be loaded automatically when the Jenkins Server starts up. You have to write a JobDSL job to load the job at start-up time.


It cannot easily load other Jenkinsfile files. You can use a ‘shared code library’, but that then requires using Groovy.


The development team needs to learn how to write Jenkinsfiles. Luckily, there are lots of examples on the web. But the more complex it gets, the less examples there are, and using it can slow the team down. The team can work faster if the Jenkinsfile stays very simple and any advanced logic is moved into a programming language the team is familiar with.


The language is quirky.


4.2. JobDSL

This DSL is like a less friendly but more functional version of a Jenkinsfile. However, it has to be loaded into the server configuration. Typically it is used to create seed jobs, or to do things that Jenkinsfile can’t easily. It is also a quirky language.
4.3. Groovy (java source code)

This is a very-slightly-easier version of straight up Java code. Most people don’t know the language and it’s very quirky. Usually teams end up writing entire libraries of Groovy code, just to eventually run a single job that could have been a shell script. This is great for job security, not so much for collaboration on changes by lots of people.
5. Jenkins Backups

When you run a Jenkins Server, it stores all its files in a single directory. Server configuration, Job configuration, build artifacts, Jenkins Core software, Jenkins Plugins software, everything. In. One. Directory.
Build Artifacts (or realistically, the build logs) need to be backed up, so that if they are lost, developers can go back and see which builds succeeded or failed in the past.
All the rest (configuration, software, etc) needs to be backed up in case a new change to one of them breaks Jenkins. You’ll have to restore everything to how it was before Jenkins broke, or you may have to rebuild it all from scratch.
Luckily, since all the software is in there along with the configuration, you can restore the whole backup directory to get a running version of Jenkins. But don’t try to load configuration into the wrong version of Jenkins, or it’ll just blow up again.
You may need to restore a Jenkins backup due to something like an upgraded plugin. If you aren’t version-controlling the configuration & the software versions, you will lose all the changes that happened since the last backup.
6. Jenkins Cluster Scaling

As your number of times you run a Jenkins Job increases, so does the load on the server. Eventually you will need to add capacity in some regard. Often, it’s disk performance that goes first. Other times, the server configuration artificially limits the number of jobs that can run at a time. And sometimes you just need to add extra servers to run jobs on.
The Jenkins Server Configuration needs to be set up to add new build nodes. You also need the networking set up correctly, depending on which of the two different ways you’ve chosen to connect the Manager to the Agent nodes. This requires coordinating your AWS configuration & infrastructure with your Jenkins Server Configuration.
And of course, you need to incorporate backups & version control into this scaling.
7. Jenkins Infrastructure

By default, Jenkins is just a java program. You need to figure out how you will run it.
If you run it on your Desktop, running java -jar jenkins.war will suffice.
You can also run it in a Docker container, with docker run --rm -d -v jenkins_home:/var/jenkins_home -p 8080:80 jenkins/jenkins:lts
But in the Cloud, you’ll need to do a lot more work:


You’ll have to choose between using EC2, ECS, or EKS, and custom-tailor the way you configure & run Jenkins based on the platform you chose.


You’ll need to figure out DNS, load balancing, IAM permissions, and secrets management.


Then you can figure out the Software Versions, Authentication & Authorization, Server Configuration, Job Configuration, Backups, Scaling, etc.


Each time you create a new Cloud-based Jenkins, it’s like you’re starting all over again. Some of the components may be the same, but the whole still requires a ton of coordination to bring it all together.
So, why don’t we just pick one method and stick with it? They all have upsides and downsides.


Kubernetes is the most expensive and complicated method. Just to get to the Jenkins part, first you have to have a working Kubernetes cluster and experienced admins.


ECS takes away some of the EC2 management headaches, but it brings its own management problems when you try to tweak it to be scalable.


EC2 is the simplest, but still requires a lot of set-up work. If you do use ECS or Kubernetes, this essentially adds extra maintenance tasks.