The 7 (and a half) tools to assist you on your DevOps journey

Of course that having almost no automation, monitoring only 2-3 crucial http calls, having no real way to save yourself quickly if something goes terribly wrong, and releasing every 3 weeks while yelling “Leeroy Jeeenkins” (to name few of my top favorites) has to stop some time.

DevOps journey of any company or a team starts whenever you deployed anything anywhere. Of course emergence of any need to understand the term, not to mention really adopt some random principles comes at a later time.

You do, of course, have choice of when it happens, and I will not caution you that there will be a bloody price to pay if you do it too late, but I will offer you some tools to take with you when you decide to embark on this journey.

Two important things before getting further into reading:

  1. These are engineering tools,this list is a list of concepts which you can use to develop tools, or choose tools for certain tasks in your team. Other side of the whole thing is organizational and that aspect is not the subject of this article.
  2. DevOps journey never ends, it is an infinite loop of continuous improvement and software delivery until our software’s end of life takes us somewhere else.

1 - the ChatOps

The ChatOps is the meta-tool - the automation to glue together the things needed for you to operate your CICD easier. It gets you to the next level of DevOps for your team.

ChatOps is about the chat and about operations (obviously), it is also known as conversation-driven development. It is something usually built by the teams themselves because it is highly dependent on the tooling you use.

Here is quickly a concept of how it works:

  • For your team communication tool of choice, for example, Slack or Ms Teams, you build an integration to your CICD task runner/build system, for example, Jenkins (not Leeroy in this case).
  • Integration assumes the special channel to which you will push notification cards for any successful or failed builds.
  • Fail build notification cards got the URL to stack trace of the build or some other page that shows why it failed
  • For successful builds on branches card would have an URL to visit its dev environment
  • For successful build on the main branch, it would have:
    • An URL to its dev env
    • Button to trigger promotion to staging
  • Successful promotion to staging would have a button to trigger the automatic part of testing, and would potentially trigger other means of communication to set things in motion so we can quickly prepare to release.
  • Testing ending in positive result would give you the button to push and go to prod

Voila - the conversation-driven… automation for something… development… CICD :).

It is great because you can control your CICD from the place in which you are having day-to-day work related conversations, which greatly reduces the context switching and helps teams keep the focus on the task at hand.

I initiated building and polishing ChatOps for almost every project I have worked on for years now, and I can highly recommend it as a focus and scalability tool for your team.

2 - Feature flags

Another meta-concept that meant a whole DevOps world to me, is the concept of feature flags.

Feature flags are configuration-controlled logical encapsulations of features, which allow us to decouple code deployment/release and feature release.

Imagine developing a feature for your application and code is done and tested so you have to hold off with merging it to the main branch because the production content or database or something else is not ready, or because the business wants to release it on a certain day. This is a costly wait because in time for the release you got to go through a new round of source code alignment, a new round of testing etc.

What the feature flag concept brings to the table is the idea that, if you have a situation similar to above, you can actually put the logical (if) block around the feature enabling code, and control the entry logic with a flag in the config (naively simplifying it for the sake of an article length).

Having this, you can test before the release, disable the flag in the production configuration, and deploy to prod. You still have everything in the code, it is tested and ready so you can enable it when external conditions are met.

You can control the config in whatever way you generally do it - some teams need a code release to make this type of config change (it is a lot better to release just a config change than the full code change), some teams use databases or third-party config control systems… whichever you use, the concept is the same.

Dimensions of feature flags - feature flag does not have to be just true or false, it can also depend on the environment or other factors. For example, basic implementation of feature flags would need an env dimension, so you can say “it is enabled in dev, but not in staging and prod”, in multiple market e-commerce, it would probably be useful to have configuration like “enable in dev for all markets, in staging for given set of markets, and in prod for another set of markets”. Unfortunately, every application is different and the environment dimension is the only rule of thumb I can offer.

3 - Circuit breaker

Circuit breaker is a simplest concept but hardest to implement well. It is what we call a self-healing mechanism, and represents the pinnacle of reliability engineering.

Assuming you have a feature that is an occasional source of instability, or integration that is so important for your flow (value stream), that you want to have a backup for it.

Circuit breaker would be the concept that would use system metrics to evaluate state of the feature/integration and react when it becomes problematic - by shutting the thing down or activating the backup integration.

Example: microservice in which one of the integrations used to enrich the data, occasionally starts performing super slow. In order to maintain the performance at proper level, we want it shut down and have a bit of data richness degradation whenever this happens. Our circuit breaker system would trigger the operation of turning off the integration and degrading the data when the response time of the service passes a certain threshold for X amount of time.

4 - Canary/shadow-prod releases

Kubernetes gives this ability out of the box - you would be using certain Kubernetes annotations to manage two separate deployments under the same host and manage the traffic to one or the other.

There are two usual uses of this tool:

  • having portion of your real consumers traffic going into your new release and rest still going to your current/old release, until you confirm all is working correctly and then just shifting 100% of traffic to a news release.
  • Having the code released to production as a secondary (shadow) production deployment and testing it there internally.

Of course, having the old version already deployed and having the ability to send the traffic there in case of emergency is a possible option for some failure scenarios. Which gets me to the next tool.

5 - Rollback

Disclaimer- my recommendation is to always follow the roll-forward strategy, which means that you are always fixing the problem by deploying a new change. Of course, that “always” means as much as possible.

In cases that it will take time to fix a big production bug, or your SRE is only one available and there are no feature flags to switch off, it is always handy to have a last resort panic button. Goes nicely with the “do not panic” sign.

Rollback is usually another CICD job deploying (current version of the container - 1) to production, effectively neutralizing complete last release to production.

6 - The seeds

DevOps is equal part smooth development-prod-operations cycle, and good developer experience. Technical enablement is a huge thing in the DevOps savvy environment.

Nothing speaks development experience better then smooth development project bootstrap, and good step in this direction are what we call “seeds”.

Concept i simple, if you want to drive something like:

  • Microservice adoption
  • Cloud adoption
  • Kubernetes adoption
  • All of the above or you just want to boost DevOps culture in your engineering

One of the things you can do is develop seeds for the projects so engineering teams can just jump almost straight into writing business logic.

Types of seeds you can have:

  • Microservice code seeds - bootstrap repo containing initial microservice project setup including standard libraries import, initiation of the app, standard logging and code instrumentation, tests, container file etc. For any programming languages you use in your ecosystem.
  • CICD seed - build and deployment setup with standardized containerization and all of the setups for uploading containers, cluster deployments, infrastructure as a code, quality gates, different env setups etc. You use these together with code seed.
  • Cloud bootstrap - if you are in a business where you often prep a new cloud account in the standard way (certain compliances, monitoring, budgets, etc.) initial seed can be infrastructure as a code setup for all of these.

This is not a not a final list, you can have whatever you want actually, these are just most usual cases I encountered.

Seeds are often:

  • Repos you can clone/fork
  • Scaffolding scripts pulling from different repos and assembling new ones based on your project needs

In practice, these are amazing to drive any standardization and transformation in engineering space, if done right of course.

6.5 - Libraries

Libraries are about standards - the standard way you do http calls, the standard way to log stuff, instrument the code in a standard way, with proper logging, secure handling of headers, parameters sanitization… blah blah.

These are integrated in seeds, but are important to mention - if you develop libraries that are giving your app, and new apps that decide to integrate it a fully standardized way of working on certain topics, you are a star of your team’s DevOps journey.

7 - DevOps checklist

Over time, as you build your toolset and set up the process, the list of essential things that are defining the progress along the DevOps path will crystalize.

This list should be written down and refined as a checklist to start any new project.

As a next step, I would propose attacking measurements to each point in the checklist and annually (or more often) running it back as a sort of maturity assessment.

As an example of a quite good, and mature tool of this kind (and completely open source tool you can use), I will leave here the adidas devops maturity framework.

Yes, I did say this is the list of tech tools and I smuggled a governance tool in. If it is worth anything - it is an amazing tool.

Conclusion - the affordable governance

Smoothness in creating DevOps culture and standardization around it is all about achieving a certain level of self-service, or rather making it easier to follow the rules and standards then doing anything else.

Tools should break the communication and boring work boundaries and standards should be guidance into the better state. It should be a no-brainer that standards are followed - governance should not be a burden on the way of achieving the goal, especially when driving DevOps adoption and lifecycle it is quite easy to make it affordable.