Writing A DevOps Vision

DevOps as a concept has been around since around 2010, but implementing the ideas behind it, particularly when you’re in a team that is supporting old monolithic codebases is challenging. For several years we had engineers fulfilling the role of a “DevOps Engineer”. However, we always knew that having a specific person working on DevOps is a bit antithetical to the DevOps concept - it’s supposed to be a state of mind and a set of practices rather than a job role.

The aim was always to have that engineer act as a source of expert knowledge and an enabler. Teams were still supposed to own their code, processes and deployments, but in reality, DevOps related work was often thrown over the wall to that engineer with the expectation that it was their problem, and not the team’s problem.

We ended up in a situation where we had to make a choice - hire a new engineer into the same role, or attempt to spread the work across all engineers. We chose the second option, but that then poses the question of how to change team culture across a department, so that DevOps becomes a standard part of the team’s process, much like Kanban, Scrum or any of the other ways the team organises themselves.

Read More...

The Power Of Team Dashboards

Using metrics and dashboards is a well-understood tool when monitoring the health and performance of software, or your profitability or other key business metrics. What is less common is using the same tools and techniques to monitor the health and performance of the team behind the software. I’m not suggesting using dashboards to report on individual developers, but as a tool to help the team focus on improving their own processes, it can be very useful, provide it’s handled carefully.

My own journey started when I was promoted from a team leader to an engineering manager, responsible for five teams. The change in level resulted in a significantly different view, but also great difficulty in knowing where to focus my efforts. When you’re a team leader you are so close to the team that you hear and feel every change in mood, and have intimate knowledge of all projects and their current state. Suddenly being responsible for five teams gives you a great view to take advantage of areas of collaboration between teams, and removes you from the noise of day to day life so you can focus on the biggest issues. However, it also removes you from the firehose of raw information so it can be hard to know where you should spend your time to get the best return on your energy.

Read More...

Replacing Travis CI With BuildBot

Back when I reactivated this blog I posted about using Travis CI to automate the build process. Sadly at the end of last year Travis announced they were ending free builds for all public repositories, and only authorised open source projects will now get free build credits.

The repository for this blog is publically accessible, partly in case anyone wants to see my draft posts, or raise a merge request to fix a typo, but mostly because why not? That previously allowed me to not worry about the cost of building the site, but it’s not unreasonable for a private company who need to make a profit to want to focus their generosity on actual open-source projects. I certainly don’t blame them for the policy change, although I hope the approval process for open source projects is easy and widely applied, so it’s not just a few big projects that can take advantage of it.

Read More...

Scheduled SMART Checks

For years hard disks (both spinning rust and SSDs) have had a built in monitoring system that tracks various metrics about the health of your disk, called SMART. In the old days if you were lucky you might get some warning that your disk was about to fail because it would start to make a nasty noise. In the modern era of SSDs you likely won’t get any warning, and suddenly boom, your laptop won’t boot or mount the disk.

Obviously nothing is perfect, and any monitoring can miss a failure, but the potential of some warning is better than definitely not getting any. Also this is no subtitute for a proper backup and recovery strategy, but in most home situations people don’t have spare laptops or hard drives just sitting around.

It would be relatively easy for operating system vendors to automatically detect SMART capabable drives and automatically run a check every so often. If it fails, they could pop up a warning about a potential imminent failure. As far as I know though, no-one does this.

Read More...

Working In A Crisis Is The Easy Part

Recently one of my team asked me how they were doing, and to be fair I think they’ve been doing great. The challenge though is that they started as team leader right about the time that COVID-19 hit. Working for an online supermarket, around March 2020 was definitely challenging, but in some ways it actually made things easier. When there is a fire to put out, it’s easy to know what to do next - you put the fire out! And after that? You put the next fire out!

Certainly, there are challenges for being a team leader during a crisis. Perhaps people in your team are stressed or losing their cool. You might have stakeholders shouting at you to fix things quicker. Maybe you have to take shortcuts that you know lead to technical debt, but that will get things fixed quicker.

Calming people down, and protecting your team are important skills for a team leader. Knowing when to make pragmatic technical decisions is something all developers should know. And recognising when your stakeholders are crying wolf and you can push back on fixing their fire is important. For us, back in March, it was pretty clear that this was a real crisis and we needed to do all we can to help.

Read More...