The Power Of Team Dashboards

Using metrics and dashboards is a well-understood tool when monitoring the health and performance of software, or your profitability or other key business metrics. What is less common is using the same tools and techniques to monitor the health and performance of the team behind the software. I’m not suggesting using dashboards to report on individual developers, but as a tool to help the team focus on improving their own processes, it can be very useful, provide it’s handled carefully.

My own journey started when I was promoted from a team leader to an engineering manager, responsible for five teams. The change in level resulted in a significantly different view, but also great difficulty in knowing where to focus my efforts. When you’re a team leader you are so close to the team that you hear and feel every change in mood, and have intimate knowledge of all projects and their current state. Suddenly being responsible for five teams gives you a great view to take advantage of areas of collaboration between teams, and removes you from the noise of day to day life so you can focus on the biggest issues. However, it also removes you from the firehose of raw information so it can be hard to know where you should spend your time to get the best return on your energy.


Replacing Travis CI With BuildBot

Back when I reactivated this blog I posted about using Travis CI to automate the build process. Sadly at the end of last year Travis announced they were ending free builds for all public repositories, and only authorised open source projects will now get free build credits.

The repository for this blog is publically accessible, partly in case anyone wants to see my draft posts, or raise a merge request to fix a typo, but mostly because why not? That previously allowed me to not worry about the cost of building the site, but it’s not unreasonable for a private company who need to make a profit to want to focus their generosity on actual open-source projects. I certainly don’t blame them for the policy change, although I hope the approval process for open source projects is easy and widely applied, so it’s not just a few big projects that can take advantage of it.


Scheduled SMART Checks

For years hard disks (both spinning rust and SSDs) have had a built in monitoring system that tracks various metrics about the health of your disk, called SMART. In the old days if you were lucky you might get some warning that your disk was about to fail because it would start to make a nasty noise. In the modern era of SSDs you likely won’t get any warning, and suddenly boom, your laptop won’t boot or mount the disk.

Obviously nothing is perfect, and any monitoring can miss a failure, but the potential of some warning is better than definitely not getting any. Also this is no subtitute for a proper backup and recovery strategy, but in most home situations people don’t have spare laptops or hard drives just sitting around.

It would be relatively easy for operating system vendors to automatically detect SMART capabable drives and automatically run a check every so often. If it fails, they could pop up a warning about a potential imminent failure. As far as I know though, no-one does this.


Working In A Crisis Is The Easy Part

Recently one of my team asked me how they were doing, and to be fair I think they’ve been doing great. The challenge though is that they started as team leader right about the time that COVID-19 hit. Working for an online supermarket, around March 2020 was definitely challenging, but in some ways it actually made things easier. When there is a fire to put out, it’s easy to know what to do next - you put the fire out! And after that? You put the next fire out!

Certainly, there are challenges for being a team leader during a crisis. Perhaps people in your team are stressed or losing their cool. You might have stakeholders shouting at you to fix things quicker. Maybe you have to take shortcuts that you know lead to technical debt, but that will get things fixed quicker.

Calming people down, and protecting your team are important skills for a team leader. Knowing when to make pragmatic technical decisions is something all developers should know. And recognising when your stakeholders are crying wolf and you can push back on fixing their fire is important. For us, back in March, it was pretty clear that this was a real crisis and we needed to do all we can to help.


Meter Readings Over MQTT

In a previous post I talked about swapping the in-home device (IHD) supplied by my electricity and gas company for one produced by Glow. This connects over wifi and gives access to the raw data coming from your smart meters over MQTT. To collect this data and integrate it into my house monitor I needed to listen to the MQTT topic and expose the data to Prometheus.

This was quite similar to my previous project to expose statistics from my router, but as this was processing JSON data being exposed over MQTT it was a lot simpler than parsing raw text over SSH. As before I created a GitHub repo with my usual Python linting, testing and build scripts set up.

Connecting to MQTT is straightforward, using the Paho MQTT library. When they’ve set up your MQTT account Glow’s support team will supply you with a topic name that you can use, along with the username and password you set, to connect. The code below shows how to do this. We’ll cover the on_message callback next.