Router Stats To Prometheus
I’ve previously written about my plan to collect much more data about my house. In the current work-from-home environment the quality of our internet connection is paramount, and I wanted to be able to monitor it and potentially be alerted to any degradation before it becomes an issue.
Although I’ve replaced my wifi with a UniFi based system, I still use the router that was supplied by my ISP - which is a ZyXEL VMG1312-B10D. Like most networking equipment the ZyXel supports SNMP which is a technology for reading and writing stats and configuration from equipment, and aggregating them together. On paper it sounds great, but unfortunately SNMP is a nightmare to work with, and you need a mapping file for each device, which doesn’t exist for this model. After looking into creating this mapping, and integrating my preferred technology slack of Grafana and Prometheus, I decided to change tack and extract the data myself.
Fortunately the router UI contains some plain text data which looks easy to scrape. So, filled with confidence that this would be an easier approach that learning SNMP I spun up a GitHub project and got to work cranking out some code.
============================================================================ VDSL Training Status: Showtime Mode: VDSL2 Annex B VDSL Profile: Profile 17a G.Vector: Disable Traffic Type: PTM Mode Link Uptime: 1 day: 4 hours: 28 minutes ============================================================================ VDSL Port Details Upstream Downstream Line Rate: 7.881 Mbps 39.998 Mbps Actual Net Data Rate: 7.853 Mbps 39.999 Mbps Trellis Coding: ON ON SNR Margin: 5.7 dB 7.3 dB Actual Delay: 0 ms 0 ms Transmit Power: - 2.6 dBm 11.4 dBm Receive Power: -20.8 dBm -11.0 dBm Actual INP: 0.0 symbols 55.0 symbols Total Attenuation: 18.2 dB 22.4 dB Attainable Net Data Rate: 7.853 Mbps 47.093 Mbps ============================================================================ VDSL Band Status U0 U1 U2 U3 D1 D2 D3 Line Attenuation(dB): 7.2 40.1 N/A N/A 18.1 50.3 77.6 Signal Attenuation(dB): 7.2 39.9 N/A N/A 20.0 50.1 N/A SNR Margin(dB): 5.5 5.7 N/A N/A 7.3 7.3 N/A Transmit Power(dBm):-13.7 - 3.0 N/A N/A 8.9 7.8 N/A ============================================================================
Unfortunately, but perhaps not unsurprisingly, at this point I hit what appears to be a bug in the router. The stats work well for a few hours, but then the router stopped responding. Even more strangely after this occurs it was impossible to log in manually to the web UI - the router responded with a username or password is not valid error. Seemly the only solution was to reboot the router. Extending the Prometheus scrape interval extended the time that the stats worked for, but eventually the same error reoccurred.
Given there’s no way to see what was happening inside the router, and therefore it’s unlikely I could work
around the bug a different strategy was needed. Digging around the router UI revealed that you could turn on
ssh access. Connecting to that gives you a simple shell, that seems to replicate most of the web UI. The commands
xdslctl info and
ifconfig return information about the speed of the internet connection, and statistics about
the number of bytes sent and received.
ZySH> xdslctl info xdslctl: ADSL driver and PHY status Status: Showtime Last Retrain Reason: 1 Last initialization procedure status: 0 Max: Upstream rate = 7987 Kbps, Downstream rate = 46854 Kbps Bearer: 0, Upstream rate = 7987 Kbps, Downstream rate = 40000 Kbps Bearer: 1, Upstream rate = 0 Kbps, Downstream rate = 0 Kbps
The excellent Paramiko library makes it easy to connect to an
SSH server, and despite
a few problems caused by the fact that
ZySH is clearly ZyXel’s own implementation of a shell, and it doesn’t quite
work how Paramiko expects, it was simple enough to get the data returned by these commands into a string that I could
parse. A few regular expressions
later and it’s turned into
a set of Prometheus metrics. Using the built in Python HTTP server makes it
easy to serve them up for
Prometheus to scrape.
Docker expects arguments to be provided using environment variables, but the Python
argparse library doesn’t make
it as easy as perhaps it could to accept argument either on the command line, or by environment variable. To work
around this I needed to write
some code to check the
environment variable if the equivalent command line option wasn’t set. Once that was in place I could configure
the relevant password using the Docker Compose
env_file option, which reads environment variables from a file.
zyxel: image: andrewjw/zyxelprometheus:0.5.2 container_name: zyxel ports: - 9101:9100 env_file: - ./secrets/zyxel
Using the SSH interface appears to be much more reliable the than the web UI, and has been running for several weeks with no issues at this point. I’ve currently not got any alerts configured (no point in emailing if the connection is down), but next time there are complaints about slow internet at least I’ll have the statistics to investigate.
At the moment the code is specific to my brand and model of router, so it’s not that widely useful. If your router has an SSH interface, feel free to open a pull request with support. I’d love to support more routers, even non-ZyXel ones.