Router Stats To Prometheus

I’ve previously written about my plan to collect much more data about my house. In the current work-from-home environment the quality of our internet connection is paramount, and I wanted to be able to monitor it and potentially be alerted to any degradation before it becomes an issue.

Although I’ve replaced my wifi with a UniFi based system, I still use the router that was supplied by my ISP - which is a ZyXEL VMG1312-B10D. Like most networking equipment the ZyXel supports SNMP which is a technology for reading and writing stats and configuration from equipment, and aggregating them together. On paper it sounds great, but unfortunately SNMP is a nightmare to work with, and you need a mapping file for each device, which doesn’t exist for this model. After looking into creating this mapping, and integrating my preferred technology slack of Grafana and Prometheus, I decided to change tack and extract the data myself.

Fortunately the router UI contains some plain text data which looks easy to scrape. So, filled with confidence that this would be an easier approach that learning SNMP I spun up a GitHub project and got to work cranking out some code.

    VDSL Training Status:   Showtime
                    Mode:   VDSL2 Annex B
            VDSL Profile:   Profile 17a
                G.Vector:   Disable
            Traffic Type:   PTM Mode
             Link Uptime:   1 day: 4 hours: 28 minutes
       VDSL Port Details       Upstream         Downstream
               Line Rate:      7.881 Mbps       39.998 Mbps
    Actual Net Data Rate:      7.853 Mbps       39.999 Mbps
          Trellis Coding:         ON                ON
              SNR Margin:        5.7 dB            7.3 dB
            Actual Delay:          0 ms              0 ms
          Transmit Power:      - 2.6 dBm          11.4 dBm
           Receive Power:      -20.8 dBm         -11.0 dBm
              Actual INP:        0.0 symbols      55.0 symbols
       Total Attenuation:       18.2 dB           22.4 dB
Attainable Net Data Rate:      7.853 Mbps       47.093 Mbps
      VDSL Band Status    U0      U1      U2      U3      D1      D2      D3
  Line Attenuation(dB):  7.2    40.1     N/A     N/A    18.1    50.3    77.6
Signal Attenuation(dB):  7.2    39.9     N/A     N/A    20.0    50.1     N/A
        SNR Margin(dB):  5.5     5.7     N/A     N/A     7.3     7.3     N/A
   Transmit Power(dBm):-13.7   - 3.0     N/A     N/A     8.9     7.8     N/A

The first step to implement this was to investigate how the built in UI requests this data. An early discovery was that when accessing using HTTP responses are encrypted with AES and then decrypted in Javascript. When accessing with HTTPS (which uses a self-signed ZyXEL certificate) responses are in plain text.

Unfortunately, but perhaps not unsurprisingly, at this point I hit what appears to be a bug in the router. The stats work well for a few hours, but then the router stopped responding. Even more strangely after this occurs it was impossible to log in manually to the web UI - the router responded with a username or password is not valid error. Seemly the only solution was to reboot the router. Extending the Prometheus scrape interval extended the time that the stats worked for, but eventually the same error reoccurred.

Given there’s no way to see what was happening inside the router, and therefore it’s unlikely I could work around the bug a different strategy was needed. Digging around the router UI revealed that you could turn on ssh access. Connecting to that gives you a simple shell, that seems to replicate most of the web UI. The commands xdslctl info and ifconfig return information about the speed of the internet connection, and statistics about the number of bytes sent and received.

ZySH> xdslctl info
xdslctl: ADSL driver and PHY status
Status: Showtime
Last Retrain Reason:  1
Last initialization procedure status:  0
Max: Upstream rate = 7987 Kbps, Downstream rate = 46854 Kbps
Bearer: 0, Upstream rate = 7987 Kbps, Downstream rate = 40000 Kbps
Bearer: 1, Upstream rate = 0 Kbps, Downstream rate = 0 Kbps

The excellent Paramiko library makes it easy to connect to an SSH server, and despite a few problems caused by the fact that ZySH is clearly ZyXel’s own implementation of a shell, and it doesn’t quite work how Paramiko expects, it was simple enough to get the data returned by these commands into a string that I could parse. A few regular expressions later and it’s turned into a set of Prometheus metrics. Using the built in Python HTTP server makes it easy to serve them up for Prometheus to scrape.

The last step is to create a Dockerfile, which is only a few lines long. It installs the library from PyPI and then runs the app in daemon mode when started.

Docker expects arguments to be provided using environment variables, but the Python argparse library doesn’t make it as easy as perhaps it could to accept argument either on the command line, or by environment variable. To work around this I needed to write some code to check the environment variable if the equivalent command line option wasn’t set. Once that was in place I could configure the relevant password using the Docker Compose env_file option, which reads environment variables from a file.

    image: andrewjw/zyxelprometheus:0.5.2
    container_name: zyxel
      - 9101:9100
      - ./secrets/zyxel

Using the SSH interface appears to be much more reliable the than the web UI, and has been running for several weeks with no issues at this point. I’ve currently not got any alerts configured (no point in emailing if the connection is down), but next time there are complaints about slow internet at least I’ll have the statistics to investigate.

Internet Statistic in Grafana

At the moment the code is specific to my brand and model of router, so it’s not that widely useful. If your router has an SSH interface, feel free to open a pull request with support. I’d love to support more routers, even non-ZyXel ones.