2 min read

Seeing the Signals: My Homelab Gets an Observability Upgrade

Seeing the Signals: My Homelab Gets an Observability Upgrade

This week I rolled up my sleeves and finally got around to something I’ve been meaning to do for a while — and something I’m genuinely passionate about: setting up a proper monitoring stack in my homelab.

I wanted something lightweight, reliable, and easy to manage across both dev and prod environments. After a few nights of tinkering, it’s all up and running.

What I Built

The stack includes:

  • Grafana for visualization and alerting
  • Prometheus as the primary metrics scraper
  • VictoriaMetrics as the backend storage for Prometheus, stored on my NAS
  • Blackbox Exporter to monitor endpoint availability
  • Prometheus linux_exporter installed on every server (both VMs and Raspberry Pis)

All services are running in Docker containers, and I’m using docker-compose for orchestration. It’s a simple setup, but more than enough for my current needs.

Automating with GitHub Actions

One of the goals for this stack was hands-off deployments. I used GitHub Actions to automate provisioning across dev and prod, with two separate workflows:

.github/workflows/
├── deploy-dev.yaml
└── deploy-prod.yaml

These workflows handle everything from container updates to provisioning configuration. The prod deployment includes a manual approval step, so I can test things in dev first before pushing changes live.

name: Deploy Dev Environment

on:
  push:
    branches:
      - main

env:
  SSH_USER: ${{ vars.SSH_USER }}
  PI_DEV002: ${{ vars.PI_DEV002 }}

jobs:
  deploy-dev:
    runs-on: [self-hosted, runner-dev]

    steps:
      - uses: actions/checkout@v4

      - name: Setup SSH key
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.SSH_PRIVATE_AUTOMATION_KEY_DEV }}" > ~/.ssh/deploy_key
          chmod 600 ~/.ssh/deploy_key
          ssh-keyscan -H ${{ env.PI_DEV002 }} >> ~/.ssh/known_hosts

      - name: Deploy via SSH
        run: |
          ssh -i ~/.ssh/deploy_key ${{ env.SSH_USER }}@${{ env.PI_DEV002 }} << 'EOF'
            mkdir -p ~/homelab-grafana/provisioning/datasources ~/homelab-grafana/provisioning/dashboards
          EOF
          scp -i ~/.ssh/deploy_key ./docker-compose.yaml ${{ env.SSH_USER }}@${{ env.PI_DEV002 }}:~/homelab-grafana/
          scp -i ~/.ssh/deploy_key ./provisioning-dev/datasources/datasources.yaml ${{ env.SSH_USER }}@${{ env.PI_DEV002 }}:~/homelab-grafana/provisioning/datasources/datasources.yaml
          scp -r -i ~/.ssh/deploy_key ./provisioning-dev/dashboards/ ${{ env.SSH_USER }}@${{ env.PI_DEV002 }}:~/homelab-grafana/provisioning
          ssh -i ~/.ssh/deploy_key ${{ env.SSH_USER }}@${{ env.PI_DEV002 }} << 'EOF'
            cd ~/homelab-grafana
            export LOGSTASH_HOST=${{ vars.LOGSTASH_HOST_DEV }}
            sudo -E docker compose up -d
          EOF

      - name: Cleanup SSH key
        run: rm -f ~/.ssh/deploy_key

Provisioned Grafana Resources

I also spent some time dialing in Grafana provisioning. Rather than clicking around in the UI to add data sources or rebuild dashboards, everything is defined in code. Each environment has its own provisioning directory:

├───provisioning-dev
│   ├───dashboards
│   │       blackbox.json
│   │       dashboard1.json
│   │       dashboards.yaml
│   │
│   └───datasources
│           datasources.yaml
│
└───provisioning-prod
    ├───dashboards
    │       blackbox.json
    │       dashboard1.json
    │       dashboards.yaml
    │
    └───datasources
            datasources.yaml

That means when Grafana starts, it picks up pre-defined dashboards, data sources, and alerting contacts automatically. It makes rebuilding or duplicating the environment incredibly straightforward.

Wrapping Up

This monitoring stack gives me real-time visibility into my entire homelab — CPU and memory stats from each node, Blackbox probes for uptime checks, and a clear path to alerting if something goes sideways. And thanks to automation, it’s all manageable with a couple of YAML files and a push to GitHub.

Next up, I'll look into expanding my provisioned resources and adding graylog as the logging driver to these services. I'm also in the process of adding certificates to my sites by leveraging one of the domains I own, and NginxProxyManager.

Until then, the graphs are live — and they’re looking great. Cheers! 👋