10 days and 10 tips for Microsoft Tunnel Gateway: Day 5

When Tunnel Updates Fail

Updates to the Microsoft Tunnel containers can apparently fail mid-process. I’ve seen this on three different deployments where the automatic update pulled container images with SHA256 hashes that weren’t documented anywhere in Microsoft’s official release notes.

This left the agent container endlessly restarting. Here’s how I identified and fixed it without reinstalling.

Note: I’ve only encountered this on Ubuntu Server with Docker containers. I don’t currently work with Red Hat/Podman, so I can’t speak to that environment.

The problem

Symptoms:
  • mst-health.sh script reports containers “starting” or “unhealthy” and have errors in recent logs
  • Containers restart continuously (crash-loop)
  • Container SHA256 hashes don’t match official Microsoft documentation
  • Recent container update failed

Root Cause: The update pulled container images with SHA256 hashes that don’t appear anywhere in Microsoft’s official documentation. These undocumented images caused the agent container to crash-loop, failing the update. The metadata files ended up pointing to these problematic versions that I couldn’t find documented in any release notes.

Detection

The easiest way to detect this issue is using the mst-health.sh script:

sudo ./mst-health.sh

Look for these indicators:

  1. Container Status Issues:

    Containers:
      mstunnel-server - Up 12 minutes (healthy)
      mstunnel-agent - Up 2 seconds (health: starting)
    [WARN] 2 container(s) running but not all healthy
           Some containers are still starting or unhealthy
  2. Container Image Hashes:

    Container image hashes:
      Agent:  sha256:abbdc...
      Server: sha256:ad57d...
  3. Recent Log Errors:

     [FAIL] 15 error(s) found (showing last 5)
    
    - Apr 23 10:15:30 hostname mstunnel-agent[1234]: Error: Failed to initialize
    - Apr 23 10:15:32 hostname mstunnel-agent[1234]: Connection refused

What Actually Went Wrong

When I was troubleshooting this, the health script showed the containers weren’t healthy and there were errors in the logs, but it didn’t tell me why.

That troubleshooting led me to noticing the tunnel installation was fetching SHA256 hashes I didn’t see in the official tunnel upgrade documentation. I manually checked /etc/mstunnel/version-info.json and /etc/mstunnel/images_configured to confirm:

sudo cat /etc/mstunnel/version-info.json

When I compared those full hashes against Microsoft’s official upgrade documentation, nothing matched.

The containers were running images with sha256:abbdc... and sha256:ad57d... – values that don’t appear anywhere in the official update history.

Examples of official hashes:

  • February 2026 (20251219.1-01):
    • Agent: sha256:2859a8e1466f...
    • Server: sha256:34aee0978f7c...
  • March 2026 (20260330.1):
    • Agent: sha256:163214b6a22e...
    • Server: sha256:dd62ce7e23f6...

Note: After dealing with this issue, I added version and hash display to the health script (v1.3) so it’s easier to spot this problem in the future.

The Fix: Repair Installation

This procedure recovers your tunnel without full uninstall – preserving configuration, certificates and Intune registration.

Step 1: Stop Services

sudo mst-cli agent stop
sudo mst-cli server stop

Step 2: Remove Containers

# Remove containers (not images - Docker will handle that)
sudo docker rm -f mstunnel-agent mstunnel-server

Step 3: Clear Problematic Metadata

# Remove version metadata files
sudo rm -f /etc/mstunnel/version-info.json
sudo rm -f /etc/mstunnel/images_configured

Step 4: Re-run Setup

# Re-run the installation script
sudo ./mstunnel-setup

What happens:

  • Detects existing registration – no re-authentication needed
  • Detects existing certificates – no certificate prompts
  • Pulls fresh container images from Microsoft registry
  • Creates new metadata pointing to current stable release
  • Preserves all configuration (routes, DNS, ports, etc.)

Verification

Run the health script again to verify everything is fixed:

sudo ./mst-health.sh

What to look for:

  • Containers showing “healthy” (not “starting” or “unhealthy”)
  • SHA256 hashes now match official Microsoft documentation
  • No errors in recent logs

Why This Works

The version-info.json file tells Docker which container images to pull by their SHA256 hash. When it references undocumented hashes, Docker keeps trying to use those problematic images, causing the crash-loop.

Clearing the metadata forces a fresh start – the installer pulls the current stable release by version tag instead of the bad SHA256 references. That’s why re-running setup fixes it without losing any configuration.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.