WireGuard connect fails on Ethernet NICs whose driver resets the PHY during NM device_reapply (e.g. igb / Intel I211)
Summary
When connecting via the WireGuard protocol, the kill-switch code path calls NetworkManager's device_reapply_async() on the physical interface to inject a host route for the VPN server. On NICs whose driver performs a PHY reset in response to device_reapply (Intel I211 with the igb driver, in this report), the interface loses carrier (NO-CARRIER, state DOWN) for several seconds. The kernel marks every route via that interface — including the freshly-added VPN-server host route — as linkdown and unusable. The subsequent TCP reachability check in networkmanager.py:start() cannot find a usable route to the server IP and fails immediately with "VPN server NOT reachable". Connection is aborted with "Error: Connection failed. Try connecting to a different server or check your network settings." Every WireGuard connect attempt fails identically.
OpenVPN-TCP works fine on the same system because its kill-switch path (NMKillSwitch) injects exclusion routes into a dummy kill-switch connection profile instead of calling device_reapply on the physical interface.
Environment
- OS: Ubuntu 24.04 LTS
- Kernel: 6.8.0-generic (x86_64)
- NetworkManager: 1.46.0
- libnetplan1: 1.1.2
- NIC: Intel I211 Gigabit (PCI ID
8086:1539), driver igb
- Physical interface: managed by NM, DHCP (referred to below as
<eth>)
ProtonVPN packages (all installed from repo.protonvpn.com/debian stable, latest versions available):
| Package |
Version |
| proton-vpn-cli |
1.0.1 |
| proton-vpn-daemon |
0.13.7 |
| python3-proton-vpn-api-core |
5.2.4 |
| python3-proton-vpn-local-agent |
1.6.3 |
| python3-proton-core |
0.7.4 |
| python3-proton-keyring-linux |
0.2.1 |
Reproduction
- On a host with an Intel I211 NIC managed by the
igb driver (or any NIC whose driver triggers a PHY reset on device_reapply).
- Default protocol = wireguard, default kill switch setting (
off).
protonvpn signin, then protonvpn connect (or protonvpn connect --country US).
Result: Error: Connection failed. Try connecting to a different server or check your network settings. 100% reproducible.
Diagnostic data
Below, <eth> is the physical interface name, <gw> is the LAN gateway, and <server_ip> is the chosen VPN-server IPv4.
CLI verbose log (relevant lines)
T+0.000s | proton.vpn.core.vpnconnector | INFO | CONN.CONNECT:START | Protocol: wireguard
T+0.275s | proton.vpn.core.vpnconnector | INFO | CONN:STATE_CHANGED | Connecting
T+3.480s | proton.vpn.backend.networkmanager.core.networkmanager:80 | INFO | VPN server NOT reachable.
T+3.481s | proton.vpn.connection.states:401 | WARNING | Reached connection error state: Timeout (None)
T+3.482s | proton.vpn.core.vpnconnector | INFO | CONN:STATE_CHANGED | Disconnected
Physical interface link state and routing table, sampled every 200 ms during connect
T+0.000s | LINK: <eth> UP, <BROADCAST,MULTICAST,UP,LOWER_UP>
ROUTES: default via <gw> dev <eth> ... metric 100
T+5.530s | LINK: <eth> UP, <BROADCAST,MULTICAST,UP,LOWER_UP> ← kill switch added, interface still up
ROUTES: default via 100.85.0.1 dev pvpnksintrf0 metric 98
default via <gw> dev <eth> metric 100
100.85.0.0/24 dev pvpnksintrf0 ...
T+5.744s | LINK: <eth> DOWN, <NO-CARRIER,BROADCAST,MULTICAST,UP> ← device_reapply fired; PHY reset; carrier lost
ROUTES: default via 100.85.0.1 dev pvpnksintrf0 metric 98
default via <gw> dev <eth> metric 100 linkdown
<server_ip> via <gw> dev <eth> metric 100 linkdown ← host route added but already linkdown
<LAN>/24 dev <eth> ... linkdown
T+8.711s | LINK: <eth> DOWN, <NO-CARRIER,...> ← TCP check failed, ProtonVPN tore down pvpnksintrf0
ROUTES: default via <gw> dev <eth> metric 100 linkdown
<server_ip> via <gw> dev <eth> metric 100 linkdown
<LAN>/24 dev <eth> ... linkdown
T+11.686s | LINK: <eth> DOWN, <NO-CARRIER,...> ← still down, ~6 seconds after device_reapply
ROUTES: (none)
Approximate carrier-recovery time on this driver: 10–15 seconds.
TCP reachability sanity check (no VPN, server known reachable)
Port 443: reachable
Port 7770: reachable
Port 8443: reachable
Root cause
The call sequence in python3-proton-vpn-api-core 5.2.4:
-
proton/vpn/connection/states.py::Connecting.run_tasks() (line 250)
→ self.context.kill_switch.enable(server, permanent=...)
-
proton/vpn/backend/networkmanager/killswitch/wireguard/wgkillswitch.py::WGKillSwitch.enable() (line 59)
→ self._ks_handler.add_kill_switch_connection(permanent) (adds pvpnksintrf0, metric 98 default route — fine)
→ self._ks_handler.add_vpn_server_route(server_ip=...)
-
proton/vpn/backend/networkmanager/killswitch/wireguard/killswitch_connection_handler.py::add_vpn_server_route() (line 144)
→ for each physical device: self.nm_client.add_route_to_device(device, ...)
→ await self._wait_for_vpn_server_route(server_ip, device.get_iface(), found=True)
-
proton/vpn/backend/networkmanager/killswitch/wireguard/nmclient.py::add_route_to_device() (line 354)
→ modifies the in-memory connection profile (adds host route to NM's IPv4 settings)
→ cls._apply_connection_async(active_connection, ...)
→ device.reapply_async(connection, version_id=0, flags=0, ...) (line 340)
-
On the igb driver, reapply_async triggers a PHY reset. The interface goes NO-CARRIER, state DOWN. Kernel marks all routes via the physical interface — including the just-added host route — as linkdown.
-
killswitch_connection_handler.py::_wait_for_vpn_server_route() (line 207)
→ polls ip route with delays [0.5, 0.5, 1, 1, 2]s, returns as soon as the route string matches.
→ The regex f"{server_ip} via .* dev {interface_name} .*" does not check the linkdown flag. So the function returns "success" within 500 ms while the interface is still NO-CARRIER.
-
proton/vpn/connection/states.py::Connecting.run_tasks() then calls self.context.connection.start().
-
proton/vpn/backend/networkmanager/core/networkmanager.py::start() (line 65)
→ await tcpcheck.is_any_port_reachable(self._vpnserver.server_ip, self._vpnserver.openvpn_ports.tcp)
-
proton/vpn/backend/networkmanager/core/tcpcheck.py::is_port_reachable() opens a plain TCP socket with no SO_BINDTODEVICE. With the host route linkdown and the only viable default route being the dummy pvpnksintrf0, connect_ex() returns EHOSTUNREACH (or ENETUNREACH) immediately — no 5-second timeout. All three concurrent ports fail.
-
is_any_port_reachable() returns False. networkmanager.py logs "VPN server NOT reachable.", fires events.Timeout, state machine goes to Error, CLI prints Error: Connection failed.
Why it works on most systems but fails here
Most users don't hit this because:
- WiFi adapters do not reset the radio on
device_reapply; carrier remains.
- Many Ethernet drivers (
e1000e, r8169, etc.) update L3 config without touching the PHY.
igb on Intel I211 is unusual in triggering a PHY reset for the kind of in-memory route reapply that ProtonVPN performs.
The race is latent in the code for all users; the igb driver just exposes it deterministically.
Why OpenVPN works on the same system
NMKillSwitch (used for openvpn-tcp / openvpn-udp) takes a different approach: it adds exclusion routes (covering 0.0.0.0/0 minus the server IP) into the dummy kill-switch connection profile. It never calls device_reapply on the physical interface, so the link stays up and traffic to the server IP falls through to it via the main routing table. Confirmed working on the same hardware with Protocol: openvpn-tcp.
Suggested fixes (any of these would resolve it)
The TCP-reachability check exists specifically to guard against the kill switch breaking server access ("after introducing the dummy kill switch network interface, the VPN connection backend tries to use it…" — comment in networkmanager.py:67). The fix needs to make that guard actually work when the kill-switch path itself disrupts the physical interface.
-
Make _wait_for_vpn_server_route() wait for the route to become usable, not just present. Reject matches whose line contains linkdown:
match = re.search(server_route, result.stdout)
route_exists = bool(match)
if found and route_exists and "linkdown" not in match.group(0):
return
And/or extend the polling schedule beyond 5 s, since PHY reset recovery on some NICs takes 10–15 s.
-
Bind the TCP check socket to the physical interface (SO_BINDTODEVICE) so the routing table is bypassed during the reachability probe. The kernel still requires the link to be LOWER_UP for packets to leave, so this only helps if the link recovers — but it makes the probe correct in cases where the host route is marked linkdown spuriously, and is the canonical approach for "probe via this specific interface".
-
Don't use device_reapply to inject the host route. Add it directly via ip route add <server>/32 via <gw> dev <iface> (subprocess, like _run_ip_route_command already does), or via netlink. This avoids triggering driver-specific reapply behaviors entirely. Removing the route on disconnect would mirror this.
-
Run the TCP reachability check before the kill switch is enabled, since at that point the routing table is untouched and the check accurately reflects whether the upstream network can reach the server.
Any of these would unblock WireGuard on Intel I211 / igb hosts. (1) is the smallest patch and likely the safest.
Related existing issues
None of those identify device_reapply-induced PHY reset as the root cause, or the missing linkdown check in _wait_for_vpn_server_route as the amplifier.
WireGuard connect fails on Ethernet NICs whose driver resets the PHY during NM
device_reapply(e.g.igb/ Intel I211)Summary
When connecting via the WireGuard protocol, the kill-switch code path calls NetworkManager's
device_reapply_async()on the physical interface to inject a host route for the VPN server. On NICs whose driver performs a PHY reset in response todevice_reapply(Intel I211 with theigbdriver, in this report), the interface loses carrier (NO-CARRIER,state DOWN) for several seconds. The kernel marks every route via that interface — including the freshly-added VPN-server host route — aslinkdownand unusable. The subsequent TCP reachability check innetworkmanager.py:start()cannot find a usable route to the server IP and fails immediately with "VPN server NOT reachable". Connection is aborted with "Error: Connection failed. Try connecting to a different server or check your network settings." Every WireGuard connect attempt fails identically.OpenVPN-TCP works fine on the same system because its kill-switch path (
NMKillSwitch) injects exclusion routes into a dummy kill-switch connection profile instead of callingdevice_reapplyon the physical interface.Environment
8086:1539), driverigb<eth>)ProtonVPN packages (all installed from
repo.protonvpn.com/debian stable, latest versions available):Reproduction
igbdriver (or any NIC whose driver triggers a PHY reset ondevice_reapply).off).protonvpn signin, thenprotonvpn connect(orprotonvpn connect --country US).Result:
Error: Connection failed. Try connecting to a different server or check your network settings.100% reproducible.Diagnostic data
Below,
<eth>is the physical interface name,<gw>is the LAN gateway, and<server_ip>is the chosen VPN-server IPv4.CLI verbose log (relevant lines)
Physical interface link state and routing table, sampled every 200 ms during connect
Approximate carrier-recovery time on this driver: 10–15 seconds.
TCP reachability sanity check (no VPN, server known reachable)
Root cause
The call sequence in
python3-proton-vpn-api-core 5.2.4:proton/vpn/connection/states.py::Connecting.run_tasks()(line 250)→
self.context.kill_switch.enable(server, permanent=...)proton/vpn/backend/networkmanager/killswitch/wireguard/wgkillswitch.py::WGKillSwitch.enable()(line 59)→
self._ks_handler.add_kill_switch_connection(permanent)(adds pvpnksintrf0, metric 98 default route — fine)→
self._ks_handler.add_vpn_server_route(server_ip=...)proton/vpn/backend/networkmanager/killswitch/wireguard/killswitch_connection_handler.py::add_vpn_server_route()(line 144)→ for each physical device:
self.nm_client.add_route_to_device(device, ...)→
await self._wait_for_vpn_server_route(server_ip, device.get_iface(), found=True)proton/vpn/backend/networkmanager/killswitch/wireguard/nmclient.py::add_route_to_device()(line 354)→ modifies the in-memory connection profile (adds host route to NM's IPv4 settings)
→
cls._apply_connection_async(active_connection, ...)→
device.reapply_async(connection, version_id=0, flags=0, ...)(line 340)On the
igbdriver,reapply_asynctriggers a PHY reset. The interface goesNO-CARRIER,state DOWN. Kernel marks all routes via the physical interface — including the just-added host route — aslinkdown.killswitch_connection_handler.py::_wait_for_vpn_server_route()(line 207)→ polls
ip routewith delays[0.5, 0.5, 1, 1, 2]s, returns as soon as the route string matches.→ The regex
f"{server_ip} via .* dev {interface_name} .*"does not check thelinkdownflag. So the function returns "success" within 500 ms while the interface is still NO-CARRIER.proton/vpn/connection/states.py::Connecting.run_tasks()then callsself.context.connection.start().proton/vpn/backend/networkmanager/core/networkmanager.py::start()(line 65)→
await tcpcheck.is_any_port_reachable(self._vpnserver.server_ip, self._vpnserver.openvpn_ports.tcp)proton/vpn/backend/networkmanager/core/tcpcheck.py::is_port_reachable()opens a plain TCP socket with noSO_BINDTODEVICE. With the host routelinkdownand the only viable default route being the dummy pvpnksintrf0,connect_ex()returnsEHOSTUNREACH(orENETUNREACH) immediately — no 5-second timeout. All three concurrent ports fail.is_any_port_reachable()returnsFalse.networkmanager.pylogs"VPN server NOT reachable.", firesevents.Timeout, state machine goes to Error, CLI printsError: Connection failed.Why it works on most systems but fails here
Most users don't hit this because:
device_reapply; carrier remains.e1000e,r8169, etc.) update L3 config without touching the PHY.igbon Intel I211 is unusual in triggering a PHY reset for the kind of in-memory route reapply that ProtonVPN performs.The race is latent in the code for all users; the
igbdriver just exposes it deterministically.Why OpenVPN works on the same system
NMKillSwitch(used foropenvpn-tcp/openvpn-udp) takes a different approach: it adds exclusion routes (covering 0.0.0.0/0 minus the server IP) into the dummy kill-switch connection profile. It never callsdevice_reapplyon the physical interface, so the link stays up and traffic to the server IP falls through to it via the main routing table. Confirmed working on the same hardware withProtocol: openvpn-tcp.Suggested fixes (any of these would resolve it)
The TCP-reachability check exists specifically to guard against the kill switch breaking server access ("after introducing the dummy kill switch network interface, the VPN connection backend tries to use it…" — comment in
networkmanager.py:67). The fix needs to make that guard actually work when the kill-switch path itself disrupts the physical interface.Make
_wait_for_vpn_server_route()wait for the route to become usable, not just present. Reject matches whose line containslinkdown:And/or extend the polling schedule beyond 5 s, since PHY reset recovery on some NICs takes 10–15 s.
Bind the TCP check socket to the physical interface (
SO_BINDTODEVICE) so the routing table is bypassed during the reachability probe. The kernel still requires the link to beLOWER_UPfor packets to leave, so this only helps if the link recovers — but it makes the probe correct in cases where the host route is marked linkdown spuriously, and is the canonical approach for "probe via this specific interface".Don't use
device_reapplyto inject the host route. Add it directly viaip route add <server>/32 via <gw> dev <iface>(subprocess, like_run_ip_route_commandalready does), or via netlink. This avoids triggering driver-specific reapply behaviors entirely. Removing the route on disconnect would mirror this.Run the TCP reachability check before the kill switch is enabled, since at that point the routing table is untouched and the check accurately reflects whether the upstream network can reach the server.
Any of these would unblock WireGuard on Intel I211 /
igbhosts. (1) is the smallest patch and likely the safest.Related existing issues
_wait_for_vpn_server_routefunction, observed as aTimeoutErrorrather than spurious success.TimeoutErrorinkillswitch_connection_handler.pyduring connect, WireGuard + NM.None of those identify
device_reapply-induced PHY reset as the root cause, or the missinglinkdowncheck in_wait_for_vpn_server_routeas the amplifier.