Supplicant Recovery on Roam Failures

Matt Causey matt.causey at gmail.com
Thu Dec 27 16:18:26 EST 2012


Hello!

I have a question that hopefully you can help me with.  We run
wpa_supplicant 1.x and Linux 2.6.39 in a somewhat dense Cisco-based WLAN.
There are lots of clients and they spend a lot of time roaming between
access points.  Intermittently, the client gets into a state where
wpa_supplicant has received an RSSI_LOW event, and has selected a BSSID
that has better coverage.  The client gets to
ieee80211_send_probe_req<http://lxr.free-electrons.com/ident?i=ieee80211_send_probe_req>()
[1], and the BSSID that it has selected fails to respond to broadcast
probes.:

Dec 11 01:54:36 wpa_supplicant[477]: wlan0: Event SIGNAL_CHANGE (28)
received
Dec 11 01:54:36 wpa_supplicant[477]: nl80211: Event message available
Dec 11 01:54:36 wpa_supplicant[477]: nl80211: Connection quality monitor
event: RSSI low
Dec 11 01:54:36 wpa_supplicant[477]: nl80211: Signal: -79 dBm  txrate: 48000
Dec 11 01:54:36 wpa_supplicant[477]: nl80211: Noise: -101 dBm
Dec 11 01:54:36 wpa_supplicant[477]: wlan0: Event SIGNAL_CHANGE (28)
received
Dec 11 01:54:36 kernel: wlan0: direct probe to c (try 2/3)
Dec 11 01:54:37 kernel: wlan0: direct probe to dead:beef (try 3/3)
Dec 11 01:54:37 wpa_supplicant[477]: nl80211: Event message available
Dec 11 01:54:37 kernel: wlan0: direct probe to dead:beef timed out
Dec 11 01:54:37 wpa_supplicant[477]: nl80211: MLME event 37; timeout with
dead:beef

Dec 11 01:54:37 wpa_supplicant[477]: wlan0: Event AUTH_TIMED_OUT (14)
received
Dec 11 01:54:37 wpa_supplicant[477]: wlan0: SME: Authentication timed out
Dec 11 01:54:37 wpa_supplicant[477]: wlan0: Setting scan request: 5 sec 0
usec
Dec 11 01:54:37 wpa_supplicant[477]: wlan0: State: AUTHENTICATING ->
DISCONNECTED at 1355190877.307

Dec 11 01:54:37 wpa_supplicant[477]: wpa_driver_nl80211_set_operstate:
operstate 1->0 (DORMANT)
Dec 11 01:54:37 wpa_supplicant[477]: netlink: Operstate: linkmode=-1,
operstate=5
Dec 11 01:54:37 wpa_supplicant[477]: EAPOL: External notification -
portEnabled=0

Dec 11 01:54:37 wpa_supplicant[477]: EAPOL: SUPP_PAE entering state
DISCONNECTED
Dec 11 01:54:37 wpa_supplicant[477]: EAPOL: Supplicant port status:
Unauthorized
Dec 11 01:54:37 wpa_supplicant[477]: EAPOL: SUPP_BE entering state
INITIALIZE
Dec 11 01:54:37 wpa_supplicant[477]: EAPOL: Supplicant port status:
Unauthorized
Dec 11 01:54:37 wpa_supplicant[477]: EAPOL: External notification -
portValid=0
Dec 11 01:54:37 wpa_supplicant[477]: EAPOL: Supplicant port status:
Unauthorized
Dec 11 01:54:37 wpa_supplicant[477]: EAPOL: External notification - EAP
success=0
Dec 11 01:54:37 wpa_supplicant[477]: EAPOL: Supplicant port status:
Unauthorized
Dec 11 01:54:37 wpa_supplicant[477]: RTM_NEWLINK: operstate=0
ifi_flags=0x11023 ([UP][LOWER_UP])
Dec 11 01:54:37 wpa_supplicant[477]: RTM_NEWLINK, IFLA_IFNAME: Interface
'wlan0' added
Dec 11 01:54:37 wpa_supplicant[477]: nl80211: if_removed already cleared -
ignore event
Dec 11 01:54:38 wpa_supplicant[477]: CTRL_IFACE: GET_NETWORK id=0
name='ssid'
Dec 11 01:54:41 wpa_supplicant[477]: CTRL_IFACE: GET_NETWORK id=0
name='ssid'
Dec 11 01:54:42 wpa_supplicant[477]: wlan0: State: DISCONNECTED -> SCANNING
at 1355190882.309
Dec 11 01:54:42 wpa_supplicant[477]: ap_scan 1 Conf ssid redacted
prev_scan_ssid=WILDCARD_SSID_SCAN
Dec 11 01:54:42 wpa_supplicant[477]: wlan0: Starting AP scan for wildcard
SSID
Dec 11 01:54:42 wpa_supplicant[477]: nl80211: Scan frequency 5180 MHz
Dec 11 01:54:42 wpa_supplicant[477]: nl80211: Scan frequency 5200 MHz
Dec 11 01:54:42 wpa_supplicant[477]: nl80211: Scan frequency 5220 MHz
Dec 11 01:54:42 wpa_supplicant[477]: nl80211: Scan frequency 5240 MHz
Dec 11 01:54:42 wpa_supplicant[477]: nl80211: Scan frequency 5745 MHz
Dec 11 01:54:42 wpa_supplicant[477]: nl80211: Scan frequency 5765 MHz

Dec 11 01:54:42 wpa_supplicant[477]: nl80211: Scan frequency 5785 MHz
Dec 11 01:54:42 wpa_supplicant[477]: nl80211: Scan frequency 5805 MHz
Dec 11 01:54:42 wpa_supplicant[477]: nl80211: Scan frequency 5825 MHz
Dec 11 01:54:42 kernel: nl80211_trigger_scan scan
Dec 11 01:54:42 kernel: starting software scan
Dec 11 01:54:42 wpa_supplicant[477]: Scan requested (ret=0) - scan timeout
30 seconds
Dec 11 01:54:42 wpa_supplicant[477]: nl80211: Event message available
Dec 11 01:54:42 wpa_supplicant[477]: nl80211: Scan trigger
Dec 11 01:54:43 wpa_supplicant[477]: CTRL_IFACE: GET_NETWORK id=0
name='ssid'
Dec 11 01:54:44 wpa_supplicant[477]: nl80211: Event message available
Dec 11 01:54:44 wpa_supplicant[477]: nl80211: New scan results available
Dec 11 01:54:44 wpa_supplicant[477]: wlan0: Event SCAN_RESULTS (3) received
Dec 11 01:54:44 wpa_supplicant[477]: nl80211: Associated on 5180 MHz
Dec 11 01:54:44 wpa_supplicant[477]: nl80211: Associated with dead:beef
Dec 11 01:54:44 wpa_supplicant[477]: nl80211: Received scan results (161
BSSes)
Dec 11 01:54:44 wpa_supplicant[477]: nl80211: Scan results indicate BSS
status with dead:beef as associated
Dec 11 01:54:44 wpa_supplicant[477]: nl80211: Local state (not associated)
does not match with BSS state
Dec 11 01:54:44 wpa_supplicant[477]: nl80211: Clear possible state mismatch
(dead:beef)
Dec 11 01:54:44 kernel: wlan0: deauthenticating from dead:beef by local
choice (reason=2)
Dec 11 01:54:44 kernel: cfg80211: Calling CRDA for country: US
Dec 11 01:54:44 wpa_supplicant[477]: Sorted scan results
Dec 11 01:54:44 wpa_supplicant[477]: dead:beef freq=5240 qual=0 noise=0
level=-68 flags=0xb

<processing all 161 BSSes . snipped>

Dec 11 01:54:45 wpa_supplicant[477]: Add randomness: count=2223 entropy=18
Dec 11 01:54:45 wpa_supplicant[477]: random pool - hexdump(len=128):
[REMOVED]
Dec 11 01:54:45 wpa_supplicant[477]: random_mix_pool - hexdump(len=8):
[REMOVED]
Dec 11 01:54:45 wpa_supplicant[477]: random_mix_pool - hexdump(len=5):
[REMOVED]
Dec 11 01:54:45 wpa_supplicant[477]: random pool - hexdump(len=128):
[REMOVED]
Dec 11 01:54:45 wpa_supplicant[477]: Add randomness: count=2224 entropy=19
Dec 11 01:54:45 wpa_supplicant[477]: random pool - hexdump(len=128):
[REMOVED]
Dec 11 01:54:45 wpa_supplicant[477]: random_mix_pool - hexdump(len=8):
[REMOVED]
Dec 11 01:54:45 wpa_supplicant[477]: random_mix_pool - hexdump(len=5):
[REMOVED]
Dec 11 01:54:45 wpa_supplicant[477]: random pool - hexdump(len=128):
[REMOVED]
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: New scan results available
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: Selecting BSS from priority
group 5
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: 0: dead:beef ssid='redacted'
wpa_ie_len=0 rsn_ie_len=20 caps=0x111 level=-68
Dec 11 01:54:45 wpa_supplicant[477]: wlan0:    skip - SSID mismatch
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: 1: dead:beef ssid='redacted'
wpa_ie_len=0 rsn_ie_len=20 caps=0x111 level=-69
Dec 11 01:54:45 wpa_supplicant[477]: wlan0:    skip - SSID mismatch
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: 2: dead:beef ssid='redacted'
wpa_ie_len=0 rsn_ie_len=20 caps=0x111 level=-69
Dec 11 01:54:45 wpa_supplicant[477]: wlan0:    selected based on RSN IE
Dec 11 01:54:45 wpa_supplicant[477]: wlan0:    selected BSS dead:beef
ssid='redacted'
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: Request association:
reassociate: 0  selected: dead:beef  bssid: 00:00:00:00:00:00  pending:
00:00:00:00:00:00  wpa_state: SCANNING
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: Automatic auth_alg selection:
0x1
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: RSN: using IEEE 802.11i/D9.0
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: WPA: Selected cipher suites:
group 16 pairwise 16 key_mgmt 2 proto 2
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: WPA: clearing AP WPA IE
Dec 11 01:54:45 wpa_supplicant[477]: WPA: set AP RSN IE - hexdump(len=22):
30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac 02 28 00
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: WPA: using GTK CCMP
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: WPA: using PTK CCMP
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: WPA: using KEY_MGMT WPA-PSK
Dec 11 01:54:45 wpa_supplicant[477]: WPA: Set own WPA IE default -
hexdump(len=22): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac
02 00 00
Dec 11 01:54:45 wpa_supplicant[477]: FT: Stored MDIE and FTIE from
(Re)Association Response - hexdump(len=0):
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: Cancelling scan request
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: SME: Trying to authenticate
with dead:beef (SSID='redacted' freq=5240 MHz)
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: No keys have been configured -
skip key clearing
Dec 11 01:54:45 wpa_supplicant[477]: wlan0: State: SCANNING ->
AUTHENTICATING at 1355190885.209
Dec 11 01:54:45 wpa_supplicant[477]: EAPOL: External notification - EAP
success=0
Dec 11 01:54:45 wpa_supplicant[477]: EAPOL: Supplicant port status:
Unauthorized
Dec 11 01:54:45 wpa_supplicant[477]: EAPOL: External notification - EAP
fail=0
Dec 11 01:54:45 wpa_supplicant[477]: EAPOL: Supplicant port status:
Unauthorized
Dec 11 01:54:45 wpa_supplicant[477]: EAPOL: External notification -
portControl=Auto
Dec 11 01:54:45 wpa_supplicant[477]: EAPOL: Supplicant port status:
Unauthorized
Dec 11 01:54:45 wpa_supplicant[477]: nl80211: Authenticate (ifindex=3)
Dec 11 01:54:45 wpa_supplicant[477]:   * bssid=dead:beef
Dec 11 01:54:45 wpa_supplicant[477]:   * freq=5240
Dec 11 01:54:45 wpa_supplicant[477]:   * IEs - hexdump(len=0): [NULL]
Dec 11 01:54:45 wpa_supplicant[477]:   * Auth Type 0
Dec 11 01:54:45 kernel: wlan0: authenticate with dead:beef (try 1)
Dec 11 01:54:45 kernel: wlan0: authenticated
Dec 11 01:54:45 wpa_supplicant[477]: nl80211: Authentication request send
successfully


In the 802.11 capture data, we see clearly that the client is sending valid
Probe Requests, and the access point is failing to respond, though the same
access point is responding to other clients.  We have the vendor engaged to
answer the question of why the access point is not responding.

Obviously, the access point is in the client's scan list and is not dead,
and should be responding.  While discussions with the vendor are underway
though, I have a question about the way that wpa_supplicant responds to
this scenario.

What appears to happen after the MLME layer issues the times-out, is that
the supplicant then kicks off a scan and disconnects.  In our dense
environment, this scan takes seconds to complete, and breaks our clients.
In the case where these failures happen, the client already has a very full
scan list that contains a lot of very solid roam choices.  Is there some
way that I can cause the supplicant to attempt the next BSS on the list
rather than invoking a whole new scan?

Thanks in advance for your input!

Cheers,

--
Matt

[1] - http://lxr.free-electrons.com/source/net/mac80211/mlme.c#L2725
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.shmoo.com/pipermail/hostap/attachments/20121227/7a7c3f69/attachment.htm 


More information about the HostAP mailing list