Unusable hostapd

Ben Greear greearb at candelatech.com
Wed Jan 11 15:24:44 EST 2012


On 01/11/2012 12:03 PM, Bryan Phillippe wrote:
>
> Disappointed to hear crickets on this because it makes wireless unusable for us in a production environment, but here is some more information on it anyway...
>
> This has happened again on another one of our WiFi units running the same code, and the contents of the wpa_driver_nl80211_data struct appear almost identical.  After looking at this more closely and comparing to a normal structure, it looks to me like this structure is corrupted.  This would suggest a specific type of memory corruption (since the corruption looks the same in both cases on different units on different days).
>
> Anybody have some thoughts or advice for this lone wolf?

Well, don't give up yet...takes folks a few days to look at problems
sometimes.

See more below.

> On Jan 9, 2012, at 2:59 PM, Bryan Phillippe wrote:
>
>> Hi All,
>>
>> Well, I was able to debug this problem more during a repro today.  I found a lot of information.  Basically, we're stuck in wpa_driver_nl80211_send_frame() from src/drivers/driver_nl80211.c here:
>>
>> 	};
>>
>> 	if (encrypt)
>> 		rtap_hdr[8] |= IEEE80211_RADIOTAP_F_WEP;
>>
>> 	return sendmsg(drv->monitor_sock,&msg, 0); /*<--------------- stucky */
>> }
>>
>> The socket is blocked, so hostapd sleeps forever.  I can hit it with signals and the signal handler will execute, but I have to kill the program and restart it to restore operation.  Here is the backtrace:
>>
>> (gdb) where
>> #0  0x0fd16088 in sendmsg ()
>>    from /home/bp/P4/rootfs/root/lib/libc.so.6
>> #1  0x1002b614 in wpa_driver_nl80211_send_frame (drv=<value optimized out>,
>>     data=<value optimized out>, len=<value optimized out>,
>>     encrypt=<value optimized out>) at ../src/drivers/driver_nl80211.c:2783
>> #2  0x1000b6bc in hostapd_send_mgmt_frame (hapd=<value optimized out>,
>>     msg=<value optimized out>, len=<value optimized out>)
>>     at ../src/ap/ap_drv_ops.c:64
>> #3  0x10045bf8 in handle_probe_req (hapd=0x10072c40, mgmt=0x9fffeef6,
>>     len=<value optimized out>) at ../src/ap/beacon.c:331
>> #4  0x10049674 in ieee802_11_mgmt (hapd=0x10072c40, buf=0x9fffeef6 "@",
>>     len=70, fi=<value optimized out>) at ../src/ap/ieee802_11.c:1425
>> #5  0x1000b5ec in hostapd_mgmt_rx (ctx=<value optimized out>,
>>     event=<value optimized out>, data=0x9fffee80)
>>     at ../src/ap/drv_callbacks.c:302
>> #6  wpa_supplicant_event (ctx=<value optimized out>,
>>     event=<value optimized out>, data=0x9fffee80)
>>     at ../src/ap/drv_callbacks.c:422
>> #7  0x10030ca8 in handle_frame (sock=<value optimized out>,
>>     eloop_ctx=0x10072700, sock_ctx=<value optimized out>)
>>     at ../src/drivers/driver_nl80211.c:3141
>> #8  handle_monitor_read (sock=<value optimized out>, eloop_ctx=0x10072700,
>> ---Type<return>  to continue, or q<return>  to quit---
>>     sock_ctx=<value optimized out>) at ../src/drivers/driver_nl80211.c:3229
>> #9  0x1001b1f4 in eloop_sock_table_dispatch (table=0x10071100, fds=0x100737b8)
>>     at ../src/utils/eloop.c:222
>> #10 0x1001b914 in eloop_run () at ../src/utils/eloop.c:588
>> #11 0x100044ec in hostapd_global_run (argc=<value optimized out>,
>>     argv=0x9ffffec4) at main.c:439
>> #12 main (argc=<value optimized out>, argv=0x9ffffec4) at main.c:548
>> (gdb)
>>
>> The sendmsg() is blocked on the monitor_sock, which is apparently blocking IO and unable to send for some reason.
>> More information:
>>
>> (gdb) p *(struct wpa_driver_nl80211_data *)0x100727d0
>> $6 = {ctx = 0x10072700, netlink = 0x0, ioctl_sock = 13,
>>   brname = "ath1", '\000'<repeats 11 times>, ifindex = 8388608,
>>   if_removed = 21, capa = {key_mgmt = 16, enc = 13, auth = 18, flags = 16,
>>     max_scan_ssids = 16, max_remain_on_chan = 16}, has_capability = 0,
>>   operstate = 0, scan_complete_events = 0, nl_sock = 0x0, nl_sock_event = 0x0,
>>   nl_cache = 0x0, nl_cache_event = 0x0, nl_cb = 0x0, nl80211 = 0x0,
>>   auth_bssid = "\000\000\000\000\020\a", bssid = "'\364\000\000\000\020",
>>   associated = 0,
>>   ssid = '\000'<repeats 15 times>, "\021\020\a'\000\020\003M\020\020\003(@\000\000\001I", ssid_len = 268904872, nlmode = 268904872, ap_scan_as_station = 1,
>>   assoc_freq = 13107200, monitor_sock = 2346, monitor_ifidx = 2346,
>>   probe_req_report = 17432576, disable_11b_rates = 1,
>>   pending_remain_on_chan = 0, added_bridge = 0, added_if_into_bridge = 0,
>>   remain_on_chan_cookie = 0, send_action_cookie = 1154435205700780032,
>>   filter_ssids = 0x0, num_filter_ssids = 0, first_bss = {drv = 0x0,
>>     next = 0xffffffff, ifindex = 0,
>>     ifname = '\000'<repeats 12 times>"\377, \377\377\377", beacon_set = 0},
>>   eapol_sock = 0, default_if_indices = {0, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 0,
>>     -1, 0, 0, 0}, if_indices = 0x0, num_if_indices = -1, last_freq = 0,
>>   last_freq_ht = 0}
>> (gdb) p ((struct wpa_driver_nl80211_data *)0x100727d0)->monitor_sock
>> $7 = 2346
>> (gdb)

That looks wrong.  The socket should normally be a smaller number.  I notice it's same as monitor_ifindex
which may be a clue.

That said, I would expect the write call to just exit with an error (EBADF) in
this case.

Can you use sysrq to figure out the kernel stack trace for that
blocking write?

Can you try top-of-tree hostapd code to see if problem persists there?

Can you try compiling without optimization (-OO instead of -O2)?  That will give
you better gdb output, but is unlikely to change the runtime behaviour.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



More information about the HostAP mailing list