Recurring traffic/authentication freeze - detailed information (pt1)
Bryan.Phillippe at watchguard.com
Mon Feb 6 20:17:01 EST 2012
On Feb 4, 2012, at 9:11 AM, Jouni Malinen wrote:
> On Fri, Feb 03, 2012 at 02:44:06AM +0000, Bryan Phillippe wrote:
>> However, it seems that when we have more than a dozen or so clients concurrently connected to any one AP, we begin to get strange traffic freezes where no traffic passes from the AP wireless interface to the bridged wired interface, and nobody can authenticate or associate with the AP during this time. But, it looks like probe requests work and wired-only traffic is fine, and I can't see anything wrong from the shell (hostapd running, load under 1, no strange kernel messages, etc). While the problem is occurring, it affects everyone at the same time - Windows, Mac, mobile clients, etc. After a minute or so, the problem self-corrects for everyone. If I kill and restart hostapd, the problem corrects right away.
>> I have captured normal and debug logs + tcpdump output both during normal operation and during the problem, and would very much appreciate any help anyone would be willing to offer!
> I did not notice any clear issues in the logs, but since the issue may
> be some frames not getting transmitted or received properly, it would be
> necessary to see what actually went out rather than what hostapd may
> have tried to transmit. In other words, debugging this further is likely
> to require using a separate wireless sniffer to capture the frames to
> confirm what exactly is happening during the connectivity issue.
Thanks for looking, Jouni. I have updated to the most recent-ish version
of hostapd (v2.0-devel) built from git pulled the other day. I will retest
for this problem on this version and report my findings. I'll try to see
if I can find a wireless sniffer or something to investigate from the
However, I have noticed something else worth reporting... when running
hostapd-0.7.3, I found that occasionally the sendmsg() in driver_nl80211.c
wpa_driver_nl80211_send_frame() would block *forever*, resulting in dead
wireless. I posted about this but nobody had any idea, so I just changed
the code to make the socket non-blocking and then exit(1) if the
sendmsg() failed with EAGAIN. After upgrading to the new hostapd-2.0-devel,
this same problem bit me again - stalling in sendmsg() forever, analogous
location (wpa_driver_nl80211_send_mntr()). So I suspect this is probably
NOT hostapd's fault, or I'd guess that many other people would have seen
it? So do you or anyone else have advice on how I can debug this problem?
I have already added some code to my kernel socket implementation to log a
message when the write is blocked on a blocking socket, and I can see it's
because the write queue is full. I'd like to figure out who the likely
owner of this bug is so I can work with them to resolve it.
More information about the HostAP