wpa_supplicant: iwl3945 + EAP-PEAP/MSCHAPv2 (eduroam) == random disconnects

Pedro Francisco pedrogfrancisco at gmail.com
Wed Nov 28 06:59:01 EST 2012


On Mon, Jul 30, 2012 at 10:12 PM, Dan Williams <dcbw at redhat.com> wrote:
> On Mon, 2012-07-30 at 09:56 -0500, Dan Williams wrote:
>> On Thu, 2012-07-26 at 17:52 +0100, Pedro Francisco wrote:
>> > On Mon, Jul 16, 2012 at 3:15 PM, Dan Williams <dcbw at redhat.com> wrote:
>> > > On Mon, 2012-07-16 at 09:08 -0500, Dan Williams wrote:
>> > >> On Fri, 2012-07-13 at 20:34 +0100, Pedro Francisco wrote:
>> > >> > I have random disconnects when connecting to an ESS (eduroam) using
>> > >> > EAP-PEAP/MSCHAPv2 and iwl3945 (Fedora 17).
>> > >> >
>> > >> Best thing to do here is run wpa_supplicant with debug logging (ie, with
>> > >> arguments -dddt) to get more information.  You can do this by stopping
>> > >> NM, killall -TERM wpa_supplicant, then "wpa_supplicant -dddtu" (as root
>> > >> of course), then restarting NM, and waiting for the disconnect.  Then we
>> > >> can see what's actually going on.
>> > >
>> > > Note that if you do this, the resulting logs may contain some
>> > > identifying information like your username, and the IDs of various APs
>> > > on your school network. So if you don't feel comfortable posting that,
>> > > feel free to mail the logs to me privately.
>> >
>> > Ok, I'll email you personally.
>> > Do you mind if the logs have the iwl3945 module debug info as well?
>>
>> I don't mind at all.
>
> There are a few unexpected deauthentication events, all appear to be:
>
> 1343297539.871522: wlan0: Event DEAUTH (12) received
> 1343297539.871535: wlan0: Deauthentication notification
> 1343297539.871547: wlan0:  * reason 10
> 1343297539.871560: wlan0:  * address b8:62:1f:45:2c:20
>
> or
>
> 1343297675.593175: wlan0: Event DEAUTH (12) received
> 1343297675.593188: wlan0: Deauthentication notification
> 1343297675.593200: wlan0:  * reason 2
> 1343297675.593214: wlan0:  * address 00:0e:84:ab:0d:d0
>
> (about 70 seconds after the successful connection, though the AP has a
> pretty bad RSSI at this point)
>
> though there are some during connections like:
>
> 1343297547.200592: wlan0: Event DEAUTH (12) received
> 1343297547.200605: wlan0: Deauthentication notification
> 1343297547.200616: wlan0:  * reason 4
> 1343297547.200629: wlan0:  * address b8:62:1f:62:4f:d0
>
>
> Basically, I'm seeing a lot of deauthentications due to reason #10 which
> is apparently "Disassociated because the information in the Power
> Capability element is unacceptable" from 802.11-2007.  Which I don't
> actually know much about.  Any comments here Jouni?

Hi again!
I've rewritten this like 10 times now. I'm going to send it now
because you may have a clearer picture of what is going on here and
I've changed theories enough times.

I contacted the IT department which told me from their side all they
see is "disconnected due to excess retries".
When testing with the following script
$ while [ 1 ]; do sudo wpa_cli blacklist clear; sleep 1; done
things seem to get better -- though disconnect #10 keeps happening.

So my theory is there are three issues with my iwl3945 -- note: using sw_scan!:
* Reason #10 (on 2 out of 3 locations) disconnects from nearest Cisco AP.
* All other AP are too far away and we get all other disconnect
reasons due to wireless interference.
* wpa_supplicant's blacklist gets full and network manager complains
visibly; if blacklisted AP were to be retried again, the connection
would succeed with nearest AP (since disconnect reason #10 only
happens after a while and only sometimes).

Two additional notes:
* a friend of mine with a newer Intel wireless card connected to the
same AP remained connected (while I got disconnected due to reason=10)
* another friend of mine with a newer Intel was being disconnected as
well with reason=0 .

At some point I got the idea the disconnects happened faster on places
with a higher amount of group rekeying (every ~10s, which I would
assume it was due to wpa_strict_rekeying or equivalent). It may be
worth referring group rekeying resets bgscan [2].
However, I'm not sure how it might be related.

I wanted to disable scanning but the following commands didn't disable
all scanning
$ sudo wpa_cli -iwlan0 set_network eduroam scan_ssid 0
$ sudo wpa_cli -iwlan0 set_network eduroam bgscan \"simple:4000:-200:8000\"
NM also resets bgscan for some reason.


iwl3945 apparently still uses by default disable_hw_scan=1, though the
issue it fixed has been been fixed -- I think [1].
So, I used disable_hw_scan=0.

After a 3 hour test, using disable_hw_scan=0 seems to almost have
fixed the problem: occasional "reason=0" occur but recovery seems
faster [3] (sorry no timestamps, but userland seems to barely notice
most of the times -- and since gnome-shell 3.4 freezes compositing
every time telepathy-gabble disconnects, it's easy to notice).

I'm not saying I am 100% sure this is fixed, since scanning behaviour
may depend on the environment (since it depends on number of clients
on certain location, triggering different group rekeying behaviour,
which then changes scanning pattern).
Furthermore, as I said, disconnects still happen [3]

I'm also starting to think this happens on Windows as well.

Theory:
* Cisco AP misbehaves; sends random error message or stops replying.
* wireless card decides to roam to 2nd, far AP
* should return to 1st AP when it decides to recover but doesn't,
since it is blacklisted when using sw_scan
* when using hw_scan, the first AP is returned to after a while

Do you think it is possible to be this the explanation?

References:
[1] http://thread.gmane.org/gmane.linux.drivers.ipw3945.devel/7573/focus=84052
[2] http://code.google.com/p/chromium-os/issues/detail?id=33948
[3] http://pastebin.com/U9UG5MKN

-- 
Pedro Francisco


More information about the HostAP mailing list