Wpa_supplicant in a shaky network environment

Jouni Malinen j at w1.fi
Sat Mar 31 23:00:50 EDT 2007


On Fri, Feb 09, 2007 at 03:04:47PM +0200, Antti Mäkelä wrote:

>   could you take a look at a bug I filed at Gentoo (and got suggestions 
>   that this should be addressed at wpa_cli or wpa_supplicant):
> 
> http://bugs.gentoo.org/show_bug.cgi?id=164965

I haven't had a chance to take a closer look at this before, but the
last entry on that case is from Feb 2nd. Are there any updates on this
front since that (or well, this message of yours)?

If I understood correctly, what you are asking for is an option in
wpa_cli to allow some DISCONNECT action calls to be skipped in cases
where the re-association happens within short amount of time. While I
understand and can agree with the reason for not completely closing the
network interface (and daemons using it), I don't know whether I would
agree with the place this is being fixed.

Do you know what exactly is causing the disconnection? wpa_supplicant is
reporting disconnect event only if the driver reports disassociation. It
would be interesting to know why the driver did this. If it is just
roaming to another AP, it should just report a new association.
Likewise, if it is returning immediately back to the same AP, I would
expect to see new association, not disassociation. Do you know whether
the driver is actually re-associating at this point or does it just wait
for wpa_supplicant to re-association since you seem to be using
ap_scan=1 mode? By the way, have you tried this with ap_scan=2 mode
which should allow the driver more control over this type of
re-associations.

>   Anyway - it seems that at least wpa_cli.c has quite a lot of similarities 
>   to ifplugd code, but, if I had interpreted things correctly, parsing events 
> and executing action scripts happens within the same function 
> (wpa_cli_action_process()), on the same call.
> 
>   Thus, any patch to include a "downtime threshold" would change the entire 
> structure a bit - separate parsing of wpa events from executing the 
> actions, and including a timer.

While this is a potential workaround for the issue, I would prefer if
this could be done in the drivers without having to come up with such a
threshold timer in user space for processing the events if there were
not really a disassociation in the first place.

This kind of delayed disconnect event would mean that all "valid cases"
would also need to wait before processing and filtering out the
unwanted events at a lower layer might be preferable.

>   Personally, I've found this problem annoying enough recently (the office 
> building I frequently visit must have a gazillion access points) that I'll 
> probably have enough motivation to try to write something to fix this, at 
> least eventually.

Why would large number of APs cause this kind of issue? Based on the
wpa_cli log it looked like the client was just returning to the same AP
every time.. Which driver are you using in the client?

I have close to 50 APs in scan results at home and between 50 and 100
APs at work depending on the day, so having to deal with long scan
results is not very uncommon to me, but still, I have not seen this
short of issue showing up frequently. Then again, even though I'm using
Gentoo, I don't actually use the networking scripts to setup my
wireless, so there is not really anything tearing down IP addresses and
shutting down services, so I might not notice this in every case.


>   Anyway, what I'd like to change is
> 
>   the current functionality of
> 
> static void wpa_cli_action_process(const char *msg)
> 
>   should be split to
> 
> static char* wpa_cli_action_process_parse(const char *msg)
> 
>   and
> 
> static void wpa_cli_action_process_exec(char *event)
> 
>   where the first would simply get the event (connect or disconnect) and 
>   upon return, start a timer. The latter would then be executed when the 
> timer goes off. A newly received re-connect to the same network (including 
> re-completing 802.1X) within a threshold time would not cause the action 
> script to be executed.

If no other solutions can be found (i.e., something at the driver
level), I would be willing to consider this kind of change if this is
made configurable and by default, such a delayed operation is disabled.
However, I cannot agree on removing any CONNECTED events. If IEEE 802.1X
authentication has been re-run, the network parameters may have changed.
For example, the client may have been moved to another VLAN and DHCP
operation must be re-run. It would be up to the networks scripts to
handle this more nicely, e.g., by pinging the the default gateway and
only re-configuring parameters if the gateway is not available anymore.

-- 
Jouni Malinen                                            PGP id EFC895FA



More information about the HostAP mailing list