Group rekey with lots of stations.
greearb at candelatech.com
Mon Sep 23 12:27:46 EDT 2013
On 09/23/2013 01:06 AM, Jouni Malinen wrote:
> On Thu, Sep 05, 2013 at 11:12:16AM -0700, Ben Greear wrote:
>> We are seeing an issue where a few of our 60+ stations are getting kicked
>> out by a customer's AP when it rekeys. It seems that at least a few rekey messages
>> are lost and the hostapd gives up and disconnects the client stations.
> Do you know why the messages are lost?
No. The RF environment is fairly crowded. They are currently
running code without the QoS fix on the station machine, so that could be part of the problem
as well. The sniff showed that sometimes our box failed to respond quickly,
and sometimes their AP failed to respond (though ACKs were seen,
so it could be CPU related on both systems, or just plain bugs).
I did do some tests on my hardware and recent hostapd/supplicant.
Hardware is dual-core Atom..so not all that fast.
With QoS patch in place, and increased eapol timers
(patch posted years ago, and written by someone else),
I still see a few lost connections on most rekey events
when using 128 stations in a mostly clean RF environment.
It was clean at 64 stations, however.
>> These 60+ stations are all on one machine, so the supplicant there has lots of work
>> to do in a short time. There are some additional station machines connected and running
>> some background traffic.
> If there are any stations that are unable to reply to group rekey
> messages in reasonable time, such stations are expected to be
> disconnected to allow the rekey operation to go ahead.
>> Since it appears all stations need to rekey at once, I am wondering if
>> it would be valid to be more lenient in hostapd's retransmit timers?
> Your use case sounds quite special and if you want to test something
> like that, you may need more CPU on the simulated station side.. The
> default timeouts should not be modified unless a more real world use
> case is showing failures for this to avoid delaying rekeying. The new
> GTK can be taken into use only after all associated STAs have received
> it (or are disconnected due to timeout or are using WNM-Sleep Mode).
This problem has been on hold for a while, so I haven't yet looked
at the hostapd/supplicant rekey code, but I am curious if the retry timers
have any fuzz to them (ie, timer expires in base-timer + random-fuzz)
to at least spread the retries a bit. I suspect adding fuzz should
help not only reduce peak load on my simulated systems, but also on
the RF environment and AP in real-world scenarios.
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc http://www.candelatech.com
More information about the HostAP