[PATCH 0/2] dbus: Do Not Coalesce State Changed Signals

Dan Williams dcbw at redhat.com
Thu Oct 27 02:08:24 EDT 2011


On Mon, 2011-10-24 at 12:07 -0700, Sam Leffler wrote:
> On Mon, Oct 24, 2011 at 10:35 AM, Dan Williams <dcbw at redhat.com> wrote:
> > On Mon, 2011-10-24 at 10:08 -0700, Sam Leffler wrote:
> >> On Mon, Oct 24, 2011 at 8:40 AM, Dan Williams <dcbw at redhat.com> wrote:
> >> > On Mon, 2011-10-24 at 18:26 +0300, Jouni Malinen wrote:
> >> >> On Mon, Oct 24, 2011 at 10:05:04AM -0500, Dan Williams wrote:
> >> >> > I used to have a problem with this, but I now see where it makes sense
> >> >> > for the State property alone.  Most properties *should* be coalesced to
> >> >> > reduce dbus traffic, but the State property should probably just trigger
> >> >> > an immediate emit of the interface's changed properties.  State doesn't
> >> >> > change that often so it wouldn't meaningfully increase D-Bus traffic.
> >> >>
> >> >> What exactly are these state changes used for? The main problem I see
> >> >> with this is that wpa_s->wpa_state is supposed to be internal data for
> >> >> wpa_supplicant and the values it has and the way they are used are
> >> >> subject to change whenever needed.. As such, I would really not want any
> >> >> external functional to depends on that for anything beyond maybe showing
> >> >> some debug information.
> >> >
> >> > NM uses it to determine what state the supplicant is in, obviously.  If
> >> > the supplicant is scanning, then we don't want to do certain operations.
> >> > If the supplicant is connecting, there are other operations we don't
> >> > want to do (like request a scan).
> >>
> >> You get an explicit signal for scanning.
> >
> > Yeah, because I added that, because scanning happens in other states
> > than SCANNING.
> >
> >> >
> >> > If the supplicant fails during association/authentication, then the
> >> > connection manager wants to know that, because that could indicate
> >> > specific failures: a failure during 4-way handshake usually indicates
> >> > that a password is incorrect, and that's very useful information to a
> >> > connection manager.
> >>
> >> We identify 4-way handshake failure when the CurrentBSS transitions to
> >> "/".  One might argue this is a hack but it's been as reliable as
> >> monitoring the state (both are unsatisfactory--there should be an
> >> explicit signal/property).
> >
> > Well, that can happen for a number of reasons and could in the future,
> > and it relies on specific behavior of the D-Bus interface that might
> > change in the future.  In general you shouldn't be relying on signal
> > ordering for any sort of meaning.  It would be a lot clearer and more
> > robust to have an indication of 4-way handshake failure, or even better,
> > just some indication that the connection attempt failed due to invalid
> > X, where X is invalid certificates, invalid password, AP rejection,
> > whatever.  Doesn't need to be "4way handshake failed" so much as
> > "connection attempt failed and here's why..." with detailed reason codes
> > or something.
> 
> We agree about having explicit signal.  The way we infer a 4WHS issue
> is no less reliable than we would get if State change signals were NOT
> coalesced.  Having an explicit status code / indication is what I
> suggested >year ago when we first hit this issue.  Having the 802.11
> status / reason codes would definitely improve life when a connection
> manager wants to get directly involved in handling certain events.

Yes, having the reason codes for certain operations would help a lot;
not just for 802.11 stuff but also for EAP failures for wired as well.

> >
> >> >
> >> > Obviously we want notification of disconnection events so that in NM we
> >> > can start a timer for overall connection failure and after a certain
> >> > period of retries, terminate the attempt and look for a different AP.
> >>
> >> Again, monitor CurrentBSS.
> >
> > Again, seems like a hack.  CurrentBSS is useful when connected to detect
> > roaming behavior between APs, but really shouldn't be a proxy for
> > auth/assoc attempts and detection of failures.
> 
> Not sure what we're talking about here but I'll just say in general
> that programming to the new dbus api greatly simplified the
> client-side code (relative to the old api) and a large part of that
> was using CurrentBSS (instead of juggling a plethora state vars that
> were not necessarily consistent).

In reality there are only two state variables: State and Scanning.
Those are pretty clear it seems to me.

> >
> >> >
> >> > We need to know, after adding an interface, whether that interface is
> >> > now ready for action or not.  When the interface enters state COMPLETED,
> >> > we need to know to start IP configuration.  When the interface is DOWN
> >> > we need to know to clean stuff up and forget about the interface.  If
> >> > the interface is AUTHENTICATED and we're waiting for an EAP identity or
> >> > password, then when the interface transitions to DISCONNECTED we need to
> >> > know that we should stop waiting for user input.
> >> >
> >> > All sorts of stuff :)
> >>
> >> We were able to accomplish all the above w/o using the state variable.
> >>  In fact it wasn't until we stopped using the state setting that we
> >> got reliable integration (even for the old dbus api).  The typical
> >> problem seems to there are two async state machines trying to control
> >> each other and that just causes horrible confusion.
> >
> > What do you define "reliable" as?  The State variable has been pretty
> > solid for the last 3+ years for NetworkManager and I don't believe I've
> > encountered any of the cases that you've described here or in the past.
> > Are you by chance dumping all network blocks into the supplicant config
> > at the same time and letting the supplicant do network block selection?
> > That's something that NM does not do, but is something we'd like to do
> > in the future, and might be the cause of some of these issues.  So I'd
> > love to fix that for real, and some sort of synchronous state emission
> > would do that.
> 
> "reliable" as in synchronizing the connection manager w/ supplicant.
> I know the data in the "State" property has been reliable; the only
> complaint was this signal coalescing.

There's actually no synchronization because the supplicant isn't going
to wait for any one client; the client also has to be able to handle and
arbitrary amount of time elapsed between when the supplicant has emitted
a state change and when the client gets around to processing it.  That's
just normal client/server.

And even if there wasn't coalescing the client still has to be able to
handle arbitrary state transitions; of which coalescing is simply one of
those.  That's part of making the client robust.  Having more detailed
or "reliable" state transitions is a bonus and very useful, but the
client really can't depend on them.

> FWIW we used to monitor "State" and do lots of things based on what we
> thought supplicant was doing, Once we switched over to depending only
> on the COMPLETED state and using CurrentBSS for a more reliable
> indicator of network operation lots of issues went away.  It's been a
> long time (>>year) so it's hard for me to be specific but I distinctly
> remember competing state machines trying to re-connect at the least
> (causing churn and long delays in reaching a steady-state).

Ok, that's interesting; it doesn't match my experience but then
everyone's allowed to have their own experience too :)

> As to how we work; we too send a single netblock to supplicant.  We
> depend on supplicant to roam within a BSS and handle inter-BSS roaming
> in the connection manager.  Our preference is to move to multi-block
> constructs and let supplicant manage all roaming but given we don't
> offload this to the driver/firmware at the moment doing that has been
> low priority.  This stuff has definitely been less than satisfying and
> way too slow to deem it production quality but that's not what we're
> discussing here.

Good to know; I'd like to move NM more in this direction to since I
believe we'd get better inter-ESS roaming this way as well.  Possibly
better connect-at-resume time too.

> >
> >> >
> >> >> There are already some horrible hacks in place for wpa_state (just see
> >> >> how WPA_IDLE is generated in Android builds) and I really do not want to
> >> >> have to consider supporting something like that with the D-Bus
> >> >> interface.
> >> >
> >> > Well, it's extremely useful to have something like this, as opposed to a
> >> > plethora of signals for other stuff.  Having one state signal for an
> >> > interface means one centralized place that client code can perform
> >> > actions based on what's happening in the supplicant.  If not state
> >> > signals then I'm not sure what we'd use, but hopefully it's not a
> >> > combination of other signals since those arrive asynchronously and thus
> >> > are harder to handle.
> >>
> >> Integration of supplicant w/ a connection manager is problematic given
> >> both want control under certain conditions.  Until supplicant stops
> >> wanting to do so much or connection managers _really_ defer control to
> >> supplicant there will continue to be madness.  We started w/ the
> >> connman approach that the connection manager should be in control but
> >> very quickly recognized this would never work except in the simplest
> >> scenarios.  We stabilized once we moved more control to supplicant and
> >> continue in this direction.
> >
> > Hmm, again, it's worked pretty well so far for NM; I'm apparently just
> > not familiar with the problems you've encountered.  NM takes a more CM
> > oriented approach by doing network selection itself and only sending one
> > network block down the supplicant.  That does slightly increase
> > connection latency and we're moving in the direction of pushing all
> > network blocks down to the supplicant and letting it handle network
> > selection, but there's some features we need from the supplicant first
> > that I've started pushing patches for.
> 
> See above; I think we're in violent agreement.
> 
> >
> > So I'm interested in what problems you had with the more CM oriented
> > approach, and what problems you have with more control in the
> > supplicant.
> 
> I don't have the time to write it down here; we should visit.
> 
> >
> >> >
> >> >> If there is desire to actually use this type of information for taking
> >> >> some actions, it would be much safer to define a new value that will be
> >> >> generated based on internal state (including wpa_s->wpa_state) and
> >> >> define that in a way that makes it easier to maintain this in a stable
> >> >> way even if the internal implementation changes.
> >> >
> >> > That's fine with me, we could simply keep the same state values and just
> >> > change how the D-Bus property value is calculated.  But having a
> >> > notification of supplicant interface state is really, really useful.
> >>
> >> I asked for this to be optional because it just adds overhead for us
> >> and may introduce subtle breakage.  I still haven't seen a good reason
> >> to change the current behaviour (e.g. we've demonstrated there's no
> >> need for this change).  If there are explicit events clients want it
> >> would seem better add signals than expose internal supplicant state
> >> that will be used to inter these events.
> >
> > I think mainly because other mechanisms aren't really appropriate as
> > proxies for actual interface state.  CurrentBSS is simply a pointer to
> > the current BSS the supplicant is working with, but that doesn't
> > actually give you any indication of what's really going on, or what the
> > failures actually were if there were failures.  Next up, how do you
> > determine when the supplicant has completed the process and is fully
> > associated, so you can being IP configuration?  AFAIK the only way to
> > figure that out is the State property, because CurrentBSS is (or if not,
> > should be) set during the association/authentication states as well.
> 
> See above; we use the COMPLETED State.  My response to Grant was two-fold:
> 
> 1. Don't speak for flimflam because you don't understand how it works.
> 2. You can do what you want w/o the proposed change so don't
> arbitrarily change existing API's and/or penalize existing clients of
> supplicant.

RE #2 if doing what he wants w/o the proposed change isn't the optimal
or "right" way of doing it, then I feel the supplicant should change.
But at the same time, I don't see how this specific change would
adversely affect clients, because these clients already have to handle
every individual state correctly anyway.  So if they are coded
correctly, the additional state emissions would not have any effect.
The only real change would be a slight increase in additional D-Bus
traffic, which is negligable since the state doesn't actually change
that often.

Dan



More information about the HostAP mailing list