[wpa_supplicant] essid with non-ascii characters
dcbw at redhat.com
Fri Aug 3 13:05:04 EDT 2012
On Wed, 2012-08-01 at 10:58 +0200, Jouke Witteveen wrote:
> On Wed, Aug 1, 2012 at 12:01 AM, Dan Williams <dcbw at redhat.com> wrote:
> > On Tue, 2012-07-31 at 11:11 +0200, Jouke Witteveen wrote:
> >> On Mon, Jul 30, 2012 at 6:15 PM, Jouni Malinen <j at w1.fi> wrote:
> >> > On Mon, Jul 30, 2012 at 02:10:51PM +0200, Jouke Witteveen wrote:
> >> >> The network is supposedly named "Wifi Château d'Olonne", thus the
> >> >> experiment shows that wpa_cli substitutes the 'â' with a '_'.
> >> >
> >> > If it happens to be encoded as a single character in the SSID.. It could
> >> > also end up showing up as "__" if multi-byte encoding was used.
> >> >
> >> >> If it
> >> >> would just output whatever bytes are in the SSID, the result of the
> >> >> printf would be usable in shell scripts to connect to the network.
> >> >
> >> > SSID is not a string and it cannot be printed as such. It could even
> >> > include things like '\0' in it. If you want to get the raw SSID binary
> >> > data, you can get it from the beginning of the "ie" line in the BSS
> >> > ctrl_iface command output as a hexdump (starting with two octet header
> >> > if IE id/len).
> >> This would be quite cumbersome and it means that the ssid=... part of
> >> the bss output cannot be used used as the ssid=... part of a config
> >> file. It would be convenient if the SSID reported by scan_results can
> >> be copied to the config file in many cases. I don't really care about
> >> SSID's containing '\0': network maintainers that choose to have such
> >> SSID's deserve to face problems.
> >> >> The only problems I see with outputting the SSID as-is, is with '\n'
> >> >> and '\t'. Both mess up the output of `wpa_cli scan_results`. One way
> >> >> to solve this problem is to have ' ' match all three of them (spaces,
> >> >> newlines and tabs), another is by introducing escaping.
> >> >
> >> > The proper way of handling the SSID is to copy the exact binary data
> >> > as-is rather try to pretend that it can be handled as text. As such, the
> >> > scan_results output is not suitable for this purpose.
> >> Using printf as in the experiment makes it possible to use
> >> extraordinary text values:
> >> ---
> >> printf "%q\n" "â"
> >> $'\303\242'
> >> ---
> >> I believe it works for non-printable characters too, so outputting
> >> whatever octets make up the SSID (perhaps except for '\n', '\t', '\0')
> >> makes sense to me.
> > Except as Jouni says, those are valid bytes for an SSID. Perhaps the
> > bss output could be extended with an ssid_hex=... option that *could* be
> > fed right back into the ssid= part of the config.
> Good suggestion, although mapping '\n', '\t', '\0' to ' ' and
> accepting ' ' in the config file as wildcard for the three would be
> simpler from my point of view as a network utility writer. Is there a
> better way to get a list of SSID's for a connection manager (from the
That would be simpler, but still wrong. If that happened, the
supplicant could not differentiate between two similarly named networks
that happen to contain a different byte where the substituted character
was. Really, they are not strings, and you simply cannot treat them
like strings. They are byte arrays.
Which is why I proposed something like a read-only ssid_hex config item
that just copies back the SSID in hex form or something like that. I'm
pretty sure that doing any sort of magic munging of the SSID in the
config simply will not happen.
The ssid field already accepts hex SSIDs if you do something like:
ie, by not enclosing the SSID in quotes. So the only piece you're
missing is being able to *retrieve* the SSID in hex, right?
> > But really, you're not going to get around the fact that SSIDs are not
> > strings, no matter what. That's the way it is, and applications have to
> > cope with it. You have no idea what encoding the browser was in when
> > the user typed in the SSID when configuring their AP, it could be
> > ShiftJIS or Chinese or UCS2 or something like that. There is no
> > guarantee that the SSID is printable ASCII. That's not to say the
> > supplicant couldn't help out a bit. Patches welcome, I'm sure.
> The printf trick works, even for the three troublesome characters.
> Their problem is one of output formatting. I think wpa_supplicant
> might already fail to connect to an AP with a '\0' in its SSID, but I
> wouldn't spend too much time investigating such bad behaviour.
> I would like to propose some patches, but I couldn't find the place
> where characters become '_' in my first search through the codebase.
> Perhaps someone more familiar with the code can point out where to
> - Jouke
> HostAP mailing list
> HostAP at lists.shmoo.com
More information about the HostAP