[wpa_supplicant] essid with non-ascii characters

Jouke Witteveen j.witteveen at gmail.com
Fri Aug 3 17:53:07 EDT 2012


On Fri, Aug 3, 2012 at 7:05 PM, Dan Williams <dcbw at redhat.com> wrote:
> On Wed, 2012-08-01 at 10:58 +0200, Jouke Witteveen wrote:
>> On Wed, Aug 1, 2012 at 12:01 AM, Dan Williams <dcbw at redhat.com> wrote:
>> > On Tue, 2012-07-31 at 11:11 +0200, Jouke Witteveen wrote:
>> >> On Mon, Jul 30, 2012 at 6:15 PM, Jouni Malinen <j at w1.fi> wrote:
>> >> > On Mon, Jul 30, 2012 at 02:10:51PM +0200, Jouke Witteveen wrote:
>> >> >> The network is supposedly named "Wifi Château d'Olonne", thus the
>> >> >> experiment shows that wpa_cli substitutes the 'â' with a '_'.
>> >> >
>> >> > If it happens to be encoded as a single character in the SSID.. It could
>> >> > also end up showing up as "__" if multi-byte encoding was used.
>> >> >
>> >> >> If it
>> >> >> would just output whatever bytes are in the SSID, the result of the
>> >> >> printf would be usable in shell scripts to connect to the network.
>> >> >
>> >> > SSID is not a string and it cannot be printed as such. It could even
>> >> > include things like '\0' in it. If you want to get the raw SSID binary
>> >> > data, you can get it from the beginning of the "ie" line in the BSS
>> >> > ctrl_iface command output as a hexdump (starting with two octet header
>> >> > if IE id/len).
>> >>
>> >> This would be quite cumbersome and it means that the ssid=... part of
>> >> the bss output cannot be used used as the ssid=... part of a config
>> >> file. It would be convenient if the SSID reported by scan_results can
>> >> be copied to the config file in many cases. I don't really care about
>> >> SSID's containing '\0': network maintainers that choose to have such
>> >> SSID's deserve to face problems.
>> >>
>> >> >> The only problems I see with outputting the SSID as-is, is with '\n'
>> >> >> and '\t'. Both mess up the output of `wpa_cli scan_results`. One way
>> >> >> to solve this problem is to have ' ' match all three of them (spaces,
>> >> >> newlines and tabs), another is by introducing escaping.
>> >> >
>> >> > The proper way of handling the SSID is to copy the exact binary data
>> >> > as-is rather try to pretend that it can be handled as text. As such, the
>> >> > scan_results output is not suitable for this purpose.
>> >>
>> >> Using printf as in the experiment makes it possible to use
>> >> extraordinary text values:
>> >> ---
>> >> printf "%q\n" "â"
>> >> $'\303\242'
>> >> ---
>> >> I believe it works for non-printable characters too, so outputting
>> >> whatever octets make up the SSID (perhaps except for '\n', '\t', '\0')
>> >> makes sense to me.
>> >
>> > Except as Jouni says, those are valid bytes for an SSID.  Perhaps the
>> > bss output could be extended with an ssid_hex=... option that *could* be
>> > fed right back into the ssid= part of the config.
>>
>> Good suggestion, although mapping '\n', '\t', '\0' to ' ' and
>> accepting ' ' in the config file as wildcard for the three would be
>> simpler from my point of view as a network utility writer. Is there a
>> better way to get a list of SSID's for a connection manager (from the
>> shell)?
>
> That would be simpler, but still wrong.  If that happened, the
> supplicant could not differentiate between two similarly named networks
> that happen to contain a different byte where the substituted character
> was.  Really, they are not strings, and you simply cannot treat them
> like strings.  They are byte arrays.
>
> Which is why I proposed something like a read-only ssid_hex config item
> that just copies back the SSID in hex form or something like that.  I'm
> pretty sure that doing any sort of magic munging of the SSID in the
> config simply will not happen.
>
> The ssid field already accepts hex SSIDs if you do something like:
>
> ssid=AABBCCDD1199
>
> ie, by not enclosing the SSID in quotes.  So the only piece you're
> missing is being able to *retrieve* the SSID in hex, right?

That is right. The simpler, the better (i.e. preferably directly
through scan_results).
>
> Dan
>
>> > But really, you're not going to get around the fact that SSIDs are not
>> > strings, no matter what.  That's the way it is, and applications have to
>> > cope with it.  You have no idea what encoding the browser was in when
>> > the user typed in the SSID when configuring their AP, it could be
>> > ShiftJIS or Chinese or UCS2 or something like that.  There is no
>> > guarantee that the SSID is printable ASCII.  That's not to say the
>> > supplicant couldn't help out a bit.  Patches welcome, I'm sure.
>>
>> The printf trick works, even for the three troublesome characters.
>> Their problem is one of output formatting. I think wpa_supplicant
>> might already fail to connect to an AP with a '\0' in its SSID, but I
>> wouldn't spend too much time investigating such bad behaviour.
>>
>> I would like to propose some patches, but I couldn't find the place
>> where characters become '_' in my first search through the codebase.
>> Perhaps someone more familiar with the code can point out where to
>> look.
>>
>> Regards,
>> - Jouke


More information about the HostAP mailing list