[wpa_supplicant] essid with non-ascii characters

Jouke Witteveen j.witteveen at gmail.com
Wed Aug 1 04:58:48 EDT 2012


On Wed, Aug 1, 2012 at 12:01 AM, Dan Williams <dcbw at redhat.com> wrote:
> On Tue, 2012-07-31 at 11:11 +0200, Jouke Witteveen wrote:
>> On Mon, Jul 30, 2012 at 6:15 PM, Jouni Malinen <j at w1.fi> wrote:
>> > On Mon, Jul 30, 2012 at 02:10:51PM +0200, Jouke Witteveen wrote:
>> >> The network is supposedly named "Wifi Château d'Olonne", thus the
>> >> experiment shows that wpa_cli substitutes the 'â' with a '_'.
>> >
>> > If it happens to be encoded as a single character in the SSID.. It could
>> > also end up showing up as "__" if multi-byte encoding was used.
>> >
>> >> If it
>> >> would just output whatever bytes are in the SSID, the result of the
>> >> printf would be usable in shell scripts to connect to the network.
>> >
>> > SSID is not a string and it cannot be printed as such. It could even
>> > include things like '\0' in it. If you want to get the raw SSID binary
>> > data, you can get it from the beginning of the "ie" line in the BSS
>> > ctrl_iface command output as a hexdump (starting with two octet header
>> > if IE id/len).
>>
>> This would be quite cumbersome and it means that the ssid=... part of
>> the bss output cannot be used used as the ssid=... part of a config
>> file. It would be convenient if the SSID reported by scan_results can
>> be copied to the config file in many cases. I don't really care about
>> SSID's containing '\0': network maintainers that choose to have such
>> SSID's deserve to face problems.
>>
>> >> The only problems I see with outputting the SSID as-is, is with '\n'
>> >> and '\t'. Both mess up the output of `wpa_cli scan_results`. One way
>> >> to solve this problem is to have ' ' match all three of them (spaces,
>> >> newlines and tabs), another is by introducing escaping.
>> >
>> > The proper way of handling the SSID is to copy the exact binary data
>> > as-is rather try to pretend that it can be handled as text. As such, the
>> > scan_results output is not suitable for this purpose.
>>
>> Using printf as in the experiment makes it possible to use
>> extraordinary text values:
>> ---
>> printf "%q\n" "â"
>> $'\303\242'
>> ---
>> I believe it works for non-printable characters too, so outputting
>> whatever octets make up the SSID (perhaps except for '\n', '\t', '\0')
>> makes sense to me.
>
> Except as Jouni says, those are valid bytes for an SSID.  Perhaps the
> bss output could be extended with an ssid_hex=... option that *could* be
> fed right back into the ssid= part of the config.

Good suggestion, although mapping '\n', '\t', '\0' to ' ' and
accepting ' ' in the config file as wildcard for the three would be
simpler from my point of view as a network utility writer. Is there a
better way to get a list of SSID's for a connection manager (from the
shell)?

> But really, you're not going to get around the fact that SSIDs are not
> strings, no matter what.  That's the way it is, and applications have to
> cope with it.  You have no idea what encoding the browser was in when
> the user typed in the SSID when configuring their AP, it could be
> ShiftJIS or Chinese or UCS2 or something like that.  There is no
> guarantee that the SSID is printable ASCII.  That's not to say the
> supplicant couldn't help out a bit.  Patches welcome, I'm sure.

The printf trick works, even for the three troublesome characters.
Their problem is one of output formatting. I think wpa_supplicant
might already fail to connect to an AP with a '\0' in its SSID, but I
wouldn't spend too much time investigating such bad behaviour.

I would like to propose some patches, but I couldn't find the place
where characters become '_' in my first search through the codebase.
Perhaps someone more familiar with the code can point out where to
look.

Regards,
- Jouke


More information about the HostAP mailing list