From: Guy Thornley Date: 04:43 on 22 Dec 2006 Subject: locales. Well, this is new to me. Perhaps it new to you, too. Or maybe not. Follow closely. $ ls Alan Parsons Project - 1976 - Tales of Mystery and Imagination/ cd1/ cd2/ Yahel - Waves of sound/ Younger Brother - A Flock of Bleeps/ $ mv [A-Z]* cd2/ mv: cannot move `cd2' to a subdirectory of itself, `cd2/cd2' $ ls cd2/ Uhm? Since when were shell globs case *in*sensitive?? Yes, I know about nocaseglob: $ shopt nocaseglob off which, according to the manpage, should make globs case sensitive. $ echo $LANG en_NZ.UTF-8 It gets worse: $ touch a b C D $ ls a b C cd2/ D $ echo [A-Z]* b C cd2 D Geezuz, where did little-'a' go?? A colleague pointed out that little-'a' is sorting before big-'A' now. This is just wrong, on every single level I think of, this is WRONG. Easy to demonstrate it is locale: $ bash -c 'echo [A-Z]*' b C cd2 D $ unset LANG; bash -c 'echo [A-Z]*' C D Why should I use locales ever again? This behaviour is not just hateful; it is outright terrifying. .Guy
From: jrodman Date: 04:58 on 22 Dec 2006 Subject: Re: locales. On Fri, Dec 22, 2006 at 05:43:21PM +1300, Guy Thornley wrote: > This is just wrong, on every single level I think of, this is WRONG. Wow, they still haven't fixed that crap. Been biting people for at least 8 years now. I can't tell if locales are insanely buggy or insane by design, or if the shell is insane. Just digging into that code makes my head hurt so I have no real idea. The only way I've found to get sane behavior in the shell is LC_COLLATE=C which forces your ranges such as [A-Z] to match, you know, A to Z, instead of whatever horrible buggy inexcusable behavior you get with most locales. -josh
From: Joshua Rodman Date: 05:21 on 22 Dec 2006 Subject: Re: locales. On Thu, Dec 21, 2006 at 08:58:52PM -0800, jrodman@xxxx.xxxxxxxxxx.xxx wrote: > On Fri, Dec 22, 2006 at 05:43:21PM +1300, Guy Thornley wrote: > > This is just wrong, on every single level I think of, this is WRONG. [...] > I can't tell if locales are insanely buggy or insane by design, or if > the shell is insane. Just digging into that code makes my head hurt so > I have no real idea. It seems it's mostly the shell that's insane. locales provide a sort order, which indicates lexical sorting. Certainly you'd find capital letter A and lower case letter A in the same place in the dictionary. Then bash developers decide that users of other languages should be able to use range expressions that make sense in their dictionaries. Okay fine. Then bash developers decide that they're going to change the meaning of the existing glob expression [A-Z] from it's time honored behavior of sorting in character numeric order (ascii) to lexical order. They couldn't have, you know, provided some way to express a lexical range differently from a character range, or allowed the body of existing scripts to work. I'm sure they can quote some POSIX standard somewhere that well tell you that you are wrong. Oh wait they DO! "This is what POSIX.2 and SUSv3/XPG6 specify." Thanks idiot POSIX committees. -josh
From: Tony Finch Date: 13:28 on 22 Dec 2006 Subject: Re: locales. On Thu, 21 Dec 2006, Joshua Rodman wrote: > > It seems it's mostly the shell that's insane. No, no, other things are just as fucked by locales. My favourite is to sort two files using `sort` and then feed them to `comm` and find that `comm` does not consider the files to be sorted and so produces bogus output. Tony.
From: A. Pagaltzis Date: 06:08 on 22 Dec 2006 Subject: Re: locales. * jrodman@xxxx.xxxxxxxxxx.xxx <jrodman@xxxx.xxxxxxxxxx.xxx> [2006-12-22 06:00]: > The only way I've found to get sane behavior in the shell is > LC_COLLATE=C which forces your ranges such as [A-Z] to match, > you know, A to Z, instead of whatever horrible buggy > inexcusable behavior you get with most locales. Yeah. Which leads to my Frankensteinian settings: LANG=en_US.utf8 LC_CTYPE=de_DE.utf8 LC_COLLATE=C Because, y'know, I *don't* want German localised error messages for a number of reasons; but I *do* want my umlauts considered word letters; but I DO NOT wanted my files sorted any other way than ASCIIbetically. Unbreak me, unbreak me harder! Regards,
From: Rafael Garcia-Suarez Date: 07:59 on 22 Dec 2006 Subject: Re: locales. On 22/12/06, jrodman@xxxx.xxxxxxxxxx.xxx <jrodman@xxxx.xxxxxxxxxx.xxx> wrote: > The only way I've found to get sane behavior in the shell is > LC_COLLATE=C which forces your ranges such as [A-Z] to match, you know, > A to Z, instead of whatever horrible buggy inexcusable behavior you get > with most locales. That is so horrible. Especially when you forgot to put LC_ALL=C on top of some shell script, and when it breaks horribly as soon as it's deployed on a machine with another locale setting.
From: Aaron Crane Date: 10:39 on 22 Dec 2006 Subject: Re: locales. jrodman@xxxx.xxxxxxxxxx.xxx writes: > On Fri, Dec 22, 2006 at 05:43:21PM +1300, Guy Thornley wrote: > > This is just wrong, on every single level I think of, this is WRONG. > > I can't tell if locales are insanely buggy or insane by design, Mostly, I'd say they're insane by design, and that the POSIX spec that requires the shell to make [A-Z] unpredictable and useless is similarly insane. > The only way I've found to get sane behavior in the shell is LC_COLLATE=C Yep, definitely a good unbreak-me option for one's personal settings. Another alternative for scripts and the like is to say [[:upper:]] instead of [A-Z]. It takes more than twice as many characters, which sucks, but at least it's predictable. It also includes non-ASCII upper-case letters, though, which may or may not be a good thing in any particular situation.
From: Yossi Kreinin Date: 12:34 on 22 Dec 2006 Subject: Re: locales. > > Why should I use locales ever again? This behaviour is not just hateful; it > is outright terrifying. > Locales are very useful. They make internalization of large software systems a snap. For example, I once saw a hieroglyphic error message from what looked like perror (it read something like "my/path: ^&@^#@($^", under circumstances suggesting the translation "my/path: Is a directory"). An inquiry lead to the conclusion that I've unintentionally asked for a "Japanese" session at the KDE login dialog, playing with the menu before logging in. OK, so the other zillion strings remained in English. Still, it must feel much better for Japanese people to see: > hairy_program opening x... doing y... z: Is a directory Translated to their mother toungue: > hairy_program opening x... doing y... z: ^&@^#@($^ Locales are very useful indeed.
Generated at 10:25 on 16 Apr 2008 by mariachi