Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Bug with really_input under cygwin
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Nuutti Kotivuori <naked+caml@n...>
Subject: Re: [Caml-list] Bug with really_input under cygwin
skaller@users.sourceforge.net wrote:
> On Thu, 2004-03-11 at 02:25, Nuutti Kotivuori wrote:
>> Luckily not everybody sees the world as glum :-)
>
> I'm not seeing it as glum. I'm pointing out that today the situation
> is vastly more complex due to belated recognition of the need for
> Standards to support I18N issues.
>
> Because of this the idea that \r\n <-> \n is the only real encoding
> issue across platforms is wrong.  If only that were the case today,
> it would be a trivial problem to resolve.
>
> For example, text files may contain certain header bytes that
> indicate if the file is UTF8 encoded, or UCS-2 with big or little
> endian: these bytes if found must not be considered as 'text',
> they're just encoding indicators.
>
> Even within Unicode/ISO-10646 there are myrriad 'encoding' problems,
> the famous ones being the use of combining characters -- and that's
> *after* you have found the ISO10646 code points :)
>
> So, if you want to handle *text* in a portable way, you have some
> work ahead of you. Don't even try to render it correctly, the
> required algorithm competes with Mr Ackermann in performance :D
>
> As long as these kinds of comments are labelled as 'rants' people
> will continue to write non-portable software and fail to face up to
> the issues.

I have left the entire text here quoted to point out the difference in
subjects.

Sure, handling *text* is a really, really complex beast in today's
world. I end up fighting with those problems almost daily. You are
preaching to the choir.

But - there's nothing ambiguous about slurping an entire file into a
string. And there's nothing complex about doing that portably.

Encodings, byte-order-marks, combining characters, text printing and
all that do not enter into it. The \r\n <-> \n translation issue is
the first portability hurdle, since it affects plain byte input and
output, regardless of implications for text. String as an array of
characters is a really complex beast to handle. String as an array of
bytes is trivial to handle.

And the encoding issues do not suddenly make 'md5sum' any less
portable. Or 'rsync'. Or 'wget'. But the \r\n <-> \n issue does.

-- Naked

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners