You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original bug ID: 7048 Reporter: herbelin Status: resolved (set by @xavierleroy on 2017-10-09T17:22:50Z) Resolution: fixed Priority: normal Severity: minor Version: 4.02.3 Fixed in version: 4.06.0 +dev/beta1/beta2/rc1 Category: ocamldoc Monitored by:@gasche
Bug description
Hi,
I'm trying to use ocamldoc on files with utf-8 encoded characters. It seems however that ocamldoc -latex is hard-wired to latin1. Even when compiling with -noheader so that we can use our own inputenv, there is still an additional automatic expansion of latin1-encoded accentuated letters into LaTeX expressions of the form 'e. This virtually breaks any encoding other than latin-1 if it is using code >= 128 such as utf-8 (e.g. ? in utf-8 starts with â in latin1 - octal code 0342 -, and is hence translated into the 5 bytes ^a\0210\0205 instead of the 3 bytes \0342\0210\0205 where I use \0XXX to denote a byte in octal notation).
Contrastingly, "ocamldoc -html -charset utf-8" works fine with utf-8.
Setting it as blocking is of course subjective. It is blocking in using ocamldoc -latex on a non pure-ascii environment, which I believe is the norm nowadays, but we shall probably decide to live instead with only ocamldoc -html, since we anyway try to support compilation of Coq with version of OCaml which are not the most recent. We could also try to apply a translation backwards, though it is subtle to identify which ^a or so come from ocamldoc and which possibly come from the original source.
The sentence "e.g. ? in utf-8" should be read as "e.g. [unicode U+2205] in utf-8".
I uploaded the file foo.mli so that its contents is visible. Using LaTeX to express the non-ascii symbols, the sentence "gives a file foo.tex where ? is faitfully translated but ? and ? are not." should be read as "gives a file foo.tex where \Gamma is faitfully translated but \vdash and \emptyset are not."
Original bug ID: 7048
Reporter: herbelin
Status: resolved (set by @xavierleroy on 2017-10-09T17:22:50Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 4.02.3
Fixed in version: 4.06.0 +dev/beta1/beta2/rc1
Category: ocamldoc
Monitored by: @gasche
Bug description
Hi,
I'm trying to use ocamldoc on files with utf-8 encoded characters. It seems however that ocamldoc -latex is hard-wired to latin1. Even when compiling with -noheader so that we can use our own inputenv, there is still an additional automatic expansion of latin1-encoded accentuated letters into LaTeX expressions of the form 'e. This virtually breaks any encoding other than latin-1 if it is using code >= 128 such as utf-8 (e.g. ? in utf-8 starts with â in latin1 - octal code 0342 -, and is hence translated into the 5 bytes ^a\0210\0205 instead of the 3 bytes \0342\0210\0205 where I use \0XXX to denote a byte in octal notation).
Contrastingly, "ocamldoc -html -charset utf-8" works fine with utf-8.
Setting it as blocking is of course subjective. It is blocking in using ocamldoc -latex on a non pure-ascii environment, which I believe is the norm nowadays, but we shall probably decide to live instead with only ocamldoc -html, since we anyway try to support compilation of Coq with version of OCaml which are not the most recent. We could also try to apply a translation backwards, though it is subtle to identify which ^a or so come from ocamldoc and which possibly come from the original source.
Best,
Hugo
Steps to reproduce
Build foo.mli with utf-8 contents
(** ? ? ? : ? *)
Then
ocamldoc -noheader -notrailer -latex foo.mli -o foo.tex
gives a file foo.tex where ? is faitfully translated but ? and ? are not.
File attachments
The text was updated successfully, but these errors were encountered: