Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007048OCamlocamldocpublic2015-11-19 12:552017-10-09 19:22
Reporterherbelin 
Assigned To 
PrioritynormalSeverityminorReproducibilityalways
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version4.02.3 
Target VersionFixed in Version4.06.0+dev 
Summary0007048: Support for other inputenc than latin1 in ocamldoc -latex
DescriptionHi,

I'm trying to use ocamldoc on files with utf-8 encoded characters. It seems however that ocamldoc -latex is hard-wired to latin1. Even when compiling with -noheader so that we can use our own inputenv, there is still an additional automatic expansion of latin1-encoded accentuated letters into LaTeX expressions of the form \'e. This virtually breaks any encoding other than latin-1 if it is using code >= 128 such as utf-8 (e.g. ? in utf-8 starts with â in latin1 - octal code 0342 -, and is hence translated into the 5 bytes \^a\0210\0205 instead of the 3 bytes \0342\0210\0205 where I use \0XXX to denote a byte in octal notation).

Contrastingly, "ocamldoc -html -charset utf-8" works fine with utf-8.

Setting it as blocking is of course subjective. It is blocking in using ocamldoc -latex on a non pure-ascii environment, which I believe is the norm nowadays, but we shall probably decide to live instead with only ocamldoc -html, since we anyway try to support compilation of Coq with version of OCaml which are not the most recent. We could also try to apply a translation backwards, though it is subtle to identify which \^a or so come from ocamldoc and which possibly come from the original source.

Best,

Hugo
Steps To ReproduceBuild foo.mli with utf-8 contents

(** ? ? ? : ? *)

Then

ocamldoc -noheader -notrailer -latex foo.mli -o foo.tex

gives a file foo.tex where ? is faitfully translated but ? and ? are not.

TagsNo tags attached.
Attached Files? file icon foo.mli [^] (24 bytes) 2015-11-19 13:35 [Show Content]

- Relationships

-  Notes
(0014726)
herbelin (reporter)
2015-11-19 13:41

Apparently, Mantis does not support utf-8 either.

The sentence "e.g. ? in utf-8" should be read as "e.g. [unicode U+2205] in utf-8".

I uploaded the file foo.mli so that its contents is visible. Using LaTeX to express the non-ascii symbols, the sentence "gives a file foo.tex where ? is faitfully translated but ? and ? are not." should be read as "gives a file foo.tex where \Gamma is faitfully translated but \vdash and \emptyset are not."
(0014727)
xleroy (administrator)
2015-11-19 13:48

A minimal fix would be not to translate bytes >= 0x80 at all, just copying them verbatim to the LaTeX output.

Perhaps the default header should \usepackage[utf8]{inputenc}, too.
(0018517)
xleroy (administrator)
2017-10-09 16:49

Proposal at https://github.com/ocaml/ocaml/pull/1420 [^]
(0018519)
xleroy (administrator)
2017-10-09 19:22

GPR#1420 merged, will be in 4.06.

- Issue History
Date Modified Username Field Change
2015-11-19 12:55 herbelin New Issue
2015-11-19 13:35 herbelin File Added: foo.mli
2015-11-19 13:41 herbelin Note Added: 0014726
2015-11-19 13:48 xleroy Note Added: 0014727
2015-11-19 13:48 xleroy Severity block => minor
2015-11-19 13:48 xleroy Status new => acknowledged
2015-11-19 13:48 xleroy Target Version => 4.03.0+dev / +beta1
2016-04-15 15:28 doligez Target Version 4.03.0+dev / +beta1 => 4.03.1+dev
2017-02-16 14:01 doligez Target Version 4.03.1+dev => undecided
2017-02-23 16:46 doligez Category OCamldoc => ocamldoc
2017-04-10 14:38 doligez Target Version undecided =>
2017-10-09 16:49 xleroy Note Added: 0018517
2017-10-09 19:22 xleroy Note Added: 0018519
2017-10-09 19:22 xleroy Status acknowledged => resolved
2017-10-09 19:22 xleroy Resolution open => fixed
2017-10-09 19:22 xleroy Fixed in Version => 4.06.0+dev


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker