Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for other inputenc than latin1 in ocamldoc -latex #7048

Closed
vicuna opened this issue Nov 19, 2015 · 4 comments
Closed

Support for other inputenc than latin1 in ocamldoc -latex #7048

vicuna opened this issue Nov 19, 2015 · 4 comments

Comments

@vicuna
Copy link

vicuna commented Nov 19, 2015

Original bug ID: 7048
Reporter: herbelin
Status: resolved (set by @xavierleroy on 2017-10-09T17:22:50Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 4.02.3
Fixed in version: 4.06.0 +dev/beta1/beta2/rc1
Category: ocamldoc
Monitored by: @gasche

Bug description

Hi,

I'm trying to use ocamldoc on files with utf-8 encoded characters. It seems however that ocamldoc -latex is hard-wired to latin1. Even when compiling with -noheader so that we can use our own inputenv, there is still an additional automatic expansion of latin1-encoded accentuated letters into LaTeX expressions of the form 'e. This virtually breaks any encoding other than latin-1 if it is using code >= 128 such as utf-8 (e.g. ? in utf-8 starts with â in latin1 - octal code 0342 -, and is hence translated into the 5 bytes ^a\0210\0205 instead of the 3 bytes \0342\0210\0205 where I use \0XXX to denote a byte in octal notation).

Contrastingly, "ocamldoc -html -charset utf-8" works fine with utf-8.

Setting it as blocking is of course subjective. It is blocking in using ocamldoc -latex on a non pure-ascii environment, which I believe is the norm nowadays, but we shall probably decide to live instead with only ocamldoc -html, since we anyway try to support compilation of Coq with version of OCaml which are not the most recent. We could also try to apply a translation backwards, though it is subtle to identify which ^a or so come from ocamldoc and which possibly come from the original source.

Best,

Hugo

Steps to reproduce

Build foo.mli with utf-8 contents

(** ? ? ? : ? *)

Then

ocamldoc -noheader -notrailer -latex foo.mli -o foo.tex

gives a file foo.tex where ? is faitfully translated but ? and ? are not.

File attachments

@vicuna
Copy link
Author

vicuna commented Nov 19, 2015

Comment author: herbelin

Apparently, Mantis does not support utf-8 either.

The sentence "e.g. ? in utf-8" should be read as "e.g. [unicode U+2205] in utf-8".

I uploaded the file foo.mli so that its contents is visible. Using LaTeX to express the non-ascii symbols, the sentence "gives a file foo.tex where ? is faitfully translated but ? and ? are not." should be read as "gives a file foo.tex where \Gamma is faitfully translated but \vdash and \emptyset are not."

@vicuna
Copy link
Author

vicuna commented Nov 19, 2015

Comment author: @xavierleroy

A minimal fix would be not to translate bytes >= 0x80 at all, just copying them verbatim to the LaTeX output.

Perhaps the default header should \usepackage[utf8]{inputenc}, too.

@vicuna
Copy link
Author

vicuna commented Oct 9, 2017

Comment author: @xavierleroy

Proposal at #1420

@vicuna
Copy link
Author

vicuna commented Oct 9, 2017

Comment author: @xavierleroy

#1420 merged, will be in 4.06.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant