Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alterations to handling of \013 in source files breaking other tools #6165

Closed
vicuna opened this issue Sep 7, 2013 · 9 comments
Closed

Alterations to handling of \013 in source files breaking other tools #6165

vicuna opened this issue Sep 7, 2013 · 9 comments

Comments

@vicuna
Copy link

vicuna commented Sep 7, 2013

Original bug ID: 6165
Reporter: @dra27
Status: closed (set by @damiendoligez on 2014-01-22T13:04:42Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 4.01.0+beta/+rc
Target version: 4.01.1+dev
Fixed in version: 4.01.1+dev
Category: platform support (windows, cross-compilation, etc)
Tags: patch, junior_job
Related to: #5598

Bug description

#5598 alters parsing/lexer.mll so that "\r" is no longer allowed as a line breaking character (removing legacy MacOS 9 and earlier support). Unfortunately, this is also crashing external tools (in particular, menhir) which accidentally generate \r\r\n on Windows (this may happen when \r\n is erroneously passed through a print routine which expects \n on all platforms).

Steps to reproduce

Attempt to build menhir-20130116 on Windows using the mingw 32-bit port and OCaml 4.01.0+rc2

Additional information

See also https://sympa.inria.fr/sympa/arc/caml-list/2013-08/msg00009.html

File attachments

@vicuna
Copy link
Author

vicuna commented Sep 7, 2013

Comment author: @dra27

Getting libraries primarily written with Unix in mind to support Windows is already enough of a faff - would it be acceptable for the OCaml lexer to be slightly more forgiving about stray \r characters while still retiring MacOS 9 support?
The attached patch changes the newline regexp to \r*\n (i.e. Unix line ending of \n definitely supported and a more relaxed Windows version). This is still an improvement on the previous situation where \r\r\n would be interpreted as two newlines.
With this patch, menhir does compile on Windows again.

@vicuna
Copy link
Author

vicuna commented Sep 7, 2013

Comment author: @protz

I'll ask François Pottier on Monday if he can easily fix this on the Menhir side, which would save Damien the hassle of issuing a rc3. I still agree that we should be more forgiving about \r's, but if we could postpone this to the next bugfix release, that would save Damien some trouble I think.

@vicuna
Copy link
Author

vicuna commented Sep 9, 2013

Comment author: @protz

François and I fixed the problem on the Menhir side. Menhir now compiles just fine on Windows (and generates proper files without extra \r's). A new release will be issued shortly.

@vicuna
Copy link
Author

vicuna commented Sep 11, 2013

Comment author: @dra27

OK - would the intention still be to consider/target this patch for ocaml-next (or whatever the new marker is!), though?

@vicuna
Copy link
Author

vicuna commented Sep 11, 2013

Comment author: @protz

I would be in favor of a warning that's "as-error" by default (you could then override it, but we would be pretty harsh already). Thoughts? :)

@vicuna
Copy link
Author

vicuna commented Sep 12, 2013

Comment author: @dra27

A warning's a good idea - though perhaps as warning, rather than as error? To me, it's part of the "be liberal in input you receive and strict in output you send" mantra. What we're concerned about is that there's a newline which, for the two major OS categories supported means a \n. My other "objection" is that the current regular expression is very strict about an incorrect number of \rs (e.g. \r\r\n) but doesn't care about mixed Windows and Unix linebreaks in the same file (e.g. \r\n used for some lines and \n for the others).

@vicuna
Copy link
Author

vicuna commented Sep 12, 2013

Comment author: @damiendoligez

The attached patch changes the newline regexp to \r*\n

This is a feature request that I view with a very favourable eye.

@vicuna
Copy link
Author

vicuna commented Sep 12, 2013

Comment author: @protz

My other "objection" is that the current regular expression is very strict about an incorrect number of \rs (e.g. \r\r\n) but doesn't care about mixed Windows and Unix linebreaks in the same file (e.g. \r\n used for some lines and \n for the others).

That's a good point, so I'm in favor of reverting to a more forgiving behavior as well.

@vicuna
Copy link
Author

vicuna commented Jan 22, 2014

Comment author: @damiendoligez

Patch applied in 4.01 branch (commit 14404) and trunk (commit 14405).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant