Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Str library: $ inconsistency #7024

Closed
vicuna opened this issue Oct 18, 2015 · 5 comments
Closed

Str library: $ inconsistency #7024

vicuna opened this issue Oct 18, 2015 · 5 comments

Comments

@vicuna
Copy link

vicuna commented Oct 18, 2015

Original bug ID: 7024
Reporter: flindgren
Status: closed (set by @xavierleroy on 2017-02-16T14:16:33Z)
Resolution: fixed
Priority: normal
Severity: minor
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: otherlibs
Monitored by: @gasche

Bug description

The Str library states that the $ metacharacter "[m]atches at end of line (either at the end of the matched string, or just before a newline character)". However, it appears that it only matches against LF and not other types of ends of line (say, CRLF). The documentation is not consistent with the observed behaviour.

Steps to reproduce

From an Ocaml toplevel:

#load "str.cma";;
let stringlf = "test\n";;
let stringcrlf = "test\r\n";;
let regexp = (Str.regexp "test$");;
Str.string_match regexp stringlf 0;; -> true
Str.string_match regexp stringcrlf 0;; -> false

@vicuna
Copy link
Author

vicuna commented Oct 18, 2015

Comment author: @gasche

The standard library input functions will (when the file is being read in text, rather than binary mode) translate \r\n into \n at reading time under Windows. This means that you should not manipulate strings with \r\n in the OCaml world, and that in particular Str can assume than line ends with \n.

Do you have a particular reason for manipulating raw strings that have not been read in O_TEXT mode?

https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx

@vicuna
Copy link
Author

vicuna commented Oct 18, 2015

Comment author: flindgren

The program reads many files through the Unix module, which does not seem to support text mode. Some text files may embed binary data and cannot be read in translating modes.

But regardless of the validity or not of my use case, is this only-supports-LF documented somewhere? The documentation of Str only refers generally to line endings and newlines, without specifying that they must be of the right type. Is it documented elsewhere?

@vicuna
Copy link
Author

vicuna commented Nov 23, 2015

Comment author: @xavierleroy

There is a general assumption in OCaml libraries that "newline" means '\n' (LF). I agree it's not stated explicitely anywhere.

Would it be enough to document this? E.g. for Str,

'$' ... [m]atches at end of line: either at the end of the matched string, or just before a '\n' character

I don't feel like adding a special case to the regexp matcher so that '$' also matches just before a "\r\n" sequence.

@vicuna
Copy link
Author

vicuna commented Nov 24, 2015

Comment author: flindgren

I'm satisfied with documentation. Thanks.

@vicuna
Copy link
Author

vicuna commented Nov 26, 2015

Comment author: @xavierleroy

Documentation updated, commit [trunk bf87415]

@vicuna vicuna closed this as completed Feb 16, 2017
@vicuna vicuna added this to the 4.03.0 milestone Mar 14, 2019
@vicuna vicuna added the bug label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant