New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Str library: $ inconsistency #7024
Comments
Comment author: @gasche The standard library input functions will (when the file is being read in text, rather than binary mode) translate \r\n into \n at reading time under Windows. This means that you should not manipulate strings with \r\n in the OCaml world, and that in particular Str can assume than line ends with \n. Do you have a particular reason for manipulating raw strings that have not been read in O_TEXT mode? |
Comment author: flindgren The program reads many files through the Unix module, which does not seem to support text mode. Some text files may embed binary data and cannot be read in translating modes. But regardless of the validity or not of my use case, is this only-supports-LF documented somewhere? The documentation of Str only refers generally to line endings and newlines, without specifying that they must be of the right type. Is it documented elsewhere? |
Comment author: @xavierleroy There is a general assumption in OCaml libraries that "newline" means '\n' (LF). I agree it's not stated explicitely anywhere. Would it be enough to document this? E.g. for Str, '$' ... [m]atches at end of line: either at the end of the matched string, or just before a '\n' character I don't feel like adding a special case to the regexp matcher so that '$' also matches just before a "\r\n" sequence. |
Comment author: flindgren I'm satisfied with documentation. Thanks. |
Comment author: @xavierleroy Documentation updated, commit [trunk bf87415] |
Original bug ID: 7024
Reporter: flindgren
Status: closed (set by @xavierleroy on 2017-02-16T14:16:33Z)
Resolution: fixed
Priority: normal
Severity: minor
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: otherlibs
Monitored by: @gasche
Bug description
The Str library states that the $ metacharacter "[m]atches at end of line (either at the end of the matched string, or just before a newline character)". However, it appears that it only matches against LF and not other types of ends of line (say, CRLF). The documentation is not consistent with the observed behaviour.
Steps to reproduce
From an Ocaml toplevel:
#load "str.cma";;
let stringlf = "test\n";;
let stringcrlf = "test\r\n";;
let regexp = (Str.regexp "test$");;
Str.string_match regexp stringlf 0;; -> true
Str.string_match regexp stringcrlf 0;; -> false
The text was updated successfully, but these errors were encountered: