Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007127OCamltoplevelpublic2016-01-13 23:172017-06-05 00:51
Reporterxyh 
Assigned Tolefessan 
PrioritynormalSeverityfeatureReproducibilityalways
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version4.02.3 
Target VersionFixed in Version4.06.0 +dev/beta1/beta2/rc1 
Summary0007127: ocaml REPL does not truncate value of type 'bytes' and 'string'
Descriptionocaml REPL does not truncate value of type 'bytes' and 'string'
Tagsjunior_job, patch
Attached Filespatch file icon truncate_toplevel_string.patch [^] (1,911 bytes) 2016-01-29 04:35 [Show Content]
patch file icon truncate_and_print_bytes.patch [^] (1,825 bytes) 2016-02-01 01:59 [Show Content]

- Relationships

-  Notes
(0015254)
gasche (administrator)
2016-01-13 23:23

Indeed: if you do

  String.init 10_000 (fun i -> char_of_int (i mod 26 + int_of_char 'A'));;

you get a rather long string shown in the toplevel, whereas

# Array.init 10_000 (fun i -> char_of_int (i mod 26 + int_of_char 'A'));;
- : char array =
[|'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O';
  'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D';
  'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S';
  'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H';
  'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W';
  'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L';
  'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A';
  'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P';
  'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E';
  'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T';
  'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I';
  'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X';
  'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M';
  'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B';
  'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q';
  'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F';
  'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U';
  'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J';
  'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y';
  'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; ...|]

cuts the array representation after the first few elements.

It would be consistent to summarize the string using the same threshold as well.
(0015271)
junsli (reporter)
2016-01-27 03:50

I am working on this bug now.
(0015287)
junsli (reporter)
2016-01-29 04:44

Hi, How do you think this patch?

The ellipsis is printed for Array because iteration reaches the max (max_printer_steps). And toplevel prints string using print_string. This patch refactors max_printer_steps, and defines max_string_length to be the threshold for both.

Thanks.
(0015289)
gasche (administrator)
2016-01-29 18:18

The patch is ok, but it is a bit frustrating that you have to plumb the steps parameter this way.

Another thing you could try is to remove the string printer from the list of default printers, and instead have it be another case of the catch-all case where Array is already handled, and the "steps" information in accessible. Would you be motivated enough to maybe try this as well, so that it is possible to compare the two approaches?

(One advantage of the latter is that the information would take into account the steps already taken, not just a global fixed constant.)

While reading your patch and the codebase, I wondered how bytes is handled when -safe-string is set. Do you know what happens? How come we don't need to have separate printers for string and bytes?
(0015292)
junsli (reporter)
2016-01-30 03:44

Thanks gasche. I agree with you; I feel...odd about this solution.

I'll try your suggestion.

I have never used -safe-string before. Top-level gives me <abstr> on Bytes with -safe-string. I'll see what's going on here.
(0015293)
gasche (administrator)
2016-01-30 04:20

Indeed, bytes are not printed when -safe-string is passed, because the type-directed lookup does not find anything for them.

Whatever solution we end up choosing for restricting string size, it would be nice to have a second commit to support Bytes printing as well. With the current patch, that would just mean adding another simple printer just for bytes; if the suggested approach works, you can test for the equality of the type path against either Predef.ident_string or Predef.ident_bytes.
(0015294)
gasche (administrator)
2016-01-30 04:23
edited on: 2016-01-30 04:24

(Oh, and while you are at it, if your pseudonym is linked to a full name anyway, then it would be better to use your full name as an author name in the commit messages; you can use "git commit --amend --author=<author>" to amend a commit's author name.)

(0015295)
junsli (reporter)
2016-01-30 07:30

Thank you! I'll report back.
(0015296)
junsli (reporter)
2016-02-01 02:22

OK. Simple fix. I simply remove the default string printer and print strings (and bytes) when it fails to find the default one.

Without -safe-string, bytes is just an alias of string types, so printing worked before. With -safe-string, as a new type, bytes did not have a default printer, so it was printed as <abstr>.

To replace the literal "..." with Oval_ellipsis, we have to either change type Oval_string to "Oval_string of Oval_value list" or "Oval_string of string * bool" (bool indicates whether string is truncated) to encode the Oval_ellipsis. I think they are over designed and also put unnecessary overhead during constructing and printing Oval_string values. what do you think?
(0015297)
junsli (reporter)
2016-02-01 02:43

Two concerns.

1. After I fixed this, I was thinking, wait a minute, what's wrong with unlimited string output?

2. It could be simpler to get the string: "let str : string = O.obj obj in". It compiled fine with the compiler. What could go wrong if a mutable byte was wrongly treated as an immutable string and shortened by String.sub? Would memory leak occur?
(0015298)
gasche (administrator)
2016-02-01 03:47

I think the second patch is better because it is simpler. I wouldn't worry about Oval_ellipsis, as the solutions are too complex, but it could be nice to have a comment at this point to explain why "..." is used instead.

> 1. After I fixed this, I was thinking, wait a minute, what's wrong with unlimited string output?

I would guess that it's the same argument as for unlimited printing in the general case: if the output is too long, it becomes harder to read.

> 2. It could be simpler to get the string: "let str : string = O.obj obj in". It compiled fine with the compiler. What could go wrong if a mutable byte was wrongly treated as an immutable string and shortened by String.sub? Would memory leak occur?

There would not be a memory leak with the current implementation, but you could observe a difference if the bytes got mutated between this step and the final printing step.

I like the fact of having two different code paths instead of relying on the same-representation detail to get shorter code. However, the current way you first Bytes.to_string and then String.sub is suboptimal (it will copy the bytes data in full, to sub it later), I think it would make sense to reduplicate the length-testing logic and use Bytes.sub_string in the bytes path.
(0015299)
junsli (reporter)
2016-02-01 06:29

Thanks for your comment Gabriel.

The change breaks tests/typing-extension-constructor/test.ml. I saw your editing history on this file. You may know what's going on. The test won't fail as long as strings are printed using print_string. The failing test has nothing to do with the truncation or bytes.

> I like the fact of having two different code paths instead of relying on the same-representation detail to get shorter code. However, the current way you first Bytes.to_string and then String.sub is suboptimal (it will copy the bytes data in full, to sub it later), I think it would make sense to reduplicate the length-testing logic and use Bytes.sub_string in the bytes path.

Agreed. Awesome. I'll update the code accordingly, and send a pull request on github.

Thanks for your time.
(0015303)
junsli (reporter)
2016-02-03 04:03
edited on: 2017-02-28 16:34

For future reference: GPR#454.

(0017496)
doligez (administrator)
2017-02-28 16:41

See also:
GPR#1058 - https://github.com/ocaml/ocaml/pull/1058 [^]
GPR#454 - https://github.com/ocaml/ocaml/pull/454 [^]
(0017842)
gasche (administrator)
2017-06-05 00:51

GPR#1058 was merged in trunk (4.06~dev).

- Issue History
Date Modified Username Field Change
2016-01-13 23:17 xyh New Issue
2016-01-13 23:20 gasche Tag Attached: junior_job
2016-01-13 23:23 gasche Note Added: 0015254
2016-01-13 23:23 gasche Status new => acknowledged
2016-01-27 03:50 junsli Note Added: 0015271
2016-01-29 04:35 junsli File Added: truncate_toplevel_string.patch
2016-01-29 04:44 junsli Note Added: 0015287
2016-01-29 18:18 gasche Note Added: 0015289
2016-01-30 03:44 junsli Note Added: 0015292
2016-01-30 04:20 gasche Note Added: 0015293
2016-01-30 04:23 gasche Note Added: 0015294
2016-01-30 04:24 gasche Note Edited: 0015294 View Revisions
2016-01-30 07:30 junsli Note Added: 0015295
2016-02-01 01:59 junsli File Added: truncate_and_print_bytes.patch
2016-02-01 02:22 junsli Note Added: 0015296
2016-02-01 02:43 junsli Note Added: 0015297
2016-02-01 03:47 gasche Note Added: 0015298
2016-02-01 06:29 junsli Note Added: 0015299
2016-02-03 04:03 junsli Note Added: 0015303
2016-02-17 19:20 yallop Tag Attached: patch
2017-02-23 16:36 doligez Category OCaml general => -OCaml general
2017-02-28 16:34 doligez Note Edited: 0015303 View Revisions
2017-02-28 16:41 doligez Note Added: 0017496
2017-02-28 16:42 doligez Category -OCaml general => toplevel
2017-02-28 16:42 doligez Assigned To => lefessan
2017-02-28 16:42 doligez Status acknowledged => assigned
2017-06-05 00:51 gasche Note Added: 0017842
2017-06-05 00:51 gasche Status assigned => resolved
2017-06-05 00:51 gasche Fixed in Version => 4.06.0 +dev/beta1/beta2/rc1
2017-06-05 00:51 gasche Resolution open => fixed


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker