ocaml REPL does not truncate value of type 'bytes' and 'string' #7127

vicuna · 2016-01-13T22:17:47Z

Original bug ID: 7127
Reporter: xyh
Assigned to: @lefessan
Status: resolved (set by @gasche on 2017-06-04T22:51:23Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: 4.02.3
Fixed in version: 4.06.0 +dev/beta1/beta2/rc1
Category: toplevel
Tags: patch, junior_job
Monitored by: junsli @gasche @diml @hcarty

Bug description

ocaml REPL does not truncate value of type 'bytes' and 'string'

File attachments

vicuna · 2016-01-13T22:23:43Z

Comment author: @gasche

Indeed: if you do

String.init 10_000 (fun i -> char_of_int (i mod 26 + int_of_char 'A'));;

you get a rather long string shown in the toplevel, whereas

Array.init 10_000 (fun i -> char_of_int (i mod 26 + int_of_char 'A'));;

: char array =
[|'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O';
'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D';
'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S';
'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H';
'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W';
'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L';
'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A';
'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P';
'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E';
'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T';
'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I';
'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X';
'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M';
'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B';
'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q';
'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F';
'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U';
'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J';
'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y';
'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; ...|]

cuts the array representation after the first few elements.

It would be consistent to summarize the string using the same threshold as well.

vicuna · 2016-01-27T02:50:33Z

Comment author: junsli

I am working on this bug now.

vicuna · 2016-01-29T03:44:33Z

Comment author: junsli

Hi, How do you think this patch?

The ellipsis is printed for Array because iteration reaches the max (max_printer_steps). And toplevel prints string using print_string. This patch refactors max_printer_steps, and defines max_string_length to be the threshold for both.

Thanks.

vicuna · 2016-01-29T17:18:26Z

Comment author: @gasche

The patch is ok, but it is a bit frustrating that you have to plumb the steps parameter this way.

Another thing you could try is to remove the string printer from the list of default printers, and instead have it be another case of the catch-all case where Array is already handled, and the "steps" information in accessible. Would you be motivated enough to maybe try this as well, so that it is possible to compare the two approaches?

(One advantage of the latter is that the information would take into account the steps already taken, not just a global fixed constant.)

While reading your patch and the codebase, I wondered how bytes is handled when -safe-string is set. Do you know what happens? How come we don't need to have separate printers for string and bytes?

vicuna · 2016-01-30T02:44:18Z

Comment author: junsli

Thanks gasche. I agree with you; I feel...odd about this solution.

I'll try your suggestion.

I have never used -safe-string before. Top-level gives me on Bytes with -safe-string. I'll see what's going on here.

vicuna · 2016-01-30T03:20:37Z

Comment author: @gasche

Indeed, bytes are not printed when -safe-string is passed, because the type-directed lookup does not find anything for them.

Whatever solution we end up choosing for restricting string size, it would be nice to have a second commit to support Bytes printing as well. With the current patch, that would just mean adding another simple printer just for bytes; if the suggested approach works, you can test for the equality of the type path against either Predef.ident_string or Predef.ident_bytes.

vicuna · 2016-01-30T03:23:59Z

Comment author: @gasche

(Oh, and while you are at it, if your pseudonym is linked to a full name anyway, then it would be better to use your full name as an author name in the commit messages; you can use "git commit --amend --author=" to amend a commit's author name.)

vicuna · 2016-01-30T06:30:26Z

Comment author: junsli

Thank you! I'll report back.

vicuna · 2016-02-01T01:22:48Z

Comment author: junsli

OK. Simple fix. I simply remove the default string printer and print strings (and bytes) when it fails to find the default one.

Without -safe-string, bytes is just an alias of string types, so printing worked before. With -safe-string, as a new type, bytes did not have a default printer, so it was printed as .

To replace the literal "..." with Oval_ellipsis, we have to either change type Oval_string to "Oval_string of Oval_value list" or "Oval_string of string * bool" (bool indicates whether string is truncated) to encode the Oval_ellipsis. I think they are over designed and also put unnecessary overhead during constructing and printing Oval_string values. what do you think?

vicuna · 2016-02-01T01:43:00Z

Comment author: junsli

Two concerns.

After I fixed this, I was thinking, wait a minute, what's wrong with unlimited string output?
It could be simpler to get the string: "let str : string = O.obj obj in". It compiled fine with the compiler. What could go wrong if a mutable byte was wrongly treated as an immutable string and shortened by String.sub? Would memory leak occur?

vicuna · 2016-02-01T02:47:54Z

Comment author: @gasche

I think the second patch is better because it is simpler. I wouldn't worry about Oval_ellipsis, as the solutions are too complex, but it could be nice to have a comment at this point to explain why "..." is used instead.

After I fixed this, I was thinking, wait a minute, what's wrong with unlimited string output?

I would guess that it's the same argument as for unlimited printing in the general case: if the output is too long, it becomes harder to read.

It could be simpler to get the string: "let str : string = O.obj obj in". It compiled fine with the compiler. What could go wrong if a mutable byte was wrongly treated as an immutable string and shortened by String.sub? Would memory leak occur?

There would not be a memory leak with the current implementation, but you could observe a difference if the bytes got mutated between this step and the final printing step.

I like the fact of having two different code paths instead of relying on the same-representation detail to get shorter code. However, the current way you first Bytes.to_string and then String.sub is suboptimal (it will copy the bytes data in full, to sub it later), I think it would make sense to reduplicate the length-testing logic and use Bytes.sub_string in the bytes path.

vicuna · 2016-02-01T05:29:35Z

Comment author: junsli

Thanks for your comment Gabriel.

The change breaks tests/typing-extension-constructor/test.ml. I saw your editing history on this file. You may know what's going on. The test won't fail as long as strings are printed using print_string. The failing test has nothing to do with the truncation or bytes.

I like the fact of having two different code paths instead of relying on the same-representation detail to get shorter code. However, the current way you first Bytes.to_string and then String.sub is suboptimal (it will copy the bytes data in full, to sub it later), I think it would make sense to reduplicate the length-testing logic and use Bytes.sub_string in the bytes path.

Agreed. Awesome. I'll update the code accordingly, and send a pull request on github.

Thanks for your time.

vicuna · 2016-02-03T03:03:29Z

Comment author: junsli

For future reference: #454.

vicuna · 2017-02-28T15:41:56Z

Comment author: @damiendoligez

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ocaml REPL does not truncate value of type 'bytes' and 'string' #7127

ocaml REPL does not truncate value of type 'bytes' and 'string' #7127

vicuna commented Jan 13, 2016

vicuna commented Jan 13, 2016

vicuna commented Jan 27, 2016

vicuna commented Jan 29, 2016

vicuna commented Jan 29, 2016

vicuna commented Jan 30, 2016

vicuna commented Jan 30, 2016

vicuna commented Jan 30, 2016

vicuna commented Jan 30, 2016

vicuna commented Feb 1, 2016

vicuna commented Feb 1, 2016

vicuna commented Feb 1, 2016

vicuna commented Feb 1, 2016

vicuna commented Feb 3, 2016

vicuna commented Feb 28, 2017

vicuna commented Jun 4, 2017

ocaml REPL does not truncate value of type 'bytes' and 'string' #7127

ocaml REPL does not truncate value of type 'bytes' and 'string' #7127

Comments

vicuna commented Jan 13, 2016

Bug description

File attachments

vicuna commented Jan 13, 2016

Array.init 10_000 (fun i -> char_of_int (i mod 26 + int_of_char 'A'));;

vicuna commented Jan 27, 2016

vicuna commented Jan 29, 2016

vicuna commented Jan 29, 2016

vicuna commented Jan 30, 2016

vicuna commented Jan 30, 2016

vicuna commented Jan 30, 2016

vicuna commented Jan 30, 2016

vicuna commented Feb 1, 2016

vicuna commented Feb 1, 2016

vicuna commented Feb 1, 2016

vicuna commented Feb 1, 2016

vicuna commented Feb 3, 2016

vicuna commented Feb 28, 2017

vicuna commented Jun 4, 2017