Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocaml REPL does not truncate value of type 'bytes' and 'string' #7127

Closed
vicuna opened this issue Jan 13, 2016 · 15 comments
Closed

ocaml REPL does not truncate value of type 'bytes' and 'string' #7127

vicuna opened this issue Jan 13, 2016 · 15 comments

Comments

@vicuna
Copy link

vicuna commented Jan 13, 2016

Original bug ID: 7127
Reporter: xyh
Assigned to: @lefessan
Status: resolved (set by @gasche on 2017-06-04T22:51:23Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: 4.02.3
Fixed in version: 4.06.0 +dev/beta1/beta2/rc1
Category: toplevel
Tags: patch, junior_job
Monitored by: junsli @gasche @diml @hcarty

Bug description

ocaml REPL does not truncate value of type 'bytes' and 'string'

File attachments

@vicuna
Copy link
Author

vicuna commented Jan 13, 2016

Comment author: @gasche

Indeed: if you do

String.init 10_000 (fun i -> char_of_int (i mod 26 + int_of_char 'A'));;

you get a rather long string shown in the toplevel, whereas

Array.init 10_000 (fun i -> char_of_int (i mod 26 + int_of_char 'A'));;

  • : char array =
    [|'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O';
    'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D';
    'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S';
    'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H';
    'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W';
    'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L';
    'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A';
    'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P';
    'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E';
    'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T';
    'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I';
    'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X';
    'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M';
    'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B';
    'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q';
    'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F';
    'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U';
    'V'; 'W'; 'X'; 'Y'; 'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J';
    'K'; 'L'; 'M'; 'N'; 'O'; 'P'; 'Q'; 'R'; 'S'; 'T'; 'U'; 'V'; 'W'; 'X'; 'Y';
    'Z'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; 'J'; 'K'; 'L'; 'M'; ...|]

cuts the array representation after the first few elements.

It would be consistent to summarize the string using the same threshold as well.

@vicuna
Copy link
Author

vicuna commented Jan 27, 2016

Comment author: junsli

I am working on this bug now.

@vicuna
Copy link
Author

vicuna commented Jan 29, 2016

Comment author: junsli

Hi, How do you think this patch?

The ellipsis is printed for Array because iteration reaches the max (max_printer_steps). And toplevel prints string using print_string. This patch refactors max_printer_steps, and defines max_string_length to be the threshold for both.

Thanks.

@vicuna
Copy link
Author

vicuna commented Jan 29, 2016

Comment author: @gasche

The patch is ok, but it is a bit frustrating that you have to plumb the steps parameter this way.

Another thing you could try is to remove the string printer from the list of default printers, and instead have it be another case of the catch-all case where Array is already handled, and the "steps" information in accessible. Would you be motivated enough to maybe try this as well, so that it is possible to compare the two approaches?

(One advantage of the latter is that the information would take into account the steps already taken, not just a global fixed constant.)

While reading your patch and the codebase, I wondered how bytes is handled when -safe-string is set. Do you know what happens? How come we don't need to have separate printers for string and bytes?

@vicuna
Copy link
Author

vicuna commented Jan 30, 2016

Comment author: junsli

Thanks gasche. I agree with you; I feel...odd about this solution.

I'll try your suggestion.

I have never used -safe-string before. Top-level gives me on Bytes with -safe-string. I'll see what's going on here.

@vicuna
Copy link
Author

vicuna commented Jan 30, 2016

Comment author: @gasche

Indeed, bytes are not printed when -safe-string is passed, because the type-directed lookup does not find anything for them.

Whatever solution we end up choosing for restricting string size, it would be nice to have a second commit to support Bytes printing as well. With the current patch, that would just mean adding another simple printer just for bytes; if the suggested approach works, you can test for the equality of the type path against either Predef.ident_string or Predef.ident_bytes.

@vicuna
Copy link
Author

vicuna commented Jan 30, 2016

Comment author: @gasche

(Oh, and while you are at it, if your pseudonym is linked to a full name anyway, then it would be better to use your full name as an author name in the commit messages; you can use "git commit --amend --author=" to amend a commit's author name.)

@vicuna
Copy link
Author

vicuna commented Jan 30, 2016

Comment author: junsli

Thank you! I'll report back.

@vicuna
Copy link
Author

vicuna commented Feb 1, 2016

Comment author: junsli

OK. Simple fix. I simply remove the default string printer and print strings (and bytes) when it fails to find the default one.

Without -safe-string, bytes is just an alias of string types, so printing worked before. With -safe-string, as a new type, bytes did not have a default printer, so it was printed as .

To replace the literal "..." with Oval_ellipsis, we have to either change type Oval_string to "Oval_string of Oval_value list" or "Oval_string of string * bool" (bool indicates whether string is truncated) to encode the Oval_ellipsis. I think they are over designed and also put unnecessary overhead during constructing and printing Oval_string values. what do you think?

@vicuna
Copy link
Author

vicuna commented Feb 1, 2016

Comment author: junsli

Two concerns.

  1. After I fixed this, I was thinking, wait a minute, what's wrong with unlimited string output?

  2. It could be simpler to get the string: "let str : string = O.obj obj in". It compiled fine with the compiler. What could go wrong if a mutable byte was wrongly treated as an immutable string and shortened by String.sub? Would memory leak occur?

@vicuna
Copy link
Author

vicuna commented Feb 1, 2016

Comment author: @gasche

I think the second patch is better because it is simpler. I wouldn't worry about Oval_ellipsis, as the solutions are too complex, but it could be nice to have a comment at this point to explain why "..." is used instead.

  1. After I fixed this, I was thinking, wait a minute, what's wrong with unlimited string output?

I would guess that it's the same argument as for unlimited printing in the general case: if the output is too long, it becomes harder to read.

  1. It could be simpler to get the string: "let str : string = O.obj obj in". It compiled fine with the compiler. What could go wrong if a mutable byte was wrongly treated as an immutable string and shortened by String.sub? Would memory leak occur?

There would not be a memory leak with the current implementation, but you could observe a difference if the bytes got mutated between this step and the final printing step.

I like the fact of having two different code paths instead of relying on the same-representation detail to get shorter code. However, the current way you first Bytes.to_string and then String.sub is suboptimal (it will copy the bytes data in full, to sub it later), I think it would make sense to reduplicate the length-testing logic and use Bytes.sub_string in the bytes path.

@vicuna
Copy link
Author

vicuna commented Feb 1, 2016

Comment author: junsli

Thanks for your comment Gabriel.

The change breaks tests/typing-extension-constructor/test.ml. I saw your editing history on this file. You may know what's going on. The test won't fail as long as strings are printed using print_string. The failing test has nothing to do with the truncation or bytes.

I like the fact of having two different code paths instead of relying on the same-representation detail to get shorter code. However, the current way you first Bytes.to_string and then String.sub is suboptimal (it will copy the bytes data in full, to sub it later), I think it would make sense to reduplicate the length-testing logic and use Bytes.sub_string in the bytes path.

Agreed. Awesome. I'll update the code accordingly, and send a pull request on github.

Thanks for your time.

@vicuna
Copy link
Author

vicuna commented Feb 3, 2016

Comment author: junsli

For future reference: #454.

@vicuna
Copy link
Author

vicuna commented Feb 28, 2017

Comment author: @damiendoligez

See also:
#1058 - #1058
#454 - #454

@vicuna
Copy link
Author

vicuna commented Jun 4, 2017

Comment author: @gasche

#1058 was merged in trunk (4.06~dev).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants