Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A format specifier for bytes #6429

Open
vicuna opened this issue May 17, 2014 · 16 comments
Open

A format specifier for bytes #6429

vicuna opened this issue May 17, 2014 · 16 comments

Comments

@vicuna
Copy link

vicuna commented May 17, 2014

Original bug ID: 6429
Reporter: @whitequark
Status: confirmed (set by @damiendoligez on 2014-05-21T15:32:19Z)
Resolution: open
Priority: normal
Severity: feature
Target version: undecided
Category: standard library
Monitored by: braibant @diml

Bug description

Without a %S-like format specifier for Bytes, debugging becomes extremely annoying--the code is littered with bogus conversions everywhere. Perhaps it's possible to provide one?

It also makes sense to provide its non-escaped equivalent as well ("%s").

@vicuna
Copy link
Author

vicuna commented May 17, 2014

Comment author: @gasche

I have no idea of what a good syntax would be. My only idea would be to reuse "%#s" (currently considered as "%s", but planning-to-be-outlawed in 4.02 as it doesn't mean anything), and it's mediocre at best.

@vicuna
Copy link
Author

vicuna commented May 17, 2014

Comment author: @whitequark

Well, you could disambiguate the conflict with %b/%B by using the second letter: %y/%Y, in the same way as e.g. options for Unix tools are disambiguated. This is what I would expect, at least.

@vicuna
Copy link
Author

vicuna commented May 17, 2014

Comment author: @gasche

Note that in the meantime, we could make sure that

"%a" Bytes.print by
"%a" Bytes.to_string by

works (by adding the relevant functions if need be for some *printf function), which would already have reasonable readability.

@vicuna
Copy link
Author

vicuna commented May 21, 2014

Comment author: @damiendoligez

While we decide which letter to use, you should use this for debugging without too much pain:

let (!!) = Bytes.unsafe_to_string;;
Printf.printf "hello %s\n" !!my_byte_sequence;;

For the letter, I think Y is a good candidate. My first idea was Z but we might want to use that for bignums at some point in the future.

As for %#s, that would make the type depend on the format's flags rather than the letter. A very bad idea.

@vicuna
Copy link
Author

vicuna commented Sep 21, 2014

Comment author: @gasche

I considered implementing this, but I'm not happy with having both %y and %Y.

The problem is that the usual semantics of the big-letter version is "as written in OCaml source code", so it would seem natural that whichever output syntax is chosen for %Y also produces valid OCaml literals; but we have no literal syntax for bytes.

On the other hand, the escaped-printing behavior of %S is certainly more useful than the non-escaped behavior of %s for bytes, for the "byte sequences" applications that have no reason to stay in the printable ASCII range; so if we had only one formatter for bytes, it should probably have the semantics of %S.

@vicuna
Copy link
Author

vicuna commented Feb 25, 2015

Comment author: @damiendoligez

I think the parallel with %s/%S is quite natural, so if you want to implement only one, it should be %Y...

@vicuna
Copy link
Author

vicuna commented Apr 21, 2016

Comment author: @whitequark

I think implementing just %Y is a good idea.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 13, 2020
@gasche
Copy link
Member

gasche commented May 13, 2020

I believe this is still a relevant feature request.

Thinking about it again, it is not clear that there is no value for a y format that would output the literals directly -- for example if people have bytes that contain terminal escapes or things like that.
We could also support xy to print the bytes in hexadecimal.

@gasche
Copy link
Member

gasche commented May 13, 2020

Marking this as "newcomer job advanced": the Format machinery uses advanced types, one has to be familiar with GADTs to work on them, but then it is doable for a newcomer to add support for a new conversion by imitating the existing code.

@nojb
Copy link
Contributor

nojb commented May 13, 2020

We could also support xy to print the bytes in hexadecimal.

This would actually be quite useful.

@gasche
Copy link
Member

gasche commented May 13, 2020

I should point out that the suggestion is inspired by @dra27's work on #9446.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Aug 19, 2022
@shindere shindere removed the Stale label Aug 29, 2022
@shindere shindere self-assigned this Aug 29, 2022
@shindere
Copy link
Contributor

shindere commented Aug 29, 2022 via email

@XVilka
Copy link
Contributor

XVilka commented Jul 3, 2023

Just a food for thought, for debugging the sparse hexadecimal format might be more useful, at least it's quite useful for the reverse engineering tasks:

[0x0040bbe0]> px
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x0040bbe0  4885 f674 6b55 5348 83ec 0848 8b46 5048  H..tkUSH...H.FPH
0x0040bbf0  8b2d aa75 2200 488b 1d53 a422 0048 8905  .-.u".H..S.".H..
0x0040bc00  9c75 2200 488b 4620 4885 c074 0d48 8338  .u".H.F H..t.H.8
0x0040bc10  00ba 0000 0000 480f 44c2 4889 fe48 c7c2  ......H.D.H..H..
0x0040bc20  ffff ffff 31ff 4889 0523 a422 00e8 bef4  ....1.H..#."....
0x0040bc30  ffff 4889 2d67 7522 0048 891d 10a4 2200  ..H.-gu".H....".
0x0040bc40  4883 c408 5b5d c366 0f1f 8400 0000 0000  H...[].f........
0x0040bc50  4889 fe48 c7c2 ffff ffff 31ff e98f f4ff  H..H......1.....
0x0040bc60  ff0f 1f44 0000 662e 0f1f 8400 0000 0000  ...D..f.........
0x0040bc70  4155 4154 5553 4883 ec08 488b 2df7 7622  AUATUSH...H.-.v"
0x0040bc80  0048 8b1d f876 2200 48c7 05e5 7622 0000  .H...v".H...v"..
0x0040bc90  0000 0048 85f6 7478 488b 4650 4c8b 2dfd  ...H..txH.FPL.-.
0x0040bca0  7422 004c 8b25 a6a3 2200 4889 05ef 7422  t".L.%..".H...t"
0x0040bcb0  0048 8b46 2048 85c0 740d 4883 3800 ba00  .H.F H..t.H.8...
0x0040bcc0  0000 0048 0f44 c248 89fe 48c7 c2ff ffff  ...H.D.H..H.....
0x0040bcd0  ff31 ff48 8905 76a3 2200 e811 f4ff ff4c  .1.H..v."......L
[0x0040bbe0]> pxi
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
  40bbe0: .H 85 f6 .t .k .U .S .H 83 ec 08 .H 8b .F .P .H
  40bbf0: 8b .- aa .u ."    .H 8b 1d .S a4 ."    .H 89 05
  40bc00: 9c .u ."    .H 8b .F .  .H 85 c0 .t 0d .H 83 .8
  40bc10:    ba             .H 0f .D c2 .H 89 fe .H c7 c2
  40bc20: ## ## ## ## .1 ## .H 89 05 .# a4 ."    e8 be f4
  40bc30: ## ## .H 89 .- .g .u ."    .H 89 1d 10 a4 ."
  40bc40: .H 83 c4 08 .[ .] c3 .f 0f 1f 84
  40bc50: .H 89 fe .H c7 c2 ## ## ## ## .1 ## e9 8f f4 ##
  40bc60: ## 0f 1f .D       .f .. 0f 1f 84
  40bc70: .A .U .A .T .U .S .H 83 ec 08 .H 8b .- f7 .v ."
  40bc80:    .H 8b 1d f8 .v ."    .H c7 05 e5 .v ."
  40bc90:          .H 85 f6 .t .x .H 8b .F .P .L 8b .- fd
  40bca0: .t ."    .L 8b .% a6 a3 ."    .H 89 05 ef .t ."
  40bcb0:    .H 8b .F .  .H 85 c0 .t 0d .H 83 .8    ba
  40bcc0:          .H 0f .D c2 .H 89 fe .H c7 c2 ## ## ##
  40bcd0: ## .1 ## .H 89 05 .v a3 ."    e8 11 f4 ## ## .L
  40bce0 ]
[0x0040bbe0]>

The first is the "normal" hex, the second is the sparse "hexII" format invented by Ange Albertini.

See more information at https://speakerdeck.com/ange/no-more-dumb-hex

@shindere shindere removed their assignment Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants