Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better explanation of multi-argument vs. tuple constructors in sum types #7783

Open
vicuna opened this issue Apr 23, 2018 · 27 comments · May be fixed by #12348
Open

Better explanation of multi-argument vs. tuple constructors in sum types #7783

vicuna opened this issue Apr 23, 2018 · 27 comments · May be fixed by #12348
Assignees

Comments

@vicuna
Copy link

vicuna commented Apr 23, 2018

Original bug ID: 7783
Reporter: pmetzger
Assigned to: @Octachron
Status: assigned (set by @Octachron on 2018-04-24T14:14:22Z)
Resolution: open
Priority: normal
Severity: minor
Platform: N/A
OS: N/A
OS Version: N/A
Version: 4.06.1
Category: documentation
Related to: #5883
Monitored by: @nojb @hhugo @gasche

Bug description

The manual does not sufficiently explain a subtle point. Consider the following declaration:

type foo =
| A of intint
| B of (int
int)

Although these look similar, A and B are quite different. A is a constructor taking two arguments, even though it looks like it takes a tuple, while B of course takes a tuple.

This means that the two cannot be used interchangeably and in fact have different enough meanings to matter in a lot of contexts.

This is confusing to a beginner because it would seem like the parentheses are just meaningless precedence grouping, but they aren’t, they’re syntactically quite significant, and it would seem like the * in the A case means “tuple” but it does not.

(Note that I myself didn't understand this distinction. The manual does not make it particularly clear. That is why I am filing this bug report. A discussion on the discord chat channel seemed to reveal that many beginners were confused about this.)

It would be good to make this very explicit in the manual.

Steps to reproduce

  1. Try learning OCaml
  2. Become confused about multi-argument constructors vs. tuple constructors in sum types.
@vicuna
Copy link
Author

vicuna commented Apr 23, 2018

Comment author: pmetzger

BTW, during discussion, what cemented the distinction in my mind was someone noting that, if constructors had function types (as in Haskell), then:

A: int -> int -> foo
while
B: (int * int) -> foo

This made the difference much clearer to me. (The use of * in the A case is unfortunate I think, but that decision was made decades ago.)

@vicuna
Copy link
Author

vicuna commented Apr 24, 2018

Comment author: sampo

OCaml provides pairs, triples, quadruples and so forth. For n >= 2, the collection of n values is called an n-tuple, or just a tuple. The tuple whose components are x1,x2,...,xn is written (x1,x2,...,xn). Such a value is created by an expression of the form (E1,E2,...,En). With functions, tuples give the effect of multiple arguments and results.
#type foo =
| A of int * int
| B of (int * int);;
type foo = A of int * int | B of (int * int)

A(1,2);;

  • : foo = A (1, 2)

B(1,2);;

  • : foo = B (1, 2)

let average(x,y) = (x+y) / 2;;

val average : int * int -> int =

This writting is also the same:
val average : (int * int) -> int =

I hope this will help.

@vicuna
Copy link
Author

vicuna commented Apr 24, 2018

Comment author: pmetzger

Sampo, I'm not sure what you were suggesting? The point is that there's a difference between a constructor that takes multiple values and a constructor that takes a single tuple, but the syntax is so similar that beginners have problems with this, and it is important to document the difference carefully. I am not talking about a misunderstanding that I myself am experiencing, I'm suggesting that documentation on this subtlety be improved. If you were proposing documentation, I'm not sure that really explains it well enough.

@vicuna
Copy link
Author

vicuna commented Apr 24, 2018

Comment author: pmetzger

Here is some possible documentation that could be added in section 1.4 of the manual ("Records and Variants"):

Note that although the syntax is almost the same, there is a significant difference between a variant constructor that takes multiple arguments and one that takes a single tuple as an argument. Consider the following type:

type foo =
| A of intint
| B of (int
int)

Although the syntax for A and B seems nearly the same, the parentheses are quite significant. A is a constructor that takes two arguments, both of which are integer typed values, while B is a constructor that takes a single argument, a tuple of two integers. These behave quite differently in many circumstances, such as when pattern matching. Be careful not to confuse them.

@vicuna
Copy link
Author

vicuna commented Apr 24, 2018

Comment author: sampo

@pmetzger.
A is a constructor taking a tuple, not arguments. This is a "Cartesian product". The values themselves can be grouped in pairs or even in tuples.
int * int is a pair or a tuple likewise (int * int). There is no difference.
However, external parentheses are not obligatory. The "Cartesian product" is denoted by the symbol *.
type such as these:
t1 * t2 * t3
(t1 * t2) * t3
t1 * (t2 * t3)
The first indicates a set of triples.
The second indicates a set of pairs where the first component is also a pair.
The third indicates a set of pairs where the second component is also a pair.

@vicuna
Copy link
Author

vicuna commented Apr 24, 2018

Comment author: @Octachron

@sampo, I am afraid that you are mistaken. Constructor arguments are not packed in a tuple by default: there are differences in both behaviors in pattern matching and memory representations between type t = A of int * int and type s = B of (int * int). I recommand playing with caml-inspect and reading the memory representation chapter of RWO to learn about the difference in memory representation.

@vicuna
Copy link
Author

vicuna commented Apr 24, 2018

Comment author: sampo

@Octachron,a pair or a 2-tuple will always remain a pair or a 2-tuple, with or without parentheses and this has existed since the birth of CAML and continues with Ocaml. This is a fact.
The representation in memory is another matter.

@vicuna
Copy link
Author

vicuna commented Apr 24, 2018

Comment author: @trefis

No.

type t = A of int * string;;

type t = A of int * string

let x = (3, "haha");;

val x : int * string = (3, "haha")

A x;;

Characters 0-3:
A x;;
^^^
Error: The constructor A expects 2 argument(s),
but is applied here to 1 argument(s)

@vicuna
Copy link
Author

vicuna commented Apr 24, 2018

Comment author: pmetzger

Sampo: apparently, the documentation update I'm requesting is even more necessary than I had naively expected.

@vicuna
Copy link
Author

vicuna commented Apr 24, 2018

Comment author: sampo

@trefis, yes, this is an example that shows the power of Ocaml typing. But the two expected arguments have a name and are called a pair or a 2-tuple. And that makes the difference that apparently you do not discern. (respectfully speaking) As I said above, a type int * int or (int * int) is the same and is a component value written in parenthesis like this (1,2), a comma serves as the separator between elements. In this case (your example) the argument is not a value and another value but it is a pair or a 2-tuple that is expected. Reading the error that the compiler gives can cause confusion in the understanding, and apparently that's
what he does.

@vicuna
Copy link
Author

vicuna commented Apr 24, 2018

Comment author: pmetzger

Sampo:

Abusing notation (since unlike Haskell, constructors don't have types):

The type of "A" in "type t = A of int * int" would be "A: int -> int -> int"
The type of "B" in "type s = B of (int * int)" would be "B: (int * int) -> int"

These are not the same. The "*" in the first case is rather unfortunate notation, but we're stuck with it.

This is why I think this really needs to be documented. It confuses people, even quite smart people, and nothing in the documentation really explains it.

@vicuna
Copy link
Author

vicuna commented Apr 24, 2018

Comment author: pmetzger

(And again note that if you feed a tuple to A, it won't work it is expecting two arguments, but if you feed a tuple to B it will. They are not the same.)

@vicuna
Copy link
Author

vicuna commented Apr 25, 2018

Comment author: sampo

@pmetzger
This is a matter of writing. Let's go back to the past, at the time of CAML.
Write these examples in the Caml top level.
#type t = A of int * int;;
Le type t est défini.
#type s = B of (int * int);;
Le type s est défini.
As you can see there is no signature representing the types with or without parenthesis. Only an acknowledgment of receipt. Then write:

A = B;;
^
Cette expression est de type int * int -> s,
mais est utilisée avec le type int * int -> t.
#B = A;;
Entrée interactive:
B = A;;
^
Cette expression est de type int * int -> t,
mais est utilisée avec le type int * int -> s.
The corresponding signature is int * int, no parenthesis.

Come back to today, with Ocaml write those same sentences in the top level:

type t = A of int * int;;

type t = A of int * int

type s = B of (int * int);;

type s = B of (int * int)
The writing (the answer) of the interpreter is different. Then write:

A = B;;

Characters 0-1:
A = B;;
^
Error: The constructor A expects 2 argument(s),
but is applied here to 0 argument(s)

B = A;;

Characters 0-1:
B = A;;
^
Error: The constructor B expects 1 argument(s),
but is applied here to 0 argument(s)

As you can see, the writing (the answer) of the interpreter is different.
However, it's still the same thing in all these past years.

Perhaps, to summarize this situation, it is necessary to simply improve the error message written in the top level, for a better comprehension?

@vicuna
Copy link
Author

vicuna commented Apr 25, 2018

Comment author: @Octachron

Sampo, you are getting confused by CAML promotion of constructor to function when they are partially applied. I assure you that constructors do have an arity in OCaml, as stated by the compiler error messages and the reference manual(e.g. http://caml.inria.fr/pub/docs/manual-ocaml/expr.html#sec146).

@vicuna
Copy link
Author

vicuna commented Apr 26, 2018

Comment author: @hhugo

Would it help to make the parallel with inline record ?

@vicuna
Copy link
Author

vicuna commented Apr 26, 2018

Comment author: pmetzger

See: https://discuss.ocaml.org/t/sum-type-constructor-declaration-subtlety-and-documentation-wanted/1898 for some proposed language.

@vicuna
Copy link
Author

vicuna commented Apr 27, 2018

Comment author: sampo

I read the debate with particular attention. Many agree and apparently I will be the only one who does not agree.
In fact, I understand your reasoning at all, but I think that in this particular situation you have to take a solid foundation, a base, that is the Cartesian product. And a Cartesian product always indicates an n-tuple, composed of argument(s), singleton, 2-tuples, 3-tuples,..., n-tuples with or without parenthesis.
So to resume the example of bobbypriambodo I will say that:
type account =
| Facebook(string, int) /* not 2 arguments but a 2-tuple or pair*/;
type account2 =
| Instagram((string, int)) /* not 1 argument -not happens to be a 2-tuple but also a 2-tuple or pair*/;
This is my point of view and I do not get confused.

@vicuna
Copy link
Author

vicuna commented Apr 29, 2018

Comment author: sampo

@Octachron. I would like to say one more thing before you change the documentation. I resumed the example of trefis with a small change.

type bar = int * string;;

type bar = int * string

type t = A of bar;;

type t = A of bar

let x = (3, "haha");;

val x : int * string = (3, "haha")

A x;;

  • : t = A (3, "haha")
    Here, below, another example that shows some difficulty in calculating the number of arguments.

let addpoint ((x1,y1), (x2,y2)) = (x1+.x2, y1+.y2);;

val addpoint : (float * float) * (float * float) -> float * float =
This function take a pair of points. Its arguments pattern is a pair of pairs.
Look again at the argument pattern of addpoint. We may equivalently view this function as taking:

  • one argument: a pair of pairs of float numbers
  • two arguments: each a pair of float numbers
  • four arguments: all float numbers, oddly grouped
    There is a probability that the compiler is mistaken in calculating the arity of the expressions given to it in a constructor with or without parentheses, between an argument and two arguments like the example above.

@github-actions
Copy link

github-actions bot commented May 7, 2020

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 7, 2020
@pmetzger
Copy link
Member

This problem isn't really fixed; it should still be looked at. It's just a documentation change. I note that the move from the original bug tracker has messed up most of the formatting.

@github-actions github-actions bot removed the Stale label Jun 10, 2020
@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Jun 16, 2021
@Octachron Octachron removed the Stale label Jun 16, 2021
@pmetzger
Copy link
Member

@Octachron if I submitted a pull request for a documentation fix, would it be helpful?

@Octachron
Copy link
Member

It would certainly be helpful and I would gladly review this documentation fix.

@github-actions
Copy link

github-actions bot commented Jul 1, 2022

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Jul 1, 2022
@Octachron Octachron removed the Stale label Jul 1, 2022
@pmetzger
Copy link
Member

pmetzger commented Jul 1, 2022

I still should write this up.

@github-actions
Copy link

github-actions bot commented Jul 3, 2023

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Jul 3, 2023
gasche added a commit to gasche/ocaml that referenced this issue Jul 3, 2023
…e-argument variant constructors

fixes ocaml#7783, see also
https://discuss.ocaml.org/t/sum-type-constructor-declaration-subtlety-and-documentation-wanted/1898

The new content is included at the end of the (fairly brief)
presentation of records and variants in the "tutorial / language tour"
part of the manual.

This is arguably more advanced content and it could also somewhere
else, but it is unclear that the "reference manual" part of the manual
would be more appropriate.

It depends on how we think of this explanation/warning and its
intended target audience: is it a pitfall that all OCaml beginners
should be told about to avoid confusion later (similar to: `Array.make
xlen (Array.make ylen 0)`), or is it advanced content that we want
teachers/mentors to be able to point to as a reference explanation?

This commits tries to aim for the "beginner tour" aspect, as suggested
by past discussions, so it is intentionally brief and light on
details.
@pmetzger
Copy link
Member

pmetzger commented Jul 7, 2023

As this is being discussed in a pull request, it should not be marked stale (or at least not currently.)

@github-actions github-actions bot removed the Stale label Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants