Lexical conventions

Blanks

The following characters are considered as blanks: space, newline, horizontal tabulation, carriage return, line feed and form feed. Blanks are ignored, but they separate adjacent identifiers, literals and keywords that would otherwise be confused as one single identifier, literal or keyword.

Comments

Comments are introduced by the two characters (*, with no intervening blanks, and terminated by the characters *), with no intervening blanks. Comments are treated as blank characters. Comments do not occur inside string or character literals. Nested comments are correctly handled.

Identifiers

ident:
      letter {letter | 0...9 | _}

letter:
      A...Z | a...z

Identifiers are sequences of letters, digits and _ (the underscore character), starting with a letter. Letters contain at least the 52 lowercase and uppercase letters from the ASCII set. Implementations can recognize as letters other characters from the extended ASCII set. Identifiers cannot contain two adjacent underscore characters (__). Implementation may limit the number of characters of an identifier, but this limit must be above 256 characters. All characters in an identifier are meaningful.

Integer literals

integer-literal:
      [-] {0...9}+
   |  [-] (0x | 0X) {0...9 | A...F | a...f}+
   |  [-] (0o | 0O) {0...7}+
   |  [-] (0b | 0B) {0...1}+

An integer literal is a sequence of one or more digits, optionally preceded by a minus sign. By default, integer literals are in decimal (radix 10). The following prefixes select a different radix:
PrefixRadix
0x, 0Xhexadecimal (radix 16)
0o, 0Ooctal (radix 8)
0b, 0Bbinary (radix 2)
(The initial 0 is the digit zero; the O for octal is the letter O.)

Floating-point literals

float-literal:
      [-] {0...9}+ [. {0...9}] [(e | E) [+ | -] {0...9}+]

Floating-point decimals consist in an integer part, a decimal part and an exponent part. The integer part is a sequence of one or more digits, optionally preceded by a minus sign. The decimal part is a decimal point followed by zero, one or more digits. The exponent part is the character e or E followed by an optional + or - sign, followed by one or more digits. The decimal part or the exponent part can be omitted, but not both to avoid ambiguity with integer literals.

Character literals

char-literal:
      ` regular-char `
   |  ` \ (\ | ` | n | t | b | r) `
   |  ` \ (0...9) (0...9) (0...9) `

Character literals are delimited by ` (backquote) characters. The two backquotes enclose either one character different from ` and \, or one of the escape sequences below:
SequenceCharacter denoted
\\backslash (\)
\`backquote (`)
\nnewline (LF)
\rreturn (CR)
\thorizontal tabulation (TAB)
\bbackspace (BS)
\dddthe character with ASCII code ddd in decimal

String literals

string-literal:
      " {string-character} "

string-character:
      regular-char
   |  \ (\ | " | n | t | b | r)
   |  \ (0...9) (0...9) (0...9)

String literals are delimited by " (double quote) characters. The two double quotes enclose a sequence of either characters different from " and \, or escape sequences from the table below:
SequenceCharacter denoted
\\backslash (\)
\"double quote (")
\nnewline (LF)
\rreturn (CR)
\thorizontal tabulation (TAB)
\bbackspace (BS)
\dddthe character with ASCII code ddd in decimal

Implementations must support string literals up to 2^{16}-1 characters in length (65535 characters).

Keywords

The identifiers below are reserved as keywords, and cannot be employed otherwise:

        and    as     begin       do      done      downto 
        else   end    exception   for     fun       function 
        if     in     let         match   mutable   not 
        of     or     prefix      rec     then      to 
        try    type   value       where   while     with
The following character sequences are also keywords:
        #    !    !=   &    (    )    *    *.   +    +. 
        ,    -    -.   ->   .    .(   /    /.   :    :: 
        :=   ;    ;;   <    <.   <-   <=   <=.  <>   <>. 
        =    =.   ==   >    >.   >=   >=.  @    [    [| 
        ]    ^    _    __   {    |    |]   }    '

Ambiguities

Lexical ambiguities are resolved according to the ``longest match'' rule: when a character sequence can be decomposed into two tokens in several different ways, the decomposition retained is the one with the longest first token.