Standard JSON5

opt

opt

opt

opt

but not one of " or \ or LineTerminator

U+2028

U+2029

but not one of ' or \ or LineTerminator

U+2028

U+2029

5.1Escapes

Any character may be escaped. If the character is in the Basic Latin or Latin-1 Supplement Unicode character ranges (U+0000 through U+00FF), then it may be represented as a four-character sequence: a reverse solidus, followed by the lower case letter x, followed by two hexadecimal digits that encode the character’s code point. A reverse solidus followed by the lower case letter x must be followed by two hexadecimal digits.

If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lower case letter u, followed by four hexadecimal digits that encode the character’s code point. A reverse solidus followed by the lower case letter u must be followed by four hexadecimal digits. The hexadecimal letters A though F can be upper or lower case.

Example 1 (Informative)

A string containing only a single reverse solidus character may be represented as '\x5C' or '\u005C'.

To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a 12-character sequence, encoding the UTF-16 surrogate pair.

Example 2 (Informative)

A string containing only the musical score character 🎼 (U+1F3BC) may be represented as '\uD83C\uDFBC'.

Alternatively, there are two-character sequence escape representations of some popular characters. A decimal digit must not follow a reverse solidus followed by a zero.

Table 1: Escape sequences

Escape Sequence	Description	Code Point
`\'`	Apostrophe	U+0027
`\"`	Quotation mark	U+0022
`\\`	Reverse solidus	U+005C
`\b`	Backspace	U+0008
`\f`	Form feed	U+000C
`\n`	Line feed	U+000A
`\r`	Carriage return	U+000D
`\t`	Horizontal tab	U+0009
`\v`	Vertical tab	U+000B
`\0`	Null	U+0000

Example 3 (Informative)

A string containing only a single reverse solidus character may be represented more compactly as '\\'.

A string may be continued on a new line by following a reverse solidus with one of the following line terminator sequences. The reverse solidus and line terminator sequence are not included in the string.

Table 2: Line terminator sequences

Code Points	Description
U+000A	Line feed
U+000D	Carriage return
U+000D U+000A	Carriage return and line feed
U+2028	Line separator
U+2029	Paragraph separator

Example 4 (Informative)

The following strings represent the same information.

'Lorem ipsum dolor sit amet, \
consectetur adipiscing elit.'

'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'

If any other character follows a reverse solidus, except for the decimal digits 1 through 9, that character will be included in the string, but the reverse solidus will not.

Example 5 (Informative)

The following strings represent the same information.

'\A\C\/\D\C'

'AC/DC'

5.2Paragraph and Line Separators

Like JSON, JSON5 allows the Unicode code points U+2028 and U+2029 to appear unescaped in strings. Since ECMAScript 5.1 does not allow these code points in strings, authors should avoid including them in JSON5 documents. JSON5 parsers should produce a warning when they are found unescaped in strings. JSON5 generators should escape these code points in strings.

6Numbers

The representation of numbers is similar to that used in most programming languages. A number may be represented in in base 10 using decimal digits, base 16 using hexadecimal digits, or the IEEE 754 values positive infinity, negative infinity, or NaN.

JSON5Number

JSON5NumericLiteral

NumericLiteral

Infinity

NaN

Decimal numbers contain an integer component that may be prefixed with an optional plus or minus sign, which may be followed by a fraction part and/or an exponent part.

A fraction part is a decimal point followed by one or more digits.

An exponent part begins with the letter E in upper or lower case, which may be followed by a plus or minus sign. The E and optional sign are followed by one or more digits.

Example 1 (Informative)

{
    integer: 123,
    withFractionPart: 123.456,
    onlyFractionPart: .456,
    withExponent: 123e-456,
}

Hexadecimal numbers contain the literal characters 0x or 0X that may be prefixed with an optional plus or minus sign, which must be followed by one or more hexadecimal digits. The hexadecimal letters A through F can be upper or lower case.

Example 2 (Informative)

{
    positiveHex: 0xdecaf,
    negativeHex: -0xC0FFEE,
}

The IEEE 754 value positive infinity must be the literal characters Infinity and may be prefixed with an optional plus sign.

The IEEE 754 value negative infinity must be the literal characters -Infinity.

The IEEE 754 value NaN must be the literal characters NaN and may be prefixed with an optional plus or minus sign.

Example 3 (Informative)

{
    positiveInfinity: Infinity,
    negativeInfinity: -Infinity,
    notANumber: NaN,
}

7Comments

Comments can be either single or multi-line. Multi-line comments cannot nest. Comments may appear before and after any JSON5Token.

A single line comment begins with two soliduses and ends with a LineTerminator or the end of the document. All Unicode characters may be placed within the start and end, except for a LineTerminator.

A multi-line comment begins with a solidus and an asterisk and ends with an asterisk and a solidus. All Unicode characters may be placed within the start and end, except for an asterisk followed by a solidus.

Example (Informative)

// This is a single line comment.

/* This is a multi-
   line comment. */

8White Space

White space may appear before and after any JSON5Token.

Table 3: White space

Code Points	Description
U+0009	Horizontal tab
U+000A	Line feed
U+000B	Vertical tab
U+000C	Form feed
U+000D	Carriage return
U+0020	Space
U+00A0	Non-breaking space
U+2028	Line separator
U+2029	Paragraph separator
U+FEFF	Byte order mark
Unicode Zs category	Any other character in the Space Separator Unicode category

9Grammar

JSON5 is defined by a lexical grammar and a syntactic grammar. The lexical grammar defines productions that translate text into tokens, and the syntactic grammar defines productions that translate tokens into a JSON5 value.

All productions that do not begin with the characters “JSON5” are defined by productions of the ECMAScript 5.1 Lexical Grammar.

9.1Lexical Grammar

The lexical grammar for JSON5 has as its terminal symbols characters (Unicode code units) that conform to the rules for JSON5SourceCharacter. It defines a set of productions, starting from the goal symbol JSON5InputElement, that describe how sequences of such characters are translated into a sequence of input elements.

Input elements other than white space and comments form the terminal symbols for the syntactic grammar for JSON5 and are called tokens. These tokens are the identifiers, literals, and punctuators of the JSON5 language. Simple white space and comments are discarded and do not appear in the stream of input elements for the syntactic grammar.

Productions of the lexical grammar are distinguished by having two colons “::” as separating punctuation.

one of

{

}

[

]

opt

opt

opt

opt

but not one of " or \ or LineTerminator

U+2028

U+2029

but not one of ' or \ or LineTerminator