The JSON5 Data Interchange Format is a proposed extension to JSON that aims to make it easier for humans to write and maintain by hand. It does this by adding some minimal syntax features directly from ECMAScript 5.1.
Similar to JSON, JSON5 can represent four primitive types (strings, numbers, Booleans, and null) and two structured types (objects and arrays).
A string is a sequence of zero or more Unicode characters. Note that this citation references the latest version of Unicode rather than a specific release. It is not expected that future changes in the Unicode specification will impact the syntax of JSON5.
An object is an unordered collection of zero or more name/value pairs, where a name is a string or identifier and a value is a string, number, Boolean, null, object, or array.
An array is an ordered sequence of zero or more values.
The following ECMAScript 5.1 features, which are not supported in JSON, have been extended to JSON5.
An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string or identifier. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. A single comma may follow the name/value pair. The names within an object should be unique.
An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Implementations may report the last name/value pair only, report an error or fail to parse the object, or report all of the name/value pairs, including duplicates.
Implementations may make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by this.
An array structure is represented as square brackets surrounding zero or more values (or elements). Elements are separated by commas. A single comma may follow the final element.
There is no requirement that the values in an array be of the same type.
A string begins and ends with single or double quotation marks. The same quotation mark that begins a string must also end the string. All Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: the quotation mark used to begin and end the string, reverse solidus, and line terminators.
Any character may be escaped. If the character is in the Basic Latin or
Latin-1 Supplement Unicode character ranges (U+0000 through U+00FF), then
it may be represented as a four-character sequence: a reverse solidus,
followed by the lower case letter
x, followed by two hexadecimal digits
that encode the character’s code point. A reverse solidus followed by the
lower case letter
x must be followed by two hexadecimal digits.
If the character is in the Basic Multilingual Plane (U+0000 through
U+FFFF), then it may be represented as a six-character sequence: a reverse
solidus, followed by the lower case letter
u, followed by four
hexadecimal digits that encode the character’s code point. A reverse
solidus followed by the lower case letter
u must be followed by four
hexadecimal digits. The hexadecimal letters A though F can be upper or
To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a 12-character sequence, encoding the UTF-16 surrogate pair.
Alternatively, there are two-character sequence escape representations of some popular characters. A decimal digit must not follow a reverse solidus followed by a zero.
|Escape Sequence||Description||Code Point|
A string may be continued on a new line by following a reverse solidus with one of the following line terminator sequences. The reverse solidus and line terminator sequence are not included in the string.
|U+000D U+000A||Carriage return and line feed|
If any other character follows a reverse solidus, except for the decimal digits 1 through 9, that character will be included in the string, but the reverse solidus will not.
Like JSON, JSON5 allows the Unicode code points U+2028 and U+2029 to
appear unescaped in strings. Since ECMAScript 5.1 does not allow these
code points in strings, authors should avoid including them in JSON5
The representation of numbers is similar to that used in most programming languages. A number may be represented in in base 10 using decimal digits, base 16 using hexadecimal digits, or the IEEE 754 values positive infinity, negative infinity, or NaN.
Decimal numbers contain an integer component that may be prefixed with an optional plus or minus sign, which may be followed by a fraction part and/or an exponent part.
A fraction part is a decimal point followed by one or more digits.
An exponent part begins with the letter E in upper or lower case, which may be followed by a plus or minus sign. The E and optional sign are followed by one or more digits.
Hexadecimal numbers contain the literal characters
0X that may be
prefixed with an optional plus or minus sign, which must be followed by one
or more hexadecimal digits. The hexadecimal letters A through F can be upper
or lower case.
The IEEE 754 value positive infinity must be the literal characters
Infinity and may be prefixed with an optional plus sign.
The IEEE 754 value negative infinity must be the literal characters
The IEEE 754 value NaN must be the literal characters
NaN and may be
prefixed with an optional plus or minus sign.
Comments can be either single or multi-line. Multi-line comments cannot
nest. Comments may appear before and after any
A multi-line comment begins with a solidus and an asterisk and ends with an asterisk and a solidus. All Unicode characters may be placed within the start and end, except for an asterisk followed by a solidus.
White space may appear before and after any
|U+FEFF||Byte order mark|
|Unicode Zs category||Any other character in the Space Separator Unicode category|
JSON5 is defined by a lexical grammar and a syntactic grammar. The lexical grammar defines productions that translate text into tokens, and the syntactic grammar defines productions that translate tokens into a JSON5 value.
All productions that do not begin with the characters “JSON5” are defined by productions of the ECMAScript 5.1 Lexical Grammar.
The lexical grammar for JSON5 has as its terminal symbols characters
(Unicode code units) that conform to the rules for
Input elements other than white space and comments form the terminal symbols for the syntactic grammar for JSON5 and are called tokens. These tokens are the identifiers, literals, and punctuators of the JSON5 language. Simple white space and comments are discarded and do not appear in the stream of input elements for the syntactic grammar.
Productions of the lexical grammar are distinguished by having two colons “::” as separating punctuation.
The syntactic grammar for JSON5 has tokens defined by the lexical grammar
as its terminal symbols. It defines a set of productions, starting from
When a stream of characters is to be parsed as a JSON5 value, it is first
converted to a stream of input elements by repeated application of the
lexical grammar; this stream of input elements is then parsed by a single
application of the syntactic grammar. The program is syntactically in
error if the tokens in the stream of input elements cannot be parsed as a
single instance of the goal nonterminal
Productions of the syntactic grammar are distinguished by having just one colon “:” as punctuation.
A JSON5 parser transforms a JSON5 text into another representation. A JSON5
parser must accept all texts that conform to the JSON5
An implementation may set limits on the size of texts that it accepts, the maximum depth of nesting, the range and precision of numbers, and the length and character contents of strings.
A JSON5 generator produces JSON5 text. The resulting text must strictly
conform to the JSON5
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.
All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes.
Examples in this specification are introduced with the words “for example” or are set apart from the normative text like this:
Informative notes begin with the word “Note” and are set apart from the normative text like this:
The MIT License (MIT)
Copyright (c) 2017 Aseem Kishore, Jordan Tucker
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.