Notation
January 4, 2023
Notation
The syntax is specified using a variant of Extended Backus-Naur Form (EBNF):
Ah, the good ol’ Extended Backus-Naur Form…

EBNF is a fairly popular way to formally describe a formal language. There’s a very good chance you’ve seen something like this before, especially if you’ve ever found yourself reading an RFC. But if you haven’t, this is a great time to familiarize yourself with the basic concepts. I won’t go into a detailed explanation of EBNF or WSN (the variant used in the Go spec), as there are better online resources. Here I want to just go over the very basics, enough that if you read the rest of this series, you won’t be completely lost if you never read anything else about EBNF.
In EBNF, the syntax is described using a set of rules. Each rule consists of a non-terminal symbol, which represents a syntactic construct, and a definition of that construct in the form of a sequence of terminal symbols and/or non-terminal symbols. Terminal symbols represent the individual tokens that make up the language, such as keywords, identifiers, and punctuation. Non-terminal symbols represent higher-level syntactic constructs, such as expressions, statements, and declarations.
In this context, a terminal symbol is a literal symbol, and a nonterminal symbol can be replaced, somewhat like a variable in programming.
EBNF also includes a few special notations for describing the syntax of a language in a more concise way.
Quoting from the Go Spec, this is the formal definition of WSN as used throughout the rest of the document. Note that this set of rules is defining WSN itself, as used within the Go Spec. It’s not directly defining Go itself!
Syntax = { Production } . Production = production_name "=" [ Expression ] "." . Expression = Term { "|" Term } . Term = Factor { Factor } . Factor = production_name | token [ "…" token ] | Group | Option | Repetition . Group = "(" Expression ")" . Option = "[" Expression "]" . Repetition = "{" Expression "}" .
Productions are expressions constructed from terms and the following operators, in increasing precedence:
| alternation () grouping [] option (0 or 1 times) {} repetition (0 to n times)
Lowercase production names are used to identify lexical (terminal) tokens. Non-terminals are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes ``.
The form a … b represents the set of characters from a through b as alternatives. The horizontal ellipsis … is also used elsewhere in the spec to informally denote various enumerations or code snippets that are not further specified. The character … (as opposed to the three characters …) is not a token of the Go language.
Let’s break some of that down in layman’s terms, to make it as clear as possible.
Syntax = { Production } .
This rule defines the nonterminal symbol Syntax
. It is defined as containing 0 or more repititions of Production
, which is itself a nonterminal symbol defined in the following rule:
Production = production_name "=" [ Expression ] "." .
This rule defines Production
, which itself is defined as production_name
followed by a literal equal sign (=
), and 0 or 1 Expression
s, explained in yet another rule, and terminated by a literal period (.
).
What is this production_name
? That’s explained in the second-to-last paragraph quoted above: “Lowercase production names are used to identify lexical (terminal) tokens.” So production_name
is a lexical (terminal) token.
Let’s make this a bit more concrete by jumping ahead a couple days, to the WSN notation used in the Go Spec to define a letter
:
letter = unicode_letter | "_" .
This tells us that a letter
, in a Go source code file, is defined as a unicode_letter
or the underscore (_
). unicode_letter
is defined elsewhere, which we’ll of course look at in due time.
The Go Programming Language Specification, Version of June 29, 2022