A Formalization of Symbolic Expressions

Analysis and Formalization of draft-rivest-sexp-00.txt This whole section is about the original document made available in 1997. The first subsection of each of the following sections is an analysis of the original document for ambiguities and contradictions, together with the resolution of these ambiguities and contradictions. The second subsection shows and explains the formal model of what is discussed in the first subsection. Formalization only guarantees that an instance of the model has exactly the same bugs that its model, so there is a need for a validation of that model. The third subsection shows proofs of correctness for each example in the original document. First we need to import a module from the Idris2 standard library:

We want Idris2 to fail to type-check the code if totality cannot be verified for any of the functions, which is done which the following pragma:

Note that this pragma actually makes Idris2 non-Turing complete for the code in that document. Our type is indexed over a list of octets, which makes it a dependent type. Per the Curry-Howard isomorphism, this type acts as a proposition that can be read in plain English as "there exists a list of octets that is a valid s-expr". Idris's Bits8 is what the IETF calls an octet, so a List Bits8 is a list of octets. Note that we do not use characters at all in this formalization, as s-expr are not defined for characters but for octets, but is it possible to convert a list of octets into an equivalent characters string in the Idris REPL:

pack $ map (chr . cast) [51, 58, 97, 98, 99] "3:abc"]]>

Alternatively the show function can be used directly on a list of octets after loading the code in the REPL:

Our indexed types are actually families of types, one for each possible value of the index of type List Bits8. Because there is an infinite number of values possible for the index, that actually defines an infinite number of types, one for each possible list of octets, each either a valid s-expr or not. Only the types in that family that are indexed over a list of octets that is a valid s-expr can have an instance that type-checks. Per the Curry-Howard isomorphim, that instance is considered a proof of the corresponding proposition. Conversely the impossibility of finding an instance of a specific type is a proof that the index is not a valid s-expr. We need a way to extract the underlying octet-string for each representation that we are going to define. This is done by declaring an ad-hoc polymorphic function in an interface:

List Bits8]]>

Verbatim Representation

Analysis Section 4.1 of the original s-expr document states:

There is a slight confusion here between an octet-string and its representation as verbatim, which is understandable in this context because they look exactly the same. There is no possible BNF that is sound for the verbatim representation.

Formalization We need first to reimplement the Idris2 function that is used to convert a number into an equivalent list of octets using the ASCII encoding, as we generally cannot use functions that uses primitives in types because they do not reduce:

List Bits8 base10 0 = [48] base10 x = base' [] x where base' : List Bits8 -> Nat -> List Bits8 base' xs 0 = xs base' xs n = let (d, m) = divmodNatNZ n 10 SIsNonZero m' = cast (m + 0x30) in assert_total $ base' (m' :: xs) d]]>

Then the Verbatim type is defined for the verbatim representation of an octet-string:

Type where MkVerbatim : (xs : List Bits8) -> Verbatim (base10 (length xs) ++ [58] ++ xs)]]>

Then we define the octetString function for the Verbatim type:

Validation Idris2 is expressive enough to allow embedding unit tests in the same source and run them as part of the type-checking. Here we prove that all the examples in section 4.1 of the original document are valid instances of the Verbatim type:

3:abc
7:subject
4:::::
12:hello world!
10:abcdefghij
0:

Quoted-String Representation

Analysis Section 4.2 of the original s-expr document states:

There is no possible BNF that is sound for the quoted-string representation when preceded with the length. Section 4.2 continues with:

-- causes carriage-return to be \ \ignored. \ -- causes linefeed to be ignored \ -- causes CRLF to be \ \ignored. \ -- causes LFCR to be \ \ignored.]]>

Here the first sentence does not match the list of line terminators below it. We assume that there are four line terminators, not two. In C the escape sequence '\0' is defined for character, but not for strings as a 0 value can never appear in a C string. But that is not true of a quoted-string and it would even be useful to have a shorter encoding than "\x00". We assume that is was not an oversight, and do not add "\0" as escape sequence.

Formalization There is between four and seven different ways to represent an octet in a quoted-string: ASCII, escaped, octal, and hexadecimal, with that last one taking up to 4 different different representations depending on the combination of uppercase and lowercase symbols. We first define a function for each of the different type of encodings that returns either a non-empty list of octets if the octet is representable in that encoding, or an empty list if it is not:

The octets of value 32, 33, 35-91, and 93-126 can be represented as the equivalent ASCII character: List Bits8 ascii m = if m < 32 then empty else if m == 34 then empty else if m == 92 then empty else if m > 126 then empty else [m]]]>
The octets of value 7, 8, 9, 10, 11, 12, 13, 34, 39, and 92 can be represented respectively as the ASCII sequences "\a", "\b", "\t", "\n", "\v", "\f", "\r", "\"", "\'", and "\\": List Bits8 escaped 7 = [92, 97] escaped 8 = [92, 98] escaped 9 = [92, 116] escaped 10 = [92, 110] escaped 11 = [92, 118] escaped 12 = [92, 102] escaped 13 = [92, 114] escaped 34 = [92, 34] escaped 39 = [92, 39] escaped 92 = [92, 92] escaped _ = empty]]>
All octets can be represented as the "\" ASCII character followed by the octal encoding of that octet in ASCII: List Bits8 octal x = let m = x `shiftR` 6 n = (x `shiftR` 3) .&. 7 o = x .&. 7 in [92, m + 48, n + 48, o + 48]]]>
All octets can be represented as the "\x" ASCII sequence followed by the hexadecimal encoding of that octet. Because alphabetic hexadecimal symbols can be encoded as lowercase or uppercase symbols, we get two different encodings for each half of an octet: Bits8 halfl x = if x < 10 then x + 48 else x + 87]]> Bits8 halfu x = if x < 10 then x + 48 else x + 55]]> Which then gives us four different hexadecimal encodings for an octet: List Bits8 hexll x = [92, 120, halfl (x `shiftR` 4), halfl (x .&. 15)]]]> List Bits8 hexlu x = [92, 120, halfl (x `shiftR` 4), halfu (x .&. 15)]]]> List Bits8 hexul x = [92, 120, halfu (x `shiftR` 4), halfl (x .&. 15)]]]> List Bits8 hexuu x = [92, 120, halfu (x `shiftR` 4), halfu (x .&. 15)]]]>

We then define a Quoted type indexed over the quoted-string representation of a single octet, using one constructor for each possible type of representation for an octet. A boolean expression is used to restrict the possible values of the octet when encoded as an ASCII or escaped value, preventing the corresponding constructors to be instantiated. We also have four additional constructors for the four types of line breaks. These are purely cosmetic and do not encode an octet.

Type where Ascii : (x : Bits8) -> (prf : (x >= 32 && x <= 127 && x /= 34 && x /= 92) === True) -> Quoted (ascii x) Escaped : (x : Bits8) -> (prf : (x >= 7 && x <= 13 || x == 34 || x == 39 || x == 92) === True) -> Quoted (escaped x) HexLL : (x : Bits8) -> Quoted (hexll x) HexUL : (x : Bits8) -> Quoted (hexul x) HexLU : (x : Bits8) -> Quoted (hexlu x) HexUU : (x : Bits8) -> Quoted (hexuu x) Octal : (x : Bits8) -> Quoted (octal x) Cr : Quoted [92, 13] Lf : Quoted [92, 10] CrLf : Quoted [92, 13, 10] LfCr : Quoted [92, 10, 13]]]>

We can then use that type to build a type indexed over a complete quoted-string. Here we use an Idris2 namespace so we can use the syntactic sugar for a list multiple times in the same source:

Type where Nil : QuotedList [] (::) : Quoted xs -> QuotedList ys -> QuotedList (xs ++ ys)]]>

We can then define the octetString function for the QuotedList type:

The type for a quoted-string:

Type where MkQuotedString : QuotedList xs -> QuotedString (34 :: xs ++ [34])]]>

And the function to retrieve its octet-string:

We then define an alternative type for the quoted-string representation that is preceded by the length of its octet-string:

Type where MkQuotedStringLength : (q : QuotedList xs) -> QuotedStringLength (base10 (length (octetString q)) ++ [34] ++ xs ++ [34])]]>

And the function to retrieve its octet-string:

Validation Here we prove that all the examples in section 4.2 of the original document are valid instances of the QuotedString or QuotedStringLength types:

"subject"
"hi there"
7"subject"
3"\n\n\n"
"This has\n two lines."
"This has\ one." (actually on two lines)
""

Token Representation

Analysis Section 4.3 of the original s-expr document states:

Formalization At the difference of all the other encodings, a token element can represent only a subset of all possible octets so we first define a type that constrains any octets but the first in a token:

Type where MkTokenChar : (x : Bits8) -> (prf : (x >= 65 && x <= 90 || x >= 97 && x <= 122 || x >= 48 && x <= 57 || x == 45 || x == 46 || x == 47 || x == 95 || x == 58 || x == 42 || x == 43 || x == 61) === True) -> TokenChar x]]>

Then we define a type for a list of these:

Type where Nil : TokenCharList [] (::) : TokenChar x -> TokenCharList xs -> TokenCharList (x :: xs)]]>

Then a type that represents a complete token as a constrained first octet followed by a list of constrained octets:

Type where MkToken : (x : Bits8) -> (prf : (x >= 65 && x <= 90 || x >= 97 && x <= 122 || x == 45 || x == 46 || x == 95 || x == 58 || x == 42 || x == 47 || x == 43 || x == 61) === True) -> TokenCharList xs -> Token (x :: xs)]]>

We can then define the octetString function for the Token type:

List Bits8 octetString' [] = [] octetString' (MkTokenChar x _ :: xs) = x :: octetString' xs]]>

Validation Here we prove that all the examples in section 4.3 of the original document are valid instances of the Token type:

subject
not-before
class-of-1997
//microsoft.com/names/smith
*

Hexadecimal Representation

Analysis Section 4.4 of the original s-expr document states:

There is no possible BNF that is sound for the hexadecimal representation when preceded with the length. "hexadecimal encoding" is understood as allowing either case for each hexadecimal half of the encoding for a single octet.

Formalization The hexadecimal representation encodes each octet of an octet-string as two octets in ASCII, each followed by zero or more white spaces. First we built a type for a white space:

Type where MkWhitespace : (x : Bits8) -> (prf : (x == 32 || x == 9 || x == 11 || x == 12 || x == 13 || x == 10) === True) -> Whitespace x]]>

And then a type for a list of white spaces:

Type where Nil : WhitespaceList [] (::) : Whitespace x -> WhitespaceList xs -> WhitespaceList (x :: xs)]]>

Note that white spaces are purely cosmetic, so they do not encode octets in an octet-string. That means that there no octetString function for these. With that we can build the hexadecimal representation of an octet. We have four constructors, each corresponding to one of the four possible variants for an hexadecimal encoding:

Type where HexLL' : (x : Bits8) -> WhitespaceList xs -> WhitespaceList ys -> Hex ([halfl (x `shiftR` 4)] ++ xs ++ [halfl (x .&. 15)] ++ ys) HexLU' : (x : Bits8) -> WhitespaceList xs -> WhitespaceList ys -> Hex ([halfl (x `shiftR` 4)] ++ xs ++ [halfu (x .&. 15)] ++ ys) HexUL' : (x : Bits8) -> WhitespaceList xs -> WhitespaceList ys -> Hex ([halfu (x `shiftR` 4)] ++ xs ++ [halfl (x .&. 15)] ++ ys) HexUU' : (x : Bits8) -> WhitespaceList xs -> WhitespaceList ys -> Hex ([halfu (x `shiftR` 4)] ++ xs ++ [halfu (x .&. 15)] ++ ys)]]>

Then we can build a type for the hexadecimal representation of an octet-string:

Type where Nil : HexList [] (::) : Hex xs -> HexList ys -> HexList (xs ++ ys)]]>

And a function octetString for that type:

With that we can build an Hexadecimal type:

Type where MkHexadecimal : WhitespaceList xs -> HexList ys -> Hexadecimal (35 :: xs ++ ys ++ [35])]]>

And its octetString function:

We then define an alternative type for the hexadecimal representation that is preceded by the length of its octet-string:

Type where MkHexadecimalLength : WhitespaceList xs -> (h : HexList ys) -> HexadecimalLength (base10 (length (octetString h)) ++ [35] ++ xs ++ ys ++ [35])]]>

And the function to retrieve its octet-string:

Validation Here we prove that all the examples in section 4.4 of the original document are valid instances of the Hexadecimal type:

#616263#
3#616263#
# 616 263 #

Base 64 Representation

Analysis Section 4.5 of the original s-expr document states:

There is no possible BNF that is sound for the base 64 representation when preceded with the length. The fragment "...where the equals signs are dropped" is ambiguous as it does not state if it is one or two equals signs that can be dropped, or all equals signs. Here we encode types to support the former interpretation.

Formalization First we need a function that will return a base 64 octet from the six lower bits of an octet:

Bits8 b64 x = if x < 26 then x + 65 else if x < 52 then x + 71 else if x < 62 then x - 4 else if x == 62 then 43 else if x == 63 then 47 else 0x3D]]>

Next we need four functions that return respectively the first, second, third, and fourth ASCII octet for a group of three octets from the octet-string:

List Bits8 b641 x1 = [b64 (x1 `shiftR` 2)]]]>

Bits8 -> List Bits8 b642 x1 x2 = [b64 (((x1 .&. 0b11) `shiftL` 4) .|. (x2 `shiftR` 4))]]]>

Bits8 -> List Bits8 b643 x2 x3 = [b64 (((x2 .&. 0b1111) `shiftL` 2) .|. (x3 `shiftR` 6))]]]>

List Bits8 b644 x3 = [b64 (x3 .&. 0b111111)]]]>

Our first type for the base 64 representation is for a group of three octets from the octet-string:

Type where MkBase64Full : (x1 : Bits8) -> (x2 : Bits8) -> (x3 : Bits8) -> WhitespaceList xs -> WhitespaceList ys -> WhitespaceList zs -> WhitespaceList ws -> Base64Full (b641 x1 ++ xs ++ b642 x1 x2 ++ ys ++ b643 x2 x3 ++ zs ++ b644 x3 ++ ws)]]>

Then we can build a type for the base 64 representation of an octet-string whose length is a multiple of three:

Type where Nil : Base64List [] (::) : Base64Full xs -> Base64List ys -> Base64List (xs ++ ys)]]>

We build another type for the octet-strings that have a length that is not a multiple of three. There is additional constructors to account for the fact that the padding is optional.

Type where EndOnePadPad : (x1 : Bits8) -> WhitespaceList xs -> WhitespaceList ys -> WhitespaceList zs -> WhitespaceList ws -> Base64End (b641 x1 ++ xs ++ b642 x1 0 ++ ys ++ [61] ++ zs ++ [61] ++ ws) EndOnePad : (x1 : Bits8) -> WhitespaceList xs -> WhitespaceList ys -> WhitespaceList zs -> Base64End (b641 x1 ++ xs ++ b642 x1 0 ++ ys ++ [61] ++ zs) EndOne : (x1 : Bits8) -> WhitespaceList xs -> WhitespaceList ys -> Base64End (b641 x1 ++ xs ++ b642 x1 0 ++ ys) EndTwoPad : (x1 : Bits8) -> (x2 : Bits8) -> WhitespaceList xs -> WhitespaceList ys -> WhitespaceList zs -> WhitespaceList ws -> Base64End (b641 x1 ++ xs ++ b642 x1 x2 ++ ys ++ b643 x2 0 ++ zs ++ [61] ++ ws) EndTwo : (x1 : Bits8) -> (x2 : Bits8) -> WhitespaceList xs -> WhitespaceList ys -> WhitespaceList zs -> Base64End (b641 x1 ++ xs ++ b642 x1 x2 ++ ys ++ b643 x2 0 ++ zs)]]>

We then put all these together into a type for base 64 encoding with two constructors, one for octet-strings whose length is a multiple of 3, and one for the others:

Type where Base64Mult3 : Base64List xs -> Base64' xs Base64Non : Base64List xs -> Base64End ys -> Base64' (xs ++ ys)]]>

We can then define the octetString function for the Base64' type:

List Bits8 octetString' [] = [] octetString' (MkBase64Full x1 x2 x3 _ _ _ _ :: xs) = x1 :: x2 :: x3 :: octetString' xs OctetString (Base64' _) where octetString (Base64Mult3 xs) = octetString' xs octetString (Base64Non xs (EndOnePadPad x1 _ _ _ _)) = octetString' xs ++ [x1] octetString (Base64Non xs (EndOnePad x1 _ _ _)) = octetString' xs ++ [x1] octetString (Base64Non xs (EndOne x1 _ _)) = octetString' xs ++ [x1] octetString (Base64Non xs (EndTwoPad x1 x2 _ _ _ _)) = octetString' xs ++ [x1, x2] octetString (Base64Non xs (EndTwo x1 x2 _ _ _)) = octetString' xs ++ [x1, x2]]]>

Finally we can define the Base64 type:

Type where MkBase64 : WhitespaceList xs -> Base64' ys -> Base64 (124 :: xs ++ ys ++ [124])]]>

And its octetString function:

We then reuse the Base64' type to define one more type for the base 64 representation that is preceded by the length of its octet-string:

Type where MkBase64Length : WhitespaceList xs -> (b : Base64' ys) -> Base64Length (base10 (length (octetString b)) ++ [124] ++ xs ++ ys ++ [124])]]>

And its octetString function:

Validation Here we prove that all the examples in section 4.5 of the original document are valid instances of the Base64 type:

|YWJj|
| Y W J j |
3|YWJj|
|YWJjZA==|
|YWJjZA|

Octet-String Representation

Analysis Before going further we have to address the case of the brace notation for base 64. Section 6.2 of the original s-expr document states:

It is not clear from that text if the octets that are to be re-scanned are for the representation of an octet-string, or for a whole s-expr. Additionally this text seems to ignore the fact that examples using that notation were provided in section 2 and section 5 of the original s-expr document. So the first ambiguity would about about the usage of the brace notation in a display-hint. Obviously it would not make sense to have a s-expr inside a display-hint so at best it encodes an octet-string. But if that's the case, does it encodes any of the other representations (maybe including itself) or just the verbatim representation, as examples in section 2 and 5 show? The same can be said of the use of the brace notation as simple-string. There again it would not make sense to encode an s-expr with it, because then it would be possible to associate it with a display-hint, which does not make sense. Then if it is only the encoding of the representation of an octet-string then the same ambiguity than above is present about the representations permitted. To add to the issue, the brace notation for base 64 on an octet-string is largely redundant with the quoted-string, hexadecimal, and base 64 representations, because these already handle the problem of representing any s-expr using ASCII characters. That is only required for the basic transport. Here we chose to use the brace notation for base 64 exclusively in the basic transport, restricting the octet-string inside as verbatim representations. That makes the examples in section 2 and 5 incorrect unless used as s-expr in the basic transport.

Formalization With that in mind we can define a type that covers all possible representation for an octet-string, excluding the brace notation for base 64.

Type where RepresentationVerbatim : (v : Verbatim xs) -> Representation xs RepresentationQuoted : QuotedString xs -> Representation xs RepresentationQuotedLength : QuotedStringLength xs -> Representation xs RepresentationToken : Token xs -> Representation xs RepresentationHexadecimal : Hexadecimal xs -> Representation xs RepresentationHexadecimalLength : HexadecimalLength xs -> Representation xs RepresentationBase64 : Base64 xs -> Representation xs RepresentationBase64Length : Base64Length xs -> Representation xs]]>

And its matching octetString function:

Display-hint Representation

Analysis Section 4.6 of the original s-expr document states:

The uses of "octet string" in this fragment are all incorrect. "octet string representation" should be used instead. The text uses singular "whitespace", not the plural "whitespaces". The text also does not say if white spaces can separate the display hint from the octet-string it provides information to. We assume that multiple white spaces can be used after the opening bracket, before the closing bracket, and between the closing bracket and the following octet-string. Following the argument in the argument in the previous section, "legal formats" does not include the brace notation for base 64. Section 4.6 of the original s-expr document ends with:

Formalization We first build a type for a display-hint:

Type where MkDisplayHint : WhitespaceList xs -> Representation ys -> WhitespaceList zs -> DisplayHint (91 :: xs ++ ys ++ zs ++ [93])]]>

Then define octetString for that type:

Then a type for the association of a display-hint and the representation of an octet-string:

Type where MkWithHint : DisplayHint xs -> WhitespaceList ys -> Representation zs -> WithHint (xs ++ ys ++ zs)]]>

We finally define the default display-hint as the token application/octet-stream:

Validation Here we prove that all the examples in section 4.6 of the original document are valid instances of the DisplayHint type:

[image/gif]
[URI]
[charset=unicode-1-1]
[text/richtext]
[application/postscript]
[audio/basic]
["http://abc.com/display-types/funky.html"]

Equality of Octet-String

Analysis Section 4.7 of the original s-expr document states:

The term "octet string" here is incorrect as it is described as the combination of a display hint and a "data octet strings", the latter being actually an "octet string representation". Consequently the terms "equal" or "equality" are incorrect, and the terms "equivalent" or "equivalences" should be used instead. Here the term "equivalent" means "carrying the same information", i.e. the same octet-string. Two octet-string representations can be equivalent, but not equal, e.g, the token abc and the quoted-string "abc" are equivalent but not equal. The same reasoning is applied when comparing typed octet-string representations, or a typed octet-string representation with an untyped octet-string representation.

Formalization We first define a type that carries either a typed or an untyped octet-string representation:

Element Typed : Representation _ -> Representation _ -> Element]]>

Then we define the type alias Equivalence as a relation between two elements. Equivalence is already declared in the standard library, so we have to hide that declaration first:

Element -> Type Equivalence (Untyped x) (Untyped x') = octetString x === octetString x' Equivalence (Untyped x) (Typed h x') = (octetString defaultHint === octetString h, octetString x === octetString x') Equivalence (Typed h x) (Untyped x') = (octetString h === octetString defaultHint, octetString x === octetString x') Equivalence (Typed h x) (Typed h' x') = (octetString h === octetString h', octetString x === octetString x')]]>

Validation Here we prove that a subset of the examples in section 1 of the original document are equivalent. Proving the other equivalences is trivial:

We first proves that the three first representations are correct:

We can then prove that abc is equivalent to "abc", and that "abc" is equivalent to #616263#:

By transitivity we can then prove that abc is equivalent to #616263#:

We can also use symmetry to prove that if a first octet-string representation is equivalent to a second octet-string representation, then the second is also equivalent to the first one.

Lists

Analysis Section 5 of the original s-expr document states:

The first sentence should say that there are different ways to represent a list. But the issue is really that in some cases the separation between some representations of an octet-string is ambiguous. The actual rules for mandatory separation are:

a token must be separated from a quoted-string, hexadecimal, or base 64 representation that is prefixed with the length
a token must be separated from the next token
a token must be separated from the next verbatim representation

Additionally section 2 states:

Parentheses are not optional when representing a list, so "may be" should be "are".

Formalization To represent the various ways to separate representations we need four mutually inductive types, that we first declare as abstract types:

Type data SeparateList : List Bits8 -> Type data OtherList : List Bits8 -> Type data Lists : List Bits8 -> Type]]>

TokenList is the type of a list of octet-string representations that starts with a token:

Type where TokenNil : Token xs -> TokenList xs TokenConsToken : Token xs -> Whitespace y -> WhitespaceList ys -> TokenList zs -> TokenList (xs ++ (y :: ys) ++ zs) TokenConsSeparate : Token xs -> Whitespace y -> WhitespaceList ys -> SeparateList zs -> TokenList (xs ++ (y :: ys) ++ zs) TokenConsOther : Token xs -> WhitespaceList ys -> OtherList zs -> TokenList (xs ++ ys ++ zs)]]>

SeparateList is the type of a list of octet-string representations that starts with an octet-string representation that when inserted after a token will require it to be separated:

Type where SeparateVerbatim : Verbatim xs -> SeparateList xs SeparateVerbatimToken : Verbatim xs -> WhitespaceList ys -> TokenList zs -> SeparateList (xs ++ ys ++ zs) SeparateVerbatimSeparate : Verbatim xs -> WhitespaceList ys -> SeparateList zs -> SeparateList (xs ++ ys ++ zs) SeparateVerbatimOther : Verbatim xs -> WhitespaceList ys -> OtherList zs -> SeparateList (xs ++ ys ++ zs) SeparateQuotedStringLength : QuotedStringLength xs -> SeparateList xs SeparateQuotedStringLengthToken : QuotedStringLength xs -> WhitespaceList ys -> TokenList zs -> SeparateList (xs ++ ys ++ zs) SeparateQuotedStringLengthSeparate : QuotedStringLength xs -> WhitespaceList ys -> SeparateList zs -> SeparateList (xs ++ ys ++ zs) SeparateQuotedStringLengthOther : QuotedStringLength xs -> WhitespaceList ys -> OtherList zs -> SeparateList (xs ++ ys ++ zs) SeparateHexadecimal : HexadecimalLength xs -> SeparateList xs SeparateHexadecimalLengthToken : HexadecimalLength xs -> WhitespaceList ys -> TokenList zs -> SeparateList (xs ++ ys ++ zs) SeparateHexadecimalLengthSeparate : HexadecimalLength xs -> WhitespaceList ys -> SeparateList zs -> SeparateList (xs ++ ys ++ zs) SeparateHexadecimalLengthOther : HexadecimalLength xs -> WhitespaceList ys -> OtherList zs -> SeparateList (xs ++ ys ++ zs) SeparateBase64 : Base64Length xs -> SeparateList xs SeparateBase64LengthToken : Base64Length xs -> WhitespaceList ys -> TokenList zs -> SeparateList (xs ++ ys ++ zs) SeparateBase64LengthSeparate : Base64Length xs -> WhitespaceList ys -> SeparateList zs -> SeparateList (xs ++ ys ++ zs) SeparateBase64LengthOther : Base64Length xs -> WhitespaceList ys -> OtherList zs -> SeparateList (xs ++ ys ++ zs)]]>

OtherList is the type of a list of octet-string representations that starts with an octet-string representations that when inserted after a token will not require it to be separated:

Type where OtherQuotedString : QuotedString xs -> OtherList xs OtherQuotedStringToken : QuotedString xs -> WhitespaceList ys -> TokenList zs -> OtherList (xs ++ ys ++ zs) OtherQuotedStringSeparate : QuotedString xs -> WhitespaceList ys -> SeparateList zs -> OtherList (xs ++ ys ++ zs) OtherQuotedStringOther : QuotedString xs -> WhitespaceList ys -> OtherList zs -> OtherList (xs ++ ys ++ zs) OtherHexadecimal : Hexadecimal xs -> OtherList xs OtherHexadecimalToken : Hexadecimal xs -> WhitespaceList ys -> TokenList zs -> OtherList (xs ++ ys ++ zs) OtherHexadecimalSeparate : Hexadecimal xs -> WhitespaceList ys -> SeparateList zs -> OtherList (xs ++ ys ++ zs) OtherHexadecimalOther : Hexadecimal xs -> WhitespaceList ys -> OtherList zs -> OtherList (xs ++ ys ++ zs) OtherBase64 : Base64 xs -> OtherList xs OtherBase64Token : Base64 xs -> WhitespaceList ys -> TokenList zs -> OtherList (xs ++ ys ++ zs) OtherBase64Separate : Base64 xs -> WhitespaceList ys -> SeparateList zs -> OtherList (xs ++ ys ++ zs) OtherBase64Other : Base64 xs -> WhitespaceList ys -> OtherList zs -> OtherList (xs ++ ys ++ zs) OtherHint : WithHint xs -> OtherList xs OtherHintToken : WithHint xs -> WhitespaceList ys -> TokenList zs -> OtherList (xs ++ ys ++ zs) OtherHintSeparate : WithHint xs -> WhitespaceList ys -> SeparateList zs -> OtherList (xs ++ ys ++ zs) OtherHintOther : WithHint xs -> WhitespaceList ys -> OtherList zs -> OtherList (xs ++ ys ++ zs) OtherLists : Lists xs -> OtherList xs OtherListsToken : Lists xs -> WhitespaceList ys -> TokenList zs -> OtherList (xs ++ ys ++ zs) OtherListsSeparate : Lists xs -> WhitespaceList ys -> SeparateList zs -> OtherList (xs ++ ys ++ zs) OtherListsOther : Lists xs -> WhitespaceList ys -> OtherList zs -> OtherList (xs ++ ys ++ zs)]]>

And finally the Lists type groups all the possible lists in a s-expr.

Type where ListsTokenList : WhitespaceList xs -> TokenList ys -> WhitespaceList zs -> Lists (40 :: xs ++ ys ++ zs ++ [41]) ListsSeparateList : WhitespaceList xs -> SeparateList ys -> WhitespaceList zs -> Lists (40 :: xs ++ ys ++ zs ++ [41]) ListsOtherList : WhitespaceList xs -> OtherList ys -> WhitespaceList zs -> Lists (40 :: xs ++ ys ++ zs ++ [41]) ListsEmptyList : WhitespaceList xs -> Lists (40 :: xs ++ [41])]]>

Validation Here we prove that all the examples in section 5 of the original document except the last one are valid instances of the Lists type:

(a b c)
( a ( b c ) ( ( d e ) ( e f ) ) )
(11:certificate(6:issuer3:bob)(7:subject5:alice))

Advanced S-Expr Transport

Analysis Section 6.3 of the original s-expr document states:

Because this transport is aimed at users, we also permit to add white spaces before and after a s-expr.

Formalization SExpr is the type of advanced transport for valid s-expr:

Type where SExprRepresentation : WhitespaceList xs -> Representation ys -> WhitespaceList zs -> SExpr (xs ++ ys ++ zs) SExprWithHint : WhitespaceList xs -> WithHint ys -> WhitespaceList zs -> SExpr (xs ++ ys ++ zs) SExprList : WhitespaceList xs -> Lists ys -> WhitespaceList zs -> SExpr (xs ++ ys ++ zs)]]>

Validation Here we prove that the example in section 5 of the original document is a valid instance of the SExpr type:

(abc (de #6667#) "ghi jkl")

Canonical S-Expr Transport

Analysis Section 6.1 of the original s-expr document states:

Formalization The canonical transport is actually a profile of the advanced transport, so we can reuse our previous types: First we declare an abstract type for the canonical s-expr, as it is an inductive type:

Type]]>

Then a type for a list of canonical s-expr:

Type where Nil : CanonicalSExprList [] (::) : CanonicalSExpr xs -> CanonicalSExprList ys -> CanonicalSExprList (xs ++ ys)]]>

And finally our concrete type for a canonical s-expr:

Type where MkCanonical : Verbatim xs -> CanonicalSExpr xs MkCanonicalHint : Verbatim xs -> Verbatim ys -> CanonicalSExpr (91 :: xs ++ [93] ++ ys) MkCanonicalList : CanonicalSExprList xs -> CanonicalSExpr (40 :: xs ++ [41])]]>

Validation Here we prove that all the examples in section 6.1 of the original document are valid instances of the Canonical type:

(6:issuer3:bob)
(4:icon[12:image/bitmap]9:xxxxxxxxx)
(7:subject(3:ref5:alice6:mother))

Basic S-Expr Transport

Analysis Section 6.2 of the original s-expr document states:

There is no possible BNF that is sound for a base 64 representation of an underlying s-expr.

Formalization The basic transport is also a profile of the advanced transport, so we can reuse some previous types: We first redefine Base64Full without white spaces:

Type where MkBasicBase64Full : (x1 : Bits8) -> (x2 : Bits8) -> (x3 : Bits8) -> BasicBase64Full (b641 x1 ++ b642 x1 x2 ++ b643 x2 x3 ++ b644 x3s)]]>

Then a list of these:

Type where Nil : BasicBase64List [] (::) : BasicBase64Full xs -> BasicBase64List ys -> BasicBase64List (xs ++ ys)]]>

And a type for a base 64 encoding for lengths that are not a multiple of 3:

Type where BasicEndOnePadPad : (x1 : Bits8) -> BasicBase64End (b641 x1 ++ b642 x1 0 ++ [61, 61]) BasicEndOnePad : (x1 : Bits8) -> BasicBase64End (b641 x1 ++ b642 x1 0 ++ [61]) BasicEndOne : (x1 : Bits8) -> BasicBase64End (b641 x1 ++ b642 x1 0) BasicEndTwoPad : (x1 : Bits8) -> (x2 : Bits8) -> BasicBase64End (b641 x1 ++ b642 x1 x2 ++ b643 x2 0 ++ [61]) BasicEndTwo : (x1 : Bits8) -> (x2 : Bits8) -> BasicBase64End (b641 x1 ++ b642 x1 x2 ++ b643 x2 0)]]>

And a basic base 64 type:

Type where BasicBase64Mult3 : BasicBase64List xs -> BasicBase64 xs BasicBase64Non : BasicBase64List xs -> BasicBase64End ys -> BasicBase64 (xs ++ ys)]]>

Then we need to define three base64 encoding functions, one for each variant:

List Bits8 base64 [] = [] base64 [x1] = b641 x1 ++ b642 x1 0 ++ [61, 61] base64 [x1, x2] = b641 x1 ++ b642 x1 x2 ++ b643 x2 0 ++ [61] base64 (x1 :: x2 :: x3 :: xs) = b641 x1 ++ b642 x1 x2 ++ b643 x2 x3 ++ b644 x3 ++ base64 xs]]>

List Bits8 base64OnePad [] = [] base64OnePad [x1] = b641 x1 ++ b642 x1 0 ++ [61] base64OnePad [x1, x2] = b641 x1 ++ b642 x1 x2 ++ b643 x2 0 base64OnePad (x1 :: x2 :: x3 :: xs) = b641 x1 ++ b642 x1 x2 ++ b643 x2 x3 ++ b644 x3 ++ base64OnePad xs]]>

List Bits8 base64NoPad [] = [] base64NoPad [x1] = b641 x1 ++ b642 x1 0 base64NoPad [x1, x2] = b641 x1 ++ b642 x1 x2 ++ b643 x2 0 base64NoPad (x1 :: x2 :: x3 :: xs) = b641 x1 ++ b642 x1 x2 ++ b643 x2 x3 ++ b644 x3 ++ base64NoPad xs]]>

And finally our type for a brace notation for base 64:

Type where MkBasicCanonical : CanonicalSExpr xs -> BasicSExpr xs MkBasicBase64 : CanonicalSExpr xs -> BasicBase64 ys -> (prf : (base64 xs == ys) === True) -> BasicSExpr (123 :: ys ++ [123]) MkBasicBase64OnePad : CanonicalSExpr xs -> BasicBase64 ys -> (prf : (base64OnePad xs == ys) === True) -> BasicSExpr (123 :: ys ++ [123]) MkBasicBase64NoPad : CanonicalSExpr xs -> BasicBase64 ys -> (prf : (base64NoPad xs == ys) === True) -> BasicSExpr (123 :: ys ++ [123])]]>

Validation Here we prove that the first example in section 6.2 of the original document is a valid instance of the Basic type:

(1:a1:b1:c)

Array-Layout

Analysis Section 8.2 of the original s-expr document states:

The endianness of the length field is not specified, so we assume that both little and big endianness can be used. Furthermore section 8.2.1 states:

Section 8.2.2 states:

01 /* for display-type */ 01 /* for octet-string */]]>

And section 8.2.3 states:

... 00]]>

Formalization First we define a type for the endianness of the length:

Then a function that converts a natural number into a memory representation of a specified endianness and length:

Nat -> List Bits8 convert' 0 _ = [] convert' (S k) n = let (d, m) = divmodNatNZ n 256 SIsNonZero in cast m :: convert' k d convert : Endianness -> Nat -> Nat -> List Bits8 convert Big k j = convert' k j convert Little k j = reverse (convert' k j)]]>

Then we define a type for the array representation of an octet-string:

Nat -> List Bits8 -> Type where MkArrayOctetString : (xs : List Bits8) -> ArrayOctetString e l (1 :: convert e l (length xs) ++ xs)]]>

Then for an octet-string with display-hint:

Nat -> List Bits8 -> Type where MkArrayWithHint : ArrayOctetString e l xs -> ArrayOctetString e l ys -> ArrayWithHint e l (2 :: convert e l (length xs + length ys) ++ xs ++ ys)]]>

As usual an abstract type for an inductive type:

Nat -> List Bits8 -> Type]]>

Then a list of memory array:

Nat -> List Bits8 -> Type where Nil : ArrayList e l [] (::) : ArraySExpr e l xs -> ArrayList e l ys -> ArrayList e l (xs ++ ys)]]>

And finally the array memory type:

Nat -> List Bits8 -> Type where ArraySExprOctetString : ArrayOctetString e l xs -> ArraySExpr e l xs ArraySExprWithHint : ArrayWithHint e l xs -> ArraySExpr e l xs ArraySExprList : ArrayList e l xs -> ArraySExpr e l (3 :: convert e l (1 + length xs) ++ xs ++ [0])]]>

Verification Here we prove that all the examples in section 8.2 of the original document are valid instances of the ArraySExpr type:

abc
[gif] #61626364#
(abc [d]ef (g))