<?xml version='1.0' ?>
<rfc category='info' docName='draft-petithuguenin-ufmrg-formal-sexpr-05' ipr='trust200902' sortRefs='true' submissionType='IRTF' version='3'>
<front>
	<title abbrev="Formal SPKI S-Expr">A Formalization of Symbolic Expressions</title>
	<author fullname="Marc Petit-Huguenin">
      <organization>Impedance Mismatch LLC</organization>
      <address>
		 <email>marc@petit-huguenin.org</email>
      </address>
   </author>
   <date day="04" year="2024" month="Nov" />
   <area>IRTF</area>
   <abstract><t>
The goal of this document is to show and explain the formal model developed to guarantee that the examples and ABNF in the "SPKI Symbolic Expressions" Internet-Draft are correct.
</t>






</abstract>
</front>

<middle>

<section>
<name>Introduction</name>
<blockquote quotedFrom='Leslie Lamport'>
<t>
Mathematics is nature's way of letting you know how sloppy your
writing is.
</t>
</blockquote>
<t>
A <xref target='ComputerateSpecification'>Computerate Specification</xref> is a mix of a formal and an informal specification, where parts of the informal specification are generated from the formal part.
The formal specification is then erased when generating an Internet-Draft for the IETF or a Confluence page for enterprises.
</t>
<t>
<xref target='SPKI-SExpr'>SPKI Symbolic Expressions</xref> is a specification for symbolic expressions ("s-expr") that is the result of editing a specification originally written back in 1996 by Ronald Rivest.
This is done for the purpose of publishing it as an RFC and thus getting a stable reference.
</t>
<t>
This document shows and explains the formal specification as if that editing was done as a computerate specification.
It is not an analysis and formalization of <xref target='SPKI-SExpr' />, but rather the justification for some of its modifications.
</t>
<t>
This document uses the programming language <xref target='Idris2' /> as a formal method to build a formal model for s-expr that is sound and complete, and to build and verify proofs of that model.
As a result the whole text of this document is interspersed with Idris2 code (something called literate programming), which can be extracted and verified as explained in <xref target='extract' />.
</t>
<t>
Because the original document is no longer available on-line, the relevant parts are quoted instead of being referenced, using the recommendations in <xref target='RFC8792' /> to wrap long lines.
</t>
</section>
<section>
<name>Terminology</name>
<t>
The following terminology defines some mathematical terms that are not used often at the IETF.
These terms may have different definitions outside of this document, but only the definitions listed here are relevant in the context of this document:
</t>
<dl>
<dt>Completeness: </dt><dd>describes a formal system that accepts all valid strings</dd><dt>Curry-Howard Isomorphism: </dt><dd>a relation that expresses the fact that computer programs and mathematical proofs are the same thing</dd><dt>Formal Language: </dt><dd>a language that have explicit syntax and semantics</dd><dt>Formal Method: </dt><dd>the combination of a formal language and a verification system</dd><dt>Formal Model: </dt><dd>a representation of a system using a formal language</dd><dt>Formal Specification: </dt><dd>the specification of a system using formal methods</dd><dt>Isomorphism: </dt><dd>the property that two or more structures are carrying the exact same information</dd><dt>Normalization: </dt><dd>the simplification of a proof which, in a program, corresponds to code reduction (also known as code execution)</dd><dt>Proof: </dt><dd>the concrete evidence for a proposition which, in a program, corresponds to code that type-checks for the type that corresponds to that proposition</dd><dt>Proposition: </dt><dd>a mathematical statement in constructive logic which, in a program, corresponds to a type</dd><dt>Soundness: </dt><dd>describes a formal system that rejects all invalid strings</dd><dt>Totality: </dt><dd>the property of a function that always returns a value in finite time for any possible input</dd>
</dl>
</section>
<section anchor='formal'>
<name>Analysis and Formalization of draft-rivest-sexp-00.txt</name>
<t>
This whole section is about the original document made available in 1997.
</t>
<t>
The first subsection of each of the following sections is an analysis of the original document for ambiguities and contradictions, together with the resolution of these ambiguities and contradictions.
</t>
<t>
The second subsection shows and explains the formal model of what is discussed in the first subsection.
</t>
<t>
Formalization only guarantees that an instance of the model has exactly the same bugs that its model, so there is a need for a validation of that model.
The third subsection shows proofs of correctness for each example in the original document.
</t>
<t>
First we need to import a module from the Idris2 standard library:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[import Data.Bits]]></sourcecode></li>
</ul>
<t>
We want Idris2 to fail to type-check the code if totality cannot be verified for any of the functions, which is done which the following pragma:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[%default total]]></sourcecode></li>
</ul>
<t>
Note that this pragma actually makes Idris2 non-Turing complete for the code in that document.
</t>
<t>
Our type is indexed over a list of octets, which makes it a dependent type.
Per the Curry-Howard isomorphism, this type acts as a proposition that can be read in plain English as "there exists a list of octets that is a valid s-expr".
</t>
<t>
Idris's <tt>Bits8</tt> is what the IETF calls an octet, so a <tt>List Bits8</tt> is a list of octets.
Note that we do not use characters at all in this formalization, as s-expr are not defined for characters but for octets, but is it possible to convert a list of octets into an equivalent characters string in the Idris REPL:
</t>
<ul empty='true'>
<li><sourcecode type='bash'><![CDATA[Main> pack $ map (chr . cast) [51, 58, 97, 98, 99]
"3:abc"]]></sourcecode></li>
</ul>
<t>
Alternatively the <tt>show</tt> function can be used directly on a list of octets after loading the code in the REPL:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[Show (List Bits8) where
  show = pack . map (chr . cast)]]></sourcecode></li>
</ul>
<t>
Our indexed types are actually families of types, one for each possible value of the index of type <tt>List Bits8</tt>.
Because there is an infinite number of values possible for the index, that actually defines an infinite number of types, one for each possible list of octets, each either a valid s-expr or not.
</t>
<t>
Only the types in that family that are indexed over a list of octets that is a valid s-expr can have an instance that type-checks.
Per the Curry-Howard isomorphim, that instance is considered a proof of the corresponding proposition.
Conversely the impossibility of finding an instance of a specific type is a proof that the index is not a valid s-expr.
</t>
<t>
We need a way to extract the underlying octet-string for each representation that we are going to define.
This is done by declaring an ad-hoc polymorphic function in an interface:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[interface OctetString ty where
  octetString : ty -> List Bits8]]></sourcecode></li>
</ul>
<section>
<name>Verbatim Representation</name>
<section>
<name>Analysis</name>
<t>
Section 4.1 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

A verbatim encoding of an octet string consists of four parts:

        -- the length (number of octets) of the octet-string,
           given in decimal most significant digit first, with
           no leading zeros.

        -- a colon ":"

        -- the octet string itself, verbatim.

There are no blanks or whitespace separating the parts.  No \
\"escape
sequences" are interpreted in the octet string.  This \
\encoding is also
called a "binary" or "raw" encoding.]]></artwork></li>
</ul>
<t>
There is a slight confusion here between an octet-string and its representation as verbatim, which is understandable in this context because they look exactly the same.
</t>
<t>
There is no possible BNF that is sound for the verbatim representation.
</t>
</section>
<section>
<name>Formalization</name>
<t>
We need first to reimplement the Idris2 function that is used to convert a number into an equivalent list of octets using the ASCII encoding, as we generally cannot use functions that uses primitives in types because they do not reduce:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[base10 : Nat -> List Bits8
base10 0 = [48]
base10 x = base' [] x where
  base' : List Bits8 -> Nat -> List Bits8
  base' xs 0 = xs
  base' xs n =
    let (d, m) = divmodNatNZ n 10 SIsNonZero
        m' = cast (m + 0x30)
    in assert_total $ base' (m' :: xs) d]]></sourcecode></li>
</ul>
<t>
Then the <tt>Verbatim</tt> type is defined for the verbatim representation of an octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Verbatim : List Bits8 -> Type where
  MkVerbatim : (xs : List Bits8) ->
    Verbatim (base10 (length xs) ++ [58] ++ xs)]]></sourcecode></li>
</ul>
<t>
Then we define the <tt>octetString</tt> function for the <tt>Verbatim</tt> type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (Verbatim _) where
  octetString (MkVerbatim xs) = xs]]></sourcecode></li>
</ul>
</section>
<section>
<name>Validation</name>
<t>
Idris2 is expressive enough to allow embedding unit tests in the same source and run them as part of the type-checking.
</t>
<t>
Here we prove that all the examples in section 4.1 of the original document are valid instances of the <tt>Verbatim</tt> type:
</t>
<ul empty='true'>
<li><artwork><![CDATA[Here are some sample verbatim encodings:

        3:abc
        7:subject
        4:::::
        12:hello world!
        10:abcdefghij
        0:]]></artwork></li>
</ul>
<ul>
<li><t>3:abc</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testVerbatim1 : Verbatim [51, 58, 97, 98, 99]
testVerbatim1 = MkVerbatim [97, 98, 99]]]></sourcecode></li><li><t>7:subject</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testVerbatim2 : Verbatim [55, 58, 115, 117, 98, 106, 101, 99,
  116]
testVerbatim2 = MkVerbatim [115, 117, 98, 106, 101, 99, 116]]]></sourcecode></li><li><t>4:::::</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testVerbatim3 : Verbatim [52, 58, 58, 58, 58, 58]
testVerbatim3 = MkVerbatim [58, 58, 58, 58]]]></sourcecode></li><li><t>12:hello world!</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testVerbatim4 : Verbatim [49, 50, 58, 104, 101, 108, 108, 111,
  32, 119, 111, 114, 108, 100, 33]
testVerbatim4 = MkVerbatim [104, 101, 108, 108, 111, 32, 119,
  111, 114, 108, 100, 33]]]></sourcecode></li><li><t>10:abcdefghij</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testVerbatim5 : Verbatim [49, 48, 58, 97, 98, 99, 100, 101,
  102, 103, 104, 105, 106]
testVerbatim5 = MkVerbatim [97, 98, 99, 100, 101, 102, 103,
  104, 105, 106]]]></sourcecode></li><li><t>0:</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testVerbatim6 : Verbatim [48, 58]
testVerbatim6 = MkVerbatim []]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Quoted-String Representation</name>
<section>
<name>Analysis</name>
<t>
Section 4.2 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

The quoted-string representation of an octet-string consists of:

        -- an optional decimal length field

        -- an initial double-quote (")

        -- the octet string with "C" escape conventions (\n,etc)

        -- a final double-quote (")

The specified length is the length of the resulting string after \
\any
escape sequences have been handled.  The string does not have any
"terminating NULL" that C includes, and the length does not \
\count such
a character.

The length is optional.]]></artwork></li>
</ul>
<t>
There is no possible BNF that is sound for the quoted-string representation when preceded with the length.
</t>
<t>
Section 4.2 continues with:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

The escape conventions within the quoted string are as follows \
\(these follow
the "C" programming language conventions, with an extension for
ignoring line terminators of just LF or CRLF):
        \b              -- backspace
        \t              -- horizontal tab
        \v              -- vertical tab
        \n              -- new-line
        \f              -- form-feed
        \r              -- carriage-return
        \"              -- double-quote
        \'              -- single-quote
        \\              -- back-slash
        \ooo            -- character with octal value ooo (all \
\three digits
                           must be present)
        \xhh            -- character with hexadecimal value hh (\
\both digits
                           must be present)
        \<carriage-return> -- causes carriage-return to be \
\ignored.
        \<line-feed>       -- causes linefeed to be ignored
        \<carriage-return><line-feed> -- causes CRLF to be \
\ignored.
        \<line-feed><carriage-return> -- causes LFCR to be \
\ignored.]]></artwork></li>
</ul>
<t>
Here the first sentence does not match the list of line terminators below it.
We assume that there are four line terminators, not two.
</t>
<t>
In C the escape sequence '\0' is defined for character, but not for strings as a 0 value can never appear in a C string.
But that is not true of a quoted-string and it would even be useful to have a shorter encoding than "\x00".
We assume that is was not an oversight, and do not add "\0" as escape sequence.
</t>
</section>
<section>
<name>Formalization</name>
<t>
There is between four and seven different ways to represent an octet in a quoted-string: ASCII, escaped, octal, and hexadecimal, with that last one taking up to 4 different different representations depending on the combination of uppercase and lowercase symbols.
</t>
<t>
We first define a function for each of the different type of encodings that returns either a non-empty list of octets if the octet is representable in that encoding, or an empty list if it is not:
</t>
<ul>
<li><t>The octets of value 32, 33, 35-91, and 93-126 can be represented as the equivalent ASCII character:</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[ascii : Bits8 -> List Bits8
ascii m =
  if m < 32 then empty
  else if m == 34 then empty
  else if m == 92 then empty
  else if m > 126 then empty
  else [m]]]></sourcecode></li><li><t>The octets of value 7, 8, 9, 10, 11, 12, 13, 34, 39, and 92 can be represented respectively as the ASCII sequences "\a", "\b", "\t", "\n", "\v", "\f", "\r", "\"", "\'", and "\\":</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[escaped : Bits8 -> List Bits8
escaped 7 = [92, 97]
escaped 8 = [92, 98]
escaped 9 = [92, 116]
escaped 10 = [92, 110]
escaped 11 = [92, 118]
escaped 12 = [92, 102]
escaped 13 = [92, 114]
escaped 34 = [92, 34]
escaped 39 = [92, 39]
escaped 92 = [92, 92]
escaped _ = empty]]></sourcecode></li><li><t>All octets can be represented as the "\" ASCII character followed by the octal encoding of that octet in ASCII:</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[octal : Bits8 -> List Bits8
octal x =
  let m = x `shiftR` 6
      n = (x `shiftR` 3) .&. 7
      o = x .&. 7
  in [92, m + 48, n + 48, o + 48]]]></sourcecode></li><li><t>All octets can be represented as the "\x" ASCII sequence followed by the hexadecimal encoding of that octet.
Because alphabetic hexadecimal symbols can be encoded as lowercase or uppercase symbols, we get two different encodings for each half of an octet:</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[halfl : Bits8 -> Bits8
halfl x = if x < 10 then x + 48 else x + 87]]></sourcecode>
<sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[halfu : Bits8 -> Bits8
halfu x = if x < 10 then x + 48 else x + 55]]></sourcecode>
<t>
Which then gives us four different hexadecimal encodings for an octet:
</t>
<sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[hexll : Bits8 -> List Bits8
hexll x = [92, 120, halfl (x `shiftR` 4), halfl (x .&. 15)]]]></sourcecode>
<sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[hexlu : Bits8 -> List Bits8
hexlu x = [92, 120, halfl (x `shiftR` 4), halfu (x .&. 15)]]]></sourcecode>
<sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[hexul : Bits8 -> List Bits8
hexul x = [92, 120, halfu (x `shiftR` 4), halfl (x .&. 15)]]]></sourcecode>
<sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[hexuu : Bits8 -> List Bits8
hexuu x = [92, 120, halfu (x `shiftR` 4), halfu (x .&. 15)]]]></sourcecode></li>
</ul>
<t>
We then define a <tt>Quoted</tt> type indexed over the quoted-string representation of a single octet, using one constructor for each possible type of representation for an octet.
</t>
<t>
A boolean expression is used to restrict the possible values of the octet when encoded as an ASCII or escaped value, preventing the corresponding constructors to be instantiated.
</t>
<t>
We also have four additional constructors for the four types of line breaks.
These are purely cosmetic and do not encode an octet.
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Quoted : List Bits8 -> Type where
  Ascii :
    (x : Bits8) -> (prf : (x >= 32 && x <= 127 && x /= 34 &&
    x /= 92) === True) ->
    Quoted (ascii x)
  Escaped : (x : Bits8) ->
    (prf : (x >= 7 && x <= 13
     || x == 34 || x == 39 || x == 92) === True) ->
    Quoted (escaped x)
  HexLL : (x : Bits8) -> Quoted (hexll x)
  HexUL : (x : Bits8) -> Quoted (hexul x)
  HexLU : (x : Bits8) -> Quoted (hexlu x)
  HexUU : (x : Bits8) -> Quoted (hexuu x)
  Octal : (x : Bits8) -> Quoted (octal x)
  Cr : Quoted [92, 13]
  Lf : Quoted [92, 10]
  CrLf : Quoted [92, 13, 10]
  LfCr : Quoted [92, 10, 13]]]></sourcecode></li>
</ul>
<t>
We can then use that type to build a type indexed over a complete quoted-string.
Here we use an Idris2 namespace so we can use the syntactic sugar for a list multiple times in the same source:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[namespace QuotedString
  public export
  data QuotedList : List Bits8 -> Type where
    Nil : QuotedList []
    (::) : Quoted xs -> QuotedList ys ->
      QuotedList (xs ++ ys)]]></sourcecode></li>
</ul>
<t>
We can then define the <tt>octetString</tt> function for the <tt>QuotedList</tt> type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (QuotedList _) where
  octetString [] = []
  octetString (Ascii x _ :: y) = x :: octetString y
  octetString (Escaped x _ :: y) = x :: octetString y
  octetString (HexLL x :: y) = x :: octetString y
  octetString (HexUL x :: y) = x :: octetString y
  octetString (HexLU x :: y) = x :: octetString y
  octetString (HexUU x :: y) = x :: octetString y
  octetString (Octal x :: y) = x :: octetString y
  octetString (Cr :: y) = octetString y
  octetString (Lf :: y) = octetString y
  octetString (CrLf :: y) = octetString y
  octetString (LfCr :: y) = octetString y]]></sourcecode></li>
</ul>
<t>
The type for a quoted-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data QuotedString : List Bits8 -> Type where
  MkQuotedString : QuotedList xs ->
    QuotedString (34 :: xs ++ [34])]]></sourcecode></li>
</ul>
<t>
And the function to retrieve its octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (QuotedString _) where
  octetString (MkQuotedString q) = octetString q]]></sourcecode></li>
</ul>
<t>
We then define an alternative type for the quoted-string representation that is preceded by the length of its octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data QuotedStringLength : List Bits8 -> Type where
  MkQuotedStringLength : (q : QuotedList xs) ->
    QuotedStringLength (base10 (length (octetString q)) ++
      [34] ++ xs ++ [34])]]></sourcecode></li>
</ul>
<t>
And the function to retrieve its octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (QuotedStringLength _) where
  octetString (MkQuotedStringLength q) = octetString q]]></sourcecode></li>
</ul>
</section>
<section>
<name>Validation</name>
<t>
Here we prove that all the examples in section 4.2 of the original document are valid instances of the <tt>QuotedString</tt> or <tt>QuotedStringLength</tt> types:
</t>
<ul empty='true'>
<li><artwork><![CDATA[Here are some examples of quoted-string encodings:

        "subject"
        "hi there"
        7"subject"
        3"\n\n\n"
        "This has\n two lines."
        "This has\
        one."
        ""]]></artwork></li>
</ul>
<ul>
<li><t>"subject"</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testQuotedString1 : QuotedString [34, 115, 117, 98, 106, 101,
  99, 116, 34]
testQuotedString1 = MkQuotedString [Ascii 115 Refl,
  Ascii 117 Refl, Ascii 98 Refl, Ascii 106 Refl,
  Ascii 101 Refl, Ascii 99 Refl, Ascii 116 Refl]]]></sourcecode></li><li><t>"hi there"</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testQuotedString2 : QuotedString [34, 104, 105, 32, 116, 104,
  101, 114, 101, 34]
testQuotedString2 = MkQuotedString [Ascii 104 Refl,
  Ascii 105 Refl, Ascii 32 Refl, Ascii 116 Refl,
  Ascii 104 Refl, Ascii 101 Refl, Ascii 114 Refl,
  Ascii 101 Refl]]]></sourcecode></li><li><t>7"subject"</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testQuotedString3 : QuotedStringLength [55, 34, 115, 117, 98,
  106, 101, 99, 116, 34]
testQuotedString3 = MkQuotedStringLength [Ascii 115 Refl,
  Ascii 117 Refl, Ascii 98 Refl, Ascii 106 Refl,
  Ascii 101 Refl, Ascii 99 Refl, Ascii 116 Refl]]]></sourcecode></li><li><t>3"\n\n\n"</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testQuotedString4 : QuotedStringLength [51, 34, 92, 110, 92,
  110, 92, 110, 34]
testQuotedString4 = MkQuotedStringLength [Escaped 10 Refl,
  Escaped 10 Refl, Escaped 10 Refl]]]></sourcecode></li><li><t>"This has\n two lines."</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testQuotedString5 : QuotedString [34, 84, 104, 105, 115, 32,
  104, 97, 115, 92, 110, 32, 116, 119, 111, 32, 108, 105,
  110, 101, 115, 46, 34]
testQuotedString5 = MkQuotedString [Ascii 84 Refl,
  Ascii 104 Refl, Ascii 105 Refl, Ascii 115 Refl,
  Ascii 32 Refl, Ascii 104 Refl, Ascii 97 Refl, Ascii 115 Refl,
  Escaped 10 Refl, Ascii 32 Refl, Ascii 116 Refl,
  Ascii 119 Refl, Ascii 111 Refl, Ascii 32 Refl,
  Ascii 108 Refl, Ascii 105 Refl, Ascii 110 Refl,
  Ascii 101 Refl, Ascii 115 Refl, Ascii 46 Refl]]]></sourcecode></li><li><t>"This has\ one." (actually on two lines)</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testQuotedString6 : QuotedString [34, 84, 104, 105, 115, 32,
  104, 97, 115, 92, 10, 111, 110, 101, 46, 34]
testQuotedString6 = MkQuotedString [Ascii 84 Refl,
  Ascii 104 Refl, Ascii 105 Refl, Ascii 115 Refl,
  Ascii 32 Refl, Ascii 104 Refl, Ascii 97 Refl,
  Ascii 115 Refl, Lf, Ascii 111 Refl, Ascii 110 Refl,
  Ascii 101 Refl, Ascii 46 Refl]]]></sourcecode></li><li><t>""</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testQuotedString7 : QuotedString [34, 34]
testQuotedString7 = MkQuotedString []]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Token Representation</name>
<section>
<name>Analysis</name>
<t>
Section 4.3 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

An octet string that meets the following conditions may be given
directly as a "token".

        -- it does not begin with a digit

        -- it contains only characters that are
                -- alphabetic (upper or lower case),
                -- numeric, or
                -- one of the eight "pseudo-alphabetic" \
\punctuation marks:
                        -   .   /   _   :  *  +  =
        (Note: upper and lower case are not equivalent.)
        (Note: A token may begin with punctuation, including ":").]]></artwork></li>
</ul>
</section>
<section>
<name>Formalization</name>
<t>
At the difference of all the other encodings, a token element can represent only a subset of all possible octets so we first define a type that constrains any octets but the first in a token:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data TokenChar : Bits8 -> Type where
  MkTokenChar : (x : Bits8) ->
    (prf : (x >= 65 && x <= 90 || x >= 97 && x <= 122 ||
      x >= 48 && x <= 57 || x == 45 || x == 46 || x == 47 ||
      x == 95 || x == 58 || x == 42 || x == 43 || x == 61)
      === True) ->
   TokenChar x]]></sourcecode></li>
</ul>
<t>
Then we define a type for a list of these:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[namespace Token
  public export
  data TokenCharList : List Bits8 -> Type where
    Nil : TokenCharList []
    (::) : TokenChar x -> TokenCharList xs ->
    TokenCharList (x :: xs)]]></sourcecode></li>
</ul>
<t>
Then a type that represents a complete token as a constrained first octet followed by a list of constrained octets:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Token : List Bits8 -> Type where
  MkToken : (x : Bits8) ->
    (prf : (x >= 65 && x <= 90 || x >= 97 && x <= 122 ||
      x == 45 || x == 46 || x == 95 || x == 58 || x == 42 ||
      x == 47 || x == 43 || x == 61) === True) ->
    TokenCharList xs -> Token (x :: xs)]]></sourcecode></li>
</ul>
<t>
We can then define the <tt>octetString</tt> function for the <tt>Token</tt> type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (Token _) where
  octetString (MkToken x _ xs) = x :: octetString' xs where
    octetString' : TokenCharList _ -> List Bits8
    octetString' [] = []
    octetString' (MkTokenChar x _ :: xs) = x :: octetString' xs]]></sourcecode></li>
</ul>
</section>
<section>
<name>Validation</name>
<t>
Here we prove that all the examples in section 4.3 of the original document are valid instances of the <tt>Token</tt> type:
</t>
<ul empty='true'>
<li><artwork><![CDATA[Here are some examples of token representations:

        subject
        not-before
        class-of-1997
        //microsoft.com/names/smith
        *]]></artwork></li>
</ul>
<ul>
<li><t>subject</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testToken1 : Token [115, 117, 98, 106, 101, 99, 116]
testToken1 = MkToken 115 Refl [MkTokenChar 117 Refl,
  MkTokenChar 98 Refl, MkTokenChar 106 Refl,
  MkTokenChar 101 Refl, MkTokenChar 99 Refl,
  MkTokenChar 116 Refl]]]></sourcecode></li><li><t>not-before</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testToken2 : Token [110, 111, 116, 45, 98, 101, 102, 111, 114,
  101]
testToken2 = MkToken 110 Refl [MkTokenChar 111 Refl,
  MkTokenChar 116 Refl, MkTokenChar 45 Refl,
  MkTokenChar 98 Refl, MkTokenChar 101 Refl,
  MkTokenChar 102 Refl, MkTokenChar 111 Refl,
  MkTokenChar 114 Refl, MkTokenChar 101 Refl]]]></sourcecode></li><li><t>class-of-1997</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testToken3 : Token [99, 108, 97, 115, 115, 45, 111, 102, 45,
  49, 57, 57, 55]
testToken3 = MkToken 99 Refl [MkTokenChar 108 Refl,
  MkTokenChar 97 Refl, MkTokenChar 115 Refl,
  MkTokenChar 115 Refl, MkTokenChar 45 Refl,
  MkTokenChar 111 Refl, MkTokenChar 102 Refl,
  MkTokenChar 45 Refl, MkTokenChar 49 Refl,
  MkTokenChar 57 Refl, MkTokenChar 57 Refl,
  MkTokenChar 55 Refl]]]></sourcecode></li><li><t>//microsoft.com/names/smith</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testToken4 : Token [47, 47, 109, 105, 99, 114, 111, 115, 111,
  102, 116, 46, 99, 111, 109, 47, 110, 97, 109, 101, 115, 47,
  115, 109, 105, 116, 104]
testToken4 = MkToken 47 Refl [MkTokenChar 47 Refl,
  MkTokenChar 109 Refl, MkTokenChar 105 Refl,
  MkTokenChar 99 Refl, MkTokenChar 114 Refl,
  MkTokenChar 111 Refl, MkTokenChar 115 Refl,
  MkTokenChar 111 Refl, MkTokenChar 102 Refl,
  MkTokenChar 116 Refl, MkTokenChar 46 Refl,
  MkTokenChar 99 Refl, MkTokenChar 111 Refl,
  MkTokenChar 109 Refl, MkTokenChar 47 Refl,
  MkTokenChar 110 Refl, MkTokenChar 97 Refl,
  MkTokenChar 109 Refl, MkTokenChar 101 Refl,
  MkTokenChar 115 Refl, MkTokenChar 47 Refl,
  MkTokenChar 115 Refl, MkTokenChar 109 Refl,
  MkTokenChar 105 Refl, MkTokenChar 116 Refl,
  MkTokenChar 104 Refl]]]></sourcecode></li><li><t>*</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testToken5 : Token [42]
testToken5 = MkToken 42 Refl []]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Hexadecimal Representation</name>
<section>
<name>Analysis</name>
<t>
Section 4.4 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

An octet-string may be represented with a hexadecimal encoding \
\consisting of:

        -- an (optional) decimal length of the octet string

        -- a sharp-sign "#"

        -- a hexadecimal encoding of the octet string, with each \
\octet
           represented with two hexadecimal digits, most \
\significant
           digit first.

        -- a sharp-sign "#"

There may be whitespace inserted in the midst of the hexadecimal
encoding arbitrarily; it is ignored.  It is an error to have
characters other than whitespace and hexadecimal digits.]]></artwork></li>
</ul>
<t>
There is no possible BNF that is sound for the hexadecimal representation when preceded with the length.
</t>
<t>
"hexadecimal encoding" is understood as allowing either case for each hexadecimal half of the encoding for a single octet.
</t>
</section>
<section>
<name>Formalization</name>
<t>
The hexadecimal representation encodes each octet of an octet-string as two octets in ASCII, each followed by zero or more white spaces.
</t>
<t>
First we built a type for a white space:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Whitespace : Bits8 -> Type where
  MkWhitespace : (x : Bits8) ->
    (prf : (x == 32 || x == 9 || x == 11 ||
      x == 12 || x == 13 || x == 10) === True) ->
    Whitespace x]]></sourcecode></li>
</ul>
<t>
And then a type for a list of white spaces:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[namespace Whitespace
  public export
  data WhitespaceList : List Bits8 -> Type where
    Nil : WhitespaceList []
    (::) : Whitespace x -> WhitespaceList xs ->
      WhitespaceList (x :: xs)]]></sourcecode></li>
</ul>
<t>
Note that white spaces are purely cosmetic, so they do not encode octets in an octet-string.
That means that there no <tt>octetString</tt> function for these.
</t>
<t>
With that we can build the hexadecimal representation of an octet.
We have four constructors, each corresponding to one of the four possible variants for an hexadecimal encoding:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Hex : List Bits8 -> Type where
  HexLL' : (x : Bits8) -> WhitespaceList xs ->
    WhitespaceList ys ->
    Hex ([halfl (x `shiftR` 4)] ++ xs ++ [halfl (x .&. 15)] ++
    ys)
  HexLU' : (x : Bits8) -> WhitespaceList xs ->
    WhitespaceList ys ->
    Hex ([halfl (x `shiftR` 4)] ++ xs ++ [halfu (x .&. 15)] ++
     ys)
  HexUL' : (x : Bits8) -> WhitespaceList xs ->
    WhitespaceList ys ->
    Hex ([halfu (x `shiftR` 4)] ++ xs ++ [halfl (x .&. 15)] ++
      ys)
  HexUU' : (x : Bits8) -> WhitespaceList xs ->
    WhitespaceList ys ->
    Hex ([halfu (x `shiftR` 4)] ++ xs ++ [halfu (x .&. 15)] ++
      ys)]]></sourcecode></li>
</ul>
<t>
Then we can build a type for the hexadecimal representation of an octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[namespace Hexadecimal
  public export
  data HexList : List Bits8 -> Type where
    Nil : HexList []
    (::) : Hex xs -> HexList ys -> HexList (xs ++ ys)]]></sourcecode></li>
</ul>
<t>
And a function <tt>octetString</tt> for that type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (HexList _) where
  octetString [] = []
  octetString (HexLL' x _ _ :: xs) = x :: octetString xs
  octetString (HexLU' x _ _ :: xs) = x :: octetString xs
  octetString (HexUL' x _ _ :: xs) = x :: octetString xs
  octetString (HexUU' x _ _ :: xs) = x :: octetString xs]]></sourcecode></li>
</ul>
<t>
With that we can build an <tt>Hexadecimal</tt> type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Hexadecimal : List Bits8 -> Type where
  MkHexadecimal : WhitespaceList xs -> HexList ys ->
    Hexadecimal (35 :: xs  ++ ys ++ [35])]]></sourcecode></li>
</ul>
<t>
And its <tt>octetString</tt> function:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (Hexadecimal _) where
  octetString (MkHexadecimal _ y) = octetString y]]></sourcecode></li>
</ul>
<t>
We then define an alternative type for the hexadecimal representation that is preceded by the length of its octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data HexadecimalLength : List Bits8 -> Type where
  MkHexadecimalLength : WhitespaceList xs ->
    (h : HexList ys) ->
    HexadecimalLength (base10 (length (octetString h)) ++
      [35] ++ xs ++ ys ++ [35])]]></sourcecode></li>
</ul>
<t>
And the function to retrieve its octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (HexadecimalLength _) where
  octetString (MkHexadecimalLength _ h) = octetString h]]></sourcecode></li>
</ul>
</section>
<section>
<name>Validation</name>
<t>
Here we prove that all the examples in section 4.4 of the original document are valid instances of the <tt>Hexadecimal</tt> type:
</t>
<ul empty='true'>
<li><artwork><![CDATA[Here are some examples of hexadecimal encodings:

        #616263#                -- represents "abc"
        3#616263#               -- also represents "abc"
        # 616
          263 #                 -- also represents "abc"]]></artwork></li>
</ul>
<ul>
<li><t>#616263#</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testHexadecimal1 : Hexadecimal [35, 54, 49, 54, 50, 54, 51, 35]
testHexadecimal1 = MkHexadecimal [] [HexLL' 97 [] [],
  HexLL' 98 [] [], HexLL' 99 [] []]]]></sourcecode></li><li><t>3#616263#</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testHexadecimal2 : HexadecimalLength [51, 35, 54, 49, 54, 50,
  54, 51, 35]
testHexadecimal2 = MkHexadecimalLength [] [HexLL' 97 [] [],
  HexLL' 98 [] [], HexLL' 99 [] []]]]></sourcecode></li><li><t># 616 263 #</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testHexadecimal3 : Hexadecimal [35, 32, 54, 49, 54, 10, 32, 32,
  50, 54, 51, 32, 35]
testHexadecimal3 = MkHexadecimal [MkWhitespace 32 Refl] [
  HexLL' 97 [] [],
  HexLL' 98 [MkWhitespace 10 Refl, MkWhitespace 32 Refl,
    MkWhitespace 32 Refl] [],
  HexLL' 99 [] [MkWhitespace 32 Refl]]]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Base 64 Representation</name>
<section>
<name>Analysis</name>
<t>
Section 4.5 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

An octet-string may be represented in a base-64 coding \
\consisting of:

        -- an (optional) decimal length of the octet string

        -- a vertical bar "|"

        -- the rfc 1521 base-64 encoding of the octet string.

        -- a final vertical bar "|"

The base-64 encoding uses only the characters
        A-Z  a-z  0-9  +  /  =
It produces four characters of output for each three octets of \
\input.
If the input has one or two left-over octets of input, it \
\produces an
output block of length four ending in two or one equals signs, \
\respectively.
Output routines compliant with this standard MUST output the \
\equals signs
as specified.  Input routines MAY accept inputs where the equals \
\signs are
dropped.

There may be whitespace inserted in the midst of the base-64 \
\encoding
arbitrarily; it is ignored.  It is an error to have characters \
\other
than whitespace and base-64 characters.]]></artwork></li>
</ul>
<t>
There is no possible BNF that is sound for the base 64 representation when preceded with the length.
</t>
<t>
The fragment "...where the equals signs are dropped" is ambiguous as it does not state if it is one or two equals signs that can be dropped, or all equals signs.
Here we encode types to support the former interpretation.
</t>
</section>
<section>
<name>Formalization</name>
<t>
First we need a function that will return a base 64 octet from the six lower bits of an octet:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[b64 : Bits8 -> Bits8
b64 x =  if x < 26 then x + 65
  else if x < 52 then x + 71
  else if x < 62 then x - 4
  else if x == 62 then 43
  else if x == 63 then 47
  else 0x3D]]></sourcecode></li>
</ul>
<t>
Next we need four functions that return respectively the first, second, third, and fourth ASCII octet for a group of three octets from the octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[b641 : Bits8 -> List Bits8
b641 x1 = [b64 (x1 `shiftR` 2)]]]></sourcecode></li>
</ul>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[b642 : Bits8 -> Bits8 -> List Bits8
b642 x1 x2 = [b64 (((x1 .&. 0b11) `shiftL` 4) .|.
  (x2 `shiftR` 4))]]]></sourcecode></li>
</ul>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[b643 : Bits8 -> Bits8 -> List Bits8
b643 x2 x3 = [b64 (((x2 .&. 0b1111) `shiftL` 2) .|.
  (x3 `shiftR` 6))]]]></sourcecode></li>
</ul>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[b644 : Bits8 -> List Bits8
b644 x3 = [b64 (x3 .&. 0b111111)]]]></sourcecode></li>
</ul>
<t>
Our first type for the base 64 representation is for a group of three octets from the octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Base64Full : List Bits8 -> Type where
  MkBase64Full : (x1 : Bits8) -> (x2 : Bits8) ->
    (x3 : Bits8) ->
    WhitespaceList xs -> WhitespaceList ys ->
    WhitespaceList zs -> WhitespaceList ws ->
    Base64Full (b641 x1 ++ xs ++ b642 x1 x2 ++ ys ++
      b643 x2 x3 ++ zs ++ b644 x3 ++ ws)]]></sourcecode></li>
</ul>
<t>
Then we can build a type for the base 64 representation of an octet-string whose length is a multiple of three:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[namespace Base64
  public export
  data Base64List : List Bits8 -> Type where
    Nil : Base64List []
    (::) : Base64Full xs -> Base64List ys ->
      Base64List (xs ++ ys)]]></sourcecode></li>
</ul>
<t>
We build another type for the octet-strings that have a length that is not a multiple of three.
There is additional constructors to account for the fact that the padding is optional.
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Base64End : List Bits8 -> Type where
  EndOnePadPad : (x1 : Bits8) ->
    WhitespaceList xs -> WhitespaceList ys ->
    WhitespaceList zs -> WhitespaceList ws ->
    Base64End (b641 x1 ++ xs ++ b642 x1 0 ++ ys ++ [61] ++ zs
      ++ [61] ++ ws)
  EndOnePad : (x1 : Bits8) ->
    WhitespaceList xs -> WhitespaceList ys ->
    WhitespaceList zs ->
    Base64End (b641 x1 ++ xs ++ b642 x1 0 ++ ys ++ [61] ++ zs)
  EndOne : (x1 : Bits8) ->
    WhitespaceList xs -> WhitespaceList ys ->
    Base64End (b641 x1 ++ xs ++ b642 x1 0 ++ ys)
  EndTwoPad : (x1 : Bits8) -> (x2 : Bits8) ->
    WhitespaceList xs -> WhitespaceList ys ->
    WhitespaceList zs -> WhitespaceList ws ->
    Base64End (b641 x1 ++ xs ++ b642 x1 x2 ++ ys ++ b643 x2 0
      ++ zs ++ [61] ++ ws)
  EndTwo : (x1 : Bits8) -> (x2 : Bits8) ->
    WhitespaceList xs -> WhitespaceList ys ->
    WhitespaceList zs ->
    Base64End (b641 x1 ++ xs ++ b642 x1 x2 ++ ys ++ b643 x2 0
      ++ zs)]]></sourcecode></li>
</ul>
<t>
We then put all these together into a type for base 64 encoding with two constructors, one for octet-strings whose length is a multiple of 3, and one for the others:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Base64' : List Bits8 -> Type where
  Base64Mult3 : Base64List xs -> Base64' xs
  Base64Non : Base64List xs -> Base64End ys ->
    Base64' (xs ++ ys)]]></sourcecode></li>
</ul>
<t>
We can then define the <tt>octetString</tt> function for the <tt>Base64'</tt> type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[octetString' : Base64List _ -> List Bits8
octetString' [] = []
octetString' (MkBase64Full x1 x2 x3 _ _ _ _ :: xs) =
  x1 :: x2 :: x3 :: octetString' xs

OctetString (Base64' _) where
  octetString (Base64Mult3 xs) = octetString' xs
  octetString (Base64Non xs (EndOnePadPad x1 _ _ _ _)) =
    octetString' xs ++ [x1]
  octetString (Base64Non xs (EndOnePad x1 _ _ _)) =
    octetString' xs ++ [x1]
  octetString (Base64Non xs (EndOne x1 _ _)) =
    octetString' xs ++ [x1]
  octetString (Base64Non xs (EndTwoPad x1 x2 _ _ _ _)) =
    octetString' xs ++ [x1, x2]
  octetString (Base64Non xs (EndTwo x1 x2 _ _ _)) =
    octetString' xs ++ [x1, x2]]]></sourcecode></li>
</ul>
<t>
Finally we can define the <tt>Base64</tt> type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Base64 : List Bits8 -> Type where
  MkBase64 : WhitespaceList xs -> Base64' ys ->
    Base64 (124 :: xs ++ ys ++ [124])]]></sourcecode></li>
</ul>
<t>
And its <tt>octetString</tt> function:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (Base64 _) where
  octetString (MkBase64 _ y) = octetString y]]></sourcecode></li>
</ul>
<t>
We then reuse the <tt>Base64'</tt> type to define one more type for the base 64 representation that is preceded by the length of its octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Base64Length : List Bits8 -> Type where
  MkBase64Length : WhitespaceList xs -> (b : Base64' ys) ->
    Base64Length (base10 (length (octetString b)) ++ [124]
      ++ xs ++ ys ++ [124])]]></sourcecode></li>
</ul>
<t>
And its <tt>octetString</tt> function:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (Base64Length _) where
  octetString (MkBase64Length _ b) = octetString b]]></sourcecode></li>
</ul>
</section>
<section>
<name>Validation</name>
<t>
Here we prove that all the examples in section 4.5 of the original document are valid instances of the <tt>Base64</tt> type:
</t>
<ul empty='true'>
<li><artwork><![CDATA[Here are some examples of base-64 encodings:

    |YWJj|              -- represents "abc"
    | Y W
      J j |             -- also represents "abc"
    3|YWJj|             -- also represents "abc"
    |YWJjZA==|          -- represents "abcd"
    |YWJjZA|            -- also represents "abcd"]]></artwork></li>
</ul>
<ul>
<li><t>|YWJj|</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testBase641 : Base64 [124, 89, 87, 74, 106, 124]
testBase641 = MkBase64 [] (Base64Mult3
  [MkBase64Full 97 98 99 [] [] [] []])]]></sourcecode></li><li><t>| Y W J j |</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testBase642 : Base64 [124, 32, 89, 32, 87, 32, 74, 32, 106,
  32, 124]
testBase642 = MkBase64 [MkWhitespace 32 Refl]
  (Base64Mult3 [MkBase64Full 97 98 99 [MkWhitespace 32 Refl]
  [MkWhitespace 32 Refl] [MkWhitespace 32 Refl]
  [MkWhitespace 32 Refl]])]]></sourcecode></li><li><t>3|YWJj|</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testBase643 : Base64Length [51, 124, 89, 87, 74, 106, 124]
testBase643 = MkBase64Length [] (Base64Mult3
  [MkBase64Full 97 98 99 [] [] [] []])]]></sourcecode></li><li><t>|YWJjZA==|</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testBase644 : Base64 [124, 89, 87, 74, 106, 90, 65, 61, 61,
  124]
testBase644 = MkBase64 [] (Base64Non
  [MkBase64Full 97 98 99 [] [] [] []]
  (EndOnePadPad 100 [] [] [] []))]]></sourcecode></li><li><t>|YWJjZA|</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testBase645 : Base64 [124, 89, 87, 74, 106, 90, 65, 124]
testBase645 = MkBase64 [] (Base64Non
  [MkBase64Full 97 98 99 [] [] [] []]
  (EndOne 100 [] []))]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Octet-String Representation</name>
<section>
<name>Analysis</name>
<t>
Before going further we have to address the case of the brace notation for base 64.
</t>
<t>
Section 6.2 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

There is a difference between the brace notation for base-64 \
\used here
and the || notation for base-64'd octet-strings described above. \
\ Here
the base-64 contents are converted to octets, and then \
\re-scanned as
if they were given originally as octets.  With the || notation, \
\the
contents are just turned into an octet-string.]]></artwork></li>
</ul>
<t>
It is not clear from that text if the octets that are to be re-scanned are for the representation of an octet-string, or for a whole s-expr.
Additionally this text seems to ignore the fact that examples using that notation were provided in section 2 and section 5 of the original s-expr document.
</t>
<t>
So the first ambiguity would about about the usage of the brace notation in a display-hint.
Obviously it would not make sense to have a s-expr inside a display-hint so at best it encodes an octet-string.
But if that's the case, does it encodes any of the other representations (maybe including itself) or just the verbatim representation, as examples in section 2 and 5 show?
</t>
<t>
The same can be said of the use of the brace notation as simple-string.
There again it would not make sense to encode an s-expr with it, because then it would be possible to associate it with a display-hint, which does not make sense.
Then if it is only the encoding of the representation of an octet-string then the same ambiguity than above is present about the representations permitted.
</t>
<t>
To add to the issue, the brace notation for base 64 on an octet-string is largely redundant with the quoted-string, hexadecimal, and base 64 representations, because these already handle the problem of representing any s-expr using ASCII characters.
That is only required for the basic transport.
</t>
<t>
Here we chose to use the brace notation for base 64 exclusively in the basic transport, restricting the octet-string inside as verbatim representations.
That makes the examples in section 2 and 5 incorrect unless used as s-expr in the basic transport.
</t>
</section>
<section>
<name>Formalization</name>
<t>
With that in mind we can define a type that covers all possible representation for an octet-string, excluding the brace notation for base 64.
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Representation : List Bits8 -> Type where
  RepresentationVerbatim : (v : Verbatim xs) ->
    Representation xs
  RepresentationQuoted : QuotedString xs ->
    Representation xs
  RepresentationQuotedLength : QuotedStringLength xs ->
    Representation xs
  RepresentationToken : Token xs -> Representation xs
  RepresentationHexadecimal : Hexadecimal xs ->
    Representation xs
  RepresentationHexadecimalLength : HexadecimalLength xs ->
    Representation xs
  RepresentationBase64 : Base64 xs -> Representation xs
  RepresentationBase64Length : Base64Length xs ->
    Representation xs]]></sourcecode></li>
</ul>
<t>
And its matching <tt>octetString</tt> function:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (Representation _) where
  octetString (RepresentationVerbatim v) = octetString v
  octetString (RepresentationQuoted x) = octetString x
  octetString (RepresentationQuotedLength x) = octetString x
  octetString (RepresentationToken x) = octetString x
  octetString (RepresentationHexadecimal x) = octetString x
  octetString (RepresentationHexadecimalLength x) =
    octetString x
  octetString (RepresentationBase64 x) = octetString x
  octetString (RepresentationBase64Length x) = octetString x]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Display-hint Representation</name>
<section>
<name>Analysis</name>
<t>
Section 4.6 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

Any octet string may be preceded by a single "display hint".

The purposes of the display hint is to provide information on how
to display the octet string to a user.  It has no other function.
Many of the MIME types work here.

A display-hint is an octet string surrounded by square brackets.
There may be whitespace separating the octet string from the
surrounding brackets.  Any of the legal formats may be used for \
\the
octet string.]]></artwork></li>
</ul>
<t>
The uses of "octet string" in this fragment are all incorrect.
"octet string representation" should be used instead.
</t>
<t>
The text uses singular "whitespace", not the plural "whitespaces".
</t>
<t>
The text also does not say if white spaces can separate the display hint from the octet-string it provides information to.
We assume that multiple white spaces can be used after the opening bracket, before the closing bracket, and between the closing bracket and the following octet-string.
</t>
<t>
Following the argument in the argument in the previous section, "legal formats" does not include the brace notation for base 64.
</t>
<t>
Section 4.6 of the original s-expr document ends with:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

In applications an octet-string that is untyped may be \
\considered to have
a pre-specified "default" mime type.  The mime type
                "text/plain; charset=iso-8859-1" 
is the standard default.]]></artwork></li>
</ul>
</section>
<section>
<name>Formalization</name>
<t>
We first build a type for a display-hint:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data DisplayHint : List Bits8 -> Type where
  MkDisplayHint : WhitespaceList xs -> Representation ys ->
    WhitespaceList zs ->
    DisplayHint (91 :: xs ++ ys ++ zs ++ [93])]]></sourcecode></li>
</ul>
<t>
Then define <tt>octetString</tt> for that type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[OctetString (DisplayHint _) where
  octetString (MkDisplayHint _ x _) = octetString x]]></sourcecode></li>
</ul>
<t>
Then a type for the association of a display-hint and the representation of an octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data WithHint : List Bits8 -> Type where
  MkWithHint : DisplayHint xs -> WhitespaceList ys ->
    Representation zs -> WithHint (xs ++ ys ++ zs)]]></sourcecode></li>
</ul>
<t>
We finally define the default display-hint as the token application/octet-stream:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[defaultHint : Representation [97, 112, 112, 108, 105, 99, 97,
  116, 105, 111, 110, 47, 111, 99, 116, 101, 116, 45, 115,
  116, 114, 101, 97, 109]
defaultHint = RepresentationToken (MkToken 97 Refl
  [MkTokenChar 112 Refl, MkTokenChar 112 Refl,
  MkTokenChar 108 Refl, MkTokenChar 105 Refl,
  MkTokenChar 99 Refl, MkTokenChar 97 Refl,
  MkTokenChar 116 Refl, MkTokenChar 105 Refl,
  MkTokenChar 111 Refl, MkTokenChar 110 Refl,
  MkTokenChar 47 Refl, MkTokenChar 111 Refl,
  MkTokenChar 99 Refl, MkTokenChar 116 Refl,
  MkTokenChar 101 Refl, MkTokenChar 116 Refl,
  MkTokenChar 45 Refl, MkTokenChar 115 Refl,
  MkTokenChar 116 Refl, MkTokenChar 114 Refl,
  MkTokenChar 101 Refl, MkTokenChar 97 Refl,
  MkTokenChar 109 Refl])]]></sourcecode></li>
</ul>
</section>
<section>
<name>Validation</name>
<t>
Here we prove that all the examples in section 4.6 of the original document are valid instances of the <tt>DisplayHint</tt> type:
</t>
<ul empty='true'>
<li><artwork><![CDATA[Here are some examples of display-hints:

        [image/gif]
        [URI]
        [charset=unicode-1-1]
        [text/richtext]
        [application/postscript]
        [audio/basic]
        ["http://abc.com/display-types/funky.html"]]]></artwork></li>
</ul>
<ul>
<li><t>[image/gif]</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testHint1 : DisplayHint [91, 105, 109, 97, 103, 101, 47, 103,
  105, 102, 93]
testHint1 = MkDisplayHint [] (RepresentationToken
  (MkToken 105 Refl [MkTokenChar 109 Refl,
  MkTokenChar 97 Refl, MkTokenChar 103 Refl,
  MkTokenChar 101 Refl, MkTokenChar 47 Refl,
  MkTokenChar 103 Refl, MkTokenChar 105 Refl,
  MkTokenChar 102 Refl])) []]]></sourcecode></li><li><t>[URI]</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testHint2 : DisplayHint [91, 85, 82, 73, 93]
testHint2 = MkDisplayHint [] (RepresentationToken
  (MkToken 85 Refl [MkTokenChar 82 Refl,
  MkTokenChar 73 Refl])) []]]></sourcecode></li><li><t>[charset=unicode-1-1]</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testHint3 : DisplayHint [91, 99, 104, 97, 114, 115, 101, 116,
  61, 117, 110, 105, 99, 111, 100, 101, 45, 49, 45, 49, 93]
testHint3 = MkDisplayHint [] (RepresentationToken
  (MkToken 99 Refl [MkTokenChar 104 Refl,
  MkTokenChar 97 Refl, MkTokenChar 114 Refl,
  MkTokenChar 115 Refl, MkTokenChar 101 Refl,
  MkTokenChar 116 Refl, MkTokenChar 61 Refl,
  MkTokenChar 117 Refl, MkTokenChar 110 Refl,
  MkTokenChar 105 Refl, MkTokenChar 99 Refl,
  MkTokenChar 111 Refl, MkTokenChar 100 Refl,
  MkTokenChar 101 Refl, MkTokenChar 45 Refl,
  MkTokenChar 49 Refl, MkTokenChar 45 Refl,
  MkTokenChar 49 Refl])) []]]></sourcecode></li><li><t>[text/richtext]</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testHint4 : DisplayHint [91, 116, 101, 120, 116, 47, 114, 105,
  99, 104, 116, 101, 120, 116, 93]
testHint4 = MkDisplayHint [] (RepresentationToken
  (MkToken 116 Refl [MkTokenChar 101 Refl,
  MkTokenChar 120 Refl, MkTokenChar 116 Refl,
  MkTokenChar 47 Refl, MkTokenChar 114 Refl,
  MkTokenChar 105 Refl, MkTokenChar 99 Refl,
  MkTokenChar 104 Refl, MkTokenChar 116 Refl,
  MkTokenChar 101 Refl, MkTokenChar 120 Refl,
  MkTokenChar 116 Refl])) []]]></sourcecode></li><li><t>[application/postscript]</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testHint5 : DisplayHint [91, 97, 112, 112, 108, 105, 99, 97,
  116, 105, 111, 110, 47, 112, 111, 115, 116, 115, 99, 114,
  105, 112, 116, 93]
testHint5 = MkDisplayHint [] (RepresentationToken
  (MkToken 97 Refl [MkTokenChar 112 Refl,
  MkTokenChar 112 Refl, MkTokenChar 108 Refl,
  MkTokenChar 105 Refl, MkTokenChar 99 Refl,
  MkTokenChar 97 Refl, MkTokenChar 116 Refl,
  MkTokenChar 105 Refl, MkTokenChar 111 Refl,
  MkTokenChar 110 Refl, MkTokenChar 47 Refl,
  MkTokenChar 112 Refl, MkTokenChar 111 Refl,
  MkTokenChar 115 Refl, MkTokenChar 116 Refl,
  MkTokenChar 115 Refl, MkTokenChar 99 Refl,
  MkTokenChar 114 Refl, MkTokenChar 105 Refl,
  MkTokenChar 112 Refl, MkTokenChar 116 Refl])) []]]></sourcecode></li><li><t>[audio/basic]</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testHint6 : DisplayHint [91, 97, 117, 100, 105, 111, 47, 98,
  97, 115, 105, 99, 93]
testHint6 = MkDisplayHint [] (RepresentationToken
  (MkToken 97 Refl [MkTokenChar 117 Refl,
  MkTokenChar 100 Refl, MkTokenChar 105 Refl,
  MkTokenChar 111 Refl, MkTokenChar 47 Refl,
  MkTokenChar 98 Refl, MkTokenChar 97 Refl,
  MkTokenChar 115 Refl, MkTokenChar 105 Refl,
  MkTokenChar 99 Refl])) []]]></sourcecode></li><li><t>["http://abc.com/display-types/funky.html"]</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testHint7 : DisplayHint [91, 34, 104, 116, 116, 112, 58, 47,
  47, 97, 98, 99, 46, 99, 111, 109, 47, 100, 105, 115, 112,
  108, 97, 121, 45, 116, 121, 112, 101, 115, 47, 102, 117,
  110, 107, 121, 46, 104, 116, 109, 108, 34, 93]
testHint7 = MkDisplayHint [] (RepresentationQuoted
  (MkQuotedString [Ascii 104 Refl, Ascii 116 Refl,
  Ascii 116 Refl, Ascii 112 Refl, Ascii 58 Refl,
  Ascii 47 Refl, Ascii 47 Refl, Ascii 97 Refl, Ascii 98 Refl,
  Ascii 99 Refl, Ascii 46 Refl, Ascii 99 Refl, Ascii 111 Refl,
  Ascii 109 Refl, Ascii 47 Refl, Ascii 100 Refl,
  Ascii 105 Refl, Ascii 115 Refl, Ascii 112 Refl,
  Ascii 108 Refl, Ascii 97 Refl, Ascii 121 Refl,
  Ascii 45 Refl, Ascii 116 Refl, Ascii 121 Refl,
  Ascii 112 Refl, Ascii 101 Refl, Ascii 115 Refl,
  Ascii 47 Refl, Ascii 102 Refl, Ascii 117 Refl,
  Ascii 110 Refl, Ascii 107 Refl, Ascii 121 Refl,
  Ascii 46 Refl, Ascii 104 Refl, Ascii 116 Refl,
  Ascii 109 Refl, Ascii 108 Refl])) []]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Equality of Octet-String</name>
<section>
<name>Analysis</name>
<t>
Section 4.7 of the original s-expr document states:
</t>
<ul empty='true'>
<li><sourcecode><![CDATA[Two octet strings are considered to be "equal" if and only if they
have the same display hint and the same data octet strings.

Note that octet-strings are "case-sensitive"; the octet-string \
\"abc"
is not equal to the octet-string "ABC".

An untyped octet-string can be compared to another octet-string \
\(typed
or not) by considering it as a typed octet-string with the default
mime-type.]]></sourcecode></li>
</ul>
<t>
The term "octet string" here is incorrect as it is described as the combination of a display hint and a "data octet strings", the latter being actually an "octet string representation".
</t>
<t>
Consequently the terms "equal" or "equality" are incorrect, and the terms "equivalent" or "equivalences" should be used instead.
Here the term "equivalent" means "carrying the same information", i.e. the same octet-string.
Two octet-string representations can be equivalent, but not equal, e.g, the token abc and the quoted-string "abc" are equivalent but not equal.
</t>
<t>
The same reasoning is applied when comparing typed octet-string representations, or a typed octet-string representation with an untyped octet-string representation.
</t>
</section>
<section>
<name>Formalization</name>
<t>
We first define a type that carries either a typed or an untyped octet-string representation:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Element : Type where
  Untyped : Representation _ -> Element
  Typed : Representation _ -> Representation _ -> Element]]></sourcecode></li>
</ul>
<t>
Then we define the type alias <tt>Equivalence</tt> as a relation between two elements.
<tt>Equivalence</tt> is already declared in the standard library, so we have to hide that declaration first:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[%hide Control.Relation.Equivalence

Equivalence : Element -> Element -> Type
Equivalence (Untyped x) (Untyped x') =
  octetString x === octetString x'
Equivalence (Untyped x) (Typed h x') =
  (octetString defaultHint === octetString h,
  octetString x === octetString x')
Equivalence (Typed h x) (Untyped x') =
  (octetString h === octetString defaultHint,
  octetString x === octetString x')
Equivalence (Typed h x) (Typed h' x') =
  (octetString h === octetString h',
  octetString x === octetString x')]]></sourcecode></li>
</ul>
</section>
<section>
<name>Validation</name>
<t>
Here we prove that a subset of the examples in section 1 of the original document are equivalent.
Proving the other equivalences is trivial:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

An octet-string is a finite sequence of eight-bit octets.  There /
/may be
many different but equivalent ways of representing an \
\octet-string

        abc             -- as a token

        "abc"           -- as a quoted string

        #616263#        -- as a hexadecimal string

        3:abc           -- as a length-prefixed "verbatim" \
\encoding

        {MzphYmM=}      -- as a base-64 encoding of the verbatim \
\encoding
                           (that is, an encoding of "3:abc")

        |YWJj|          -- as a base-64 encoding of the \
\octet-string "abc"

These encodings are all equivalent; they all denote the same \
\octet string.]]></artwork></li>
</ul>
<t>
We first proves that the three first representations are correct:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[abcToken : Representation [97, 98, 99]
abcToken = RepresentationToken (MkToken 97 Refl
  [MkTokenChar 98 Refl, MkTokenChar 99 Refl])

abcQuoted : Representation [34, 97, 98, 99, 34]
abcQuoted = RepresentationQuoted (MkQuotedString
  [Ascii 97 Refl, Ascii 98 Refl, Ascii 99 Refl])

abcHex : Representation [35, 54, 49, 54, 50, 54, 51, 35]
abcHex = RepresentationHexadecimal (MkHexadecimal []
  [HexLL' 97 [] [], HexLL' 98 [] [], HexLL' 99 [] []])]]></sourcecode></li>
</ul>
<t>
We can then prove that abc is equivalent to "abc", and that "abc" is equivalent to #616263#:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testEq1 : Equivalence (Untyped Main.abcToken)
  (Untyped Main.abcQuoted)
testEq1 = Refl

testEq2 : Equivalence (Untyped Main.abcQuoted)
  (Untyped Main.abcHex)
testEq2 = Refl]]></sourcecode></li>
</ul>
<t>
By transitivity we can then prove that abc is equivalent to #616263#:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testEq3 : Equivalence (Untyped Main.abcToken)
  (Untyped Main.abcHex)
testEq3 = trans testEq1 testEq2]]></sourcecode></li>
</ul>
<t>
We can also use symmetry to prove that if a first octet-string representation is equivalent to a second octet-string representation, then the second is also equivalent to the first one.
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testEq4 : Equivalence (Untyped Main.abcHex)
  (Untyped Main.abcToken)
testEq4 = sym testEq3]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Lists</name>
<section>
<name>Analysis</name>
<t>
Section 5 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

Just as with octet-strings, there are several ways to represent an
S-expression.  Whitespace may be used to separate list elements, \
\but
they are only required to separate two octet strings when \
\otherwise
the two octet strings might be interpreted as one, as when one \
\token
follows another.  Also, whitespace may follow the initial left
parenthesis, or precede the final right parenthesis.]]></artwork></li>
</ul>
<t>
The first sentence should say that there are different ways to represent a list.
</t>
<t>
But the issue is really that in some cases the separation between some representations of an octet-string is ambiguous.
The actual rules for mandatory separation are:
</t>
<ul>
<li>a token must be separated from a quoted-string, hexadecimal, or base 64 representation that is prefixed with the length</li><li>a token must be separated from the next token</li><li>a token must be separated from the next verbatim representation</li>
</ul>
<t>
Additionally section 2 states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

A list is a finite sequence of zero or more simpler \
\S-expressions.  A list
may be represented by using parentheses to surround the sequence \
\of encodings
of its elements, as in:

        (abc (de #6667#) "ghi jkl")]]></artwork></li>
</ul>
<t>
Parentheses are not optional when representing a list, so "may be" should be "are".
</t>
</section>
<section>
<name>Formalization</name>
<t>
To represent the various ways to separate representations we need four mutually inductive types, that we first declare as abstract types:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data TokenList : List Bits8 -> Type
data SeparateList : List Bits8 -> Type
data OtherList : List Bits8 -> Type
data Lists : List Bits8 -> Type]]></sourcecode></li>
</ul>
<t>
<tt>TokenList</tt> is the type of a list of octet-string representations that starts with a token:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data TokenList : List Bits8 -> Type where
  TokenNil : Token xs -> TokenList xs
  TokenConsToken : Token xs -> Whitespace y ->
    WhitespaceList ys -> TokenList zs ->
    TokenList (xs ++ (y :: ys) ++ zs)
  TokenConsSeparate : Token xs -> Whitespace y ->
    WhitespaceList ys -> SeparateList zs ->
    TokenList (xs ++ (y :: ys) ++ zs)
  TokenConsOther : Token xs -> WhitespaceList ys ->
    OtherList zs -> TokenList (xs ++ ys ++ zs)]]></sourcecode></li>
</ul>
<t>
<tt>SeparateList</tt> is the type of a list of octet-string representations that starts with an octet-string representation that when inserted after a token will require it to be separated:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data SeparateList : List Bits8 -> Type where
  SeparateVerbatim : Verbatim xs -> SeparateList xs
  SeparateVerbatimToken : Verbatim xs -> WhitespaceList ys ->
   TokenList zs -> SeparateList (xs ++ ys ++ zs)
  SeparateVerbatimSeparate : Verbatim xs ->
    WhitespaceList ys -> SeparateList zs ->
    SeparateList (xs ++ ys ++ zs)
  SeparateVerbatimOther : Verbatim xs -> WhitespaceList ys ->
    OtherList zs -> SeparateList (xs ++ ys ++ zs)
  SeparateQuotedStringLength : QuotedStringLength xs ->
    SeparateList xs
  SeparateQuotedStringLengthToken : QuotedStringLength xs ->
    WhitespaceList ys -> TokenList zs ->
    SeparateList (xs ++ ys ++ zs)
  SeparateQuotedStringLengthSeparate : QuotedStringLength xs ->
    WhitespaceList ys -> SeparateList zs ->
    SeparateList (xs ++ ys ++ zs)
  SeparateQuotedStringLengthOther : QuotedStringLength xs ->
    WhitespaceList ys -> OtherList zs ->
    SeparateList (xs ++ ys ++ zs)
  SeparateHexadecimal : HexadecimalLength xs ->
    SeparateList xs
  SeparateHexadecimalLengthToken : HexadecimalLength xs ->
    WhitespaceList ys -> TokenList zs ->
    SeparateList (xs ++ ys ++ zs)
  SeparateHexadecimalLengthSeparate : HexadecimalLength xs ->
    WhitespaceList ys -> SeparateList zs ->
    SeparateList (xs ++ ys ++ zs)
  SeparateHexadecimalLengthOther : HexadecimalLength xs ->
    WhitespaceList ys -> OtherList zs ->
    SeparateList (xs ++ ys ++ zs)
  SeparateBase64 : Base64Length xs ->
    SeparateList xs
  SeparateBase64LengthToken : Base64Length xs ->
    WhitespaceList ys -> TokenList zs ->
    SeparateList (xs ++ ys ++ zs)
  SeparateBase64LengthSeparate : Base64Length xs ->
    WhitespaceList ys -> SeparateList zs ->
    SeparateList (xs ++ ys ++ zs)
  SeparateBase64LengthOther : Base64Length xs ->
    WhitespaceList ys -> OtherList zs ->
    SeparateList (xs ++ ys ++ zs)]]></sourcecode></li>
</ul>
<t>
<tt>OtherList</tt> is the type of a list of octet-string representations that starts with an octet-string representations that when inserted after a token will not require it to be separated:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data OtherList : List Bits8 -> Type where
  OtherQuotedString : QuotedString xs -> OtherList xs
  OtherQuotedStringToken : QuotedString xs ->
   WhitespaceList ys -> TokenList zs ->
   OtherList (xs ++ ys ++ zs)
  OtherQuotedStringSeparate : QuotedString xs ->
    WhitespaceList ys -> SeparateList zs ->
    OtherList (xs ++ ys ++ zs)
  OtherQuotedStringOther : QuotedString xs ->
    WhitespaceList ys -> OtherList zs ->
    OtherList (xs ++ ys ++ zs)
  OtherHexadecimal : Hexadecimal xs -> OtherList xs
  OtherHexadecimalToken : Hexadecimal xs ->
   WhitespaceList ys -> TokenList zs ->
   OtherList (xs ++ ys ++ zs)
  OtherHexadecimalSeparate : Hexadecimal xs ->
    WhitespaceList ys -> SeparateList zs ->
    OtherList (xs ++ ys ++ zs)
  OtherHexadecimalOther : Hexadecimal xs ->
    WhitespaceList ys -> OtherList zs ->
    OtherList (xs ++ ys ++ zs)
  OtherBase64 : Base64 xs -> OtherList xs
  OtherBase64Token : Base64 xs -> WhitespaceList ys ->
   TokenList zs -> OtherList (xs ++ ys ++ zs)
  OtherBase64Separate : Base64 xs ->
    WhitespaceList ys -> SeparateList zs ->
    OtherList (xs ++ ys ++ zs)
  OtherBase64Other : Base64 xs -> WhitespaceList ys ->
    OtherList zs -> OtherList (xs ++ ys ++ zs)
  OtherHint : WithHint xs -> OtherList xs
  OtherHintToken : WithHint xs ->
   WhitespaceList ys -> TokenList zs ->
   OtherList (xs ++ ys ++ zs)
  OtherHintSeparate : WithHint xs ->
    WhitespaceList ys -> SeparateList zs ->
    OtherList (xs ++ ys ++ zs)
  OtherHintOther : WithHint xs ->
    WhitespaceList ys -> OtherList zs ->
    OtherList (xs ++ ys ++ zs)
  OtherLists : Lists xs -> OtherList xs
  OtherListsToken : Lists xs ->
   WhitespaceList ys -> TokenList zs ->
   OtherList (xs ++ ys ++ zs)
  OtherListsSeparate : Lists xs ->
    WhitespaceList ys -> SeparateList zs ->
    OtherList (xs ++ ys ++ zs)
  OtherListsOther : Lists xs ->
    WhitespaceList ys -> OtherList zs ->
    OtherList (xs ++ ys ++ zs)]]></sourcecode></li>
</ul>
<t>
And finally the <tt>Lists</tt> type groups all the possible lists in a s-expr.
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Lists : List Bits8 -> Type where
  ListsTokenList : WhitespaceList xs -> TokenList ys ->
    WhitespaceList zs -> Lists (40 :: xs ++ ys ++ zs ++ [41])
  ListsSeparateList : WhitespaceList xs -> SeparateList ys ->
    WhitespaceList zs -> Lists (40 :: xs ++ ys ++ zs ++ [41])
  ListsOtherList : WhitespaceList xs -> OtherList ys ->
    WhitespaceList zs -> Lists (40 :: xs ++ ys ++ zs ++ [41])
  ListsEmptyList : WhitespaceList xs ->
    Lists (40 :: xs ++ [41])]]></sourcecode></li>
</ul>
</section>
<section>
<name>Validation</name>
<t>
Here we prove that all the examples in section 5 of the original document except the last one are valid instances of the <tt>Lists</tt> type:
</t>
<ul empty='true'>
<li><artwork><![CDATA[Here are some examples of encodings of lists:

        (a b c)

        ( a ( b c ) ( ( d e ) ( e f ) )  )

        (11:certificate(6:issuer3:bob)(7:subject5:alice))

        ({3Rt=} "1997" murphy 3:{XC++})]]></artwork></li>
</ul>
<ul>
<li><t>(a b c)</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testLists1 : Lists [40, 97, 32, 98, 32, 99, 41]
testLists1 = ListsTokenList []
  (TokenConsToken (MkToken 97 Refl []) (MkWhitespace 32 Refl)
  [] (TokenConsToken (MkToken 98 Refl [])
  (MkWhitespace 32 Refl) [] (TokenNil (MkToken 99 Refl []))))
  []]]></sourcecode></li><li><t>( a ( b c ) ( ( d e ) ( e f ) )  )</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testLists2 : Lists [40, 32, 97, 32, 40, 32, 98, 32, 99, 32, 41,
  32, 40, 32, 40, 32, 100, 32, 101, 32, 41, 32, 40, 32, 101,
  32, 102, 32, 41, 32, 41, 32, 32, 41]
testLists2 = ListsTokenList[MkWhitespace 32 Refl]
  (TokenConsOther (MkToken 97 Refl []) [MkWhitespace 32 Refl]
  (OtherListsOther (ListsTokenList [MkWhitespace 32 Refl]
  (TokenConsToken (MkToken 98 Refl []) (MkWhitespace 32 Refl)
  [] (TokenNil (MkToken 99 Refl []))) [MkWhitespace 32 Refl])
  [MkWhitespace 32 Refl] (OtherLists (ListsOtherList
  [MkWhitespace 32 Refl] (OtherListsOther (ListsTokenList
  [MkWhitespace 32 Refl] (TokenConsToken (MkToken 100 Refl [])
  (MkWhitespace 32 Refl) [] (TokenNil (MkToken 101 Refl [])))
  [MkWhitespace 32 Refl]) [MkWhitespace 32 Refl] (OtherLists
  (ListsTokenList [MkWhitespace 32 Refl] (TokenConsToken
  (MkToken 101 Refl []) (MkWhitespace 32 Refl) [] (TokenNil
  (MkToken 102 Refl []))) [MkWhitespace 32 Refl])))
  [MkWhitespace 32 Refl])))) [MkWhitespace 32 Refl,
  MkWhitespace 32 Refl]]]></sourcecode></li><li><t>(11:certificate(6:issuer3:bob)(7:subject5:alice))</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testLists3 : Lists [40, 49, 49, 58, 99, 101, 114, 116, 105,
  102, 105, 99, 97, 116, 101, 40, 54, 58, 105, 115, 115,
  117, 101, 114, 51, 58, 98, 111, 98, 41, 40, 55, 58, 115,
  117, 98, 106, 101, 99, 116, 53, 58, 97, 108, 105, 99,
  101, 41, 41]
testLists3 = ListsSeparateList [] (SeparateVerbatimOther
  (MkVerbatim [99, 101, 114, 116, 105, 102, 105, 99, 97,
  116, 101]) [] (OtherListsOther (ListsSeparateList []
  (SeparateVerbatimSeparate (MkVerbatim [105, 115, 115,
  117, 101, 114]) [] (SeparateVerbatim (MkVerbatim [98,
  111, 98]))) []) [] (OtherLists (ListsSeparateList []
  (SeparateVerbatimSeparate (MkVerbatim [115, 117, 98,
  106, 101, 99, 116]) [] (SeparateVerbatim (MkVerbatim
  [97, 108, 105, 99, 101]))) [])))) []]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Advanced S-Expr Transport</name>
<section>
<name>Analysis</name>
<t>
Section 6.3 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

The "advanced transport" representation is intended to provide
\more
flexible and readable notations for documentation, design, \
\debugging,
and (in some cases) user interface.

The advanced transport representation allows all of the \
\representation
forms described above, include quoted strings, base-64 and \
\hexadecimal
representation of strings, tokens, representations of strings with
omitted lengths, and so on.]]></artwork></li>
</ul>
<t>
Because this transport is aimed at users, we also permit to add white spaces before and after a s-expr.
</t>
</section>
<section>
<name>Formalization</name>
<t>
<tt>SExpr</tt> is the type of advanced transport for valid s-expr:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data SExpr : List Bits8 -> Type where
  SExprRepresentation : WhitespaceList xs ->
    Representation ys -> WhitespaceList zs ->
    SExpr (xs ++ ys ++ zs)
  SExprWithHint : WhitespaceList xs -> WithHint ys ->
    WhitespaceList zs -> SExpr (xs ++ ys ++ zs)
  SExprList : WhitespaceList xs -> Lists ys ->
    WhitespaceList zs -> SExpr (xs ++ ys ++ zs)]]></sourcecode></li>
</ul>
</section>
<section>
<name>Validation</name>
<t>
Here we prove that the example in section 5 of the original document is a valid instance of the <tt>SExpr</tt> type:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

A list is a finite sequence of zero or more simpler \
\S-expressions.  A list
may be represented by using parentheses to surround the \
\sequence of encodings
of its elements, as in:

        (abc (de #6667#) "ghi jkl")]]></artwork></li>
</ul>
<ul>
<li><t>(abc (de #6667#) "ghi jkl")</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testSExpr1 : SExpr [40, 97, 98, 99, 32, 40, 100, 101, 32, 35,
  54, 54, 54, 55, 35, 41, 32, 34, 103, 104, 105, 32, 106, 107,
  108, 34, 41]
testSExpr1 = SExprList [] (ListsTokenList [] (TokenConsOther
  (MkToken 97 Refl [MkTokenChar 98 Refl, MkTokenChar 99 Refl])
  [MkWhitespace 32 Refl] (OtherListsOther (ListsTokenList []
  (TokenConsOther (MkToken 100 Refl [MkTokenChar 101 Refl])
  [MkWhitespace 32 Refl] (OtherHexadecimal (MkHexadecimal []
  [HexLL' 102 [] [], HexLL' 103 [][]]))) [])
  [MkWhitespace 32 Refl] (OtherQuotedString (MkQuotedString
  [Ascii 103 Refl, Ascii 104 Refl, Ascii 105 Refl,
  Ascii 32 Refl, Ascii 106 Refl, Ascii 107 Refl,
  Ascii 108 Refl])))) []) []]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Canonical S-Expr Transport</name>
<section>
<name>Analysis</name>
<t>
Section 6.1 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

This canonical representation is used for digital signature \
\purposes,
transmission, etc.  It is uniquely defined for each \
\S-expression.  It
is not particularly readable, but that is not the point.  \
\It is
intended to be very easy to parse, to be reasonably economical, \
\and to
be unique for any S-expression.

The "canonical" form of an S-expression represents each \
\octet-string
in verbatim mode, and represents each list with no blanks \
\separating
elements from each other or from the surrounding parentheses.]]></artwork></li>
</ul>
</section>
<section>
<name>Formalization</name>
<t>
The canonical transport is actually a profile of the advanced transport, so we can reuse our previous types:
</t>
<t>
First we declare an abstract type for the canonical s-expr, as it is an inductive type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data CanonicalSExpr : List Bits8 -> Type]]></sourcecode></li>
</ul>
<t>
Then a type for a list of canonical s-expr:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data CanonicalSExprList : List Bits8 -> Type where
  Nil : CanonicalSExprList []
  (::) : CanonicalSExpr xs -> CanonicalSExprList ys ->
    CanonicalSExprList (xs ++ ys)]]></sourcecode></li>
</ul>
<t>
And finally our concrete type for a canonical s-expr:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data CanonicalSExpr : List Bits8 -> Type where
  MkCanonical : Verbatim xs -> CanonicalSExpr xs
  MkCanonicalHint : Verbatim xs -> Verbatim ys ->
    CanonicalSExpr (91 :: xs ++ [93] ++ ys)
  MkCanonicalList : CanonicalSExprList xs ->
    CanonicalSExpr (40 :: xs ++ [41])]]></sourcecode></li>
</ul>
</section>
<section>
<name>Validation</name>
<t>
Here we prove that all the examples in section 6.1 of the original document are valid instances of the <tt>Canonical</tt> type:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

Here are some examples of canonical representations of \
\S-expressions:

        (6:issuer3:bob)

        (4:icon[12:image/bitmap]9:xxxxxxxxx)

        (7:subject(3:ref5:alice6:mother))]]></artwork></li>
</ul>
<ul>
<li><t>(6:issuer3:bob)</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testCanonical1 : CanonicalSExpr [40, 54, 58, 105, 115, 115,
  117, 101, 114, 51, 58, 98, 111, 98, 41]
testCanonical1 = MkCanonicalList [MkCanonical
  (MkVerbatim [105, 115, 115, 117, 101, 114]),
  MkCanonical (MkVerbatim [98, 111, 98])]]]></sourcecode></li><li><t>(4:icon[12:image/bitmap]9:xxxxxxxxx)</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testCanonical2 : CanonicalSExpr [40, 52, 58, 105, 99, 111, 110,
  91, 49, 50, 58, 105, 109, 97, 103, 101, 47, 98, 105, 116,
  109, 97, 112, 93, 57, 58, 120, 120, 120, 120, 120, 120,
  120, 120, 120, 41]
testCanonical2 = MkCanonicalList [MkCanonical
  (MkVerbatim [105, 99, 111, 110]), MkCanonicalHint
  (MkVerbatim [105, 109, 97, 103, 101, 47, 98, 105, 116,
  109, 97, 112]) (MkVerbatim [120, 120, 120, 120, 120, 120,
  120, 120, 120])]]]></sourcecode></li><li><t>(7:subject(3:ref5:alice6:mother))</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testCanonical3 : CanonicalSExpr [40, 55, 58, 115, 117, 98, 106,
  101, 99, 116, 40, 51, 58, 114, 101, 102, 53, 58, 97, 108,
  105, 99, 101, 54, 58, 109, 111, 116, 104, 101, 114, 41, 41]
testCanonical3 = MkCanonicalList [MkCanonical
  (MkVerbatim [115, 117, 98, 106, 101, 99, 116]),
  MkCanonicalList [MkCanonical (MkVerbatim [114, 101, 102]),
  MkCanonical (MkVerbatim [97, 108, 105, 99, 101]),
  MkCanonical (MkVerbatim [109, 111, 116, 104, 101, 114])]]]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Basic S-Expr Transport</name>
<section>
<name>Analysis</name>
<t>
Section 6.2 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

There are two forms of the "basic transport" representation:

        -- the canonical representation

        -- an rfc-2045 base-64 representation of the canonical \
\representation,
           surrounded by braces.

The transport mechanism is intended to provide a universal means \
\of
representing S-expressions for transport from one machine to \
\another.]]></artwork></li>
</ul>
<t>
There is no possible BNF that is sound for a base 64 representation of an underlying s-expr.
</t>
</section>
<section>
<name>Formalization</name>
<t>
The basic transport is also a profile of the advanced transport, so we can reuse some previous types:
</t>
<t>
We first redefine <tt>Base64Full</tt> without white spaces:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data BasicBase64Full : List Bits8 -> Type where
  MkBasicBase64Full : (x1 : Bits8) -> (x2 : Bits8) ->
    (x3 : Bits8) ->
    BasicBase64Full (b641 x1 ++ b642 x1 x2 ++ b643 x2 x3 ++
      b644 x3s)]]></sourcecode></li>
</ul>
<t>
Then a list of these:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[namespace BasicBase64
  public export
  data BasicBase64List : List Bits8 -> Type where
    Nil : BasicBase64List []
    (::) : BasicBase64Full xs -> BasicBase64List ys ->
      BasicBase64List (xs ++ ys)]]></sourcecode></li>
</ul>
<t>
And a type for a base 64 encoding for lengths that are not a multiple of 3:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data BasicBase64End : List Bits8 -> Type where
  BasicEndOnePadPad : (x1 : Bits8) ->
    BasicBase64End (b641 x1 ++ b642 x1 0 ++ [61, 61])
  BasicEndOnePad : (x1 : Bits8) ->
    BasicBase64End (b641 x1 ++ b642 x1 0 ++ [61])
  BasicEndOne : (x1 : Bits8) ->
    BasicBase64End (b641 x1 ++ b642 x1 0)
  BasicEndTwoPad : (x1 : Bits8) -> (x2 : Bits8) ->
    BasicBase64End (b641 x1 ++ b642 x1 x2 ++ b643 x2 0 ++ [61])
  BasicEndTwo : (x1 : Bits8) -> (x2 : Bits8) ->
    BasicBase64End (b641 x1 ++ b642 x1 x2 ++ b643 x2 0)]]></sourcecode></li>
</ul>
<t>
And a basic base 64 type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data BasicBase64 : List Bits8 -> Type where
  BasicBase64Mult3 : BasicBase64List xs -> BasicBase64 xs
  BasicBase64Non : BasicBase64List xs -> BasicBase64End ys ->
    BasicBase64 (xs ++ ys)]]></sourcecode></li>
</ul>
<t>
Then we need to define three base64 encoding functions, one for each variant:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[base64 : List Bits8 -> List Bits8
base64 [] = []
base64 [x1] = b641 x1 ++ b642 x1 0 ++ [61, 61]
base64 [x1, x2] = b641 x1 ++ b642 x1 x2 ++ b643 x2 0 ++ [61]
base64 (x1 :: x2 :: x3 :: xs) = b641 x1 ++ b642 x1 x2 ++
  b643 x2 x3 ++ b644 x3 ++ base64 xs]]></sourcecode></li>
</ul>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[base64OnePad : List Bits8 -> List Bits8
base64OnePad [] = []
base64OnePad [x1] = b641 x1 ++ b642 x1 0 ++ [61]
base64OnePad [x1, x2] = b641 x1 ++ b642 x1 x2 ++ b643 x2 0
base64OnePad (x1 :: x2 :: x3 :: xs) = b641 x1 ++ b642 x1 x2 ++
  b643 x2 x3 ++ b644 x3 ++ base64OnePad xs]]></sourcecode></li>
</ul>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[base64NoPad : List Bits8 -> List Bits8
base64NoPad [] = []
base64NoPad [x1] = b641 x1 ++ b642 x1 0
base64NoPad [x1, x2] = b641 x1 ++ b642 x1 x2 ++ b643 x2 0
base64NoPad (x1 :: x2 :: x3 :: xs) = b641 x1 ++ b642 x1 x2 ++
  b643 x2 x3 ++ b644 x3 ++ base64NoPad xs]]></sourcecode></li>
</ul>
<t>
And finally our type for a brace notation for base 64:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data BasicSExpr : List Bits8 -> Type where
  MkBasicCanonical : CanonicalSExpr xs -> BasicSExpr xs
  MkBasicBase64 : CanonicalSExpr xs -> BasicBase64 ys ->
   (prf : (base64 xs == ys) === True) ->
   BasicSExpr (123 :: ys ++ [123])
  MkBasicBase64OnePad : CanonicalSExpr xs -> BasicBase64 ys ->
   (prf : (base64OnePad xs == ys) === True) ->
   BasicSExpr (123 :: ys ++ [123])
  MkBasicBase64NoPad : CanonicalSExpr xs -> BasicBase64 ys ->
    (prf : (base64NoPad xs == ys) === True) ->
    BasicSExpr (123 :: ys ++ [123])]]></sourcecode></li>
</ul>
</section>
<section>
<name>Validation</name>
<t>
Here we prove that the first example in section 6.2 of the original document is a valid instance of the <tt>Basic</tt> type:
</t>
<ul empty='true'>
<li><artwork><![CDATA[Here are some examples of an S-expression represented in basic
transport mode:

        (1:a1:b1:c)

        {KDE6YTE6YjE6YykA}

                (this is the same S-expression encoded in base-64)]]></artwork></li>
</ul>
<ul>
<li><t>(1:a1:b1:c)</t><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testBasic1 : BasicSExpr [40, 49, 58, 97, 49, 58, 98, 49, 58,
  99, 41]
testBasic1 = MkBasicCanonical (MkCanonicalList [MkCanonical
  (MkVerbatim [97]), MkCanonical (MkVerbatim [98]), MkCanonical
  (MkVerbatim [99])])]]></sourcecode></li>
</ul>
</section>
</section>
<section>
<name>Array-Layout</name>
<section>
<name>Analysis</name>
<t>
Section 8.2 of the original s-expr document states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

Here each S-expression is represented as a contiguous array of \
\bytes.
The first byte codes the "type" of the S-expression:

        01      octet-string

        02      octet-string with display-hint

        03      beginning of list (and 00 is used for "end of
\list")

Each of the three types is immediately followed by a k-byte \
\integer
indicating the size (in bytes) of the following representation.\
\  Here
k is an integer that depends on the implementation, it might be
anywhere from 2 to 8, but would be fixed for a given \
\implementation;
it determines the size of the objects that can be handled.  The \
\transport
and canonical representations are independent of the choice of \
\k made by
the implementation.

Although the length of lists are not given in the usual \
\S-expression
notations, it is easy to fill them in when parsing; when you \
\reach a
right-parenthesis you know how long the list representation \
\was, and
where to go back to fill in the missing length.]]></artwork></li>
</ul>
<t>
The endianness of the length field is not specified, so we assume that both little and big endianness can be used.
</t>
<t>
Furthermore section 8.2.1 states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[This is represented as follows:

        01 <length> <octet-string>]]></artwork></li>
</ul>
<t>
Section 8.2.2 states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[This is represented as follows:

        02 <length>
          01 <length> <octet-string>    /* for display-type */
          01 <length> <octet-string>    /* for octet-string */]]></artwork></li>
</ul>
<t>
And section 8.2.3 states:
</t>
<ul empty='true'>
<li><artwork><![CDATA[This is represented as

        03 <length> <item1> <item2> <item3> ... <itemn> 00]]></artwork></li>
</ul>
</section>
<section>
<name>Formalization</name>
<t>
First we define a type for the endianness of the length:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data Endianness = Big | Little]]></sourcecode></li>
</ul>
<t>
Then a function that converts a natural number into a memory representation of a specified endianness and length:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[convert' : Nat -> Nat -> List Bits8
convert' 0 _ = []
convert' (S k) n =
  let (d, m) = divmodNatNZ n 256 SIsNonZero
  in cast m :: convert' k d

convert : Endianness -> Nat -> Nat -> List Bits8
convert Big k j = convert' k j
convert Little k j = reverse (convert' k j)]]></sourcecode></li>
</ul>
<t>
Then we define a type for the array representation of an octet-string:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data ArrayOctetString :
  Endianness -> Nat -> List Bits8 -> Type where
  MkArrayOctetString : (xs : List Bits8) ->
   ArrayOctetString e l (1 :: convert e l (length xs) ++ xs)]]></sourcecode></li>
</ul>
<t>
Then for an octet-string with display-hint:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data ArrayWithHint : Endianness -> Nat -> List Bits8 ->
  Type where
  MkArrayWithHint : ArrayOctetString e l xs ->
    ArrayOctetString e l ys ->
    ArrayWithHint e l (2 :: convert e l (length xs +
    length ys) ++ xs ++ ys)]]></sourcecode></li>
</ul>
<t>
As usual an abstract type for an inductive type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data ArraySExpr : Endianness -> Nat -> List Bits8 -> Type]]></sourcecode></li>
</ul>
<t>
Then a list of memory array:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[namespace Array
  public export
  data ArrayList : Endianness -> Nat -> List Bits8 ->
    Type where
    Nil : ArrayList e l []
    (::) : ArraySExpr e l xs -> ArrayList e l ys ->
      ArrayList e l (xs ++ ys)]]></sourcecode></li>
</ul>
<t>
And finally the array memory type:
</t>
<ul empty='true'>
<li><sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[data ArraySExpr : Endianness -> Nat -> List Bits8 -> Type where
  ArraySExprOctetString : ArrayOctetString e l xs ->
    ArraySExpr e l xs
  ArraySExprWithHint : ArrayWithHint e l xs ->
    ArraySExpr e l xs
  ArraySExprList : ArrayList e l xs ->
    ArraySExpr e l (3 :: convert e l (1 + length xs) ++
    xs ++ [0])]]></sourcecode></li>
</ul>
</section>
<section>
<name>Verification</name>
<t>
Here we prove that all the examples in section 8.2 of the original document are valid instances of the <tt>ArraySExpr</tt> type:
</t>
<ul>
<li><t>abc</t><artwork><![CDATA[For example (here k = 2)

        01 0003 a b c]]></artwork>
<sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testArray1 : ArraySExpr Little 2 [1, 0, 3, 97, 98, 99]
testArray1 = ArraySExprOctetString
  (MkArrayOctetString [97, 98, 99])]]></sourcecode></li><li><t>[gif] #61626364#</t><artwork><![CDATA[For example, the S-expression

        [gif] #61626364#

would be represented as (with k = 2)

         02 000d
           01 0003  g  i  f
           01 0004 61 62 63 64]]></artwork>
<sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testArray2 : ArraySExpr Little 2 [2, 0, 13, 1, 0, 3, 103,
  105, 102, 1, 0, 4, 97, 98, 99, 100]
testArray2 = ArraySExprWithHint (MkArrayWithHint
  (MkArrayOctetString [103, 105, 102])
  (MkArrayOctetString [97, 98, 99, 100]))]]></sourcecode></li><li><t>(abc [d]ef (g))</t><artwork><![CDATA[NOTE: '\\' line wrapping per RFC 8792

For example, the list (abc [d]ef (g)) is represented in memory \
\as (with k=2)

        03 001b
          01 0003 a b c
          02 0009
            01 0001 d
            01 0002 e f
          03 0005
            01 0001 g
          00
        00]]></artwork>
<sourcecode type='idris2' name='formal-sexpr.idr'><![CDATA[testArray3 : ArraySExpr Little 2 [3, 0, 27, 1, 0, 3, 97, 98,
  99, 2, 0, 9, 1, 0, 1, 100, 1, 0, 2, 101, 102, 3, 0, 5, 1,
  0, 1, 103, 0, 0]
testArray3 = ArraySExprList [ArraySExprOctetString
  (MkArrayOctetString [97, 98, 99]), ArraySExprWithHint
  (MkArrayWithHint (MkArrayOctetString [100])
  (MkArrayOctetString [101, 102])), ArraySExprList
  [ArraySExprOctetString (MkArrayOctetString [103])]]]]></sourcecode></li>
</ul>
</section>
</section>
</section>




</middle>
<back>












<references>
<name>Informative References</name>
<reference anchor='Idris2' target='https://idris2.readthedocs.io/en/latest/'>
  <front>
    <title>Documentation for the Idris 2 Language — Idris2 0.0 documentation</title>
    <author surname='Unknown' />
  </front>
  <refcontent>Accessed 31 January 2023</refcontent>
</reference>
<reference anchor='SPKI-SExpr' target='https://datatracker.ietf.org/doc/draft-rivest-sexp'>
  <front>
    <title>SPKI S-Expressions</title>
    <author initials='R. L.' surname='Rivest' fullname='Ronald L. Rivest' />
    <author initials='D. E.' surname='Eastlake 3rd' fullname='Donald E. Eastlake 3rd' />
    <date year='2024' month='April' day='16' />
    <abstract>
      <t>
This memo specifies a data structure representation that is suitable for representing arbitrary, complex data structures. It was devised in 1996/1997 to support SPKI (RFC 2692) certificates with the intent that it be more widely spplicable and has been used elsewhere. There are many implementations in a variety of languages. Uses of this representation herein are referred to as &quot;S-expressions&quot;. This memo make precise the encodings of these S-expressions: it gives a &quot;canonical form&quot; for them, describes two &quot;transport&quot; representations, and also describe an &quot;advanced&quot; format for display to people.
      </t>
    </abstract>
  </front>
  <seriesInfo name='Internet-Draft' value='draft-rivest-sexp'/>
</reference>
<reference anchor='ComputerateSpecification' target='https://datatracker.ietf.org/doc/draft-petithuguenin-computerate-specification'>
  <front>
    <title>Computerate Specification</title>
    <author initials='M.' surname='Petit-Huguenin' fullname='Marc Petit-Huguenin' />
    <date year='2024' month='February' day='3' />
    <abstract>
      <t>
This document specifies computerate specifications, which are the combination of a formal and an informal specification such as parts of the informal specification are generated from the formal specification.
      </t>
    </abstract>
  </front>
  <seriesInfo name='Internet-Draft' value='draft-petithuguenin-computerate-specification'/>
</reference>
<reference anchor='RFC8792' target='https://www.rfc-editor.org/info/rfc8792'>
  <front>
    <title>Handling Long Lines in Content of Internet-Drafts and RFCs</title>
    <author initials='K.' surname='Watsen' fullname='K. Watsen' />
    <author initials='E.' surname='Auerswald' fullname='E. Auerswald' />
    <author initials='A.' surname='Farrel' fullname='A. Farrel' />
    <author initials='Q.' surname='Wu' fullname='Q. Wu' />
    <date year='2020' month='June' />
    <abstract>
      <t>
This document defines two strategies for handling long lines in width-bounded text content.  One strategy, called the &quot;single backslash&quot; strategy, is based on the historical use of a single backslash (&apos;\&apos;) character to indicate where line-folding has occurred, with the continuation occurring with the first character that is not a space character (&apos; &apos;) on the next line.  The second strategy, called the &quot;double backslash&quot; strategy, extends the first strategy by adding a second backslash character to identify where the continuation begins and is thereby able to handle cases not supported by the first strategy.  Both strategies use a self-describing header enabling automated reconstitution of the original content.
      </t>
    </abstract>
  </front>
  <seriesInfo name='RFC' value='8792'/>
  <seriesInfo name='DOI' value='10.17487/RFC8792' />
</reference>

</references>
<section anchor='extract'>
<name>Code Extraction and Verification</name>
<t>
To verify that the proofs in this document are correct, the first step is to install <xref target='Idris2' />.
</t>
<t>
Then the various Idris2 fragments in this document can be extracted as a complete file by running the following command:
</t>
<ul empty='true'>
<li><sourcecode type='bash'><![CDATA[xmllint --noent --nocdata \
--xpath "//sourcecode[@name='formal-sexpr.idr']/text()" \
draft-petithuguenin-ufmrg-formal-sexpr-05.xml \
| sed "s/&lt;/</g; s/&gt;/>/g; s/amp;//g" >formal-sexpr.idr
]]></sourcecode></li>
</ul>
<t>
And finally the proofs can be validated by using the following command
</t>
<ul empty='true'>
<li><sourcecode type='bash'><![CDATA[idris2 -q -c formal-sexpr.idr
]]></sourcecode></li>
</ul>
</section>
<section numbered='false'>
<name>Acknowledgements</name>
<t>
Thanks to
Erik Auerswald
and Stephane Bryant
for the comments, suggestions and questions that helped improve this document.
</t>
<t>
No technology that cannot explain its own results (LLM, AI/ML) have been involved in the creation of this document.
</t>
</section>
<section numbered='false'>
<name>Changelog</name>
<dl>
<dt>Since draft-petithuguenin-ufmrg-formal-sexpr-04: </dt><dd><ul spacing='compact'>
<li>default encoding reverted to "text/plain; charset=iso-8859-1"</li><li>more nits and clarifications</li>
</ul></dd><dt>Since draft-petithuguenin-ufmrg-formal-sexpr-03: </dt><dd><ul spacing='compact'>
<li>more nits and clarifications</li><li>add proof of equivalence using symmetry</li><li>change default display-hint to application/octet-stream</li>
</ul></dd><dt>Since draft-petithuguenin-ufmrg-formal-sexpr-02: </dt><dd><ul spacing='compact'>
<li>fix another instance of incorrect rendering</li><li>reformat some example for clarity</li><li>add "Equality of Octet-String" section</li><li>nits and some clarification</li>
</ul></dd><dt>Since draft-petithuguenin-ufmrg-formal-sexpr-01: </dt><dd><ul spacing='compact'>
<li>incorrect rendering of a string with #</li><li>add RFC 8792 headers</li><li>remove a comment about tokens that was incorrect</li><li>add REPL example to display octet-strings</li>
</ul></dd><dt>Since draft-petithuguenin-ufmrg-formal-sexpr-00: </dt><dd><ul spacing='compact'>
<li>add forgotten proofs for <tt>testList2</tt> and <tt>testList3</tt></li><li>some editing for clarity</li><li>many nit fixes</li>
</ul></dd>
</dl>
</section>
</back>
</rfc>