spec.sup

====================================================================
Sup - The ^Super Unambiguous Unobstrusive Understandable Plain Text^
====================================================================


Objective
=========

Provide an unambiguous and extensible file format with an intuitive markup
that allows the same file to be easily read as plain text or be parsed
and converted in books etc.


The Sup markup consists in the minimum amount of symbols while being the most
unambiguous as possible. It is parsed to a list of blocks of content, and each
block of content is parsed to a list of markup pieces.

This document specification is under GNU-FDL License. See credits at the end
of this document.


Basic markup
============

The styling markups are the following:

* The ````verbatim```` that becomes ``verbatim``
* The ``^^strong^^`` that becomes ^^strong^^
* The ``^emphasis^`` that becomes ^emphasis^

Headings and Titles
===================

They are used to write section titles and delimitate the bigger parts of your
text:

    =============
    Title Level 1
    =============

    Title Level 2
    =============

    Title Level 3
    -------------

    Title Level 4
    ~~~~~~~~~~~~~

Lists
=====

Lists are written prepending ``*`` or ``-`` for unordered lists
and ``1.`` or ``a.`` or ``A.`` for ordered lists. Sublists can be created
simply adding a two space indentation. Blank lines can also be added between
items.

  * First of unordered list
  * Second of unordered list
    1. First of ordered sub list

    2. Second of ordered sub list
      - First of unordered sub sub list
      - Second of unordered sub sub list

Preformated Text
================

Just indent with 4 spaces more than the previous indentation. All subsequent
lines with at least the same indentation will become part of the same
preformated block until it find a smaller indentation.

Code Blocks
===========

They act exactly like preformated text, but give hints about how external
parsers can process its content syntax.

    :: code :: Lua
        function sum(a,b)
          return a+b
        end


Code Signature
==============

Used to create very smaller section headings on the text with preformated text
in a single line. Convenient to create source code documentation about
functions that can be called or represented in many different ways.

    ## string.sub ## lua
      string.sub( str @string, start @number, end @number ) @string
      @string:sub(start @number, end @number) @string

    ## myDatatype ## c
      typedef struct {
        int x;
        char **str;
      } myDatatype;

In the example above, two entries will be created at the document index, one for
``string.sub`` and other for ``myDataType``.


Links
=====

The linking markups are between ``<<`` and ``>>``. The type of link is defined
immediately after the angle braces and can or not be followed by a token or
complementing value inside single backticks `````.
before and after it.

* The ``<<word>>`` links to the anchor heading in the document.
* The ``<<word>>@`http://somewhere.com``` make ``word`` point to an address
outside the document.
* The ``<<<word>>>`` (triple chevrons) points to another document, as in wikis.

The internal reference is made on each heading or <<signature header>>
by tokenization (replacing any ASCII non alphanumerical character by space,
trimming and removing duplicate spaces that are replaced by a single ``-``.
Non ASCII characters are preserved).

Footnotes
=========

Footnotes are marked by a single asterisk (see <<Choices>>) right after a word
or styling marker:

    This is a footnote*, and this also is ^other footnote^*.
    This is a footnote*`foo` with a name.
    This is a numbered footnote*1.

    ** This explains the first footnote.
    ** This explains the second footnote.
    ** `foo` This explains a named footnote.
    ** 1 This explains a numbered footnote.

Glossaries
==========

Glossary references are marked with a hash ``#`` immediattely the word or
expression. To group different words reffering to the same concept, it
can be followed by the target glossary reference between backticks:

    There are many planets# in our Universe. All planets seen nearer of its
    star can seen as it if had moon phases, even a waxing#`crescent` one.

    # `planets` are near spherical objects that not shine like stars and are
    bigger than our moon.
    # `crescent` or waxing is the appearance of a celestial spheric body when
    its brighter hemisphere becomes more apparent.

Glossary can have no explicative notes and be referenced multiple ways to just
create indexes of where in the document such words were used.

Image (draft)
=============

Images are a distinct block where you can put its address with some metadata
if you want to.

    :: image :: Your short description
      src    = https://xxx.yyyy/x.img
      width  =
      height =
      author =
      year   =
      copyright =

Breaks
======

Breaks can be used to disambiguate on list or block ending. It consists in just
adding new line with only four carets ``^^^^`` positionated at the desired
block level. This shouldn't be used as decorative separation lines.


Blocks
======

A block is formed by contiguous non-empty lines.
A simple blank line is sufficient to separate two blocks. If more blank lines
are found they are discared.

The only block not ruled by this is the preformated block.

Custom blocks can also incorporate subsequent blocks with wider indentation,
but this logic is not applied to ordered or unordered lists where each item
is considered a distinct block.

Other block types:

Quotation (draft)

    :: quote ::
      Se essa rua fosse minha
      eu mandava ela calar

      A bagunça da vizinha
      eu não sei se tem pra lá


Tables (draft)
--------------

  :: table :: `My table title`

    # Column A
      # A line 1
      # A line 2
      # A line 3

    # Column B
      # B line 1
      # B line 2
      # C line 3

Bibliography (draft)
--------------------

    :: bib :: JUNG-MDT
      name  = Carl Gustav Jung
      title = Memory, dreams and thoughts.
      year  = 1968
      page  = 250

    :: bib :: FRANZ-SH

      name  = Marie-Louise von Franz
      title = Shadow in Fairy tales
      year  = 1960
      page  = 120

Specifications
==============

ASCII delimiter
---------------

^Sup^ consider only ASCII printable characters in its parsing.
So a file can be written using UTF-8; As UTF-8 contains punctuation marks other
than the found in ASCII, when using them after an inline markup ending, it
should be preceded by an escaped whitespace.


Parsing order
-------------

1. First the text will be split in blocks, that are delimited by a empty lines.
  A single empty line is sufficient for that and the immediately following empty
  lines are discarded.

2. Detect the type of each block and it to the list of blocks with the type
  information.

3. Accordingly to the block type, call the block type parser.
4. Do inline parsing that should parse the block content, when applicable.
5. The default inline parsing should follow this order:
  a. verbatim
  b. links
  c. strong
  d. emphasis


Inline escaping
---------------

- A backslash before any character make it has no special meaning for parsing.

- Escaped whitespace is a space character preceeded by a backslash. It allows
  to force closing inline markups when inside a word. At the end of document
  parsing these escaped spaces are completely removed.

- Inline verbatim disable any escaping, so if a backslash is found inside the
  verbatim chunk, it will be kept as is.


Inline markup boundaries
------------------------

The start delimiter in inline markup is valid when:

- It is at the start of a line;
- It is anteceded by a space character;
- It is anteceded by one of the ASCII charaters:

    ' " ( { [


The end delimiter in inline markup is valid when:

- It is followed by a space;
- It is followed by one of the ASCII characters:

    ' " ) } ] , . : ; ! ?


Verbatim parsing
----------------

- Verbatim starts with a pair of backticks `````` preceded by any valid start
  delimiter.
- Cannot be nested. Its contents are not escaped.
- It only ends when a pair of backticks are found followed by a boundary ending
  as defined on <<Inline markup boundaries>>.


Link parsing
------------

It has a pair of left angle brackets `<<` followed by words and finalized by
another pair of right angle brackets `>>`.

To diferentiate between the kinds of link, a modifier is added immediately after
the right angle bracket. These modifiers are:

- The at sign ``@`` for URLs;
- The asterisk ``*``` for footnotes;
- The hash ``#`` for glossary/vocabulary;
- (draft) The double asterisk ``**`` for bibliographical references;

After the modifier, a single parameter can be passed to the link. This parameter
should be written between single backticks and shouldn't be parsed for more
markup.

- For URLs, the parameter is the target (email, site etc.);

- For footnotes, the parameter use an explicit numbering (instead of the
positional) or an explicit name.

- For glossary, the parameter can be used to link a word variant to another in
the glossary.

- (draft) For bibliographical reference, it should point to the token pointing
to the right bibliographical description.

Choices
=======

* Links: angle brackets (chevrons) in literature are most commonly used to
quote someone, say something in foreign language or insert asides. Just because
this is the main usage in literature, but is rarely used (giving place to
parenthesis or quotation marks) it is well suited to represent in plain text
what should be found elsewhere in document or outside it.

* Asterisks: the main usage of asterisks or stars (or daggers) in printed text
is to mark a place in text where there is an anotation or further information.
Use them for emphasis or strong markup looks strange and distractful to a
reader.

* Carets: as the names says it resembles the original marks added in
"proofreading and typography to indicate that additional material needs to
be inserted at the point indicated in the text"*. As in digital writing it is
rarely used outside technical usage (except in programming or mathematical
representation) it is the perfect sign to indicate where a text is an emphasis
or a strong emphasis.

** https://en.wikipedia.org/wiki/Caret_(proofreading)

Credits
=======

This document is under GNU-FDL license.
Copyright (c) 2023 - Thadeu A. C. de Paula