Skip to content

Instantly share code, notes, and snippets.

@Kroc
Last active August 25, 2023 23:29
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Kroc/6322deed2785ed1a744657a9dd6a1edc to your computer and use it in GitHub Desktop.
Save Kroc/6322deed2785ed1a744657a9dd6a1edc to your computer and use it in GitHub Desktop.
A web-page specification for computers that predate the World Wide Web

Web0; A Specification -- DRAFT

Web0 ("web zero") is a World Wide Web for computers that pre-date Web 1.0.

Web0 is intended for 8-bit computers; newer machines can participate, but the spec is designed around a baseline of 8-bit hardware. This is because, whilst most 16-bit computers already have web-browsers that can be fed basic HTML through a proxy service such as http://frogfind.com, few 8-bit systems have browsers and struggle with even simple HTML due to assumptions in the design of HTML itself.

INCOMPLETE, DRAFT DOCUMENT

File all suggestions / complaints to kroc@camendesign.com

Design Goals

  • Use existing hardware and infrastructure. Build an idea first, better tools and hardware come later. No new protocol: use HTTP, FTP, whatever can get plain-text on to a retro machine, even offline files on floppy disk (e.g. disk mags)

  • Presentation of the document is decided by the client according to its display capabilities and the user according to their personal preferences. Web0 intentionally does not allow document authors to specify exact spacing or presentation

  • Big content on small hardware. Allow for long content (such as Wikipedia pages) that wouldn't fit in RAM in one piece. No markup that spans more than a paragraph! Read any line and know what it is without having to first parse previous lines

  • Interoperability between platforms. Don't build separate communities for C64, Apple, etc. build one community regardless of hardware, the lower-spec the better. Access and create content within the limits of pre-ASCII text, e.g. PETSCII has no backslash, underscore

Quick Reference Guide

!web0
!title:Quick Reference Guide
!
! comments use a space after the "!"

# Heading 1
## Heading 2
### Heading 3
#### Heading 4

Like HTML, whitespace  is not   significant.
The client decides how much space should show between paragraphs;
multiple blank lines have no effect.

=== (heavy divider line)

A <url> can also be written with <url link text>.
A <document.txt relative link> and </index.txt absolute link>.
Linking to <//web0.camendesign.com other Web0 servers>.

For <:http://camendesign.com non-Web0 links>,
use a colon to indicate a non Web0 protocol.

Wrap *stressed text* in asterisks.
Use forward-slashes for /emphasised text/.

< Right-justified paragraph
| centre-justified paragraph

--- (light divider line)

> All visible lines can be indented
> > Indents can be nested
> > > ## This Includes Headings

* Bullet list
> A second paragraph within a bullet item
> * A nested bullet list

.1 A numbered list
.2 Numbers must be given, auto-numbering isn't possible
> .2.1 A nested numbered list
.a Alphabetical item (either case)
.vii Roman numerals (either case)

%pre-formatted text, such as sample code:
%The "%" marker must be used on each line

Specification

  1. Document
  2. Comments
  3. Meta-data
  4. Paragraphs
  5. Justification
    1. Right
    2. Centre
  6. Indentation

Document

  • A Web0 document is a plain-text file

  • No file-extension is mandated, although .txt is recommended

    Developer Hint:

    Many retro platforms have unusual name limitations or do not use file extensions. File names should be limited to "a-z", "0-9" and dash "-", without spaces. Dots should not be used in file names except for a file extension, if used. Underscores must not be used as these are not universally available on 8-bit systems!

  • The file encoding is UTF8

    Developer Hint:

    "But 8/16-bit computers don't support Unicode!" I hear you say. No, they don't, but UTF8 code points are easy to identify, even on an 8-bit computer. The computer can decide which UTF8 graphemes it can handle and which ones it can't. Many 8-bit machines can use user-defined-graphics to provide some extended characters. Code-pages were a bad idea, let's not repeat that mistake.

  • "Whitespace" is defined as ASCII space (0x20) and ASCII tab (0x09, \t) only

  • Like HTML, clients must not render multiple spaces. Multiple contiguous spaces should be flattened into a single space. This behaviour can be overridden with pre-formatted text

  • The tab character (\t or 0x09) should always be converted to a single space, and flattened alongside all other spaces

  • Leading and trailing whitespace around text should not be displayed except where it might cause two words to join

  • A document is broken into lines. A CRLF (\r\n) or LF (\n) or CR (\r) sequence signifies the separation between lines

    Developer Hint:

    Do not assume line endings will be uniform or always Windows \r\n or Linux \r in style; retro computer systems used different styles (Classic Mac uses only \n) so clients should be prepared to treat any combination of \r & \n characters as a line-break.

  • Multiple conjoined line-endings, e.g. \r\n\r\n or \r\r\r, are taken as one; empty lines do not contribute to output

Comments

Comments are non-visible lines in the document for the document author's benefit, much like comments in source code or HTML.

  • A comment is any line that begins with an exclamation mark ! followed by whitespace and then any other content:

    ! this is a comment. it is not visible to the user
    !
    ! a line with only an exclamation mark is also ignored
    ! (trailing whitespace included, if present) 
    
  • There must not be any leading whitespace on the line or the line is considered a paragraph:

       ! this is not a comment, it will be displayed as-is to the user
    
  • There must be whitespace following the exclamation point and any other text for it to be a comment and not to be confused with meta-data:

    !this is not a comment, it is an invalid meta-data line
    

Meta-data

Meta-data allows the document author to specify hints to the client about supplementary content that may or may not be factored into display.

Due to a client that may be accessing a subset of a document that is larger than the RAM available, it is not guaranteed that any meta-data line be encountered or recalled. For this reason, it is recommended that meta-data be placed at the beginning of the document where it is most likely to be read at least once.

  • The first line of a Web0 document should be the Web0 identifier:

    !web0
    

    This allows the client to identify a Web0 document with minimal processing and likewise to easily avoid attempting to render HTML or other content as a Web0 document.

    Developer Hint:

    Future versions of the spec may introduce an extended Web0 identifier so clients should ignore the remainder of the line once !web0 has been read.

  • Meta-data consists of an exclamation mark ! followed by a meta-data "tag" and, optionally, a colon followed by additional data:

    !web0
    !title:Title Of The Document
    

    Developer Hint:

    Clients should strip spaces around the colon, and any leading/trailing space for the additional data. E.g. clients should ideally be able to process the following as being semantically the same as the title tag in the previous example.

    !title :   Title Of The Document  
    
  • There must not be any leading whitespace, otherwise the line will be displayed as a paragraph, whilst whitespace following the exclamation mark will cause the line to be interpreted as a comment.

       !title:this is not a meta-data line, it will be displayed as-is
    ! title:this is also not a meta-data line, it is a comment
    
  • Meta-data tags are case-insensitive but it is recommended to authors to always write them as lowercase

  • The meta-data tag must consist only of the characters a-z, 0-9 and - (dash). Underscores must not be used as this is unavailable on some 8-bit computers. It is strongly recommended that meta-data tags do not begin with a number or a dash

  • Clients should ignore any meta-data they cannot parse or do not recognise

See Appendix A for the list of defined meta-data.

Paragraphs

It is critical to understand that because of expected memory limits, every paragraph must contain all information needed to present it correctly, regardless of previous paragraphs. Unlike HTML, and even Markdown, the client cannot be expected to scan backwards to find where the current formatting scope begins.

This requirement (each line defines full formatting for that line) is what separates Web0 from all other markup formats, including other lightweight protocols such as Gemini.

  • Each line in the source document corresponds to a word-wrapped paragraph of text on the client's screen. Line-breaks always begin a new paragraph!

    Developer Hint:

    The client might insert a blank line between paragraphs or use a first-line indent. Line-breaks must not be used to force a mid-paragraph break!

  • Any line that begins with whitespace is automatically a paragraph!

    Developer Hint:

    The purpose of this is to allow authors to write paragraphs that begin with characters that could be interpreted as something else, e.g. !, . or %, although pre-formatted text should be used in the case of code samples.

  • A line that contains only whitespace must be ignored

    Developer Hint:

    The client decides if, and how much, space is rendered between paragraphs of content. Empty lines in the source must be discarded by the client. The client must not display empty lines, particularly groups of them, from the source document.

Justification

Right

All paragraphs are left-justified (or justified, according to user taste) by default. A line is right-justified when the first character on the line is a left-angled (i.e. pointing from the right) bracket, "<", followed by whitespace.

< Right-justified paragraph.

The whitespace is required to distinguish right-justification from links. There must be no leading whitespace on the line otherwise the "<" character is taken as literal and displayed as-is.

<link.txt left-justified link>
< <link.txt right-justified link>
  < left-justified paragraph (note leading whitespace!)

Centre

Centering a paragraph

| centre-justified paragraph

Indentation

Any paragraph can be indented if the line begins with a right-angled bracket:

> This is an indented paragraph.

As with all leading and trailing whitespace, the whitespace following the angle-bracket should be stripped before display.

Developer Hint:

It is up to the client (and user) to define how large an indent is, but 4 spaces is recommended.

Multiple indents can be stacked.

> > > Triple indent
>>>Whitespace is optional

Developer Hint:

Clients might impose a hard limit on indent depth. Authors are recommended never to exceed 4 indents and likewise never assume the width of the client screen in characters (40/80 etc.)

Combining of indentation and justification is unspecified at this time. Clients may decide to use the indent as a margin or ignore indents entirely.

> | Centre-justified, but with a left-margin?
> < Right-justified, but with a left-margin?
< > Right-justified, but with a right-margin?
    (right-aligned indent?)

Headings

Headings are indicated by a number of hashes (#) at the beginning of the line

# Heading 1
## Heading 2
### Heading 3
#### Heading 4

Developer Hint:

Authors should limit themselves to four levels of headings, although headers up to level 6 could be encountered in the wild with content converted from HTML. The client (and user) will decide how to render each heading level.

Lists

A paragraph can be an item in a bullet list. If the line begins with an asterisk and whitespace, the client may choose to render this as a graphical bullet point and to indent the text as it wraps to match the bullet level.

* a bullet list item
* a second item

Bullet items may span multiple paragraphs, in which case the additional paragraphs should indent an additional level to match the indentation-level of the text after the bullet-point, e.g.

* A bullet item
> Its second paragraph

Bullet lists can be nested by simply indenting to match the nested level.

* First level
> * A nested list
> > * A third level

A numbered list item begins with a dot ., one or more symbols (which are not limited to numbers), and a whitespace character.

.1 A numbered item
.2 2nd numbered item
.a Alphabetical item (either case)
.vii Roman numerals (either case)

Authors should restrict themselves to valid characters: a-z, A-Z & 0-9, although some clients might not enforce this. It is suggested that the client take the number/symbols, display them (typically without the preceding dot) and indent the paragraph, as with bullet lists.

Developer Hint:

Why the dot before the number rather than after? So that clients can immediately recognise a numbered list item without having to first parse a string and recognise it as a number, particularly Roman numerals such as "mcmxcviii".

Auto-numbering of list items is not possible due to the need to scan backwards to the start of the list, therefore, numbered lists support indentation the same as bullet lists, and the two can be freely intermixed.

Pre-formatted Text

Pre-formatted text allows document authors to present content as-is that should not be word-wrapped or processed using Web0 markup, much like the <pre> element in HTML.

Pre-formatted content is indicated by a percent-sign. The remainder of the line is expected to be rendered "as is" by the client, ideally without word-wrapping and without processing Web0 markup within the content.

%Preformatted text;  whitespace is rendered   as is

Whitespace immediately following the % marker must be rendered e.g. for indented code.

It is important to note that because Web0 documents are divided into lines, and each line must contain the necessary information to format that line irrespective of others, each line in the preformatted content must begin with the % marker.

%10 PRINT "HELLO WORLD"
%20 GOTO 10

Developer Hint:

Where possible clients should provide a means of scrolling pre-formatted sections left & right and resort to word-wrapping only where this is not possible.

Indents are allowed prior to the percentage sign but clients are permitted to ignore the indent for the purpose of fitting content comfortably on screen.

> > %PRINT "HELLO WORLD"

Dividers

Dividers act like horizontal-rules from HTML (<hr />) except there are two types, a "heavy" divider and a "light" divider.

It is up to the client to determine how "heavy" and "light" dividers should be drawn but if the client is limited to non-graphical characters only, the equals = and dash - characters are recommended accordingly.

The "heavy" divider is written using three equals characters at the beginning of a line. Any characters following the three equals, until the end of the line, must be ignored.

Indents can be used and the client should begin the line according to the indent:

===
> === (with indent)

The "light" divider is written using three dashes at the beginning of the line. Indents can be used and the client should begin the line according to the indent:

---
> --- (with indent)

Links

A link is written with an opening left-angled bracket followed by the URL, then a space; which separates the URL from the link text; which may not contain a right-angled bracket as this signifies the end of the link.

<url link text>

The link text is optional if you want the URL to appear as-is.

<url>

URLs should be assumed to be using the Web0 protocol scheme (web0:), so the link <document.txt> would be the equivalent of web0:document.txt where this is a relative link from the current document to "document.txt"

Developer Hint:

Links cannot be pre-defined in another part of the document like Markdown because it is not intended that the whole document, or even the whole line, fit into RAM at once.

Stressed and Emphasised Text

  • Stress is indicated by wrapping text with asterisks *

    Highlight a phrase with *stressed text*.
    
  • Emphasis is indicated by wrapping text with forward-slashes /:

    Add /emphasis/ to text.
    
  • There must not be any whitespace between the opening asterisk * or forward-slash / marker and the first character:

    *This is stressed text*
    * This is a bullet point*
    This is not/ emphasis/
    

Future Plans

  • Input forms
  • Tables -- via CSV style formatting
  • Low-resolution pixel art via 2x2 or Teletext-esque 2x3 ("sixel") block chars

Appendices

Appendix A: List Of Meta-data

!date:YYYYMMDD, !date:YYYYMMDDHHSS

A date/time-stamp to identify when the document was published or last updated. The date-time must be in UTC! Most 8-bit systems lack real time clocks, but could calculate local time from a user-specified offset.

The simplified YYYYMMDD... numerical format is used instead of a standard ISO timestamp to make it easy to parse for 8-bit CPUs.

Developer Hint:

Clients should ignore any trailing text to allow for future expansion of the date string.

!title:[...]

Defines the document's title. Clients could use this if they have title-bars, tabs or anywhere they wish to use a 'friendly name' for the document over the document's file name.

!web0

A simple tag to identify the file as a Web0 document; it should be the first line in the document. Clients may choose to reject documents without the identifier as it provides a way to avoid accidentally trying to render HTML or other content as a Web0 document with minimal processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment