Skip to content

Instantly share code, notes, and snippets.

@Chubek
Created April 7, 2024 21:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Chubek/f4da04e64c71c3fe14e6f3ffeeb783c9 to your computer and use it in GitHub Desktop.
Save Chubek/f4da04e64c71c3fe14e6f3ffeeb783c9 to your computer and use it in GitHub Desktop.
Introduction to Zephyr ASDL

I wrote an implementation of Zephyr ASDL which you can install from here:

https:///github.com/Chubek/ZephyrASDL

In this document I wanna write a short intro to ASDL and you you can utilize it.

ASDL is in use in CPython's source code. Not my implementation of course, an implementation called 'PyASDL' which parses it, and it's up to you to provide the semantics and code generation. CPython's source code uses ASDL to build the Abstract Syntax Tree for Python in C++. My implementation of ASDL translates the syntax description directly to C. You can specify the types to be also emitted in an extra header file (via the -s flag).

Zephyr ASDL is not an standard, it's just a short paper written in 1997, by Wang, Appel, Korn and Serra, all Princeton alumni.

https://www.cs.princeton.edu/~appel/papers/asdl97.pdf

The original papr does not describe the semantics of ASDL, the original implementation, PyASDL, my implementation and several other implementations that are out there (such as OilShell guy's) have different semantics. They all use different methods to implement it. I think my implementation is the only one that uses a parser generator, given how easy it is to parse ASDL. But it my motto to never parse manually. Why? Because it's a waste of time lol. Recursive-descent parsing is really simple, especially with something like ASDL when there are not left-recursive stuff involved. But WHY DO THAT? I've seen several people gush about their manual parsers, I think, and I say thast with all respect, they just don't have much to offer backend-wise to the language they are implementing, so they waste their time at the parsing stage!

Ok let's focus on the mission of this document, which is teaching people what ASDL is.

ASDL stands of 'Abstract Syntax Description Language'. But in reality, it's designed to bring the robustness of a functional language's type system to C. So if you have a project in C that would have been much better done in an ML language, like SML, OCaml, or Haskell (yeah I just called Haskell an ML language, screw you officer I am not drunk) --- then ASDL is a good choice.

Now the authors don't specify that 'ASDL it to be translated to C, and C only'. But truly, C is the only language that could benefit from it. Modern languages all offer robust type systems. Either via prototypes, or like Rust, that mixes in imperative types like structures and unions with functional types. So I doubt Rust, or Python, or JS need to be preprocessed with ASDL. But C's clunky, old type system needs to be preprocessed if you wish to remain sane.

Aslo, using ASDL is the same reason that people like me use parser generators. It's easy to parse manually, but with parser generators, you get to 'specify' the parser. That is the important thing here. With an LR or PEG parser generator, we get to specify the lexical and syntactic grammar whilst we implement one. Whereas with a manual parser, there's no specificity at play.

So with ASDL, you get to 'specify' the syntax tree of a language, whilst getting a tree out of it. See how useful this is?

Now let's explain how ASDL works.

How does ASDL Work?

Command-line Arguments

The utility accepts several flags. Just two of them are important.

  • -s <path>: This specifies the path to symbols file. It's like y.tab.h of Yacc. Basically, it emits the typedefs and function declarations in a separate file (it emits them in the main file too). I recommedn generating a n.h file with it. This will allow you to use the types and functions in other translation units, without getting warnings.

  • -o <path>: The path to the output file, otherwise it outputs to STDOUT. You can redirect it. Nothing else gets written to STDOUT. There's one success message at the end, which gets written to STDERR. So don't redirect both.

Normally, an invokation looks lke this:

asdl -o my-tree.c -s my-tree.h my-tree.asdl

By the way, if you want your ASDL files to look pretty in Vim/Neovim, use asdl.vim which is in the extras directory.

How to Specify Trees

Specifying trees in ASDL is not differnt than specifying a tree in a functional language like OCaml, or Haskell. In fact, mosto f the authors of the paper are also authors of ML (at least Appel is). So ASDL is basically 'ML for C'.

There are some examples in examples directory. I am currently working on a POSIX shell, and this is the tree I have written for it:

%{

#include <stdint.h>
#include <sys/types.h>

extern void *tree_alloc(size_t);

#define ALLOC(size)	tree_alloc(size)

%}


quoted			::= SingleQuoted(string escaped_text)
			| DoubleQuoted(word_stack words)
			;

special_param		::= AllPositionalA
			| AllPosiitionalB
			| NoOfPos
			| LastExit
			| AsyncExit
			| ProcId
			| FlagOpts
			| EnvName
			;


parameter		::= PositionalParam(int pos)
			| SpecialParam(special_param special)
			| VariableParam(identifier name)
			;

glob_wc			::= MatchOne | MatchMany ;


glob			::= Wildcard(glob_wc wildcard)
			| LiteralChar(uchar lit_char)
			| Bracket(bool neg, string contents)
			| GlobStack(glob_stack stack)
			;

glob_stack		::= (glob* stack) ;


expand_path		::= Tilde(string suffix)
			| Glob(glob pattern)
			;


expand_param_sub       ::= Defl(parameter param, word? word_opt, bool colon)
			| Assign(parameter param, word? word_opt, bool colon)
			| Error(parameter param, word? word_opt, bool colon)
			| Alt(parameter param, word? word_opt, bool colon)
			;


expand_param_rm		::= RemoveSuffix(parameter param, word? word_opt, bool large)
			| RemovePrefix(parameter param, word? word_opt, bool large)
			;

expand_param		::= Simple(parameter to_expn)
			| Substitute(expand_param_sub sub_param)
			| Remove(expand_param_rm rm_param)
			| Strlen(parameter len_param)
			;


arith			::= Num(int64 n)
			| Variable(identifier var)
			| Nested(arith* list_nested)
			| Add(arith l, arith r)
			| Sub(arith l, arith r)
			| Mul(arith l, arith r)
			| Div(arith l, arith r)
			| Mod(arith l, arith r)
			| Shr(arith l, arith r)
			| Shl(arith l, arith r)
			;


expansion		::= Path(expand_path path_expn)
			| Param(expand_param param_expn)
			| Command(command cmd_expn)
			| Arith(arith arith_expn)
			;


word			::= Literal(string word_literal)
			| Expansion(expansion word_expn)
			| Quoted(quoted word_quoted)
			;


word_stack		::= (word* stack) ;


redirection		::= RedirIn(int? fno, word redir_word)
			| RedirOut(int? fno, word redir_word)
			| AppendOut(int? fno, word redir_word)
			| HereDoc(int? fno, word document, word delim, bool untab)
			| DupInFd(int? fno, word redir_word)
			| DupOutFd(int? fno, word redir_word)
			| OpenRw(int? fno, word redir_word)
			;


simple_command		::= NonTermCmd(word_stack words, redirection? redir) 
			| SeqTermCmd(word_stack words, redirection? redir)
			| AsynTermCmd(word_stack words, redirection? redir)
			;


pipeline		::= NonTermPipe(simple_command* commands)
			| SeqTermPipe(simple_command* commands)
			| AsyncTermPipe(simple_command* commands)


compound_list		::= PipelineBase(pipeline pipeln)
			| And(compound_list* list)
			| Or(compound_list* list)
			| Newline(compound_list* list)
			;


grouped_compound_cmd	::= SubShell(compound_list body)
			| SameShel(compound_list body)
			;

looped_compound_cmd	::= ForLoop(identifier name, word_stack words, compound_list body)
			| WhileLoop(compound_list ls1, compound_list ls2)
			| UntilLoop(compound_list ls1, compound_list ls2)
			;


case_condition		::= (glob* patterns, compound_list body) ;

if_condition		::= (compound_list cond, compound_list body) ;


condition_compound_cmd  ::= CaseCond(word_stack words, case_condition* body)
			| IfCond(if_condition if_base, if_condition* elifs, compound_list? else_base)
			;

funcdef_compound_cmd	::= (identifier name, compound_command body, redirection? redir) ;


compound_command	::= Grouped(grouped_compound_cmd grouped)
			| Looped(looped_compound_cmd looped)
			| Condition(condition_compound_cmd condition)
			| Function(funcdef_compound_cmd funcdef)
			;


command			::= Word(word_stack words)
			| List(compound_list list)
			| SimpleCmd(simple_command simple_cmd)
			| CompoundCommand(compound_command compound_cmd)
			;


command_stack		::= (command* stack) ;


shell_body		::= (command_stack elements, string shebang) ;

Here are the basic rules:

  • You can use Yacc-style "%{" and "%}" to embed C code. After you are done, you can use "%%" to write how much ever C code you want. This is not in the original specs.

  • The syntax of ASDL is comprised of 'types'. There are two kinds of types:

-- Product Types; -- Sum Types;

Does this remind you of something? Well, it should, because that's Haskell talk. Just like Haskell, a product type looks like this:

prod_type ::= (string my_str, my_other_type other_type) ;

Semantics-wise, product types are translated directoy to C structs. In a product rule, you can only specify ONE product type. So this:

# Weird? Weird! Weird?! WRONG!
prod_type ::= (string my_str, my_other_type other_type)
           | (int i, uint j);

So this is wrong af. You can't do that.

Also, '#' begins a comment, and newline terminates it. Comments are not transfered to the final file. Because I am not using a syntax-directed translation, it is really impossible to transfer comments. Tools like Flex manage to transfer comments because they do syntax-directed translation, and emit as they go. I don't do that.

A sume type looks like this:

sum_type ::= Variant1(int a, int b)
          | Variant2(string c, uint64 d)
          ;

I think people who are familiar with functional languages understand now. The above is no different than this OCaml code:

(* Ocaml equivalent *)

type sum_type = Variant1 of int * int
              | Variant2 of string  * int

Or this Haskell data type:

--- Haskell equivalent

data sum_type 
  = Variant1 Int Int
   | Variant2 String Int

You can omit fields:

enum = MyType1 | MyType2 ;

This would make it an enumeratiive type. You can, however, use field-less constructors along with ones that have fields.

In a way, constructors with fields are AGDTs, and constructors without fields are variants.

You may have noticed * and ? being used after types. These are basically like monads, for lists and optional values.

** A type cannot be both opional, and a list **

To do that, use indirection. declare you field as a product type, then apply the next monad.

redir ::= (int* a);
type ::= Con1(redir? redir_opt, my_other_type type2)
      : Con2(uint i, redir* redir_list)
      ;

Using the Translated Files

translated file has several sections:

1- Locators: These are functional macros, used to locate fields in an instance; 2- Preludes: Whatever you put between "%{" and "%}" 3- Typedefs: The type definitions; 4- Declarations: Declaration of functions; 5- Definitions: Definition of functions; 6- Epilogue: Whatever you put after '%%'

You can use create_* functions to create new instances. Each type has a next field which you can use as linked list. Every list type has an append function. Every chain has a dump function.

You can redefined ALLOC, REALLOC and FREE. That's what I have done in the shell.asdl file.

Conclusion

Use your creativity, use ASDL to both generate files, and also, specify syntaxes.

Any questions you got, you can direct them at me.

chubakbidpaa [at] riseup [dot] net

I am .chubak on Discord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment