errzey/abstract

## abstract
Writing efficient parsers for streaming data in C

C is known as one of the worst languages to write any type of parsers.
Developers usually take one of three roads for parsing data:

1. Lexical Analysis & LALR (lex / yacc) utilizing a BNF syntax input.
2. Gobs of strstr, strtok, strcmp, and loads of conditionals.
3. Regular expressions, which usually involve the same logic in #2

Utilizing Lex & Yacc can be one of the more strict and specific methods of
parsing. Though it has some shortcomings; for example, the generated code is
usually very complex, and in most cases, not thread-safe. While the BNF language
is sane and logical, the act of creating it can be daunting, not only for the
experienced, but for the next developer having to maintain your code.

Using various C str functions is by far the worst method, for not only performance, but
readability and maintainability. There is nothing worse than seeing a function
with fifty strtok's and 100 conditionals.

Regular expressions double the complexity and maintainability of a parser due to
the fact that the regular expression language itself can be impossible to read,
and in addition, usually incorporates a large number of C str functions and
conditionals around matched data.

In almost all cases, a developer will write parsers that deal with a single
block of data that can be fully parsed in one call. When dealing with streamed
data (e.g., network), this can cause problems. The work-around is usually in the
form of buffer-until-ready. That is, keep reading until a block of data can
be fully parsed.

In this presentation I will discuss how to write parsers which are resilient to
complexity, resource utilization, and overall maintainability. We will discuss
the initial thought process, from specification to expectations, prototyping,
to finalized implementation. Using these techniques, little to no system
resources are utilized, and parsing can be paused and resumed at any time
without interrupting the finalized structure.
	Writing efficient parsers for streaming data in C

	C is known as one of the worst languages to write any type of parsers.
	Developers usually take one of three roads for parsing data:

	1. Lexical Analysis & LALR (lex / yacc) utilizing a BNF syntax input.
	2. Gobs of strstr, strtok, strcmp, and loads of conditionals.
	3. Regular expressions, which usually involve the same logic in #2

	Utilizing Lex & Yacc can be one of the more strict and specific methods of
	parsing. Though it has some shortcomings; for example, the generated code is
	usually very complex, and in most cases, not thread-safe. While the BNF language
	is sane and logical, the act of creating it can be daunting, not only for the
	experienced, but for the next developer having to maintain your code.

	Using various C str functions is by far the worst method, for not only performance, but
	readability and maintainability. There is nothing worse than seeing a function
	with fifty strtok's and 100 conditionals.

	Regular expressions double the complexity and maintainability of a parser due to
	the fact that the regular expression language itself can be impossible to read,
	and in addition, usually incorporates a large number of C str functions and
	conditionals around matched data.

	In almost all cases, a developer will write parsers that deal with a single
	block of data that can be fully parsed in one call. When dealing with streamed
	data (e.g., network), this can cause problems. The work-around is usually in the
	form of buffer-until-ready. That is, keep reading until a block of data can
	be fully parsed.

	In this presentation I will discuss how to write parsers which are resilient to
	complexity, resource utilization, and overall maintainability. We will discuss
	the initial thought process, from specification to expectations, prototyping,
	to finalized implementation. Using these techniques, little to no system
	resources are utilized, and parsing can be paused and resumed at any time
	without interrupting the finalized structure.