Last active
August 29, 2015 14:01
-
-
Save errzey/abdc64f1183a741fb255 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Writing efficient parsers for streaming data in C | |
C is known as one of the worst languages to write any type of parsers. | |
Developers usually take one of three roads for parsing data: | |
1. Lexical Analysis & LALR (lex / yacc) utilizing a BNF syntax input. | |
2. Gobs of strstr, strtok, strcmp, and loads of conditionals. | |
3. Regular expressions, which usually involve the same logic in #2 | |
Utilizing Lex & Yacc can be one of the more strict and specific methods of | |
parsing. Though it has some shortcomings; for example, the generated code is | |
usually very complex, and in most cases, not thread-safe. While the BNF language | |
is sane and logical, the act of creating it can be daunting, not only for the | |
experienced, but for the next developer having to maintain your code. | |
Using various C str functions is by far the worst method, for not only performance, but | |
readability and maintainability. There is nothing worse than seeing a function | |
with fifty strtok's and 100 conditionals. | |
Regular expressions double the complexity and maintainability of a parser due to | |
the fact that the regular expression language itself can be impossible to read, | |
and in addition, usually incorporates a large number of C str functions and | |
conditionals around matched data. | |
In almost all cases, a developer will write parsers that deal with a single | |
block of data that can be fully parsed in one call. When dealing with streamed | |
data (e.g., network), this can cause problems. The work-around is usually in the | |
form of buffer-until-ready. That is, keep reading until a block of data can | |
be fully parsed. | |
In this presentation I will discuss how to write parsers which are resilient to | |
complexity, resource utilization, and overall maintainability. We will discuss | |
the initial thought process, from specification to expectations, prototyping, | |
to finalized implementation. Using these techniques, little to no system | |
resources are utilized, and parsing can be paused and resumed at any time | |
without interrupting the finalized structure. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment