Skip to content

Instantly share code, notes, and snippets.

@Fogest
Created January 27, 2016 03:35
Show Gist options
  • Save Fogest/d2e955498e73abe6c536 to your computer and use it in GitHub Desktop.
Save Fogest/d2e955498e73abe6c536 to your computer and use it in GitHub Desktop.

This answer is supposing that you want to roll your own parser using Standard C. In practice it is usually better to use an existing parser because they have already thought of and handled all the weird things that can come up.

My high level approach would be:

  • Read a line
  • Pass pointer to start of this line to a function parse_line:
    • Use strcspn on the pointer to identify the location of the first : or ; (aborting if no marker found)
    • Save the text so far as the property name
    • While the parsing pointer points to ;:
      • Call a function extract_name_value_pair passing address of your parsing pointer.
        • That function will extract and save the name and value, and update the pointer to point to the ; or : following the entry. Of course this function must handle quote marks in the value and the fact that their might be ; or : in the value
    • (At this point the parsing pointer is always on :)
    • Pass the rest of the string to a function parse_csv which will look for comma-separated values (again, being aware of quote marks) and store the results it finds in the right place.

The functions parse_csv and extract_name_value_pair should in fact be developed and tested first. Make a test suite and check that they work properly. Then write your overall parser function which calls those functions as needed.


Also, write all the memory allocation code as separate functions. Think of what data structure you want to store your parsed result in. Then code up that data structure, and test it, entirely independently of the parsing code. Only then, write the parsing code and call functions to insert the resulting data in the data structure.

You really don't want to have memory management code mixed up with parsing code. That makes it exponentially harder to debug.


When making a function that accepts a string (e.g. all three named functions above, plus any other helpers you decide you need) you have a few options as to their interface:

  • Accept pointer to null-terminated string
  • Accept pointer to start and one-past-the-end
  • Accept pointer to start, and integer length

Each way has its pros and cons: it's annoying to write null terminators everywhere and then unwrite them later if need be; but it's also annoying when you want to use strcspn or other string functions but you received a length-counted piece of string.

Also, when the function needs to let the caller know how much text it consumed in parsing, you have two options:

  • Accept pointer to character, Return the number of characters consumed; calling function will add the two together to know what happened
  • Accept pointer to pointer to character, and update the pointer to character. Return value could then be used for an error code.

There's no one right answer, with experience you will get better at deciding which option leads to the cleanest code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment