Blog 2020/1/10
<- previous | index | next ->
Let's learn how to write a Lisp interpreter in C!
In part 4, we implement support for reading, evaluating, and printing C strings.
This will require making our tokenizer a bit more sophisticated.
We add a new struct
to represent C strings:
struct CString_ {
FormType type;
char* valuep;
};
typedef struct CString_ CString;
int new_cstring(CString** cspp, char* sp);
bool is_cstring(Form* formp);
and add an entry to our FormType
enum:
TypeSymbol = 10,
TypeCLong = 20,
TypeCDouble = 30,
+ TypeCString = 40,
};
typedef enum FormType_ FormType;
We need to add a special-case to fbuff_get_token()
to handle strings.
When we see an opening quote character,
we put the character back
and then delegate to separate function
which will read a string token:
reader.c (error handling elided):
fbuff_getch(fbp, &ch);
+
+ /* this is a string literal. */
+ if (ch == '"') {
+ fbuff_ungetch(fbp, ch);
+ return fbuff_get_token_str(fbp, buffpp);
+
+ } else {
+ *cursor = ch;
+ cursor++;
}
- *cursor = ch;
- cursor++;
/* the rest of the chars. */
while (true) {
And here's the implementation of that function, fbuff_get_token_str()
. It simply reads characters until it encounters the closing quote, growing its buffer when needed.
reader.c (error handling elided):
/* Advances fbp far enough to read one token (which is a string literal).
Points buffpp to a malloc'ed buffer containing the string.
Returns 0, EOF, errno, or an error code. */
static int fbuff_get_token_str(FBuff* fbp, char** buffpp) {
char ch;
/* allocate the initial buffer. */
size_t buffsize = 1000;
size_t bufflen = buffsize - 1;
char* buffp = malloc(buffsize);
char* cursor = buffp;
/* the first char must be the opening quote. */
fbuff_getch(fbp, &ch);
assert(ch == '"');
*cursor = ch;
cursor++;
while (true) {
size_t len = cursor - buffp;
/* time to grow the buffer. */
if (len == bufflen) {
buffsize *= 2;
bufflen = buffsize - 1;
char* newbuffp = realloc(buffp, buffsize);
buffp = newbuffp;
}
fbuff_getch(fbp, &ch);
/* this is the end of the string. */
if (ch == '"') {
*cursor = ch;
cursor++;
*cursor = '\0';
/* shrink buffp to fit the size of the string. */
size_t finalbuffsize = buffp - cursor + 1;
if (finalbuffsize < buffsize) {
char* finalbuffp = realloc(buffp, finalbuffsize);
buffp = finalbuffp;
}
break;
/* this is a regular char. */
} else {
*cursor = ch;
cursor++;
}
}
*buffpp = buffp;
return 0;
}
We also need to update read_form()
to parse a token as a string when the token starts with a quote:
reader.c (error handling elided):
/* we've reached the end of input. */
if (ch1 == '\0') {
return EOF;
+
+ /* string literal. */
+ } else if (ch1 == '"') {
+ assert(buffp != buff);
+ char* sp;
+ parse_string(buffp, &sp);
+
+ CString* csp;
+ new_cstring(&csp, sp);
+ *formpp = (Form*)csp;
+ return 0;
+
+ /* the form type can't be determined from ch1 alone. */
} else {
bool success;
/* an integer literal. */
long l;
success = try_parse_long(buffp, &l);
And here's the implementation of parse_string()
:
reader.c (error handling elided):
/* Parses a string from buffp.
*spp is malloc'ed with a copy of the parsed string.
Returns 0 or errno or error. */
static int parse_string(const char* buffp, char** spp) {
size_t src_len = strlen(buffp);
size_t src_size = src_len + 1;
/* minimum buffp is an opening and closing quote, so we know len >= 2. */
assert(src_len >= 2);
/* first and last char must be '"'. */
if (*buffp != '"') {
return E_parse_string__invalid_string_1;
}
if (*(buffp + src_len - 1) != '"') {
return E_parse_string__invalid_string_2;
}
size_t dst_size = src_size - 2;
char* dst = malloc(dst_size);
/* don't copy the closing quote. */
size_t dst_len = dst_size - 1;
/* don't copy the opening quote. */
const char* start = buffp + 1;
strncpy(dst, start, dst_len);
*(dst + dst_len) = '\0';
*spp = dst;
return 0;
}
Returns 0. */
int eval_form(Form* formp, Form** resultpp) {
/* for now, all forms evaluate to themselves. */
- if (is_symbol(formp) || is_clong(formp) || is_cdouble(formp)) {
+ if (is_symbol(formp) || is_clong(formp) || is_cdouble(formp)
+ || is_cstring(formp))
+ {
*resultpp = formp;
return 0;
We implement support for printing CString
objects:
printer.c (error handling elided):
/* Prints the CString in csp into fp.
Returns 0 or errno. */
static int print_cstring(CString* csp, FILE* fp) {
fprintf(fp, "CString: \"%s\"", csp->valuep);
return 0;
}
and stitch it into print_form()
:
} else if (is_cdouble(formp)) {
CDouble* dp = (CDouble*)formp;
return print_cdouble(dp, fp);
+ } else if (is_cstring(formp)) {
+ CString* csp = (CString*)formp;
+ return print_cstring(csp, fp);
} else {
assert(false);
}
We can see that our evaluator understands strings now!
$ ./lisp
> "foo"
CString: "foo"
>
Also, our interpreter handles multi-line user input:
$ ./lisp
> "hello
world!"
CString: "hello
world!"
>
However, our interpreter does not yet recognize escaped characters. This means there is no way for us to enter a string which contains a quote character ("
):
$ ./lisp
> "I said \"Hello!\" to the baker."
CString: "I said \"
Symbol: Hello!\"
Symbol: to
Symbol: the
Symbol: baker."
>
In part 5 we will extend our string support to handle escaped characters!