Skip to content

Instantly share code, notes, and snippets.

@jsanders
Created April 4, 2012 02:13
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jsanders/b5094ff6698806f165b9 to your computer and use it in GitHub Desktop.
Save jsanders/b5094ff6698806f165b9 to your computer and use it in GitHub Desktop.
Rust deserialize XML first crack
use std;
import io::reader_util;
enum node {
tag_node({
name: str,
attributes: [attribute],
children: [node]
}),
text_node(str)
}
type attribute = {
name: str,
value: str
};
fn is_eof(c: char) -> bool { c == -1 as char }
fn parse_tag_name(rdr: io::reader, first_c: char) -> str {
let mut c = rdr.read_char();
let mut tag_name = str::from_char(first_c);
while !is_eof(c) && c != '>' {
tag_name += str::from_char(c);
c = rdr.read_char();
}
ret tag_name;
}
#[doc = "Deserializes an xml node value from an io::reader"]
fn from_reader(rdr: io::reader) -> node {
let mut c = rdr.read_char();
let mut tag_name = "";
while !is_eof(c) {
if c == '<' {
c = rdr.read_char();
if c != '/' {
tag_name = parse_tag_name(rdr, c);
}
}
c = rdr.read_char();
}
ret tag_node({ name: tag_name, attributes: [], children: [] });
}
#[doc = "Deserializes an xml node value from a string"]
fn from_str(s: str) -> node {
io::with_str_reader(s, from_reader)
}
#[test]
fn test_empty_tag() {
assert from_str("<tag></tag>") == tag_node({ name: "tag", attributes: [], children: [] });
}
@jsanders
Copy link
Author

jsanders commented Apr 4, 2012

Pretty ugly, but actually parses an empty tag "correctly". I wish gist let you make comments like you can on code in repos, I want to annotate some of this:

import io::reader_util;

This is pretty interesting - below where I have an io::reader object and want to use #read_char (oops! adopting ruby naming convention!), I can't, because #read_char is not defined on directly on io::reader but rather on the io::reader_util "implementation" for an io::reader object. It reminds me a lot of mixing in functionality to a ruby object only when you need it. Not sure yet whether I can do that import only in the functions that actually use #read_char, but I'm guessing I can. Here's where it's implemented - https://github.com/mozilla/rust/blob/master/src/libcore/io.rs#L42-163

type node = {
  name: str,
  attributes: [attribute],
  children: children
 };

type attribute = {
  name: str,
  value: str
};

These are basically C structs, but you can implement behavior for them with the implementation similarly to the reader_utils thing from before, so then they behave more like typical objects (though I have no idea yet about polymorphism.

enum children {
  nodes([node]),
  none
}

This is how you make a type that can be any of the given sub-types (now that I think of it, this is how you do polymorphism). So in this case I've made a type called nodes that is an array of nodes and a subtype of children, and a type called none that is basically untyped (not sure what that means), and is also a subtype of children. I didn't actually want or need this children type, except that the node type can't contain any fields with type derived from itself. So I can't just have:

type node = {
  name: str,
  attributes: [attribute],
  children: [node]
};

like I wanted to. Not sure yet whether that is totally lame or somewhat acceptable. It does seem like tree-like structures are less convenient because of that restriction. Here's the json data type, which I've been cribbing off of - https://github.com/mozilla/rust/blob/master/src/libstd/json.rs#L28-35. It feels a little odd that my types are somewhat less simple in code by virtue of being more simple conceptually (in XML, there really is only one type of node, but I have to have this inconvenient separate type for "children", which should really just be an array of nodes. I guess now that I think of it, there is also a text node, so maybe the node type needs to be an enum anyway and things are simpler.

#[test]
fn test_empty_tag() {
  assert from_str("<tag></tag>") == { name: "tag", attributes: [], children: none };
}

This is pretty nifty and built into the language - if you compile with --test it creates an executable that runs anything annotated with #[test] instead of running the main function like it normally would. Much more convenient for library code.

@jsanders
Copy link
Author

jsanders commented Apr 4, 2012

Note that some of the middle of my last comment is out of date now, as I've changed the node type to be an enum, but you should compare the two approaches. Both of them seem to introduce one extra type that it doesn't feel like I should need - children in the old version and tag_node in the new version.

@lmarlow
Copy link

lmarlow commented Apr 4, 2012

I see list([json]) in json implementation, is list a builtin type or would that be useful instead of children?

@jsanders
Copy link
Author

jsanders commented Apr 4, 2012

list is actually just the name of the subtype, and the [json] syntax means that the list subtype is an alias for an array (or vector maybe, not clear on the difference yet) of json types. You can reference json in the subtype because it's an enum. If it were a type, it would give you a recursive type error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment