Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
This talk is on XML attacks, which are very easy to become vulnerable to, because XML is insane, and
extremely dangerous especially if you're running web services or similar.
First up, Billion Laughs. Essentially you can do text substitutions in XML, because obviously it can
rewrite itself as you parse it. And you do them like this.
So, you define a whole load of rules, and then at the bottom &lol9 gets replaced by 10 &lol8s, which
each then get replaced by 10 &lol9's [n.b. should be &lol7], and eventually gives you one billion lols.
Byte for each character, 3 bytes for a lol, gives you 3GB of string. Parsing that will take a long
time and will probably break things when you write it anywhere.
On top of that, you can also substitute things for other resources, such as files. If you do this
you'll read from the random bit on the disk, and it'll keep giving you data, forever. You will never
parse this. In the meantime it'll also fill all of your memory. It's not very good.
In addition, imagine you've got some web service where you take some XML and maybe you look at it
for a bit, and you give it back, in an error page or similar. If you do this you give them your
passwords, or your private keys, or anything else you can possibly get at.
And also, even more insane, you can read from the internet. So, this could now read anything, so you
can do things like the previous attack, and you can give them the content of your private intranet site,
or they can provide you a website and they can just block you, you can just wait for a while, or they
can give you whatever they want to give you, and you will just sit and take it.
These look relatively easy to block, but also there's also XInclude, which does the same thing, but
And then you can mix them together. This will connect that website one billion times. If you send this
to a webserver it will attack somebody else a lot. If you send this multiple times all your multithreads
on your little multithreaded web server will all go and attack them, and it'll load balance it and
everything. This isn't theoretical! Last week, had this, this works. If you point it to itself, it
takes the entire server down.
As well as that, you can do XML injection, which is like SQL injection, but enterprisey. If you expect
to send this off your order-processing website somewhere, that's then going to build them and sell on
this product and so on. And your going to take some user input and your going to put it into here,
and your going to sell them what they want. You take some user input, but oh no, they've given you
XML, and they've written a new price in there and now everything's free.
Lots of XML parsers are quite naive and foolish, and will give you the last price, if you ask for a
single element, and it'll be zero. But you've got a clever XML parser, so it's fine. But they've
commented it now! There's only one intepretation of this, and this is that everything is free. If
you take XML that goes in like this and you don't sanitize it, everything goes horribly horribly wrong.
So what do you do? Essentially, this. If you're building XML yourself, you sanitize your inputs.
You make sure you're not putting in special characters and then there's a different set of special
characters if you're putting in attributes and so on and so on. Or sensibly, you build XML with
some kind of framework, which will do all of this for you, but you should test it as well.
Secondly, you disable all the bits of XML which are absolutely mad, which is lots of them, and are
enabled by default in lots of things. Again, needs testing.
Then, when you get exceptions in your XML, you don't show them to the user! If you're trying to do
XML injection and the thing tells you you're missing this element, you add the element, and it's
easy and it just works. You don't want to let people do that, so yeah, that covers that.
But then if you do need to use any of the special features, you lock it down. So you watch it and
you don't let it parse for half an hour, or you don't let it use 3GB of memory, and if you really
need to pull in resources from somewhere else, you only let in certain ones, not all of the internet.
In Java this is relatively easy, you just set a load of things to false. I'll send this round for
reference, but it's not terribly complicated. In .net it's even easier and you can actually set
the XmlResolver bit, which is the bit that looks up external resources, to null, so it can't get
any, and it's just better. And that's basically it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment