Create a gist now

Instantly share code, notes, and snippets.

SRT file for https://gist.github.com/pimterry/4971500 which is a transcript of http://www.youtube.com/watch?v=RzjUw47ZIg0 Tim Perry owns all the rights to this derivative work.
1
00:00:00,076 --> 00:00:02,593
This talk is on XML attacks,
2
00:00:02,593 --> 00:00:04,441
which are very easy to become vulnerable to,
3
00:00:04,441 --> 00:00:05,677
because XML is insane
4
00:00:05,677 --> 00:00:07,674
and extremely dangerous, especially if you're
5
00:00:07,674 --> 00:00:09,613
running web services or similar.
6
00:00:09,613 --> 00:00:11,921
First up, Billion Laughs.
7
00:00:11,921 --> 00:00:16,588
Essentially, you can do text substitutions in XML,
8
00:00:16,588 --> 00:00:18,521
because obviously it can rewrite itself as you parse it.
9
00:00:18,521 --> 00:00:21,424
And you do them like this.
10
00:00:21,424 --> 00:00:23,026
So, you define a whole load of rules,
11
00:00:23,026 --> 00:00:26,196
and then at the bottom, &lol9 gets replaced by
12
00:00:26,196 --> 00:00:28,175
10 &lol8s, which each then get replaced by
13
00:00:28,175 --> 00:00:30,896
10 &lol7's, and eventually gives you one billion
14
00:00:30,896 --> 00:00:35,772
lols. Byte for each character, 3 bytes for a lol, gives you
15
00:00:35,772 --> 00:00:39,776
3GB of string. Parsing that will take a long time and will
16
00:00:39,776 --> 00:00:44,144
probably break things when you write it anywhere.
17
00:00:44,144 --> 00:00:47,350
On top of that, you can also substitute
18
00:00:47,350 --> 00:00:52,591
things for other resources, such as files.
19
00:00:52,591 --> 00:00:56,178
If you do this you'll read from the random bit on the disk,
20
00:00:56,178 --> 00:00:58,728
and it will keep giving you data, forever.
21
00:00:58,728 --> 00:01:00,469
You will never parse this.
22
00:01:00,469 --> 00:01:03,838
In the meantime it'll also fill all of your memory.
23
00:01:03,838 --> 00:01:06,870
It's not very good.
24
00:01:06,870 --> 00:01:09,778
In addition, imagine you've got some web service where
25
00:01:09,778 --> 00:01:11,757
you take some XML and maybe you look at it
26
00:01:11,757 --> 00:01:14,711
for a bit, and you give it back, in an error page or similar.
27
00:01:14,711 --> 00:01:16,746
If you do this, you give them your passwords, or
28
00:01:16,746 --> 00:01:19,763
your private keys, or anything else you can possibly get at.
29
00:01:19,763 --> 00:01:23,453
And also, even more insane, you can read from the Internet.
30
00:01:23,453 --> 00:01:28,441
So, this could now read anything, so you can do things like
31
00:01:28,441 --> 00:01:30,960
the previous attack, and you can give them the contents
32
00:01:30,960 --> 00:01:33,302
of your private intranet site, or they can provide you a
33
00:01:33,302 --> 00:01:35,570
website and they can just block you, you can just wait for
34
00:01:35,570 --> 00:01:38,271
awhile, or they can give you whatever they want to give
35
00:01:38,271 --> 00:01:40,810
you, and you will just sit and take it.
36
00:01:40,810 --> 00:01:43,473
These look relatively easy to block, but also
37
00:01:43,473 --> 00:01:46,843
there's XInclude, which does the same thing, but again.
38
00:01:46,843 --> 00:01:50,046
And then, you can mix them together.
39
00:01:50,046 --> 00:01:52,682
This will connect to that website one billion times.
40
00:01:52,682 --> 00:01:57,621
If you send this to a webserver, it will attack somebody else
41
00:01:57,621 --> 00:01:58,592
a lot.
42
00:01:58,592 --> 00:02:00,762
If you send this multiple times, all your multithreads on your
43
00:02:00,762 --> 00:02:03,793
little multithreaded webserver will all go and attack
44
00:02:03,793 --> 00:02:06,060
them, and it'll load balance it and everything.
45
00:02:06,060 --> 00:02:07,731
This isn't theoretical!
46
00:02:07,731 --> 00:02:10,200
Last week, had this, this works.
47
00:02:10,200 --> 00:02:14,904
If you point it to itself, it takes the entire server down.
48
00:02:14,904 --> 00:02:17,407
It's absolutely mad.
49
00:02:17,407 --> 00:02:20,710
As well as that, you can do XML injection,
50
00:02:20,710 --> 00:02:22,812
which is like SQL injection, but "enterprisey".
51
00:02:22,812 --> 00:02:27,183
If you expect to send this off your order-processing
52
00:02:27,183 --> 00:02:29,383
website somewhere, that's then going to build
53
00:02:29,383 --> 00:02:31,516
them and sell on this product and so on.
54
00:02:31,516 --> 00:02:33,340
And you're going to take some user input and you're going
55
00:02:33,340 --> 00:02:35,097
to put it into here, and you're going to sell them what
56
00:02:35,097 --> 00:02:37,994
they want. You take some user input, but oh no, they've
57
00:02:37,994 --> 00:02:41,430
given you XML, and they've written a new price in there and
58
00:02:41,430 --> 00:02:43,092
now everything's free.
59
00:02:43,092 --> 00:02:48,261
Lots of XML parsers are quite naïve and foolish, and
60
00:02:48,261 --> 00:02:51,614
will give you the last price, if you ask for a single element,
61
00:02:51,614 --> 00:02:54,043
and it'll be zero. But you've got a clever XML parser,
62
00:02:54,043 --> 00:02:55,478
so it's fine.
63
00:02:55,478 --> 00:02:59,346
But they've commented it now! There's only one
64
00:02:59,346 --> 00:03:01,758
interpretation of this, and this is that everything is free.
65
00:03:01,758 --> 00:03:04,619
If you take XML that goes in like this and you don't
66
00:03:04,619 --> 00:03:10,132
sanitize it, everything goes horribly horribly wrong.
67
00:03:10,132 --> 00:03:12,013
So what do you do?
68
00:03:12,013 --> 00:03:14,348
Essentially, this.
69
00:03:14,348 --> 00:03:17,809
If you're building XML yourself, you sanitize your inputs.
70
00:03:17,809 --> 00:03:20,570
You make sure you're not putting in special characters
71
00:03:20,570 --> 00:03:22,644
and then there's a different set of special characters
72
00:03:22,644 --> 00:03:25,308
if you're putting in attributes and so on and so on.
73
00:03:25,308 --> 00:03:27,544
Or sensibly, you build XML with some kind of framework,
74
00:03:27,544 --> 00:03:30,487
which will do all of this for you, but you should test it,
75
00:03:30,487 --> 00:03:31,680
as well.
76
00:03:31,680 --> 00:03:34,852
Secondly, you disable all the bits of XML which are
77
00:03:34,852 --> 00:03:37,887
absolutely mad, which is lots of them, and are enabled
78
00:03:37,887 --> 00:03:41,587
by default in lots of things. Again, needs testing.
79
00:03:41,587 --> 00:03:45,064
Then, when you get exceptions in your XML, you don't
80
00:03:45,080 --> 00:03:48,372
show them to the user! If you're trying to do XML injection
81
00:03:48,388 --> 00:03:51,018
and the thing tells you you're missing this element,
82
00:03:51,018 --> 00:03:52,186
you add the element,
83
00:03:52,186 --> 00:03:54,269
and it's easy and it just works.
84
00:03:54,269 --> 00:03:57,763
You don't want to let people do that, so that covers that.
85
00:03:57,763 --> 00:03:59,765
But then if you do need to use any of the special features,
86
00:03:59,765 --> 00:04:01,567
you lock it down.
87
00:04:01,567 --> 00:04:04,603
So you watch it and you don't let it parse for half an hour,
88
00:04:04,603 --> 00:04:07,112
or you don't let it use 3GB of memory, and if you really
89
00:04:07,112 --> 00:04:10,008
need to pull in resources from somewhere else,
90
00:04:10,008 --> 00:04:13,545
you only let in certain ones, not all of the Internet.
91
00:04:13,545 --> 00:04:17,745
In Java, this is relatively easy, you just set a load of things
92
00:04:17,745 --> 00:04:20,616
to false. I'll send this 'round for reference,
93
00:04:20,616 --> 00:04:22,421
but it's not terribly complicated.
94
00:04:22,421 --> 00:04:24,421
In .NET it's even easier and you can actually set
95
00:04:24,421 --> 00:04:26,592
the XmlResolver bit, which is the bit that looks up external
96
00:04:26,592 --> 00:04:29,062
resources, to null, so it can't get any, and it's
97
00:04:29,062 --> 00:04:31,096
just better.
98
00:04:31,096 --> 00:04:33,765
And that's basically it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment