SRT file for https://gist.github.com/pimterry/4971500 which is a transcript of http://www.youtube.com/watch?v=RzjUw47ZIg0 Tim Perry owns all the rights to this derivative work.

  • Download Gist
Your XML Parser Will Destroy Everything You Have Ever Loved.srt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391
1
00:00:00,076 --> 00:00:02,593
This talk is on XML attacks,
 
2
00:00:02,593 --> 00:00:04,441
which are very easy to become vulnerable to,
 
3
00:00:04,441 --> 00:00:05,677
because XML is insane
 
4
00:00:05,677 --> 00:00:07,674
and extremely dangerous, especially if you're
 
5
00:00:07,674 --> 00:00:09,613
running web services or similar.
 
6
00:00:09,613 --> 00:00:11,921
First up, Billion Laughs.
 
7
00:00:11,921 --> 00:00:16,588
Essentially, you can do text substitutions in XML,
 
8
00:00:16,588 --> 00:00:18,521
because obviously it can rewrite itself as you parse it.
 
9
00:00:18,521 --> 00:00:21,424
And you do them like this.
 
10
00:00:21,424 --> 00:00:23,026
So, you define a whole load of rules,
 
11
00:00:23,026 --> 00:00:26,196
and then at the bottom, &lol9 gets replaced by
 
12
00:00:26,196 --> 00:00:28,175
10 &lol8s, which each then get replaced by
 
13
00:00:28,175 --> 00:00:30,896
10 &lol7's, and eventually gives you one billion
 
14
00:00:30,896 --> 00:00:35,772
lols. Byte for each character, 3 bytes for a lol, gives you
 
15
00:00:35,772 --> 00:00:39,776
3GB of string. Parsing that will take a long time and will
 
16
00:00:39,776 --> 00:00:44,144
probably break things when you write it anywhere.
 
17
00:00:44,144 --> 00:00:47,350
On top of that, you can also substitute
 
18
00:00:47,350 --> 00:00:52,591
things for other resources, such as files.
 
19
00:00:52,591 --> 00:00:56,178
If you do this you'll read from the random bit on the disk,
 
20
00:00:56,178 --> 00:00:58,728
and it will keep giving you data, forever.
 
21
00:00:58,728 --> 00:01:00,469
You will never parse this.
 
22
00:01:00,469 --> 00:01:03,838
In the meantime it'll also fill all of your memory.
 
23
00:01:03,838 --> 00:01:06,870
It's not very good.
 
24
00:01:06,870 --> 00:01:09,778
In addition, imagine you've got some web service where
 
25
00:01:09,778 --> 00:01:11,757
you take some XML and maybe you look at it
 
26
00:01:11,757 --> 00:01:14,711
for a bit, and you give it back, in an error page or similar.
 
27
00:01:14,711 --> 00:01:16,746
If you do this, you give them your passwords, or
 
28
00:01:16,746 --> 00:01:19,763
your private keys, or anything else you can possibly get at.
 
29
00:01:19,763 --> 00:01:23,453
And also, even more insane, you can read from the Internet.
 
30
00:01:23,453 --> 00:01:28,441
So, this could now read anything, so you can do things like
 
31
00:01:28,441 --> 00:01:30,960
the previous attack, and you can give them the contents
 
32
00:01:30,960 --> 00:01:33,302
of your private intranet site, or they can provide you a
 
33
00:01:33,302 --> 00:01:35,570
website and they can just block you, you can just wait for
 
34
00:01:35,570 --> 00:01:38,271
awhile, or they can give you whatever they want to give
 
35
00:01:38,271 --> 00:01:40,810
you, and you will just sit and take it.
 
36
00:01:40,810 --> 00:01:43,473
These look relatively easy to block, but also
 
37
00:01:43,473 --> 00:01:46,843
there's XInclude, which does the same thing, but again.
 
38
00:01:46,843 --> 00:01:50,046
And then, you can mix them together.
 
39
00:01:50,046 --> 00:01:52,682
This will connect to that website one billion times.
 
40
00:01:52,682 --> 00:01:57,621
If you send this to a webserver, it will attack somebody else
 
41
00:01:57,621 --> 00:01:58,592
a lot.
 
42
00:01:58,592 --> 00:02:00,762
If you send this multiple times, all your multithreads on your
 
43
00:02:00,762 --> 00:02:03,793
little multithreaded webserver will all go and attack
 
44
00:02:03,793 --> 00:02:06,060
them, and it'll load balance it and everything.
 
45
00:02:06,060 --> 00:02:07,731
This isn't theoretical!
 
46
00:02:07,731 --> 00:02:10,200
Last week, had this, this works.
 
47
00:02:10,200 --> 00:02:14,904
If you point it to itself, it takes the entire server down.
 
48
00:02:14,904 --> 00:02:17,407
It's absolutely mad.
 
49
00:02:17,407 --> 00:02:20,710
As well as that, you can do XML injection,
 
50
00:02:20,710 --> 00:02:22,812
which is like SQL injection, but "enterprisey".
 
51
00:02:22,812 --> 00:02:27,183
If you expect to send this off your order-processing
 
52
00:02:27,183 --> 00:02:29,383
website somewhere, that's then going to build
 
53
00:02:29,383 --> 00:02:31,516
them and sell on this product and so on.
 
54
00:02:31,516 --> 00:02:33,340
And you're going to take some user input and you're going
 
55
00:02:33,340 --> 00:02:35,097
to put it into here, and you're going to sell them what
 
56
00:02:35,097 --> 00:02:37,994
they want. You take some user input, but oh no, they've
 
57
00:02:37,994 --> 00:02:41,430
given you XML, and they've written a new price in there and
 
58
00:02:41,430 --> 00:02:43,092
now everything's free.
 
59
00:02:43,092 --> 00:02:48,261
Lots of XML parsers are quite naïve and foolish, and
 
60
00:02:48,261 --> 00:02:51,614
will give you the last price, if you ask for a single element,
 
61
00:02:51,614 --> 00:02:54,043
and it'll be zero. But you've got a clever XML parser,
 
62
00:02:54,043 --> 00:02:55,478
so it's fine.
 
63
00:02:55,478 --> 00:02:59,346
But they've commented it now! There's only one
 
64
00:02:59,346 --> 00:03:01,758
interpretation of this, and this is that everything is free.
 
65
00:03:01,758 --> 00:03:04,619
If you take XML that goes in like this and you don't
 
66
00:03:04,619 --> 00:03:10,132
sanitize it, everything goes horribly horribly wrong.
 
67
00:03:10,132 --> 00:03:12,013
So what do you do?
 
68
00:03:12,013 --> 00:03:14,348
Essentially, this.
 
69
00:03:14,348 --> 00:03:17,809
If you're building XML yourself, you sanitize your inputs.
 
70
00:03:17,809 --> 00:03:20,570
You make sure you're not putting in special characters
 
71
00:03:20,570 --> 00:03:22,644
and then there's a different set of special characters
 
72
00:03:22,644 --> 00:03:25,308
if you're putting in attributes and so on and so on.
 
73
00:03:25,308 --> 00:03:27,544
Or sensibly, you build XML with some kind of framework,
 
74
00:03:27,544 --> 00:03:30,487
which will do all of this for you, but you should test it,
 
75
00:03:30,487 --> 00:03:31,680
as well.
 
76
00:03:31,680 --> 00:03:34,852
Secondly, you disable all the bits of XML which are
 
77
00:03:34,852 --> 00:03:37,887
absolutely mad, which is lots of them, and are enabled
 
78
00:03:37,887 --> 00:03:41,587
by default in lots of things. Again, needs testing.
 
79
00:03:41,587 --> 00:03:45,064
Then, when you get exceptions in your XML, you don't
 
80
00:03:45,080 --> 00:03:48,372
show them to the user! If you're trying to do XML injection
 
81
00:03:48,388 --> 00:03:51,018
and the thing tells you you're missing this element,
 
82
00:03:51,018 --> 00:03:52,186
you add the element,
 
83
00:03:52,186 --> 00:03:54,269
and it's easy and it just works.
 
84
00:03:54,269 --> 00:03:57,763
You don't want to let people do that, so that covers that.
 
85
00:03:57,763 --> 00:03:59,765
But then if you do need to use any of the special features,
 
86
00:03:59,765 --> 00:04:01,567
you lock it down.
 
87
00:04:01,567 --> 00:04:04,603
So you watch it and you don't let it parse for half an hour,
 
88
00:04:04,603 --> 00:04:07,112
or you don't let it use 3GB of memory, and if you really
 
89
00:04:07,112 --> 00:04:10,008
need to pull in resources from somewhere else,
 
90
00:04:10,008 --> 00:04:13,545
you only let in certain ones, not all of the Internet.
 
91
00:04:13,545 --> 00:04:17,745
In Java, this is relatively easy, you just set a load of things
 
92
00:04:17,745 --> 00:04:20,616
to false. I'll send this 'round for reference,
 
93
00:04:20,616 --> 00:04:22,421
but it's not terribly complicated.
 
94
00:04:22,421 --> 00:04:24,421
In .NET it's even easier and you can actually set
 
95
00:04:24,421 --> 00:04:26,592
the XmlResolver bit, which is the bit that looks up external
 
96
00:04:26,592 --> 00:04:29,062
resources, to null, so it can't get any, and it's
 
97
00:04:29,062 --> 00:04:31,096
just better.
 
98
00:04:31,096 --> 00:04:33,765
And that's basically it.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.