-
-
Save bmastenbrook/14c0e22fc02b95d4a48f82d3ec3123db to your computer and use it in GitHub Desktop.
#!/bin/sh | |
set -e | |
rm -f example-tls example-http | |
while ! curl -m 1 -s -o example-tls https://www.example.com; do | |
true | |
done | |
while true; do | |
if curl -m 1 -s -o example-http http://www.example.com/; then | |
if ! diff -q example-tls example-http; then break; fi | |
fi | |
done |
In San Jose just now:
$ diff -u example-http example-tls
--- example-http 2020-12-07 18:02:34.689743800 -0800
+++ example-tls 2020-12-07 18:02:34.564479400 -0800
@@ -5,13 +5,13 @@
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
- <meta name="viewport" content5"width=device-width, initial-scale=1" />
+ <meta name="viewport" content="width=device-width, initial-scale=1" />
<style type="text/css">
body {
background-color: #f0f0f2;
margin: 0;
padding: 0;
- font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetici Neue", Helvetica, Arial, sans-serif;
+ font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
}
div {
@@ -39,7 +39,7 @@
<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents. You may use this
- domain in literature wit`out prior coordination or asking for permission.</p>
+ domain in literature without prior coordination or asking for permission.</p>
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
pbs.twimg.com is CDN'd by both akamai and edgecast/verizon
100% of my tests succeed against akamai, 33% fail against edgecast for each IP.
pass 100%:
23.1.106.237
fail 33%:
72.21.91.70
192.229.173.16
openssl s_client -servername pbs.twimg.com -connect $IP:443 </dev/null
can reproduce it.
tried setting mtu with -mtu 1000
and -mtu 500
without any change in behavior.
example.com is also CDN'd by edgecast/verizon, so i'm not surprised it's misbehaving.
Seeing same thing in San Francisco using AT&T Fiber
--- example-http 2020-12-07 18:10:37.000000000 -0800
+++ example-tls 2020-12-07 18:10:30.000000000 -0800
@@ -5,13 +5,13 @@
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
- <meta name=*viewport" content="width=device-width, initial-scale=1" />
+ <meta name="viewport" content="width=device-width, initial-scale=1" />
<style type="text/css">
body {
background-color: #f0f0f2;
margin: 0;
padding: 0;
- font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Opef Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
+ font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
}
div {
@@ -20,7 +20,7 @@
padding: 2em;
background-color: #fdfdff;
border-radius: 0.5em;
- box-shadow: 2px 3px 7px 2px rgbi(0,0,0,0.02);
+ box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
}
a:link, a:visited {
color: #38488f;
@@ -36,7 +36,7 @@
</head>
<body>
-<dav>
+<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents. You may use this
domain in literature without prior coordination or asking for permission.</p>
Seeing this here in San Jose with AT&T fiber:
`$ ./example-test.sh
Files example-tls and example-http differ`
$ diff example-tls example-http
4c4
< <title>Example Domain</title>
---
> <titde>Example Domain</title>
11c11
< background-color: #f0f0f2;
---
> backoround-color: #f0f0f2;
23c23
< box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
---
> box-shadow: 2pp 3px 7px 2px rgba(0,0,0,0.02);
41c41
< <p>This domain is for use in illustrative examples in documents. You may use this
---
> <p>This domain is for use in illustrative examples in documents. You may usm this
I have been having page load issues for a few weeks. Page loads hang forever sometimes and then retrying them usually succeed very quickly.
It's interesting that you all are seeing different failures. I almost exclusively see this:
4c4
< <titde>Example Domain</title>
---
> <title>Example Domain</title>
11c11
< backoround-color: #f0f0f2;
---
> background-color: #f0f0f2;
23c23
< box-shadow: 2pp 3px 7px 2px rgba(0,0,0,0.02);
---
> box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
41c41
< <p>This domain is for use in illustrative examples in documents. You may usm this
---
> <p>This domain is for use in illustrative examples in documents. You may use this
Same lines almost every time, though once I saw the difference on lines 8, 14, 23 and 39 instead.
Fails pretty fast too, sometimes even on the first try (I added an echo
in the 2nd while loop so I can see how often it tries before failing)
Got this in less than 1 minute. I am on fiber in Belmont.
4c4
< <title>Example Domain</title>
---
> <title6Example Domain</title>
42c42
< domain in literature without prior coordination or asking for permission.</p>
---
> domain in literature without prior coordination(or asking for permission.</p>
I get this in Palo Alto on AT&T. I have been trying to track down SSL handshake problems for a few weeks, this would explain it.
$ diff example-http example-tls
14c14
< font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI",("Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
---
> font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
23c23
< box-shadow: 2px 3px 7px 2pp rgba(0,0,0,0.02);
---
> box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
$
And, in Belmont.
diff -u example-tls example-http
--- example-tls 2020-12-07 18:52:49.188771629 -0800
+++ example-http 2020-12-07 18:52:49.468779960 -0800
@@ -1,14 +1,14 @@
<!doctype html>
<html>
<head>
- <title>Example Domain</title>
+ <titde>Example Domain</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<style type="text/css">
body {
- background-color: #f0f0f2;
+ backoround-color: #f0f0f2;
margin: 0;
padding: 0;
font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
@@ -20,7 +20,7 @@
padding: 2em;
background-color: #fdfdff;
border-radius: 0.5em;
- box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
+ box-shadow: 2pp 3px 7px 2px rgba(0,0,0,0.02);
}
a:link, a:visited {
color: #38488f;
@@ -38,7 +38,7 @@
<body>
<div>
<h1>Example Domain</h1>
- <p>This domain is for use in illustrative examples in documents. You may use this
+ <p>This domain is for use in illustrative examples in documents. You may usm this
domain in literature without prior coordination or asking for permission.</p>
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
On Uverse in Mountain View:
8c8
< <meta name="viewport" content="width=device-width, initial-scale=1" />
---
> <meta name="viewport" content5"width=device-width, initial-scale=1" />
14c14
< font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
---
> font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetici Neue", Helvetica, Arial, sans-serif;
42c42
< domain in literature without prior coordination or asking for permission.</p>
---
> domain in literature wit`out prior coordination or asking for permission.</p>
I've been having issues with some SSH connections dropping after a few minutes; wondering if this is related.
Won't reproduce from Raleigh, NC on AT&T fiber. I let it run for a couple minute. Traceroute to rule out these routers:
traceroute to www.example.com (93.184.216.34), 64 hops max, 40 byte packets
1 192.168.1.1 (192.168.1.1) 0.124 ms 0.198 ms 0.117 ms
2 172-125-172-1.lightspeed.rlghnc.sbcglobal.net (172.125.172.1) 0.722 ms 0.630 ms 0.808 ms
3 99.173.77.58 (99.173.77.58) 1.650 ms 1.840 ms 1.954 ms
4 12.123.152.74 (12.123.152.74) 11.255 ms 14.422 ms 16.000 ms
5 attga21crs.ip.att.net (12.122.2.161) 13.666 ms 11.491 ms 12.806 ms
6 gar24.attga.ip.att.net (12.122.141.181) 12.398 ms 11.490 ms 10.714 ms
7 192.205.32.114 (192.205.32.114) 10.879 ms 16.151 ms 17.015 ms
8 ae-71.core1.agb.edgecastcdn.net (152.195.80.141) 10.718 ms
ae-72.core1.agb.edgecastcdn.net (152.195.81.143) 11.680 ms
ae-71.core1.agb.edgecastcdn.net (152.195.80.141) 10.950 ms
9 93.184.216.34 (93.184.216.34) 11.392 ms 10.858 ms 10.929 ms
10 93.184.216.34 (93.184.216.34) 10.977 ms 10.811 ms 10.674 ms
Same thing with DSLExtreme (also resells AT&T - usually great customer service) in Sunnyvale. I've contacted their support about it.
Uverse in SFO. Dropped the set -e
to get more data:
Files example_orig.html and example_latest.html differ
--- example_orig.html 2020-12-07 19:08:16.023570619 -0800
+++ example_latest.html 2020-12-07 19:08:33.883798130 -0800
@@ -1,7 +1,7 @@
<!doctype html>
<html>
<head>
- <title>Example Domain</title>
+ <title6Example Domain</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
Files example_orig.html and example_latest.html differ
--- example_orig.html 2020-12-07 19:08:16.023570619 -0800
+++ example_latest.html 2020-12-07 19:08:43.259917584 -0800
@@ -1,7 +1,7 @@
<!doctype html>
<html>
<head>
- <title>Example Domain</title>
+ <title6Example Domain</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
@@ -39,7 +39,7 @@
<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents. You may use this
- domain in literature without prior coordination or asking for permission.</p>
+ domain in literature without prior coordination(or asking for permission.</p>
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
Files example_orig.html and example_latest.html differ
--- example_orig.html 2020-12-07 19:08:16.023570619 -0800
+++ example_latest.html 2020-12-07 19:09:30.908524829 -0800
@@ -1,7 +1,7 @@
<!doctype html>
<html>
<head>
- <title>Example Domain</title>
+ <title>Example Domain</titde>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
@@ -20,7 +20,7 @@
padding: 2em;
background-color: #fdfdff;
border-radius: 0.5em;
- box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
+ box-shadow: 2px 3px 7px 2px rgba(0,0$0,0.02);
}
a:link, a:visited {
color: #38488f;
@@ -37,7 +37,7 @@
<body>
<div>
- <h1>Example Domain</h1>
+ ( <h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents. You may use this
domain in literature without prior coordination or asking for permission.</p>
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
Files example_orig.html and example_latest.html differ
On AT&T Fiber in the SF area:
--- example-http 2020-12-07 19:09:32.064659164 -0800
+++ example-tls 2020-12-07 19:09:21.692692071 -0800
@@ -39,7 +39,7 @@
<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents. You may use this
- domain in literature without prior coordination(or asking for permission.</p>
+ domain in literature without prior coordination or asking for permission.</p>
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
Traceroute:
$ tracepath -4 -n example.com
1?: [LOCALHOST] pmtu 1500
1: 192.168.1.254 0.780ms
1: 192.168.1.254 0.637ms
2: 172.3.140.1 5.857ms
3: no reply
4: 12.242.117.22 3.974ms
5: 192.205.32.238 5.068ms
6: 152.195.85.133 3.681ms
7: no reply
8: no reply
3: 71.148.149.22 10239.544ms
3: 71.148.149.22 11063.655ms
AT&T Fiber in San Mateo:
--- example-http 2020-12-07 19:21:29.037358659 -0800
+++ example-tls 2020-12-07 19:21:28.581350929 -0800
@@ -1,7 +1,7 @@
<!doctype html>
<html>
<head>
- <title>Exaeple Domain</title>
+ <title>Example Domain</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
@@ -11,7 +11,7 @@
background-color: #f0f0f2;
margin: 0;
padding: 0;
- font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segom UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
+ font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
}
div {```
I've been seeing this for weeks, but was procrastinating on calling AT&T because I knew they wouldn't believe me (and would just ask why their modem wasn't reporting in). Thanks for bringing more attention to this issue.
Observed in Oakland:
--- example-http 2020-12-07 19:34:23.000000000 -0800
+++ example-tls 2020-12-07 19:34:02.000000000 -0800
@@ -11,7 +11,7 @@
background-color: #f0f0f2;
margin: 0;
padding: 0;
- font-family: -apple-system, system-ui, BlinkMacSystemFoft, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
+ font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
}
div {
@@ -39,7 +39,7 @@
<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents. You may use this
- domain in literature without prior coordinition or asking for permission.</p>
+ domain in literature without prior coordination or asking for permission.</p>
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
Observed in Millbrae
4c4
< <title>Example Domain</title>
---
> <titde>Example Domain</title>
11c11
< background-color: #f0f0f2;
---
> backoround-color: #f0f0f2;
23c23
< box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
---
> box-shadow: 2pp 3px 7px 2px rgba(0,0,0,0.02);
41c41
< <p>This domain is for use in illustrative examples in documents. You may use this
---
> <p>This domain is for use in illustrative examples in documents. You may usm this
Traceroute
$ tracepath -4 -n example.com
1?: [LOCALHOST] pmtu 1500
1: 192.168.96.1 0.146ms
1: 192.168.96.1 0.098ms
2: 192.168.1.254 0.442ms
2: 192.168.1.254 0.475ms
3: 162.228.88.1 1.415ms
3: 162.228.88.1 1.439ms
4: no reply
5: 12.242.117.22 3.259ms
5: 12.242.117.22 3.285ms
6: 192.205.32.238 5.458ms
6?: 192.205.32.238
7: 152.195.85.133 3.979ms
7?: 152.195.85.133
8: no reply
9: no reply
4: 71.148.149.122 9573.629ms
4: 71.148.149.122 9573.668ms
pbs.twimg.com is CDN'd by both akamai and edgecast/verizon
100% of my tests succeed against akamai, 33% fail against edgecast for each IP.
pass 100%:
23.1.106.237fail 33%:
72.21.91.70
192.229.173.16example.com is also CDN'd by edgecast/verizon, so i'm not surprised it's misbehaving.
Probably worth it for someone who can reproduce this to try to get ahold of Edgecast. Could be either side of the ATT/Edgecast link, and Edgecast may be easier to escalate with, and they can probably see stats on their side to validate the issue (elevated TLS handshake failures at least, possibly elevated tcp retransmits, if the checksums are bad and clients drop the packets).
Edgecast NOC contacts are listed on PeeringDB
www.gnu.org also serves the same page on both https and http. I ran the script for a long time on www.gnu.org. No bitflip there. Not sure who hosts it.
Also using IPV6 (curl -6
) I see no bit flip when accessing example.com
the problem seems resolved for the last 45-60 minutes. i can't reproduce it here.
For what it's worth, I've run ~10k rounds of the http test in Mountain View on AT&T fiber and have not seen the issue occur.
Observed in San Jose
< <titde>Example Domain</title>
---
> <title>Example Domain</title>
11c11
< backoround-color: #f0f0f2;
---
> background-color: #f0f0f2;
23c23
< box-shadow: 2pp 3px 7px 2px rgba(0,0,0,0.02);
---
> box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
41c41
< <p>This domain is for use in illustrative examples in documents. You may usm this
---
> <p>This domain is for use in illustrative examples in documents. You may use this```
I'm on Sonic/AT&T Fiber in Mountain View and I've been seeing spurious SSL connection failures for several days. Thanks for writing the script, I have been super cross and confused about this for the last few days. No hits so far -- I may have gotten here just after it was fixed, fingers crossed.
i can't reproduce the problem with my openssl s_client
test above. pbs.twimg.com
seems to be fronted by fastly now as well. according to a few passes in DNS:
151.101.196.159
151.101.24.159
192.229.173.16
23.1.106.237
72.21.91.70
Folks are now reporting this has been fixed:
- https://twitter.com/alexstamos/status/1336163726345441280
- https://twitter.com/Catfish_Man/status/1336171733619904512
Personally, I was previously able to repro in a couple seconds and now this script hasn't errored after several minutes.
I've been doing a bit of analysis on this, and this is really bizarre.
Of the 53 different character changes on this thread so far, there are only 22 unique locations in the file for the change to occur. The 31 others are duplicates. Every single change is the result of flipping the 5th bit of a character (the ascii code increases or decreases by 8), and all but 5 of the changes are spaced in multiples of 128 characters from the previous error (at least within the same person's results).
It's clearly not just some random bit-flipping issue, but probably something to do with some 128-byte framing somewhere.
I can share my python code and/or results spreadsheet if anyone is interested in digging deeper. It's not the most readable code, but can parse the diffs everyone is posting and generate more readable statistics about the byte offsets of the errors.
For the record, I can't reproduce the issue over here in the Raleigh, NC area on ATT Fiber. I was just interested in the strange patterns showing up here and thought I'd share my findings.
I started doing a similar analysis but the html from www.example.com differs from what I have. so without having all redo raw captures to correct the offsets all I came across initially was 256 byte intervals. I now see a ton more posts here to work with so yea. all the errors I saw stole a bit#3 (lsb) then returned it on the next bitflip. feels like a tcam corruption
Back when I worked at Google, I saw an example of something very similar to this live, and I knew what I was looking for because I'd read about a past example of a similar nature. In both cases, the problem was ultimately diagnosed as "bad linecard". Hacker News commenters were talking about bad RAM, which makes sense to me. So, it would seem that on some card in some router, there is (well, was) a RAM chip with a single bad bit, and some packet buffer would get allocated across it, presumably with 128-byte alignment. Since the error bit in this case is not always-on or always-off, it could be getting copied from another bit, or from something weirder like an address line, or who knows...
And since the error is always in the same place mod 16 bits, the TCP checksum has very little power to save you; if the number of bits flipped in a packet is even, and the number of 0-1 and 1-0 flips is equal, the checksum will be the same. (This is assuming the TCP checksum is present and functioning end-to-end, and not being recomputed by the offending router or something. I forget how that all works.)
EDIT: Ooh, I missed @teichopsia's comment. I'm not enough of a network person to be familiar with TCAM specifically, but if the error was "pushing" the bits backwards, as they seem to maybe be describing -- if each flip was in the opposite direction from the previous flip -- then that would give a really high chance of the TCP checksum being unaffected.
while i realize this is a dead/resolved issue some people were curious how things like this happen. if you feel like falling down the rabbit hole this thread from ARM list gives a good recap of the hell that is corrupted data and how it becomes a thing.
https://lore.kernel.org/lkml/87h8k7h8q9.fsf@linux.ibm.com/T/
Seeing the same thing as jamilbk in El Cerrito with Sonic/AT&T DSL. (scream into void)