Great Firewall Tor Probing Circa 09 DEC 2011
Author: Tim Wilde, Team Cymru, Inc. <email@example.com>
Date: 06 JAN 2012
This analysis was performed during the week of 05 DEC 2011 - 09 DEC 2011, in
an attempt to determine the extent and type of blocking and probing being
performed by the Great Firewall of China (henceforth "GFW") against
unpublished Tor bridges located outside of China, but accessed from within
These results are considered valid at the time of their collection (05-09 DEC
2011), and no guarantees are provided to their continuing validity beyond
that time, due to the changing nature of anti-circumvention techniques and
This analysis was originated based on Tor trac ticket #4185. A github
"gist" of the results, including this report, can be found at
Pre-Analysis Conditions, Goals, and Assumptions
Prior to the initiation of this round of analysis, it had been determined
that any Tor bridge outside of China that was connected to by a client
within China would be blocked by the GFW, usually within a matter of
minutes. Previous analysis revealed that the Tor connection initiated by
the client resulted in a series of probes from a variety of other IPs within
China (varying from connection to connection), and, within minutes of those
probes, the client's connection to the bridge would be blocked and could not
be re-initiated. Blocking all connections to the bridge except from the
intended client was shown to allow the connection to persist successfully
for > 48 hours, thus showing a direct connection between the probing and
blocking. This previous analysis was performed ad-hoc without dedicated
attention/resources, and the analysis described in this report was performed
as a follow-up under more stringent conditions to attempt to determine the
exact parameters of probing and blocking.
The primary goals of this analysis were:
1) Determine the extent of GFW probing of Tor connections, and more broadly,
SSL sessions in general; what triggers the probing, what time considerations
are involved, etc.
2) Determine how the decision to block is made.
3) Determine a course forward for Tor to circumvent probing and blocking.
It is assumed that, despite a significant series of probes originating from
a single host, the GFW does not change behavior to prevent us from learning
what they are doing. This assumption has been tested with a smaller set of
tests from some other hosts and appears to hold true, but cannot be
During the previous analysis it was noted that blocking appeared to be
performed against specific destination IP:port pairings; attempting to
connect to a new port on a host after a previous connection had been blocked
was successful, but provoked new probing and blocking in the same way as the
previous port. It is assumed that this destination IP:port pairing is still
in use in current techniques.
Summary of Conclusions
Goal #2 was found to be unachievable at the time of analysis, as it was not
possible throughout our testing to provoke blocking. Testing of connections
to "new" published Tor relays (exact dates for "new" not determined) showed
that even published Tor relays were connectable from within China. Once
this was determined, focus shifted to understanding the probing and how it
could be avoided (goals #1 and #3).
With respect to goal #1, our testing showed that the GFW appears to perform
"garbage binary" probes of the non-China side of any connection from China
to TCP port 443 that performs an SSL negotiation. This probe is performed
in near-real-time after the connection is established, implying
near-line-rate deep packet inspection (DPI) capabilities. TCP/443
connections that did not actually exchange an SSL handshake, such as using
the obfsproxy obfs2 protocol, did not provoke probing. At this time the
purpose of this probe is unknown; further details of its contents and
speculation as to its purpose can be found later in this report.
A second type of probing, which appears to be specifically aimed at Tor, was
also detected. This probing appeared to occur using a client that actually
spoke SSL, and, within the SSL session, the Tor protocol, and was not
activated in real-time, but appeared to be triggered to occur at regular 15
minute past the hour intervals if its conditions were met. Analysis showed
that the Tor SSL cipher list was at least one significant component of the
detection mechanism that initiated this probing; a client modified to remove
a sincle cipher from the list, when connecting to an unmodified unpublished
bridge in the United States, did not trigger probing. The SSL "client
hello" packet was determined as a direct cause of the probing; when just
that data was sent to an open Tor bridge from within China using netcat, the
probing was triggered on the expected 15 minute interval.
A connection between a Chinese client and a US unpublished bridge node over
the obfsproxy obfs2 protocol was able to successfully persist for several
days, including numerous disconnects/reconnects, without provoking any type
In the immediate term, modifying the Tor SSL client hello to present a list
of ciphers more closely (ideally identically) matching that of a common
commodity web browser would bypass the current probe trigger. In general,
the items in Proposal 179 remain valid and a wise choice; the same eye
should be cast towards the client hello, and an attempt made to match it as
closely to commodity web browsers as possible. With both sides of that
equation implemented, it becomes much more difficult for any authoritarian
regime to distinguish Tor traffic from standard SSL traffic, requiring
significantly more effort to locate unpublished bridge nodes and connections
In the longer term, the two problems that need to be solved are:
1) Client-server negotiation that is uniquely identifiable in any way.
2) The ability of an arbitrary adversary to, given a suspected Tor server
IP:port combination, connect to that IP:port combination and perform a
negotiation using the Tor protocol, or anything that can be uniquely
identified to a Tor protocol negotiation.
Problem #1 can be solved by a transport that either looks identical on both
sides to a standard HTTPS SSL session, or that looks like nothing
identifiable at all, like the obfs2 protocol provided by obfsproxy.
Problem #2 is somewhat more difficult. Tor proposal 190 combined with the
ideas in  is one method that has some promise. The "shared secret"
implementation of obfs2 in obfsproxy is another, as an adversary attempting
to initiate an obfs2 connection without the right shared secret would be
unable to get anything but noise back from the server. That noise, however,
which the server currently emits immediately upon connection in the current
implementation, may be trivially fingerprintable, as it is not entirely
common in the wild.
A single system within China was used to initiate connections to a series of
Tor bridges provisioned within the Amazon EC2 cloud computing product,
within the United States. EC2 was utilized to allow for rapid changing of
IPs while maintaining toolsets and datasets, to eliminate possible
black/whitelisting of IPs by the GFW. All test sessions were monitored with
full packet captures generated on the bridge nodes, and at several points in
the process Tor debug logs were captured from the bridge as well.
Reasonable efforts were made to ensure that each test used a unique
connection tuple to prevent cross-contamination.
General probe observations
All probes came from a wide range of IP addresses within China. A sampling
of these IPs as collected through our analysis can be found on the following
This is by no means an exhaustive list; we have not yet undertaken to
attempt a fuller collection of probing IPs. It is also possible that this
list contains false positives; it is simply a list of all IPs (other than
those of our test systems) geolocated to China that were observed connecting
to our unpublished bridges during the testing cycle; random scanning
activity could have snuck in, but the majority are due to active probes.
Each probing "session" generally consisted of multiple probes from multiple
different IPs within China. p0f (passive OS fingerprinting) data collected
during the probes indicated that they generally originated from Linux
systems, but there was some conflicting data in the source ports used by
some probes, which appeared in ephemeral port ranges more commonly seen on
Windows systems. This data is not conclusive in any way due to the
uncertain nature of p0f and the ability to reconfigure source port ranges at
will in modern operating systems.
Garbage binary probes
Any connection from China to TCP/443 of a host in the US that exchanged an
SSL handshake, no matter what the certificates looked like (self-signed,
CA-signed, etc) provoked what we have termed the "garbage binary" probes
("garbage probes" for short) against the US host. The only exception to
this seemed to be "old" hosts, which is to say, long-established hosts that
have had HTTPS services running for an extended period of time. Connections
to those hosts did not appear to provoke any type of scanning activity.
The garbage probes occurred within moments of the SSL negotiation. In some
cases, the data sent by these probes was identical, even from different
source hosts and at different times.
Notably, garbage probes did not occur in the case of TCP/443 connections
that spoke simple plaintext or non-SSL binary protocols, such as obfs2.
This indicates that DPI was taking place, likely at or near line-rate, to
select which hosts should be probed.
Once a host was probed, subsequent connections with the same destination
outside of China did not provoke probes for approximately the next 10
minutes (measured from the final packet in a sequence of probes). After 10
minutes, a new SSL negotiation between the hosts would trigger a new round
of probing. The "ignored" interval remained consistent through many rounds
We have not attempted to exhaustively collect and compare the data sent in
these probes, but what analysis we have performed does not provide any
insight into their purpose or intent. It is possible that these probes are
designed to provoke a response from a different category of server that
operates on TCP/443 (and possibly other ports) and that the Chinese
government is intending to block, but that is only speculation at this time.
It is not clear if further analysis would be worthwhile at this time given
that these probes do not, at the surface, appear to be directed specifically
at Tor. The lack of currently active blocking resulting from any probes
also makes it difficult to evaluate the effectiveness of any changes that
might result from such analysis.
Full SSL/Tor probes
Our testing demonstrated that the SSL client hello sent by a Tor client in
China to a US-based unpublished bridge will consistently trigger probing of
the bridge by Chinese hosts, consisting of fully-established TCP, SSL, and
Tor-protocol connections. Analysis of debug logs from bridges in this
testing showed behavior consistent with a Tor client configured with
multiple bridge statements. It is also possible, but impossible to
determine definitively at this time, that the probers were using custom
software either written from scratch or adapted from the Tor sources to
perform their probing.
Specifically, the debug logs and pcaps showed that the prober connected,
performed an SSL handshake, performed an SSL renegotiation, and then spoke
the Tor protocol. When speaking the Tor protocol, the probers we captured
built a one-hop circuit and sent a BEGIN_DIR cell, as would be expected from
a proper Tor client. When the bridge replies to the BEGIN_DIR cell with its
descriptor, the prober hangs up the connection.
This probing consistently took place within a range of +3 minutes from a 15
minute past the hour interval (HH:00, HH:15, HH:30, HH:45) after a Tor
client SSL client hello was sent from China to the US host. Multiple
connections were made in each round of probing. The probing occurred
consistently regardless of the TCP source and destination ports - if a Tor
SSL client hello packet was sent, on the next 15 minute interval a full
Tor-speaking probe could be expected, like clockwork.
Conversely, when the Tor SSL client hello was modified by removing a single
cipher from the cipher spec, full Tor connections from China to a US
unpublished bridge could be established multiple times, and for extended
durations, without provoking any probing. Removing the less-common
ServerName SSL extension from the client hello, however, did not change the
probing behavior, indicating that the probe is likely tied specifically to
the cipher specs, not to the entire hello packet. Unfortunately we were not
able to verify if sending a raw set of cipher specs only would trigger the
probing; this test was not performed before time on the analysis ran out.
The SSL cipher list probe trigger is further discussed in Tor trac ticket
Server-side SSL cert impacts
Before isolating the SSL client hello as the provoker of probing, we
performed a number of tests using a simple SSL server on the US side and
openssl s_client, wget, and Firefox on the China side to attempt to locate
server SSL certificate combinations that provoked probing. As discussed in
the garbage probing section, it appears that any SSL negotiation, even with
a fully legitimate CA-signed cert (and having used the correct hostname in
the cert to access the host, etc), will trigger the garbage probing. On
non-standard ports, though, none of these clients provoked any type of
probing, garbage or real SSL, no matter what the server certificate looked
like. As such it does not appear that the GFW is, today, keying on the
server certificate presented to make probing decisions.
Mike Perry posited that the prober IPs could be legitimate IPs within China
that are hijacked (partially or entirely by the GFW to perform its probes,
and suggested some traceroute testing to attempt to confirm or deny this
hypothesis. After several false starts, we were able to successfully source
TCP traceroute packets to a prober IP using the exact same
sourceIP:sourcePort:destIP:destPort tuple that was being used for the probe,
during the probe and after the probe. These results did not show a
different route during the probe and after the probe as Mike expected if
such hijacking were taking place, so it does not appear likely, though it
has not been conclusively disproven.
A number of other files are available in the github gist that contains this
* testcases.txt - a list of planned test cases for the project, plotted out
both before and during the project. Not all cases were tested as some
were deemed irrelevant when actual blocking could not be replicated, and
due to other results found during testing.
* results.txt - a log of results obtained from working within the test cases
described above. Somewhat terse and difficult to interpret for anyone
other than the author, but included for completeness.
Interpretation/annotation available upon request to the author.
* general-notes.txt - a series of general notes and a set of tentative
conclusions formed on the final two days of testing. These notes have
largely been distilled, cleaned up, and expanded upon within this report.
* traceroute.txt - traceroute output from the traceroute testing phase
described in a previous section.
* cn.log - hotpot.py output of some "garbage probes" collected early on in
* client-hello.raw - the raw bytes representing a Tor client's SSL client
hello sent via netcat to provoke full Tor/SSL probing.
Additionally, trusted researchers / Tor developers are welcome to contact
the author for access to the pcaps and debug logs collected during this
testing. Due to senstive contents (IPs of probing systems, etc), we cannot
release these publicly at this time.
Thanks to Team Cymru, Inc. for providing approximately a week of the
author's time dedicated to this analysis, as well as the funding for the
Amazon EC2 bridges and facilitating the access within China. Thanks to
George Kadianakis (asn) for his analysis assistance and help providing
direction for the testing, as well as the handy hotpot tool. Thanks also to
too many Tor developers and other contributors in #tor-dev to count or list,
but I'll try: nickm, armadev, rransom, Sebastian, and mikeperry. Apologies
to those I missed, it's not because you were unhelpful, I promise you!
06 JAN 2012 - Initial draft published
07 JAN 2012 - Minor revisions and additions per feedback from George