The Great Firewall of China (GFW)

The Great Firewall of China (GFW)

The Internet in China is censored by set of measures often referred to as the Great FireWall (GFW). It is arguably the most sophisticated censorship system in the world. It has therefore been studied enough, but rarely in so much detail as in the excellent article "How Great is the Great Firewall? Measuring China's DNS Censorship", the authors of which have monitored the GFW over a long period.

The GFW does not have a unique technique of censorship. One of its strengths is to combine several techniques. One of them is the generation, by the network itself (and not by a lying resolver), of bogus DNS responses. You ask for a censored name and as a result you get an answer giving an IP address that has nothing to do with the question asked. In most countries censorship is applied through DNS, the false response is an NXDOMAIN -- code indicating that the name does not exist, or the address of a website that will display an explanatory message. On the contrary, the Chinese censors are anxious to cover their tracks and return a real address, which will make it more difficult to understand what is going on.

Here is an example. If you query the IP address 113.113.113.113, which is in China on the network of China Telecom, no machine responds to this address. You can test it with nmap for instance. The same happens if you ask about a domain name:

$ dig +short @113.113.113.113 A mit.edu
;; connection timed out; no servers could be reached

But if you ask him about a censored name, then the network generates a false answer. Even if the input is the same, the response varies from a request to another:

$ dig +short @113.113.113.113 A scratch.mit.edu
75.126.164.178
$ dig +short @113.113.113.113 A scratch.mit.edu
174.37.243.85
$ dig +short @113.113.113.113 A scratch.mit.edu
157.240.17.14

The IP address 157.240.17.14 belongs to Facebook (normally scratch.mit.edu is at Fastly), a prime example of the lies generated by the GFW.

To study this mechanism in detail, the authors of the article "How Great is the Great Firewall?" have developed the software GFWatch which allows making studies of the GFW over a long period, among other things on the IP addresses returned by the GFWatch.

GFWatch uses lists of domain names in large TLDs like .com, augmented by names that have been found to be censored (scratch.mit.edu is censored - see example above - but mit.edu is not censored, so using lists of names under TLDs like .edu is not enough). It then queries IP addresses in China, addresses that do not answer DNS questions (remember that false answers are made up by middleboxes and so there is no need for the IP address in question to even answer., as in the case of 113.113.113.113). Any response is therefore necessarily an act of censorship. The GFWatch software then stores the responses. The article uses data for 2020, collected over nine months. The censored domains are listed on the next page:

https://gfwatch.org/censored_domains

Out of 534 million domains tested, 311,000 triggered a false response from GFW. Chinese censors are not afraid of false positives and, for example, mentorproject.org is censored, probably only because it contains the string torproject.org, censored because censors don't like Tor.

GFWatch can thus obtain a long list of censored domains, and try to rank them, which gives an idea of the policy followed by the censors (needless to say that the managers of GFW do not publish activity reports detailing what they are doing :p). For example, there are areas related to the Covid-19 pandemic (the Chinese authorities do not want to let information on the disease circulate freely).

One of the peculiarities of GFW is the return of IP addresses unrelated to the requested name (such as, above, a Facebook address returned instead of that of MIT). What are these IP addresses? How many are there? How are they chosen? It is one of the big interests of a system like GFWatch, to be able to answer these questions. The returned address is clearly not taken at random from the entire IPv4 address space. Only 1781 IPv4 addresses were seen by GFWatch, almost half being Facebook addresses. The GFW also returns IPv6 addresses:

dig +short @113.113.113.113 AAAA scratch.mit.edu
2001::a27d:1183

For IPv6, all IP addresses belong to the prefix reserved for Teredo (RFC 4380), a technology that is now discontinued.

As for IPv4 addresses, their number varies over time (new addresses appear from time to time), and the choice does not seem random, with some addresses appearing more than others.

Because the false responses are generated by the network (more exactly by a middlebox), and not by a server, the GFW sometimes scrambles the responses of legitimate servers. Several cases are cited by the article, but we may mention a very recent case, the scrambling of responses from the root server k.root-servers.net because the resolver of a Mexican ISP had the misfortune to query the Beijing instance of k.root-servers.net and the GFW therefore sent its false responses. The point was discussed on the OARC's dns-operations list in November 2021, and it appears that the Beijing instance's BGP announcement was transmitted far beyond its intended reach (a relatively common issue with anycast servers nowadays).

It is important to notice that some public DNS resolvers receive responses generated by the GFW and store them. The responses of such poisoned memory is then served to innocent users. In short, it is not repeat enough that we must use DNSSEC (sign the zones, and verify the signatures; the big public resolvers check all the signatures but this only works if the zone is signed).

How to fight against censorship? Already, the article notes that the GFW is in general "on the side" and not "on the way". It injects a lie but doesn't block the real answer. Sometimes this happens even before the lie, if the GFW has been slow to respond. A possible solution would therefore be to wait a bit to see if we do not receive another, more true answer. Certain reasons in the false response (such as the use of the 2001::/32 prefix, normally unused, for AAAA requests) could allow censorship responses to be ignored. You can see the addresses returned by the liars on the GFWatch site. But, as said above, the solution is obviously DNSSEC, with a secure link to the validating resolver (for example with DoT or DoH). But beware, this only resolves DNS censorship; t he GFW employs a combination of techniques and evading it is not easy (and can, if you are in China, attract the attention of rather obnoxious people in uniform).

Get started with 100,000 free lookups: sign up