Study reveals new data on region-specific website blocking practices

A team of researchers unearthed new data on geographic denial of access to web content in a new paper.

403 error message Enlarge

When a website is blocked to users across an entire country, the popular first assumption is that a government censorship program is at work. While this can certainly be the case, there are other reasons online content might be nationally unavailable. Service operators and publishers sometimes deny access themselves, server-side, to clients from a variety of locations.

A team of researchers led by Prof. Roya Ensafi and PhD student Allison McDonald unearthed new data on this access denial in their paper, “403 Forbidden: A Global View of CDN Geoblocking,” to be presented at the 2018 ACM Internet Measurement Conference.

This paper presents the first wide-scale measurement study of server-side geographic restrictions, or geoblocking, a phenomenon in which websites block access for users in particular countries or regions. Many websites practice geoblocking to comply with international regulations, local legal requirements, or licensing restrictions, as well as to enforce market segmentation or prevent abuse. Some websites even do so simply to reduce unwanted traffic.

“Our data suggests that making it easy for websites to block entire countries likely contributes to overly-aggressive blocking of their customers,” says Ensafi.

Excessive blocking can result in entire national populations being unable to reach valuable sites and content, and the researchers say there is an abundance of evidence that overblocking frequently occurs.

Geoblocking has drawn increasing scrutiny from policymakers, the team says. A 2013 study by the Australian parliament concluded that geoblocking forces Australians to pay higher prices, and in 2017 the European Union banned some forms of geoblocking to foster a single European market.

Besides this, advocates say the practice can exacerbate a “balkanization” of the Internet, giving populations of different regions very different online experiences. For example, after the General Data Protection Regulation came into effect in May 2018, several major US-based news sites blocked access from Europe entirely. Some sites built on the Google App Engine are unavailable in Cuba and Iran by default, due to Google’s interpretation of U.S. regulations, and many regions that witness a high concentration of spam activities are blocked where enhanced security measures could be used instead.

The team hopes that quantifying geoblocking will help reduce these and other overblocking practices by highlighting the extent of its impact on users.

To help researchers and policymakers understand this phenomenon, the team developed a semi-automated system to detect instances where whole websites were rendered inaccessible due to geoblocking. By focusing on detecting geoblocking capabilities offered by large CDNs and cloud providers, they were able to reliably distinguish the practice from dynamic anti-abuse mechanisms and network-based censorship.

This distinction can be key in eliminating errors in internet censorship research. The findings indicate that 9% of domains on a widely used list of censored domains returned a CDN block page in at least one country.

“Censorship measurement studies should take geoblocking into consideration before ascribing unavailable sites to network-based censorship,” says Ensafi.

Findings indicate that geoblocking occurs across a broad set of countries and sites – in fact it was observed in nearly all 177 countries in the study. Countries facing US economic sanctions such as Iran, Syria, Sudan, Cuba, and Russia experienced the highest rates, particularly for finance, banking, and shopping sites.

Researchers applied the techniques to test for geoblocking across the Alexa Top 10K sites from thousands of vantage points in 177 countries. They then expanded the measurement to a sample of content-delivery network customers in the Alexa Top Million. The team observed a median of 3 domains inaccessible due to geoblocking per country, with a maximum of 71 domains blocked in Syria. The domains in the Alexa Top Million saw an overall rate of 4.4% of domains using their CDN’s geoblocking feature in at least one country.

The team collaborated with Cloudflare to verify their observations and gain additional insights into the phenomenon. In particular, the authors found that geoblocking is rapidly becoming more prevalent, adding all the more urgency to their continued work of highlighting the practice’s negative impact. In the best case, their efforts will sway other CDNs and cloud providers to limit access to the tools and encourage a more open, inclusive internet.