How the Twitter App Bypasses Paywalls

by Isoroku Yamamoto

Wall Street Journal ended its practice of allowing special access for search engines. This means that a human visitor can no longer bypass the paywall by spoofing Google’s HTTP request headers.

However, subscription-based publications face a problem when users click on a link through Twitter or Facebook on a mobile device. Social media apps implement their own in-app browser, which generally do not retain cookies. Websites that require a user login must request the login every time the app is reopened.

This makes for a cumbersome user experience. Thus, publications like the Wall Street Journal disable login checks when a page request appears to come from Twitter.

It does this by inspecting HTTP request headers. The important headers are Referer and User-Agent.

When a link is shared on Twitter, the url is shortened to something like “https://t.co/9Mk58nL3xJ.” This goes to a Twitter server, which redirects the browser to the intended destination. Websites determine whether Twitter initiated the redirect by checking that the HTTP Referer string begins with “https://t.co/.” The rest of the string is ignored.

A web request from Twitter further identifies itself through the User-Agent header, which might look something like “Mobile/14C92 Twitter for iPhone.”

By submitting this information in request headers, any web browser can appear to be the Twitter app. It is easy to do this using a Chrome extension.

The following builds on top of last year’s tutorial for mimicking Google’s web crawler.

1. Use the same manifest.json file as before. Take care to list both http:// and https:// versions of the sites you are interested in, as many publishers now use ssl.

2. Modify the background.js file. The modified version should look like the one below. It is worth noting that all cookies have been blocked.

var VIA_TWITTER = ["wsj.com"]

function changeRefer(details) {

  foundReferer = false;
  foundUA = false;

  var useTwitter = VIA_TWITTER.map(function(url) {
    if (details.url.includes(url)) {
      return true;
    }
    return false;
  })
  .reduce(function(a, b) { return a || b}, false);

  var reqHeaders = details.requestHeaders.filter(function(header) {

    // block cookies by default
    if (header.name !== "Cookie") {
      return header;
    } 

  }).map(function(header) {
    
    if (header.name === "Referer") {
      header.value = setRefer(useTwitter);
      foundReferer = true;
    }
    if (header.name === "User-Agent") {
      header.value = setUserAgent(useTwitter);
      foundUA = true;
    }
    return header;
  })
  
  // append referer
  if (!foundReferer) {
    reqHeaders.push({
      "name": "Referer",
      "value": setRefer(useTwitter)
    })
  }
  if (!foundUA) {
    reqHeaders.push({
      "name": "User-Agent",
      "value": setUserAgent(useTwitter)
    })
  }
  return {requestHeaders: reqHeaders};
}

function blockCookies(details) {
  for (var i = 0; i < details.responseHeaders.length; ++i) {
    if (details.responseHeaders[i].name === "Set-Cookie") {
      details.responseHeaders.splice(i, 1);
    }
  }
  return {responseHeaders: details.responseHeaders};
}

function setRefer(useTwitter) {
  if (useTwitter) return "https://t.co/T1323aaaa"; 
  else return "https://www.google.com/";
}

function setUserAgent(useTwitter) {
  if (useTwitter) return "Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X) AppleWebKit/602.1.32 (KHTML, like Gecko) Mobile/14C92 Twitter for iPhone";
  else return "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)";
}

chrome.webRequest.onBeforeSendHeaders.addListener(changeRefer, {
  urls: ["<all_urls>"],
  types: ["main_frame"],
}, ["requestHeaders", "blocking"]);

chrome.webRequest.onHeadersReceived.addListener(blockCookies, {
  urls: ["<all_urls>"],
  types: ["main_frame"],
}, ["responseHeaders", "blocking"]);

Save both files in the same directory. The updated source code can also be downloaded here.

Now type chrome://extensions/ in the browser address bar.

Reload the old extension, or Load it as an unpacked extension if you have not previously done so. Enable the chrome extension and visit wsj.com.

There is always a tradeoff between security and usability. The fastest way to compromise a computer system is to accommodate lazy users. Or worse yet, accommodate lazy programmers.

32 thoughts on “How the Twitter App Bypasses Paywalls

    1. I teach the pitfalls of sacrificing security for usability, because the best way to build a good defense is to learn to play offense. If you decide to use this knowledge to steal something, that’s on you.

    1. 1 thing we all need to bear in mind is that the workaround that Mr. Yamamoto has suggested above does not flawlessly work on all of the paywalls. With some paywalls one can be lucky that the workaround works, but other paywalls are too robust & “smart” to be tricked.

  1. Turns out you can even make it simpler. The only thing you need to spoof is Referer which is unfortunately not easy for browsers, definitely possible for chrome extensions.

    But who cares about rules, I had some an article summarizer that did some internet scraping. I wanted to set it free on the news so thought I would implemented this hack and hey, it still works! Hooray! Owe you one Elaine 😉

    https://explaintome.herokuapp.com/

  2. It was helpful when it worked (I discovered this independently and chose not to publish it). Now that you’ve made it so easy to exploit, they’ll have to do something about it. I wish you had written about this more in the abstract and not explicitly mentioned the newspaper in question.

  3. The Chronicle of Higher Education blocks me from accessing its website when using the extension (after adding https://www.chronicle.com to the list of sites, of course). It says I’m a bot.

    And the most intriguing thing is that its articles behind the paywall (“premium content”) are fully indexed by Google.

    Maybe it has a clever system that allows only the actual Google Bot to access its content?

      1. Yeah, I know. But the CHE is the first site with porous paywall that I found on which the script does not work. The real Googlebot can index the site, but we can’t.

  4. Hi this script used to work for thetimes.co.uk, as of today it doesn’t. Is there any way it can be fixed? Would be most grateful for any help.

Leave a Reply