How the Twitter App Bypasses Paywalls

by Isoroku Yamamoto

Wall Street Journal ended its practice of allowing special access for search engines. This means that a human visitor can no longer bypass the paywall by spoofing Google’s HTTP request headers.

However, subscription-based publications face a problem when users click on a link through Twitter or Facebook on a mobile device. Social media apps implement their own in-app browser, which generally do not retain cookies. Websites that require a user login must request the login every time the app is reopened.

This makes for a cumbersome user experience. Thus, publications like the Wall Street Journal disable login checks when a page request appears to come from Twitter.

It does this by inspecting HTTP request headers. The important headers are Referer and User-Agent.

When a link is shared on Twitter, the url is shortened to something like “https://t.co/9Mk58nL3xJ.” This goes to a Twitter server, which redirects the browser to the intended destination. Websites determine whether Twitter initiated the redirect by checking that the HTTP Referer string begins with “https://t.co/.” The rest of the string is ignored.

A web request from Twitter further identifies itself through the User-Agent header, which might look something like “Mobile/14C92 Twitter for iPhone.”

By submitting this information in request headers, any web browser can appear to be the Twitter app. It is easy to do this using a Chrome extension.

The following builds on top of last year’s tutorial for mimicking Google’s web crawler.

1. Use the same manifest.json file as before. Take care to list both http:// and https:// versions of the sites you are interested in, as many publishers now use ssl.

2. Modify the background.js file. The modified version should look like the one below. It is worth noting that all cookies have been blocked.

var VIA_TWITTER = ["wsj.com"]

function changeRefer(details) {

  foundReferer = false;
  foundUA = false;

  var useTwitter = VIA_TWITTER.map(function(url) {
    if (details.url.includes(url)) {
      return true;
    }
    return false;
  })
  .reduce(function(a, b) { return a || b}, false);

  var reqHeaders = details.requestHeaders.filter(function(header) {

    // block cookies by default
    if (header.name !== "Cookie") {
      return header;
    } 

  }).map(function(header) {
    
    if (header.name === "Referer") {
      header.value = setRefer(useTwitter);
      foundReferer = true;
    }
    if (header.name === "User-Agent") {
      header.value = setUserAgent(useTwitter);
      foundUA = true;
    }
    return header;
  })
  
  // append referer
  if (!foundReferer) {
    reqHeaders.push({
      "name": "Referer",
      "value": setRefer(useTwitter)
    })
  }
  if (!foundUA) {
    reqHeaders.push({
      "name": "User-Agent",
      "value": setUserAgent(useTwitter)
    })
  }
  return {requestHeaders: reqHeaders};
}

function blockCookies(details) {
  for (var i = 0; i < details.responseHeaders.length; ++i) {
    if (details.responseHeaders[i].name === "Set-Cookie") {
      details.responseHeaders.splice(i, 1);
    }
  }
  return {responseHeaders: details.responseHeaders};
}

function setRefer(useTwitter) {
  if (useTwitter) return "https://t.co/T1323aaaa"; 
  else return "https://www.google.com/";
}

function setUserAgent(useTwitter) {
  if (useTwitter) return "Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X) AppleWebKit/602.1.32 (KHTML, like Gecko) Mobile/14C92 Twitter for iPhone";
  else return "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)";
}

chrome.webRequest.onBeforeSendHeaders.addListener(changeRefer, {
  urls: ["<all_urls>"],
  types: ["main_frame"],
}, ["requestHeaders", "blocking"]);

chrome.webRequest.onHeadersReceived.addListener(blockCookies, {
  urls: ["<all_urls>"],
  types: ["main_frame"],
}, ["responseHeaders", "blocking"]);

Save both files in the same directory. The updated source code can also be downloaded here.

Now type chrome://extensions/ in the browser address bar.

Reload the old extension, or Load it as an unpacked extension if you have not previously done so. Enable the chrome extension and visit wsj.com.

There is always a tradeoff between security and usability. The fastest way to compromise a computer system is to accommodate lazy users. Or worse yet, accommodate lazy programmers.

32 thoughts on “How the Twitter App Bypasses Paywalls”

Pingback: How Google’s Web Crawler Bypasses Paywalls | Elaine's Idle Mind
Butchmo says:

January 19, 2017 at 9:30 am

Great! Thanks for this update.

Reply
Sang Han (@jjangsangy) says:

January 19, 2017 at 8:02 pm

// You want these function scoped not global
var foundReferer = false;
var foundUA = false;

Otherwise good stuff ojisan 😉

Reply
1. pbutler222 says:
  
  January 27, 2017 at 2:00 pm
  
  More tutorials please! 🙂
  
  Reply
Skates says:

January 20, 2017 at 6:08 am

Any way to turn this into a Greasemonkey script?

Reply
mikegre2014 says:

January 22, 2017 at 4:53 pm

Seriously? You’re teaching your readers to steal?

Reply
1. Isoroku says:
  
  January 22, 2017 at 11:59 pm
  
  I teach the pitfalls of sacrificing security for usability, because the best way to build a good defense is to learn to play offense. If you decide to use this knowledge to steal something, that’s on you.
  
  Reply
renanoble says:

February 4, 2017 at 6:42 am

I am having a problem with loading the extension. It says “Manifest file is missing or unreadable.” although Chrome is pointing to the correct path

Reply
Willy says:

February 23, 2017 at 6:25 am

Magnificent. Thank you!

Reply
Charles Freilich (@charlesfreilich) says:

March 14, 2017 at 2:16 am

Is it possible to implement this on Android?

Reply
Klaas Vaak says:

April 29, 2017 at 5:38 am

I cannot access the Financial Times: has anything changed?

Reply
Tito Dick "Dickman," baby! says:

May 2, 2017 at 4:13 pm

I tried reading this article (http://www.telegraph.co.uk/news/2017/05/01/day-1776-illuminati-modern-day-conspiracy-theorists-favourite/), but the extension didn’t work, even though I added the site to manifest.json. Is there anything I did wrong?

Reply
Tito Dick "Dickman," baby! says:

May 2, 2017 at 4:15 pm

I tried reading this article (http://www.telegraph.co.uk/news/2017/05/01/day-1776-illuminati-modern-day-conspiracy-theorists-favourite/), but the extension didn’t work, even though I added the site to manifest.json. Is there anything I did wrong?

(this is a duplicate post, I just forgot to check “notify me of comments.” PLEASE IGNORE THE LAST POST!!!!!!!!!!!!1111)

Reply
vibhore singh says:

June 23, 2017 at 7:53 pm

This technique doesn’t work for http://www.foreignaffairs.com ,I have tried modifying the manifest.json file ,but doesn’t help. I am curious what kind of sophasticated paywall they are using. Thanks a lot though for your updates,been following your blog regularly .

Reply
1. Klaas Vaak says:
  
  June 24, 2017 at 1:48 am
  
  1 thing we all need to bear in mind is that the workaround that Mr. Yamamoto has suggested above does not flawlessly work on all of the paywalls. With some paywalls one can be lucky that the workaround works, but other paywalls are too robust & “smart” to be tricked.
  
  Reply
Sang Han (@jjangsangy) says:

July 3, 2017 at 8:36 pm

Turns out you can even make it simpler. The only thing you need to spoof is Referer which is unfortunately not easy for browsers, definitely possible for chrome extensions.

But who cares about rules, I had some an article summarizer that did some internet scraping. I wanted to set it free on the news so thought I would implemented this hack and hey, it still works! Hooray! Owe you one Elaine 😉

https://explaintome.herokuapp.com/

Reply
1. Klaas Vaak says:
  
  February 25, 2018 at 9:42 pm
  
  This link is a bummer, nothing there.
  
  Reply
Anonymous says:

July 17, 2017 at 10:47 pm

It was helpful when it worked (I discovered this independently and chose not to publish it). Now that you’ve made it so easy to exploit, they’ll have to do something about it. I wish you had written about this more in the abstract and not explicitly mentioned the newspaper in question.

Reply
Sun W Kim says:

July 18, 2017 at 10:23 am

If you adjust your referer to facebook.com, you can get into wsj.com without much fanfare.

Reply
Pingback: IT Security Weekend Catch Up – July 24, 2017 – BadCyber
gwkl says:

October 20, 2017 at 2:12 pm

Has this been patched? Doesn’t seem to be working for me on WSJ .

Reply
1. Elaine says:
  
  October 21, 2017 at 8:50 pm
  
  still works as far as i can tell. eg, try this url: https://t.co/AS1B0xLS27
  
  Reply
  1. gwkl says:
    
    October 24, 2017 at 3:19 pm
    
    Are you able to view this article: https://blogs.wsj.com/cio/2017/10/20/in-the-digital-economy-education-level-increasingly-defines-wage-potential/
    
    because it does not work for me. Some paid articles seem to load just fine, but this one still doesn’t.
    
    Reply
    1. Elaine says:
      
      October 24, 2017 at 11:09 pm
      
      yes, it works, but i’m also blocking cookies. Try it in incognito mode, does that work?
      
      Reply
Butchmo says:

February 25, 2018 at 8:49 pm

The Chronicle of Higher Education blocks me from accessing its website when using the extension (after adding https://www.chronicle.com to the list of sites, of course). It says I’m a bot.

And the most intriguing thing is that its articles behind the paywall (“premium content”) are fully indexed by Google.

Maybe it has a clever system that allows only the actual Google Bot to access its content?

Reply
1. Klaas Vaak says:
  
  February 25, 2018 at 9:32 pm
  
  The script works for a lot of sites, but not for all sites.
  
  Reply
  1. Butchmo says:
    
    February 26, 2018 at 4:14 pm
    
    Yeah, I know. But the CHE is the first site with porous paywall that I found on which the script does not work. The real Googlebot can index the site, but we can’t.
    
    Reply
    1. Klaas Vaak says:
      
      February 26, 2018 at 8:56 pm
      
      Just a thought: have you added the CHE to the manifest.json file?
      
      Reply
2. Klaas Vaak says:
  
  February 25, 2018 at 9:44 pm
  
  Elaine, is there any way to modify this for FIrefox?
  
  Reply
Mark says:

February 28, 2018 at 2:47 am

Hi this script used to work for thetimes.co.uk, as of today it doesn’t. Is there any way it can be fixed? Would be most grateful for any help.

Reply
Pingback: Poweekendowa Lektura 2017-07-24 – bierzcie i czytajcie | Zaufana Trzecia Strona
Pingback: Soft paywalls – FiveFilters.org

Elaine's Idle Mind

and Devil's Workshop

How the Twitter App Bypasses Paywalls

Like this:

Related

32 thoughts on “How the Twitter App Bypasses Paywalls”

Leave a ReplyCancel reply

Go talk about it:

Like this:

Related

32 thoughts on “How the Twitter App Bypasses Paywalls”

Leave a ReplyCancel reply