How the Twitter App Bypasses Paywalls
by Isoroku Yamamoto
Wall Street Journal ended its practice of allowing special access for search engines. This means that a human visitor can no longer bypass the paywall by spoofing Google’s HTTP
request headers.
However, subscription-based publications face a problem when users click on a link through Twitter or Facebook on a mobile device. Social media apps implement their own in-app browser, which generally do not retain cookies. Websites that require a user login must request the login every time the app is reopened.
This makes for a cumbersome user experience. Thus, publications like the Wall Street Journal disable login checks when a page request appears to come from Twitter.
It does this by inspecting HTTP
request headers. The important headers are Referer and User-Agent.
When a link is shared on Twitter, the url is shortened to something like “https://t.co/9Mk58nL3xJ
.” This goes to a Twitter server, which redirects the browser to the intended destination. Websites determine whether Twitter initiated the redirect by checking that the HTTP Referer string begins with “https://t.co/
.” The rest of the string is ignored.
A web request from Twitter further identifies itself through the User-Agent header, which might look something like “Mobile/14C92 Twitter for iPhone
.”
By submitting this information in request headers, any web browser can appear to be the Twitter app. It is easy to do this using a Chrome extension.
The following builds on top of last year’s tutorial for mimicking Google’s web crawler.
1. Use the same manifest.json
file as before. Take care to list both http://
and https://
versions of the sites you are interested in, as many publishers now use ssl.
2. Modify the background.js
file. The modified version should look like the one below. It is worth noting that all cookies have been blocked.
var VIA_TWITTER = ["wsj.com"] function changeRefer(details) { foundReferer = false; foundUA = false; var useTwitter = VIA_TWITTER.map(function(url) { if (details.url.includes(url)) { return true; } return false; }) .reduce(function(a, b) { return a || b}, false); var reqHeaders = details.requestHeaders.filter(function(header) { // block cookies by default if (header.name !== "Cookie") { return header; } }).map(function(header) { if (header.name === "Referer") { header.value = setRefer(useTwitter); foundReferer = true; } if (header.name === "User-Agent") { header.value = setUserAgent(useTwitter); foundUA = true; } return header; }) // append referer if (!foundReferer) { reqHeaders.push({ "name": "Referer", "value": setRefer(useTwitter) }) } if (!foundUA) { reqHeaders.push({ "name": "User-Agent", "value": setUserAgent(useTwitter) }) } return {requestHeaders: reqHeaders}; } function blockCookies(details) { for (var i = 0; i < details.responseHeaders.length; ++i) { if (details.responseHeaders[i].name === "Set-Cookie") { details.responseHeaders.splice(i, 1); } } return {responseHeaders: details.responseHeaders}; } function setRefer(useTwitter) { if (useTwitter) return "https://t.co/T1323aaaa"; else return "https://www.google.com/"; } function setUserAgent(useTwitter) { if (useTwitter) return "Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X) AppleWebKit/602.1.32 (KHTML, like Gecko) Mobile/14C92 Twitter for iPhone"; else return "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"; } chrome.webRequest.onBeforeSendHeaders.addListener(changeRefer, { urls: ["<all_urls>"], types: ["main_frame"], }, ["requestHeaders", "blocking"]); chrome.webRequest.onHeadersReceived.addListener(blockCookies, { urls: ["<all_urls>"], types: ["main_frame"], }, ["responseHeaders", "blocking"]);
Save both files in the same directory. The updated source code can also be downloaded here.
Now type chrome://extensions/
in the browser address bar.
Reload the old extension, or Load it as an unpacked extension if you have not previously done so. Enable the chrome extension and visit wsj.com
.
There is always a tradeoff between security and usability. The fastest way to compromise a computer system is to accommodate lazy users. Or worse yet, accommodate lazy programmers.