If you’ve ever uploaded child porn to Facebook, Google, or Dropbox, you might have noticed that you are now in jail.
Online service providers automatically detect and report child exploitation images in user files. Don’t worry, Dropbox employees aren’t actually looking through your personal photos. I mean, they probably are, but the porn detection is done by computer.
A computer built by humans, of course. I always imagined a team of engineers in a basement, scouring Tor for obscene images to feed into a convolutional neural net.
At a recent visit with Facebook’s Machine Learning team, I finally got to ask:
Tell me about child porn. How did you build the detection model? What is it like to be the guy labeling the training data? Do you have to report him to the Feds if he seems to be enjoying his job too much? How do you keep your employees from losing faith in humanity?
That’s not how they do it.
Those who watch Dateline know that child pornography is illegal to knowingly possess1. And when a computer in your control has child pornography on it, you knowingly possess it. There’s no “I only possessed it so that I could train computers to recognize it” defense in the statute.
So how do major tech companies write software to detect child porn without ever possessing any of it themselves?
The National Center for Missing and Exploited Children (NCMEC) is granted an exemption to maintain a database of known child pornography images2. Using Microsoft’s PhotoDNA technology3, each image is converted to greyscale, resized, and subdivided into a grid. A histogram of intensity gradients is created for each cell, then hashed.
Online service providers can store hash values, because it’s impossible to reconstruct an image from hash values.
When a user uploads a new image to a service provider, the image is deconstructed, and the new hash values compared against the existing database. If an image shares enough values with an existing hash set, it gets flagged.
While PhotoDNA can recognize images that have been cropped, resized, or altered, only previously-registered images can ever be detected. Never-before-seen child exploitation images would pass through Gmail undetected.
I’m not going to speculate on the mechanics behind building a child-porn detector for never-before-seen data – not publicly, anyway — except to say that it can be done. But probably not without violating 18 U.S.C. §2252.
Why can’t Facebook and Google mind their own business?
Possession of child porn is super illegal, and tech companies are responsible for the data on their servers. There’s no “I only possessed it because one of my idiot users uploaded it” defense. However, there is a safe harbor affirmative defense if the service provider reports the image to law enforcement, “promptly and in good faith4.”
Google can’t even mind its own business if you’re hosting images on your own personal web server, because Googlebot trawls the internet. Online service providers are bound by a duty to report5. If Google’s web crawler happens to crawl an exploitative image on your home server, it must dutifully report it.
Wait, what about my Privacy?
Google and Facebook and friends never promised you privacy. In fact, they explicitly promise you the opposite of privacy when you implicitly accept their Terms by using their service.
Moral of the story: Encrypt your stuff. Also, don’t knowingly possess child porn.
References:
1. 18 U.S.C. § 2252
2. 18 U.S.C. § 2258C
3. Microsoft’s PhotoDNA: Protecting children and businesses in the cloud
4. 18 U.S.C. § 2252A
5. 18 U.S.C. § 2258A
2 thoughts on “Facebook Knows [Child] Porn When It Sees It”