Image credit: Glenn Carstens-Peters
Sometimes privacy fears are unfounded. Sometimes they’re just misdirected. And sometimes things are far more frightening than you imagined.
In December 2013, Fairfax ran the disturbing headline: Facebook saves everything you type — even if you don’t publish it. Even if you delete your words, “the code that powers Facebook still knows what you typed,” wrote Jennifer Golbeck in the original version of her story at Slate. “It turns out the things you explicitly choose not to share aren’t entirely private.” It wasn’t quite true, and I’ll come back to that.
Far more frightening are “session replay scripts”. These go way beyond the third-party analytics scripts that most commercial websites use to record the pages you visit and the searches you make. Session replay scripts record everything. Every word typed, every mouse move made, and even what’s on the web page you’re looking at.
“Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder,” wrote Steven Englehardt from the Center for Information Technology Policy (CITP) at Princeton University.
CITP studied seven services (Yandex, FullStory, Hotjar, UserReplay, Smartlook, Clicktale, and SessionCam) which between them were used on 482 of the top 50,000 sites. They were riddled with privacy-breaching vulnerabilities, often failing to mask passwords, credit card numbers, or dates of birth. Three of them failed to encrypt the playback of user sessions with HTTPS.
One company even offered a “co-browse” feature, where a website’s operator could literally watch user sessions live.
“The stated purpose of this data collection includes gathering insights into how users interact with websites and discovering broken or confusing pages. However the extent of data collected by these services far exceeds user expectations; text typed into forms is collected before the user submits the form, and precise mouse movements are saved, all without any visual indication to the user. This data can’t reasonably be expected to be kept anonymous. In fact, some companies allow publishers to explicitly link recordings to a user’s real identity,” Englehardt wrote in November 2017.
Within weeks, cybercriminals had cottoned onto this. By February 2018, there were reports of malicious web browser extensions inserting their own session tracking scripts, giving the crims this same detailed information.
Getting back to that Facebook tracking story, as P Henry explained, Golbeck had misinterpreted some research into how often people change their mind before posting. The researchers weren’t looking at what people typed, only at whether they started typing but didn’t post.
“Our results indicate that 71% of users exhibited some level of last-minute self-censorship,” they wrote. Posts were censored more frequently than comments. “Furthermore, we find that: people with more boundaries to regulate censor more; males censor more posts than females and censor even more posts with mostly male friends than do females, but censor no more comments than females; people who exercise more control over their audience censor more content; and, users with more politically and age diverse friends censor less, in general.”
As Golbeck conceded, “The Facebook rep I spoke with agreed that the company isn’t collecting the text of self-censored posts. But it’s certainly technologically possible.”
It’s more than that. It’s technologically trivial. It should be obvious, but we rarely think through the logic. If you type text into an app, it can use that data any way it likes, not just for the app’s stated purpose. And if the app connects to the internet — which, of course, most do — then it can send the data anywhere it likes.
Then there are third-party keyboards, apps which replaced the standard keyboard in iOS or Android with a fancier one — to provide better spell-checking, more emojis, whatever. These keyboards can intercept what you type in every app, because that’s the point.
“Major mobile device platforms allow users to replace built-in keyboard apps with third-party alternatives, which have the potential to capture, leak and misuse the keystroke data they process,” warns Lenny Zeltser. “iOS places greater restrictions on keyboards than do Android operating systems; however, even Apple cannot control what keyboard developers do with keystroke data if users allow these apps to communicate over the network.”
One of the more popular keyboards is SwiftKey, which uses artificial intelligence to learn your writing style and predict the next word you’ll type. Its makers are clear about what they do and don’t do. “All personal and language data generated by SwiftKey is stored locally on your device and is never transferred,” they write, unless you create an account to share your so-called personal language model between devices. “SwiftKey does not learn anything from fields marked as password fields, nor does it remember long numbers such as credit card numbers.” But unless you can reverse engineer the app, you have to take that on trust, and you have to assume SwiftKey got their programming right.
As you might expect, some apps are straight-up dodgy, either because the developers intended them to be, or they somehow got perverted to the dark side. Last year Google pulled more than 500 apps because they’d been built using a component that contained a back door for the bad guys to install spyware.