So J.K Rowling was recently outed as the author of some book under a pen name
http://www.theguardian.com/books/2013/jul/24/jk-rowling-robert-galbraith-harry-potter
using some kind of computer analysis,
http://entertainment.time.com/2013/...nsic-linguist-explains-how-he-figured-it-out/
get on the line get in the line
http://www.thecoli.com/threads/spin-do-you-get-on-a-line-or-get-in-a-line.119691/
no biggie, right? We figure these things out all the time when old posters come back with new names.
but
http://phenomena.nationalgeographic...ames-madison-barack-obama-and-the-rest-of-us/
http://www.theguardian.com/books/2013/jul/24/jk-rowling-robert-galbraith-harry-potter
using some kind of computer analysis,
http://entertainment.time.com/2013/...nsic-linguist-explains-how-he-figured-it-out/
But couldn’t an author trying to disguise herself just use different words? It’s not so easy, Juola explains. Word length, for example, is something the author might think to change — sure, some people are more prone to “utilize sesquipedalian lexical items,” he jokes, but that can change with their audiences. What the author won’t think to change are the short words, the articles and prepositions. Juola asked me where a fork goes relative to a plate; I answered “on the left” and wouldn’t ever think to change that, but another person might say “to the left” or “on the left side.”
As one part of his work, Juola uses a program — Java Graphical Authorship Attribution Program, which is a free download available for anyone to play around with — to pull out the hundred most frequent words across an author’s vocabulary. This step eliminates rare words, character names and plot points, leaving him with words like of and but, ranked by usage. Those words might seem inconsequential, but they leave an authorial fingerprint on any word.
“Prepositions and articles and similar little function words are actually very individual,” Juola says. “It’s actually very, very hard to change them because they’re so subconscious.”
get on the line get in the line
http://www.thecoli.com/threads/spin-do-you-get-on-a-line-or-get-in-a-line.119691/
no biggie, right? We figure these things out all the time when old posters come back with new names.
but
http://phenomena.nationalgeographic...ames-madison-barack-obama-and-the-rest-of-us/
With computers and sophisticated statistical analyses, researchers are mining all sorts of famous texts for clues about their authors. Perhaps more surprising: They’re are also mining not-so-famous texts, like blogs, tweets, Facebook updates and even Amazon reviews for clues about people’s lifestyles and buying habits. The whole idea is so amusingly ironic, isn’t it? Writers choose words deliberately, to convey specific messages. But those same words, it turns out, carry personal information that we don’t realize we’re giving out.
...
The words of many of us, in fact, are probably being mined at this very moment. Some researchers, Juola told me, are working on analyzing product reviews left on websites like Amazon.com. These investigations could root out phony glowing reviews left by company representatives, for example, or reveal valuable demographic patterns.
“They might say, hmmm, that’s funny, it looks like all of the women from the American West are rating our product a star and a half lower than men from the northeast, so obviously we need ot do some adjustment of our advertisements,” he says. “Not many companies are going to admit to doing this kind of thing, but anytime you’ve got some sort of investigation going on, whether police or security clarance or a job application, one of the things you’re going to look at is somebody’s public profile on the web. Anything is fair game.”