The National Science Foundation extols its new Dark Web
project, "which aims to systematically collect and analyze all terrorist-generated content on the Web."
This is where the Dark Web project comes in. Using advanced techniques such as Web spidering, link analysis, content analysis, authorship analysis, sentiment analysis and multimedia analysis, [Hsinchun] Chen and his team [Artificial Intelligence Lab at the University of Arizona} can find, catalogue and analyze extremist activities online. According to Chen, scenarios involving massive amounts of information and data points are ideal challenges for computational scientists, who use the power of advanced computers and applications to find patterns and connections where humans can not.
How, exactly do they plan on analyzing the data?
One of the tools developed by Dark Web is a technique called Writeprint, which automatically extracts thousands of multilingual, structural, and semantic features to determine who is creating 'anonymous' content online. Writeprint can look at a posting on an online bulletin board, for example, and compare it with writings found elsewhere on the Internet. By analyzing these certain features, it can determine with more than 95 percent accuracy if the author has produced other content in the past. The system can then alert analysts when the same author produces new content, as well as where on the Internet the content is being copied, linked to or discussed.
Robot literary critics fighting terrorists? Computer-assisted identification of anonymous authors has been used before, though on a much smaller scale. Don Foster
, Shakespeare scholar and "literary forensics" pioneer, developed the field in the 1980s, when he confirmed that the previously unattributed 1612 "A Funeral Elegy for Master William Peter" signed only "W.S." was indeed William Shakespeare. In 1996, Foster identified journalist Joe Klein
as the anonymous author of Primary Colors
, a political satire of the Bill Clinton's 1992 presidential campaign. (If you insist, you can read all about it in Foster's Author Unknown
The problem is that Foster uses computers only for the "search" feature of electronic texts. There's no algorithm that simply finds a match. Foster caught Klein by (among other things) his use of "tarmac-hopping," a compound that appeared nowhere else but in the journalism and fiction of Joe Klein. Just pick any distinctive phrase (even if it is innocuous), like "Italian restaurant, London
" and run it through a search engine. You won't find many matches but the ones that do pop up will be notable.
The NSF may be over-hyping the Dark Web system. The University of Arizona's website only promises that Dark Web "will include tools supporting search, browse, and analysis capabilities."
It is certainly possible to identify writers by their use of certain words-- David Foster Wallace's use of "ontological" or Hunter Thompson's use of "atavistic" pop into my mind almost immediately. (Any others? Leave them in the comments.) But could a computer do the necessary parsing to positively identify a writer's style?