>>> context weblog
sampling new cultural context
| home | site map | about context | donate | lang >>> español - català |
friday :: march 7, 2003
   
 
word burstiness: scanning online trends

Jon Kleinberg, a professor of computer science at Cornell University, Ithaca, N.Y., has developed a method for a computer to find the topics that dominate a discussion at a particular time by scanning large collections of documents for sudden, rapid bursts of words. Among other tests of the method, he scanned presidential State of the Union addresses from 1790 to the present and created a list of words that eerily reflects historical trends. The technique, he suggests, could have many 'data mining' applications, including searching the Web or studying trends in society as reflected in Web pages.

Kleinberg says he got the idea of searching over time while trying to deal with his own flood of incoming e-mail. He reasoned that when an important topic comes up for discussion, keywords related to the topic will show a sudden increase in frequency. A search for these words that suddenly appear more often might, he theorized, provide ways to categorize messages.

He devised a search algorithm that looks for 'burstiness,' measuring not just the number of times words appear, but the rate of increase in those numbers over time. Programs based on his algorithm can scan text that varies with time and flag the most "bursty" words. "The method is motivated by probability models used to analyze the behavior of communication networks, where burstiness occurs in the traffic due to congestion and hot spots," he explains.

A few years ago, he suggested that a way to find the most useful Web sites on a particular subject would be to look at the way they are linked to one another. Sites that are 'linked to' by many others are probably 'authorities.' Sites that link to many others are likely to be 'hubs.' The most authoritative sites on a topic would be the ones that are linked to most often by the most active hubs, he reasoned. A variation on this idea is used by Google, and a more formal version is being used in a new search engine called Teoma. >from *Buzzwords of history, revealed by computer scans, indicate new ways of searching the Web*. february 18, 2003

related context
>
daypop word bursts. word bursts are heightened usage of certain words in weblogs within the last couple days. they are indicators of what webloggers are writing about right now. feature implemented since february 26, 2003.
> uniting with only a few random links: small-world networking in simulations. february 4, 2003
> how does 'six degrees of separation' work? explanation is personal networking. august 23, 2000. kleinberg's work refinement of an earlier study by steven h. strogatz and duncan watts.
> authoritative sources in a hyperlinked environment by jon m. kleinberg [pdf]. introduction of the hits (hyperlink-induced topic search) algorithm. 1998

imago
>
amerika administration word burst graph

| permaLink






> context weblog archive
december 2006
november 2006
october 2006
september 2006
august 2006
july 2006
june 2006
may 2006
april 2006
march 2006
february 2006
january 2006
december 2005
november 2005
october 2005
september 2005
august 2005
july 2005
june 2005
may 2005
april 2005
march 2005
february 2005
january 2005
december 2004
november 2004
october 2004
september 2004
august 2004
july 2004
june 2004
may 2004
april 2004
march 2004
february 2004
january 2004
december 2003
november 2003
october 2003
june 2003
may 2003
april 2003
march 2003
february 2003
january 2003
december 2002
november 2002
october 2002
july 2002
june 2002
may 2002
april 2002
march 2002
february 2002
january 2002
countdown 2002
december 2001
november 2001
october 2001
september 2001
august 2001

more news in
> sitemap

Google


context archives all www
   "active, informed citizen participation is the key to shaping the network society. a new 'public sphere' is required." seattle statement
| home | site map | about context | donate | lang >>> español - català |
03 http://straddle3.net/context/03/en/2003_03_07.html