Friday, May 27, 2005

Mining Social Networks from Email

I recently acquired a couple new toys--an IBM Thinkpad last month and a Canon Pixma multifunction printer/copier/fax/scanner just today. I go a while between upgrades so when the new stuff comes in it really blows me away. Today's revelation is optical character recognition, or OCR. How OCR works I have no idea but here's what it can do:

My regular readers may have already detected that I am a New Yorker magazine junkie. My friends can hardly fail to notice this, since I am always saying, "Yes, and that reminds me of an article I just read in the New Yorker," at which point I take over the conversation for a few minutes. In the olden times (before today) that was more than enough for my friends. But as of today it is just the beginning. Now I can go home to my personal NYer archives (dating from 9-11), grab the issue in question, put it through my scanner, and sit back while my computer receives the entire article in the form of a Word document (with columns, pages, and cartoons all properly configured) or a PDF (with text searching). I leave the rest of the story to your imagination, since this is a copyright-friendly blog.

If any of you just happen to be thinking about email right now, let me say--that reminds me of a great article I just read in the New York Times: "Enron Offers an Unlikely Boost to E-Mail Surveillance." I am a bit embarassed to be mentioning this article now. It was published very prominently on Sunday. But I have been so preoccupied with my new ThinkPad that real life is apparently passing me by. So thanks to Jim Murphy for clipping the article and handing it to me, in a quaint nod to life before scanners. Jim's gift prompted me to check Patti Anklam's blog and see her review of the article which she wrote the day after its publication.

The gist of the story is that a huge pile of Enron email is now publically available. The email provides a detailed look at communication from before the California energy crisis right up to the final bankruptcy scandal. This is an unprecendented resource for sociologists and computer scientists, who have proceeded to demonstrate not only the power of textual analysis (how often do people say "Dynergy" or "bankruptcy" week by week) but also the power of network analysis (who sends email to whom and when, regardless of the content).

The article features a beautiful network diagram:

Note the use of a hierarchical circular layout that places people in three categories: (1) periphery, (2) mid-level, and (3) core. That's a great way not to distract people with unnecessary detail.

The Enron analysis is being led by David Skillicorn, Kathleen Carley, and Michael Berry.

Want to try this at home? You can! Investigate your own email communication network by downloading Peter Gloor's TeCFlow.


Anonymous said...

The New York Times
May 22, 2005
Enron Offers an Unlikely Boost to E-Mail Surveillance

AS an object of modern surveillance, e-mail is both reassuring and troubling. It is a potential treasure trove for investigators monitoring suspected terrorists and other criminals, but it also creates the potential for abuse, by giving businesses and government agencies an efficient means of monitoring the attitudes and activities of employees and citizens.

Now the science of e-mail tracking and analysis has been given a unlikely boost by a bitter chapter in the history of corporate malfeasance - the Enron scandal.

In 2003, the Federal Energy Regulatory Commission posted the company's e-mail on its Web site, about 1.5 million messages. After duplicates were weeded out, a half-million e-mails were left from about 150 accounts, including those of the company's top executives. Most were sent from 1999 to 2001, a period when Enron executives were manipulating financial data, making false public statements, engaging in insider trading, and the company was coming under scrutiny by regulators.

Because of privacy concerns, large e-mail collections had not previously been made publicly available, so this marked the first time scientists had a sizable e-mail network to experiment with.

"While it's sad for the people at Enron that this happened, it's a gold mine for researchers," said Dr. David Skillicorn, a computer scientist at Queen's University in Canada.

Scientists had long theorized that tracking the e-mailing and word usage patterns within a group over time - without ever actually reading a single e-mail - could reveal a lot about what that group was up to. The Enron material gave Mr. Skillicorn's group and a handful of others a chance to test that theory, by seeing, first of all, if they could spot sudden changes.

For example, would they be able to find the moment when someone's memos, which were routinely read by a long list of people who never responded, suddenly began generating private responses from some recipients? Could they spot when a new person entered a communications chain, or if old ones were suddenly shut out, and correlate it with something significant?

There may be commercial uses for the same techniques. For example, they may enable advertisers to do word searches on individual e-mail accounts and direct pitches based on word frequency.

"Will you let your e-mail be mined so some car dealer can send information to you on car deals because you are talking to your friends about cars?" asks Dr. Michael Berry, a computer scientist at the University of Tennessee who has been analyzing the data.

Working with the Enron e-mail messages, about a half-dozen research groups can report that after just a few months of study they have already learned that they can glean telling information and are refining their ability to sort and analyze it.

Dr. Kathleen Carley, a professor of computer science at Carnegie Mellon University, has been trying to figure out who were the important people at Enron by the patterns of who e-mailed whom, and when and whether these people began changing their e-mail communications when the company was being investigated.

Companies have organizational charts, but they reveal little about how things really work, Dr. Carley said. Companies actually operate through informal networks, which can be revealed by analyzing "who spends time talking to whom, who are the power brokers, who are the hidden individuals who have to know what's going on," she said.

With the Enron data, Dr. Carley continued, "what you see is that prior to the investigation there is this surge in activity among the people at the top of the corporate ladder." But she adds, "as soon as the investigation starts, they stop communicating with each other and start communicating with lawyers." It showed, she says, "that they were becoming very nervous."

The analyses also found someone so junior she did not show up on organization charts but who, whichever way the e-mail data was mined, "shows up as a person of interest," Dr. Skillicorn said, in the language of intelligence analysts. In the investigation of a terror network, pinpointing such a person could be of enormous significance.

Dr. Berry said the e-mail traffic patterns tracked major events, like the manipulation of California energy prices. "We could see how things built up right before the bankruptcy," he said.

There were e-mail surges with each crisis, pointing to a problem that was consuming Enron employees. And in each crisis, there were features of certain e-mail messages - word choices, routing patterns - that allowed the computer scientists to isolate them from the morass of irrelevant personal or business messages.

One thing that didn't show up when the researchers screened for changes in word use was guardedness, said Dr. Skillicorn, a failure that was revealing in itself. Ordinarily, he said, when people are being deceptive they are more self-conscious, and their word use becomes simpler, as though they are trying too hard to sound natural.

But that apparently never occurred at Enron because its employees remained unconcerned while they engaged in illegal activity. "It wasn't a case of keeping a low profile," Dr. Skillicorn said. "They didn't worry about the story they were telling."

The scientists who are studying the Enron data said they assumed intelligence agencies are doing similar classified analyses on international e-mail traffic. Since World War II, a five-nation consortium of the United States, Canada, Britain, Australia and New Zealand have cooperated in a vast communications collection and analysis program called Echelon, for example, one that has assumed increasing importance since the terror attacks of Sept. 11, 2001.

No one in the unclassified world knows precisely what is being done with the Echelon data. But, Dr. Berry said, surveillance in the civilian world could one day have troubling consequences. It could allow companies, without ever actually infringing on e-mail conversations, to track employee attitudes and activities closely and easily.

"They can monitor discussions without actually isolating individuals," Dr. Berry said. "They can assess morale. If they make a cut in salaries, how long does the unhappiness go on? You could track topics and get a sense of how people are responding to policies and flag potential hot spots." Or, he said, managers might be able to learn which people have too much time on their hands.

And, as Dr. Skillicorn notes, if you try to write bland e-mail messages with hidden communications, chances are the programs will pick those out, too.

"It's clearly Orwellian," Dr. Berry said. "And I know that freaks people out."

Anonymous said...

this is a nice canon bp 511 blog

alena said...

Nice Blog!!! It looks like you've spent a fair amount of time setting it up and keeping the content fresh. I'll be sure to come back.

I have a online dating blog. It pretty much covers international dating related stuff.

Thanks again and keep up the good work.

EQQU said...

Like your blog, Ill check back, please check mine out: Free Email

Frank said...

Looking forward to reading more great info on your blog, I added you to my favorites and will be checking back often.

My site is about make extra money at home

If you have an interest in make extra money at home I would love to hear what you think of my site.

Peter said...


at Low Factory WHOLESALE Prices in the U.S.

and Around the World!

From The Best Manufacturers, Exporters,

Wholesalers, Drop-Shippers and More!
Profit from selling Thousands of Products at

yard/tag sales, flea markets, auctions,

newspapers car boots etc! ect!


Steve Austin said...

Nice Blog. Please visit my collection agency blog.

Steve Austin said...

I like your blog. Check out my michigan bankruptcy attorney blog.

alena said...

This is a excellent blog. Keep it going.

This may be of interest to you I have a free online dating service. It pretty much covers dating stuff.

I'll be sure to come back.

Frank said...

Thanks for the great information, I book marked your site

I have a business home own start site. It pretty much covers business home own start related stuff.

Come and check it out if you get time :-)

cash at home said...

I am glad I came across this great blog. I really enjoyed the topic you chose to write about. I'm definitely going to bookmark you! I have a fast money site. It pretty much covers fast money related stuff. Come and check it out if you get time :-)

Quit Smoking said...

Hi, I was looking around some blogger blogs for some ideas to start my own on ebooks and you have given me some great ideas. Good blog. I will check it out every week. Thanks

Cyber Financial Corporation said...


I liked your blog. I found many interesting information here.
I also give free info about advertising product. You can seen it on my advertising product site.

If you have time please visit my web site to get some free advertising product information.

Kind regards,

Anonymous said...

Hey, you have a great blog here! I'm definitely going to bookmark you! I have a computer training consultant site/blog. It pretty much covers computer training consultant related stuff.

Online Incomes said...

Hey I just love your blog. I also have a single american lady
blog/site. I mostly deals with single american lady
Please come and check it out if you get the time!

Anonymous said...

Hey, you have a great blog here! I'm definitely going to bookmark you!

I have a Free site Free Article Search. It pretty much covers author jonathan cummings type gallery related stuff.

Come and check it out if you get time :-)

Anonymous said...

Great Blog, check out this business. This is the Goose that lays you Golden Eggs! base business home mortgage


Dream Builder said...

Hey, you have a great blog here! I'm definitely going to bookmark you!
What a Great Blog! I'm definitely going to bookmark you! I have a home base business idea site. It pretty much covers home base business idea related stuff.

Come and check it out if you get time :-)

blaze said...

You have a very good site on based business business from home home opportunity work This is something I also have a large interest in and have set up a blog about based business business from home home opportunity work please visit and let me know what you think.