Tuesday, August 16, 2005

Social Network Analysis in an Imperfect World

One of the rules of thumb you hear in SNA circles is that you need 80% response rate to have a reasonably valid survey. Having used that rule for a while without asking why, I am now glad to see some explanation. A new paper in press by Borgatti, Carley, and Krackhardt shows how different measurements of network centrality degrade with imperfect data.

For example, suppose you want to know who are the top ten influential people in an organization. Naturally, you decided to measure influence with social network analysis. The problem is that you don't get all the data. What are the chances that any of the actual top ten influencers are included in your calculated list?

With a few wonderfully simplifying assumptions, BCK comes up with a rough and ready answer. If you are missing just 5% of the network data, chances are that your list of ten influencers has three rogues in it. Let 10% of your data slip through your fingers and your list of ten is probably just better than half right. If you're missing 25% of your data, then most of your "top ten" list is really just a selection of the relatively central influencers, and not a top ten at all.

For anyone fond of the veneer of precision SNA puts on fuzzy questions, these results are an equally precise grain of salt. For another dose of context, see here too.

3 comments:

Stand-Up Guy said...

For those of us who primarily work with samples from populations in which networks are not a primary consideration, the potential for error brought on by node or edge removal (or addition) in the world of SNA is a real eye-opener.

It would be interesting to hear more about how data collection is done in these environments, and non-random error (false reporting, etc.) is addressed.

Anonymous said...

Confusing... they look at RANDOM networks in their paper... NO social network is known to resemble a random network. Many of their conclusions may not work in real social networks.

An alternate paper to read [done with data from social networks] is

Anonymous said...

...is by Costenbader & Valente.

http://www.columbia.edu/~gk297/missing.pdf