One of the rules of thumb you hear in SNA circles is that you need 80% response rate to have a reasonably valid survey. Having used that rule for a while without asking why, I am now glad to see some explanation. A new paper in press by Borgatti, Carley, and Krackhardt shows how different measurements of network centrality degrade with imperfect data.
For example, suppose you want to know who are the top ten influential people in an organization. Naturally, you decided to measure influence with social network analysis. The problem is that you don't get all the data. What are the chances that any of the actual top ten influencers are included in your calculated list?
With a few wonderfully simplifying assumptions, BCK comes up with a rough and ready answer. If you are missing just 5% of the network data, chances are that your list of ten influencers has three rogues in it. Let 10% of your data slip through your fingers and your list of ten is probably just better than half right. If you're missing 25% of your data, then most of your "top ten" list is really just a selection of the relatively central influencers, and not a top ten at all.
For anyone fond of the veneer of precision SNA puts on fuzzy questions, these results are an equally precise grain of salt. For another dose of context, see here too.