Monday, December 14, 2009

Web science, Webwhompers

I have just unveiled Webwhompers, which bears the fruit of four years of my teaching Web science at Boston University. The site features a few interests of mine:
  • A solid layman's introduction to Web science, focusing on the intersection of mathematics, sociology, and the Web as it is used and built by regular people. It is all presented as an online textbook you can read here.
  • A case study in educational methodology. Unlike the online textbook, which is meant to be read, the rest of Webwhompers is meant to be experienced. It provides the online portion of my answer to the question, "What can 70 non-technical college students do together in 12 weeks that will result in their learning as much as possible about the Web?"
The course mission statement puts it this way:

Technology is often created by "experts" and then used by "regular people." Webwhompers celebrates the "Web builder": a regular person who creates his own Web technology.

Sometimes it helps to distinguish between "regular people" who use technology and "experts" who create technology. For example, a regular person might want a home stereo; he pays experts to create hi-fi technology for him. In other cases, regular people create technology without even considering asking for expert help—for example, making a snowball.

Much of the Web technology that regular people want is within their power to create, just like a snowball. Webwhompers seeks to unleash the technical creativity of the regular person: By highlighting Web building resources, by bringing together aspiring Web builders, by providing expert guidance when necessary, and by encouraging regular people to try on the idea that they can create their own Web technology.

The course overview puts it this way:

Our course introduces Web science. It has no prerequisites and has been used by non-technical undergraduates at Boston University since 2006. Our curriculum is guided by the following passage adapted from "Web Science: An Interdisciplinary Approach to understanding the Web," by James Hendler, Nigel Shadbolt, Wendy Hall, and Tim Berners-Lee:
Web science, an emerging interdisciplinary field, takes the Web as its primary object of study. This study incorporates both the social interactions enabled by the Web's design and the applications that support them.

The Web is often studied at the micro scale, as an infrastructure of protocols, programming languages, and applications. However, it is the interaction of human beings creating, linking, and consuming information that generates the Web's behavior as emergent properties at the macro scale. These properties often generate surprising properties that require new analytic methods to be understood.

For example, when Mosaic, the first popular Web browser, was released publicly in 1992, the number of users quickly grew by several orders of magnitude, with more than a million downloads in the first year. The wide deployment of Mosaic led to a need for a way to find relevant material on the growing Web, and thus search became an important application, and later an industry, in its own right. The enormous success of search engines has inevitably yielded techniques to game the algorithms (an unexpected result) to improve search rank, leading, in turn, to the development of better search technologies to defeat the gaming. More recent macro-scale examples include photo-sharing on Flickr, video-uploading on YouTube, and social-networking sites like mySpace and Facebook.

The essence of Web science is to understand how to design systems to produce the effects we want. The best we can do today is design and build in the micro, hoping for the best; but how do we know if we've built in the right functionality to ensure the desired macro-scale effects? How do we predict other side effects and the emergent properties of the macro? Further, as the success or failure of a particular Web technology may involve aspects of social interaction among users, understanding the Web requires more than a simple analysis of technological issues but also of the social dynamic of perhaps millions of users.

Given the breadth of the Web and its inherently multi-user (social) nature, its science is necessarily interdisciplinary, involving at least mathematics, computer science, sociology, psychology, and economics.

Four important themes of Web Science are
  • Micro: an individual acts
  • Macro: the world responds (or not) to an individual's action
  • Synthetic: something is created to produce a desired result
  • Analytic: laws are stated to explain observed phenomena

We focus on these themes as they apply to Web builders -- people who contribute links and other content to the Web:


Synthetic
Analytic
Micro
An individual builds a Web
site to produce a desired result.
(We do not speak
to this quadrant.)

Macro
"The world" builds a Web site
to produce a desired result
.
Laws are stated to explain
large-scale Web phenomena.

Some Web builders consider themselves Web developers; others consider themselves bloggers; others merely post an occasional comment on someone else's blog or discussion forum. We say "Web builder" to encompass the full spectrum of people who contribute links and other content to the Web.

Our lab curriculum provides an informal hands-on approach to the task of building a Web site. Our Search and Share pages help Web builders leverage collectively engineered resources (such as WordPress). The formal chapters of the Study page (which you are now reading) explain large scale Web phenomena; they also explain the Amazon recommendation algorithm and the Google PageRank algorithm.

The sociology, psychology, and economics of this course follow Duncan Watts' Six Degrees, which we recommend as a narrative companion to our own material. Our complete suggested reading list is below.

Online safety

Protecting yourself from evildoers

Privacy, trust, and ownership

Networks

Basic mathematical foundations of networks:

Set Theory

  • Sets
  • Explicit Notation for Sets
  • Cardinality
  • Subsets
  • Venn Diagrams
  • Union and Intersection
  • Ordered Lists
  • Implicit Notation for Sets
  • Logical Expressions
  • Compound expressions with "or"
  • Compound expressions with "and"
  • Union and intersection defined formally
  • Similarity of Sets

Graph Theory

  • Graphs
  • Undirected and Directed
  • Neighborhood and Degree
  • Density and Average Degree
  • Paths
  • Paths in undirected graphs defined formally
  • Paths in directed graphs
  • Length
  • Distance

See also Facebook and Touchgraph

Network Structure

Hubs, clusters, and other basic structural features of the Web:

Network Structure

  • Connected: a word of many meanings
  • Induced Subgraphs
  • "Connected" defined formally
  • Connected graphs and connected components
  • Hubs
  • Clusters
  • Defining clusters, part one: connected components
  • Defining clusters, part two: cliques
  • Defining clusters, part three

See also:

Network Dynamics

How randomness, homophily, and cumulative advantage shape the Web:

Network Dynamics

  • Limitations of traditional graph theory
  • Introduction to network dynamics
  • Three models of dynamic graphs
  • Random graphs
  • Demonstration of random graph dynamics
  • Random graph algorithm
  • Clusters and homophily
  • Triadic closure
  • Triadic closure algorithm
  • Hubs and cumulative advantage
  • Preferential attachment algorithm

See also:

All the above are summarized in the following table:

Random graphs
Clustering
Centrality
Real-world phenomenon explained by model
Giant component forms quickly when |E| ≅ |V|.
Clusters emerge, providing "table of contents" overview.
Hubs emerge, indicating popularity and/or influence.
Web sites
N/A
Clusty, iBoogie, Grokker
Google et al
Sociological force
Chance
Homophily
Cumulative advantage
Mathematical model
Random graph algorithm
Triadic closure algorithm
Preferential attachment algorithm
Variables, Probability, and Scale-Free Networks

Understanding that the Web is a scale-free network requires some probability theory:

Variables and Probability

  • Variables in mathematics
  • Variables in algorithms
  • Random variables
  • Discrete vs. continuous variables
  • Probability distributions
  • Degree distributions

General discussion of scale-free networks:

  • Six Degrees Chapter 4, pp 101-114
  • From previous chapter on Network Dynamics
    • Hubs and cumulative advantage
    • Preferential attachment algorithm
Information and Computation

Applying fundamental concepts of computer science to the Web

Information and computation

  • Information, computation, and algorithms
  • Summation: an example of what computation is
  • HTML: an example of what computation is not
  • Computing distance, part one: Information diffusion
  • Computing distance, part two: Example
  • Computing distance, part three: Algorithm

Examples of information diffusion on the Web:

See also:

Collaborative Filtering

How to compute personalized recommendations:

Collaborative Filtering

  • "Expert opinions" without the experts
  • Delicious: example of CF
  • Bookmarks: content of Delicious
  • Tuples: content of CF
  • Bipartite graphs: structure of CF
  • Structural equivalence: computation of CF
  • Delicious: algorithmic summary
  • The four steps of collaborative filtering
The Long Tail

Niches and blockbusters in the world of Web commerce:

The Long Tail

  • Macro-analytic view of collaborative filtering
  • Power law revisited
  • Niches, megahits, and the neglected middle
  • Macro-analytic view of the long tail
  • Macro view of Web programming

See also:

  • The Long Tail, by Chris Anderson. Wired, October 2004.
  • Going Long, by John Cassidy. The New Yorker, July 2006.
  • Six Degrees Chapter 7, pp 207-215: Information Externalities & Market Externalities
Influence in Networks

How to compute the influence of a Web page:

Influence in Networks

  • Popularity, influence, and centrality
  • Introduction to PageRank
  • NetRank: a simplified version of PageRank
  • Normalization and convergence
  • The NetRank algorithm
  • Dividing by outdegree: the NR* formula
  • The PageRank formula
  • The damping factor: PageRank as probability

See also PageRank Explained by Phil Craven

Competition and Cooperation

What happens when Web builders seek to increase their influence?

Games: Competition and Cooperation

  • Dynamics of popularity and influence
  • PageRank competition
  • Doing the right thing
  • Mutually assured construction
  • Authority, reciprocity, reputation
  • Game theory
  • Winners' dilemma

See also

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License and is copyrighted (c) 2009 by Connective Associates LLC except where otherwise noted.