The course “Topics in Social Network Analysis: Structure and Dynamics” is targeted towards doctoral students in management, organizational behavior and strategy. This blog post summarizes the first lecture, “The Foundations of Network Analysis.”
The goal of the first lecture is to introduce you to the “why” behind network theory and a bit of the “what.” Overall, the mission of the course is to help you become a sophisticated consumer of networks research, and hopefully a sophisticated producer of it as well.
By the end of the course, you should be able to:
- Develop network-theoretic explanations for the behavior of people, teams and organizations. Network theoretic explanations use “relationships” (we’ll talk more about this in the future) and “patterns of relationships” as explanatory devices rather than traits or characteristics.
- Learn how to set up high-quality research designs for your network theories.
- Conduct statistical tests of your theories that can help you refute alternative explanations.
So, lets begin with a simple question: What is network analysis?
There are a lot of definitions, but here is one I like:
Network theory is a scientific perspective that reasons about the behavior of a target system or elements of that system, using the pattern of relationships between elements of that system.
Lets begin with a super simple example. Stanford GSB has approximately 400 students. Let us assume, for a moment that all 400 students end up getting jobs with certain wages w(i). Some students earn a lot of money (a lot!) and some student’s might make less than what they made before they came into the MBA program. An analyst might wonder: What causes this variation in MBA salaries?
A astute PhD student might theorize a function that maps some vector of characteristics of each MBA student c(i) to their wage w(i), such that:
w(i) = f(c(i))
Elements in the vector c might include:
- The undergraduate institution of the student (before their Stanford MBA)
- Their grades at Stanford
- GMAT score
- Prior wage before business school
- …so on.
The above function assumes that wages depend on these individual characteristics and perhaps the reaction of employers to these characteristics. But the dependency is between these traits and wages.
In the above graph, the circles (nodes) represent the MBA students (I’ve depicted 20). Large nodes represent individuals who may be high on the characteristics we described above, and vice versa. Thus, our reasoning focuses on how the nodes vary based on some characteristic.
Network Analysis’ Value Add
Network analysts take a different perspective. They propose a different type of dependency: that people’s outcomes depend on the types of people they have relationships with and/or the pattern of those relationships.
This concept, that individual outcomes depend on a person’s relationships to others is not at all that new. This idea is as old as human history. However, what network analysis contributed was to provide a useful and tractable representation of this dependence among people and a way to empirically test the effects of such dependencies.
To summarize, the “new knowledge” that network analysis contributed was:
- To strongly argue that these dependencies among individuals matter.
- That these dependencies could be represented by a network (consisting of nodes, the elements of the system; and edges, the dependencies between the elements)
- That analysis of these dependencies (e.g., summaries of or descriptions of patterns of) could help us predict the performance of elements of the system better than the individual-trait based approach alone.
- That specific social mechanisms (basically stories) link certain patterns to certain outcomes through some well-specified chain of logic.
These are not simple problems. Theoretical and empirical issues related to this set of basic problems have challenged us for nearly a century now. As you can imagine, incorporating social relationships into the analysis of human and organizational behavior, will require a new way of thinking about human action and new methods to empirically validate our theories.
Before we get to the core problems of network analysis, it is perhaps useful to sketch a bit of its history and development.
Network analysis has its “origins” in many disciplines
If you are really interested in the history of social network analysis, check out Linton Freeman’s book “The Development of Social Network Analysis.”
Jacob Moreno: Invented sociometry, the network that we see today is a direct consequence of Moreno, he invented the sociogram which is a set of points that are connected by lines. He used sociograms to identify leaders, isolates and uncover patterns of asymmetry and reciprocity. He discovered what we now know as the “star” network.
Kurt Lewin: Studied group behavior. His basic argument was that individual action in groups was constrained by the concrete relationships that existed between members of the groups. He is often credited as being on of the founding fathers of social psychology, and the person who coined the term “group dynamics.”
Fritz Heider: Studied social perception and attitudes and developed what he called “balance theory” – we all know the basic mechanics of balance theory:
- “A friend of a friend is a” …
- “A friend of a enemy is a” …
- “An enemy of an enemy is a” …
Balance theory was converted into mathematical form by Dorwin Cartwright (a psychologist) and Frank Harary (a mathematician) — Harary is often credited with being one of the founders of modern graph theory.
As you can see all three approaches either directly used graphical or mathematical notation, or later were turned into a mathematical form.
Another parallel set of developments came in social anthropology – they conceptualized “social structure” as concrete relations between individuals in a society. SF Nadel, especially, theorized about the relationship between networks and “roles” in his treatise “A Theory of Social Structure.” A quote about A Theory of Social Structure from Britannica:
In his posthumous Theory of Social Structure (1958), sometimes regarded as one of the 20th century’s foremost theoretical works in the social sciences, Nadel examined social roles, which he considered to be crucial in the analysis of social structure.
The famous “Hawthorne experiment” was conducted in Chicago in the 1920’s — found that one of the best predictors of productivity was the “informal organization” of the plant—the pattern of personal relationships that people had with each other.
The big revolution in social network analysis happened in the 1960’s and 70’s – and the primary protagonists of this revolution were located at Harvard, and led by Harrison White and at University of California – Irvine, led by Linton Freeman. Much of the basic language, the tools and the theories we use today in network analysis was developed in this period.
In the 1980’s and 1990’s a group of scholars in management and organizational behavior entered the fray, and thus began the organizational social network revolution. These individuals include scholars who received their PhDs in business schools or sociology departments, but had some contact with the network theorists in sociology or sociologists who were hired by business schools. The names include people like: Ronald Burt at the U of Chicago, Daniel Brass at Penn State and later at U Kentucky, David Krackhardt at Cornell and later Carnegie Mellon, and Brian Uzzi at Northwestern, and Joel Podolny who was at Stanford.
Modern Network Analysis is Multi and sometimes Inter-Disciplinary
Today, network analysis is a multi-disciplinary, and sometimes inter-disciplinary enterprise. A lot of work has been done by scholars in a variety of disciplines. Many of the important theoretical ideas about what types of network should matter and why, were developed by sociologists (Ron Burt, for instance) and the further developed and extended by others in sociology (Fernandez and Gould) as well as scholars in management (Gulati, McEvily, etc.).
Concurrently, a large number of statisticians, including Stanley Wasserman, Tom Snijders, etc. developed methodologies for modeling the formation and dynamics of social networks. They developed models such as the p* models, Stochastic Actor-oriented Models, and much more..
The economists, starting with Charles Manski developed and theorized about methods that would allow for causal inference for network effects. Venkatesh Bala and Sanjeev Goyal (two economists from Cambridge, UK) developed and formalized a game theoretic model of network formation. Matthew Jackson of Stanford has pushed the development of formal models of network formation and “network games” forward along a variety of dimensions.
- Most of the best network research today draws on many of these traditions. Research in organizational behavior that examines network effects must draw on the work of Charles Manski for guidance about the empirical validity of the network effects they estimate.
- A large body of management research—those focusing both explicitly or implicitly—on network ideas draws on the ideas of sociologists — both in the business schools and in the sociology departments.
- Research in economics has drawn heavily on sociology—with or without citation—the most interesting intersection of this research is happening in the economics of education and labor, development economics, and finance.
- Today, you will find “network” research in almost all the top journals in management, economics, sociology, statistics, and computer science. What is more, is that you will also find specialty journals focused just on network analysis (e.g., Social Networks and Network Science).
Network Reasoning – Micro to Macro, back to Micro
An important feature of network analysis is that it gives us a way to think about both the micro (the behavior of the elements of a system, e.g., people) and the macro (society, organizations, etc.) simultaneously.
One of the most beautiful demonstrations of this is presented in the following graph:
This graph comes from Mark Granovetter’s Strength of Weak Ties. Why is this triad forbidden? That is, why is this structure unlikely to occur?
To answer this question, we will need some balance theory. Let us assign a positive valance to the present strong ties (i.e., AC and AB) and a negative sign to the absent ties (i.e., BC). To get the sign of this graph, let us just multiply the signs of the individual dyads in the triad (AC, AB, BC).
- The forbidden triad: (+)(+)(-) = (-)
Balance theory considers the sign of this graph to be negative. That means that it is unstable. For instance, if A and C are friends as are A and B, there is likely to be greater opportunities for B and C to interact and as a result form a tie to each other. This closes the triad and results in: a closed triad with the following structure: (+)(+)(+) = (+). On the other hand, if C and B cannot get along, then there will be conflict either between A and C or A and B, resulting in one of A’s ties breaking, resulting in: a triad with a singular tie tie: (+)(-)(-) = (+).
OK, so what?
Well, lets take the perspective of A. In the forbidden triad, A is a “bridge” she is the only connection between C and B and as a result has access to information from two sources that might not have overlapping information. However, the forbiddenness of this structure means that it is unstable with strong ties. The position of A reverts to either Equilibrium 1, where A is no longer a bridge because she doesn’t have a connection to B (or C); or Equilibrium 2, where A is no longer a bridge because C and B have a connection to each other and no longer have to go through A to share information. Thus, A’s role as a passthrough bridge is diminished.
This are very micro arguments. They are based on the psychological processes of individuals and their interpersonal dynamics. How do these micro processes translate into network processes at a larger scale (e.g., an organization, community or society.)?
Let us start with some assumptions:
One assumption we start with is that information is distributed unevenly across groups, and that different groups or cliques have different pieces of information.
This is not an unrealistic assumption. If you compare Berkeley to Stanford, people in the two places are likely talking about different ideas. Most people in each group do not have a complete understanding as to what ideas the other group is interested in or talking about. This is probably (or even more) true across companies, countries, different regional geographies, etc.
However, strong-tie bridges across these groups—according to our micro reasoning above—do not exist because of the two equilibria we described above.
Granovetter’s deep insight was that this problem of bridges not existing can be solved if the bridges are weak ties rather than strong ties. Weak ties, allow individuals to access information across disconnected clusters, where as strong ties, because they are embedded in cliques—e.g., exist within a cluster—only provide redundant information.
What Granovetter (1973) showed was:
- Weak ties are more useful for job seekers (that is, acquaintances) than are their close and strong ties (friends and family).
- Weak ties provide access to novel information, not present within a cluster.
- The relationship between tie strength and finding a job has less to do with the strength of the tie per se and more to do with the macro-structure of the larger network (e.g., the connections between clusters).
The beauty, I think that is the most appropriate word, of this theory is that it elegantly links a psychological process (balance theory) to the macro structure of the network (society or organization wide network), and then back to the individual outcome. This type of reasoning allows us to represent the functioning of an important system, in a way that will be difficult to do with more atomized theories of human action.
Thus, our ultimate goal is to develop theories that link a person’s social network to some larger structure, then back again to individual human action.
Where can network representations be useful for analysis?
Most novice students of network analysis often begin with the perspective that a network is a real thing and as a thing it can become the object of analysis. However, this is not true. Networks are representations—and imperfect ones at that—of a very complicated target system. Because networks are indeed representations and not real things, an analyst can represent many different target systems using a network representation.
The most basic network representations consist of two parts: nodes and edges. Below, you will see a network called the “Kite Network.” For now, lets ignore the structure of the network and its properties, but focus on two elements. Networks consist of nodes (the circles) that represent the entities we are studying in the target system and the edges (sometimes called links) which represent the relationships between the nodes/entities.
The edges in the network above are undirected, meaning that they have no direction. For instance, co-authorship is a relationship that is naturally undirected.
Above, I’ve taken the same kite network and made the edges directed. This means that there is a direction of flow (of information, etc.) between the nodes that is specified in the network. For instance, imagine if the relationship represented in the graph is “Seeks advice from” we could read the network to indicate that A seeks advice from B, but not the other way around. On the other hand, both B and E seek advice from each other.
Now that we have these basics down, we can use these two basic elements to represent many different systems:
- The studying behaviors among students
- The friendships among workers in a firm
- The alliances among firms
- The relationships among different units/teams within a corporation
The above examples are very pertinent to OB/Strategy. However, networks can be used to represent other systems as well:
- The interactions between genes
- The similarity among jobs in an organization
- Shared funders among startups
- The co-presence of two ingredients in a recipe
While a network representation is useful for all these very diverse situations, the underlying theory describing the functioning of these various systems is rather different. This is true for at least three reasons:
It is obvious that the kind of reasoning we use for each of these domains will be different for at least three reasons:
- The actions and outcomes of the actors in each domain are likely to be different. Students do different things from firms, and they both do different things from genes, jobs and ingredients.
- The mechanisms (e.g., the step-by-step processes) that link actions to outcomes are likely to be different across the contexts.
- Finally, the links between actors are qualitatively different across the domains, different types of information flow between nodes through these links, different amounts of information can flow, and different meanings are ascribed to the links.
The flexibility of the network representation allows for a critique that network theory is a free-for-all where anything goes because the actors, mechanisms, and links can be anything in any context.
While there is an element of this critique that is valid, I will argue in this class that the network representation is tremendously powerful and there is a decent amount of consistency in network reasoning across many different contexts and target systems. That is, we can apply many of the the same types of reasoning, with modifications of course, to explain actions and behavior across a variety of contexts. Further, learning and insight from one domain can be applied to learn about another.
Krackhardt’s Levels of Analysis
Networks are rich in their expressiveness of social reality. As a consequence the analyst sometimes has to ignore many other facets of the structure/content of a network to focus analytical attention on one facet. A useful typology for network analysis, developed by David Krackhardt, is called the “Levels of Analysis.” In his typology networks have (at least) four levels of analysis: Level 0 to Level 3.
The distinction across levels is important to make for several reasons, including the fact that:
- The theories are different
- The statistical techniques are different
- The data requirements are (potentially different.
Level 1: The node level of analysis
Consider the following graph:
And this matrix, which was used to generate this graph.
This is the “raw” data of the network. This data can be analyzed in many different ways. One of the most common approaches in network analysis to focus on node level analysis, or Level 1 (it is called level 1 because if there are n nodes, the number of observations one has is on the order of n^1.)
So far, we have been focusing on nodes—these are the actors whose behavior we are trying to explain. More specifically, we are trying to explain the behavior of “Ego” (from the Latin I) based on the nature of or pattern of his or her connections to “alters” (from the Latin others). Thus, in this case, our goal is to primarily take two kinds of measurements:
- Measurements about some action or outcome of Ego (our dependent variables)
- Measurements about the features of Ego’s connections to the alters in the network (our explanatory variables)
Thus, depending on the theory, we figure out how to quantify the connections that ego has to his or her alters, and see whether there exists a correlation between this and Ego’s outcomes.
- Ego1 Outcome NetworkMeasure
- Ego2 Outcome NetworkMeasure
There are generally two approaches to the Level 1 analysis. I would like to call one “structural analysis” and the other “peer effects” or “peer influence.” We will cover both in the class.
- Structural Analysis/Analysis of Network Position defines the NetworkMeasure based on a summary of the pattern of edges in the network with respect to the focal node (e.g., the node whose outcome we are interested in).
- Peer effects often ignores the structure and focuses on understanding how the characteristics of a focal node’s connections (e.g., the prior performance of a node’s connections) affect that node’s outcomes. For instance, this could be done by taking the average of the characteristics of the alter’s SAT score or some other metric.
As you can see, the data look pretty much like a traditional regression analysis at the individual level. We call this level (1) analysis because there are N(1) observations. For the number of nodes in the network.
The nice thing about both of these types of analyses, is that the statistical methods we use are ones that you should be quite familiar with as a doctoral student. While there are empirical issues in interpreting the coefficients from these models, the setup is pretty standard. Most network analysis takes one of these two forms.
Level 2: The dyad level of analysis
Another class of problems requires us to focus on understanding the processes that led the network to take the structure that it has taken. In the static case, the micro-question is: Why is one connection present, while another one not present? In the dynamic case, the question might be reframed as: why do some ties persist, while others dissolve?
This type of analysis is called Level 2 because in a network consisting of N actors, there are N(N-1) or ~ N(2) observations in the data.
The focus of Level 2 analyses is understanding why a tie or interaction, or relationship, exists between an ego and an alter.
For instance, the questions we can ask, include:
- Why two workers decide to become friends.
- Why two companies decide to pursue a research collaboration.
- Why two scientists decide to co-author a paper.
The kind of information and often the types of theories we need are richer here than is often necessary at the L(1) level of analysis.
Can you tell me what kind of information we might need to make a prediction about whether two scientists decide to collaborate?
- Characteristics about Ego
- Characteristics about Alter
- The interaction of the characteristics of Ego and Alter (e.g. whether they are in the same discipline, the distance from one office to the next, etc.)
- The ties that exist indirectly between Ego and Alter.
Further, the methods we use here are much more complex than the ones used for the N(1) analysis, primarily because of point #4. There are dependencies in the network that interfere with the presence/absence of a tie for a given pair of individuals. Consider the forbidden triad. It illustrates clearly that A’s decision to form a tie with C is not independent of C’s relationship to B nor independent of A’s relationship with B. Ignoring these dependencies could potentially bias our understanding of why a tie between A and B forms or does not form.
As a consequence, people have developed specific statistical approaches for testing theories at Level two – Multiple Regression – Quadratic Assignment Procedure, Exponential Random Graph Models, (ERGM), and then the older P1 models.
Here the analysis is conducted so that the data structure looks like:
- Actor1 Actor2
- Actor1 Actor3
- Actor1 Actor4
- Actor2 Actor1
- Actor2 Actor3
- Actor2 Actor4
The dependent variable is whether a tie exists between two actors (or whether some kind of interaction occurs, i.e. knowledge transfer). The explanatory variables in these models are the characteristics of ego, alter, their shared characteristics, and the other structures in which they are embedded predict this interaction.
Level 0: The whole network.
Another level of analysis is the N(0) level of analysis, here the entire network results in only one observation. N^0 = 1
The goal of Level 0 analysis is drastically different than the goal in the first two levels of analysis. Here, the analyst is trying to understand how the entire social network and its configuration affects, the outcomes of the system as a whole. This is an interesting and exciting level of analysis, and there are very few studies that have been conducted at this level.
First, we have to have network data on enough networks that we can do a network analysis. That is hard in itself. The best research of this type has been done by people studying teams (e.g. Ray Reagans, Ezra Zuckerman and Bill McEvily). In many respects, the small groups research has also looked at this level of analysis going back to some very early work by Bavelas.
In Level 0 analysis we are trying to do is look at the entire network and what it represents (e.g. an entire organization) and relate it to the organizations’ outcome.
Thus, we need theories and measures that can summarize the macro structure of the network and link it to organizational performance.
For instance, a class of problems may include:
- How does the internal network structure of a start-up firm affect its ability to come up with innovative ideas. We would need:
- A set of startups; say 75 or more.
- We look at some measure of the start up’s innovative output.
An example analysis might be to measure the startup’s internal network structure, and then conduct a regression analysis linking the outcome to some measure of the internal network structure (e.g. what proportion of the people have ties to each other i.e. density).
A well known study at this level is Reagans, Zuckerman and McEvily, who found that project teams within an organization who have high density are more effective (they finish their projects faster) than project teams who have low density.
Level 3: Cognitive Social Structures
Finally, another area of research within social network analysis recognizes that networks are indeed representations and imperfect ones at that. This is called Level 3 analysis–this is because we have on the order of N*N*N or N(3) observations for use in our analyses.
Consider three graphs, and an organization chart from Krackhardt (1992).
The top-left graph is the “actual” advice network at the firm. By actual, I mean that these are the relationships people say they have with others. The top-right is the actual organizational chart. Note that the organization chart and the advice networks are imperfectly related to each other.
However, once we go to the bottom panel, we see how important cognition and representation is in the network story. On the bottom-left, we see Chris’ representation which is not perfect, but it is not as bad as Ev’s (Ev is a Manager). In terms of human action, people might behave in concert with the network on the top-left (the “actual” network), but also might behave in concert with their own perceptions.
This cognitive angle is critical in network analysis. Cognition links actual structure (if it really exists) to action and then to outcomes. Think of the faux pas.
Conducting Level 3 analysis. his requires collecting data about perceptions and theorizing about how perceptions matter independently and interactively with the “true” structure.
With Level 3 analyses we have N people who have perceptions about N x N-1 relationships. Resulting in potentially on the order of N^3 observations. In practice however, most of the modeling is done at the node level. Though this is an active area of research and much can be developed here.
You should have a pretty general overview of the kinds of problems we will be covering during the course. By the end of the course, you should be able to conduct and extend these types of analysis for a wide range of domains and levels.