Where do networks come from?

The key assumption underlying both the peer effects and structural approaches to network effects assume some degree of exogeneity in the existence and structure of network ties.

Exogeneity is both a theoretical claim as well as an empirical assumption. All reasonable theories are built on a set of axioms that assume some primitive or exogenous features of the world or of the target system which is being analyzed.  Many models in economics, for instance, assume that preferences are exogenous. From these preferences, we are then able to then derive things like behavior, choice, “roles” as well as the structure of social relationships.

Similarly, some sociological and anthropological traditions start with axioms that assume that “roles” are exogenous. These roles—e.g., the position a individual occupies in a social structure—govern behavior, preferences, as well as social relationships.

Much of the network analysis we’ve been conducting or discussing thus far also has an exogeneity assumption built in. The primitives are social relationships and their structure. All other things we observe such as behavior, preferences and roles emerge from the pattern of exogenous network ties. In the lectures on structural holes, status and peer effects, we argue that the pattern of social relationships cause in differences in behavior, preferences, as well as roles and not vice versa.

The challenge of network formation

However, a challenge for the social relationships first perspective is that networks are unlikely to be fully “exogenous.” They form and evolve through certain processes that make some people more likely to connect to each other, and make some people less likely to do so.

Network scholars have spent considerable time on trying to understand how networks form and change. At a broad conceptual level, we can think about five factors that shape whether a tie between two individuals—e.g., ego and alter—forms.

The logic behind most models of network formation is simple. At one end, there are “benefits” whether actual or perceived as well as pecuniary and non-pecuniary/psychic  for connecting with someone. At the other end, there are “costs” which make it either easier or harder to form a relationship with someone, either because searching for them, coordinating with them, or potentially dealing with them is more costly than with someone else. Relatedly, some individuals may have a lower cost of building a network than others and/or it may be lower cost (relative to benefit) to connect with someone.

Factor 1: Characteristics of Ego, the sender.

Characteristics encapsulated in “Factor 1” include a range of factors that make it easier for certain types of people (e.g., those who have a certain characteristics themselves) to connect with many others. This characteristic may include things that either make it easier for these people (relative to others) to make many connections or perhaps provide them greater benefit from doing so. Research in this stream has found a substantial range of characteristics that vary at the individual level, that also predict an increased or decreased propensity to have a certain type of network surrounding them. These things include:

• Personality: Some work has found that differences in personality traits are correlated with network structure. For instance, individuals who have many ties are also likely to have Extroverted personalities. Relatedly, those who are high in “self monitoring” also have a greater likelihood of being “brokers” or occupying “structural holes” in a social network.
• Other factors that may also be related to larger networks include:
• Strategic intent
• Intelligence
• Physical characteristics (e.g., beauty or height)
• Age
• Some factors may be describe an individual at a certain point in time:
• After the loss of a job
• After being promoted to a new role
• Other factors may be socially constructed, but describing the Ego in a given context:
• Caste
• Religion

One can reason about the various ways in which these characteristics of Ego either lower their costs of making ties or increase the benefit they get. Can you come up with other individual-level factors that might matter?

Factor 2: Characteristics of Alter, the receiver.

A related set of arguments can be made about the characteristics of an alter or alters. For instance, one could theorize about the following characteristics of alter(s) that may make them more likely to receive connections from others.

• Personality
• Intelligence
• Skill
• Wealth
• Social standing
• Formal role in the organization

Like the Ego-centric perspective, one could logically use a “cost” and “benefit” perspective for reasoning about why some Alter may have more advice seekers (e.g., they are smart) or more friends (e.g., they are helpful). In purely altercentric models, we ignore the characteristics of Ego.

Factor 3: The interaction of Ego/Alter characteristics (e.g., homophily)

The 3rd Factor is one related to the “Ego-Alter” interaction. In such models, there is something about the characteristics of Ego and Alter together that predict an increased or decreased propensity to have network ties. The most common theme in these models is homophily or the tendency for individuals who are similar to each other to have a higher propensity to connect. Research has found that individuals who are similar in the following characteristics are more likely to connect with each other, relative to the alternatives:

• Race and ethnicity
• Gender
• Age
• Formal organizational position
• Occupation
• Religion

There are many theories about why such a preference exists. On one hand, social contexts (e.g., communities, neighborhoods, etc.) are often organized by these characteristics. This makes it much easier to connect with people who are similar to you. There is also an element of choice. Individuals who are similar to you are likely have similar experiences, share similar values, and like and dislike similar things. As a consequence, the costs of interacting with similar people is likely to be less than interacting with people who are different.

However, the type of relation may matter here. In mating networks you are more likely to see heterophily than homophily. This might also be true of mentoring relationships, where individuals are more likely to be mentored by those of a different level of senority than them.

What other factors at this level might increase or decrease the cost of interaction or raise its benefits?

Factor 4: Social and Physical Context

The fourth factor can broadly be thought of as the social or physical context within which individuals are forming social networks. A simple example is office or neighborhood layout. A substantial amount of research has found that physical distance has a substantial effect on whether two individuals form ties. Scientists who are nearby, for instance, are more likely to collaborate and their research trajectories also become rather similar.

Research has found that there is a exponential relationship between physical distance and the propensity to connect. This effect is called propinquity. Individuals who are physically proximate are substantially more likely to interact, followed by steep declines in the rates of interaction as distance increases.

In addition to propinquity, other aspects of the social context are also likely to affect the extent of tie formation. These factors could be the reorganization of roles, task inter-dependencies, as well as cultural or organizational norms regarding competition or collaboration. Incentives are also important in determining what the shape of the network might be. The challenge with many of these effects are that they are often “absorbed” into the intercept of the model. That is, they are only able to be detected when looking across contexts, but not within context.

Factor 5: Endogenous Network Processes

Finally, the structure of one part of the network may affect the structure of another. Consider a simple example: Reciprocity. If I consider you a friend. There is a social-psychological as well as a sociological process that also increases the likelihood that I consider you a friend. This is akin to tit-for-tat. If you give me a gift, I will give you one in return. Networks exhibit this property with substantial regularity (but not always!). In this context, the emergence of a network tie, the reciprocal one, is endogenous to the network. That is, it emerges from within the network structure and not outside of it.

Similarly, there are other endogenous network processes that others have detected in networks. These include factors such as transitivity. For instance, a friend of a friend is often a friend. Heiderian balance theory, for example, argues that individuals desire balance in their relationships. The situation of being friend’s with your friend’s enemy is unsustainable according to balance theory (why?). Because it is, that structure will endogenously change into something else–either the enemies become friends or  the network splits.

Other forces include preferential attachment. New entrants into a network are proportionally more likely to connect to individuals based on the size of their degree centrality. This process gives some networks a power law distribution, rather than a binomial/normal distribution that would be expected if the network was formed through a purely random process.

Empirical considerations

Though the theoretical ideas behind network formation are quite straightforward, disentangling the differential impact of these effects remains quite challenging. In a subsequent post, we will discuss the various approaches to estimating these models.

Seeing the networks in your company

Thus far we have assumed that we had network data. But data like the “Professionals” was gathered using a survey in a real organization. In this post I will walk you through the process of creating a simple network survey in SurveyMonkey (a web based survey application) and analyzing the responses from the survey using R. Lets begin by going to www.surveymonkey.com.  Here is the landing page (as of May 5, 2017). You will need to purchase a basic subscription to download the data (I purchased an educator subscription for $18). I’ve signed up for a free account (for now). After I complete all my signup information. Here is the screen that I get, asking me to start by creating a survey. I will call my survey, “Simple Network Survey.” I enter this into the text box, and then press + Add Questions. Pressing this takes me to a new screen. In order to create the appropriate network data (where we know who considers whom a friend, advice giver, etc.), we will need to begin by asking people who they are. I prefer to do this first using a dropdown menu where an individual can select just one option. The question I ask is: What is your name? Please select from the dropdown menu. Make sure that the question type is “Dropdown” Once I have this, I would like to enter the names of the people who will be taking the survey. My list (of fake people) include: Alice, Bob, Chris, Dina, Elena, Frank, and Greg. I add these using the “Add Answers in Bulk” option: Once I click save, I move to the Options tab, and I check off “Require an Answer to This Question.” Next I click DONE. I now create a new page (+ New Page). This is where I will place the network survey. For the purposes of this example, I will only ask two questions about people’s networks. What questions shall we ask? Perhaps one of the things that hardest to teach about network analysis is determining the right types of questions to ask people. The questions should reveal something people and their social networks that we might not have been able to assess if we hadn’t asked them those questions. We can think about kinds of questions in terms of a 2×2 — on one dimension we have questions about networks that provide people with resources (Instrumental) and on the other, we have questions about more personal/social relationships (e.g., Expresssive). On the other dimension we have questions that are either “Enduring or qualitative” or “Event based.” The table below summarizes some examples.  Enduring/Qualitative Event Based Instrumental Advice Task Information Asked for advice in the past week. Expressive Friendship Trust Social support Informally go to Lunch Talked about important personal matters Here are some examples: Questions about who you know: Below is a list of names of your colleagues at [firm name]. Some of them you may (1) know well, others you (2) may be acquainted with, and still others (3) you may not know at all. Please check the box next to the names of those individuals who are in categories (1) or (2). Advice (Work-related) Sometimes it is useful to get help or advice from your colleagues on performing some aspect of doing your work well. Please check the box next to the names of those individuals who you would approach for help or advice on such work related issues. Advice (Work related) Reciprocal There also may be people who come to you seeking help or advice about doing their own work well. Please check the box next to the names of those individuals who might typically come to you for help or advice on work related issues. Advice (Career and Success) Sometimes it is useful to seek advice from colleagues at work about more than just how to do your work well. For example, you may be interested in “how things work” around here, or how to optimize your chances for a successful career here. If you needed help along these lines, who would you go to for help or advice regarding these issues? Please check the box next to the names of those individuals who you would approach for help or advice on these non-technical related issues. Advice (Career and Success) Reciprocal There also may be people who come to you seeking help or advice about such non-task related issues. Please check the box next to the names of those individuals who might typically come to you for help or advice along these dimensions. Friendship Sometimes during the course of interactions at the workplace, friendships form. We are interested in whether you have people at [firm name] who you consider to be friends of yours. Please check the box next to the names of the individuals who you think of as friends here at [firm name]. Event based questions: Lunch Below you will find a list of people who work at [firm name]. Please check the names of the individuals with whom you have met with for lunch at least once during the past 30 days. Event based advice Below you will find a list of people who work at [firm name]. Please check the names of the individuals from whom you’ve sought out advice about work related matters at least once during the past 30 days. The problem of recall: People are highly inaccurate when you ask them to recall specific interaction events. They are much more accurate when you ask them to recall enduring and qualitatively meaningful relationships. Events are highly informative when you know what happens during that event, but otherwise they are harder to generalize from. Now that we have some examples of questions, lets add one two the survey. I typically recommend having 2 questions, one expressive (e.g., friendship) and one instrumental (e.g., advice). They usually provide different information. Lets, for the sake of example, add an advice network question to Page 2. We will create a “Multiple Choice” question where the answers are the names of the people in the organization (e.g., Alice, etc.). The question we ask is: Sometimes it is useful to get help or advice from your colleagues on performing some aspect of doing your work well. Please check the box next to the names of those individuals who you would approach for help or advice on such work related issues. We will also add a short note telling people not to select their own name and to check as few or as many names as appropriate. Below the options, also check “Allow more than one answer to this question (use checkboxes). Let us now save this question by clicking save. I will now add one more question, this can be our “Dependent variable” which measures the extent to which co-workers have a positive or negative impact. After all the questions are in, click “Next” at the top and lets begin collecting responses. We will use the “Get Web Link” option. The web link for the survey I made is: https://www.surveymonkey.com/r/QZ5KG3S Lets quickly fill out the survey. I will also fill in responses for everyone in the roster. After all the responses are in for all the people in the organization (e.g., Alice…) we can download the data. I have downloaded the excel file. It comes as a zip file and a resulting csv file with the data. These are respectively attached here and here. The raw CSV file that is exported from Survey Monkey looks like this: Lets clean this up so that we get a 7×7 matrix. Note that there is an ordered list of names on the left (Alice…Greg on the rows) and a similarly ordered list of names at the top (columns). The rows are the respondents (senders) and the columns are the people with whom they do and do not have a relationship. With the names, the matrix looks like: Without the names, it looks like: Try to match it up to the survey response in our original file. The matrix is now saved as surveyexample.csv. The following code imports the data (the cleaned up version above) and plots the network: # This file provides some simple code to get you started on your Network Analysis Journey library(data.table) library(curl) library(sna) #(Q0) “who do you know or know of at [the firm]”, #Load the “Survey Monkey” network data from Dropbox. survey <- fread(https://www.dropbox.com/s/nd13m6szn8d8lto/surveyexample.csv?dl=1&#8217;) #Convert the data.table objects into matrix format so they can be #analyzed using the sna package. survey = as.matrix(survey) # this creates the no names = c(“Alice”, “Bob”, “Chris”,“Dina”,“Elena”,“Frank”, “Greg”) # Rename all the rows rownames(survey) = names # Rename all the columns colnames(survey) = names # Plot the survey network gplot(survey, label = names) Here is the resulting network. We can calculate each person’s centrality and also correlate the network positions with the final question we asked. We need to first convert it into a numeric and then import it into R. # This file provides some simple code to get you started on your Network Analysis Journey library(data.table) library(curl) library(sna) #(Q0) “who do you know or know of at [the firm]”, #Load the “Survey Monkey” network data from Dropbox. survey <- fread(‘https://www.dropbox.com/s/nd13m6szn8d8lto/surveyexample.csv?dl=1&#8217;) #Convert the data.table objects into matrix format so they can be #analyzed using the sna package. survey = as.matrix(survey) # this creates the no names = c(“Alice”, “Bob”, “Chris”,”Dina”,”Elena”,”Frank”, “Greg”) # Rename all the rows rownames(survey) = names # Rename all the columns colnames(survey) = names # Plot the survey network gplot(survey, label = names) #Load the “Survey Monkey” network data from Dropbox. surveyoutcome <- fread(‘https://www.dropbox.com/s/we2dvevfejte8ov/surveyoutcome.csv?dl=1&#8217;) #Convert the data.table objects into matrix format so they can be #analyzed using the sna package. surveyoutcome = as.matrix(surveyoutcome) # rename rownames and create a variable which is the integer # version of the numeric response colnames(surveyoutcome) = c(“name”,”response”,”respval”) respval = as.integer(surveyoutcome[,3]) # Calculate outdegree for the survey response survey.outdegree = degree(survey, cmode = “outdegree”) # Estimate a model regressing the respval on the outdgree m.0 = lm(respval ~ survey.outdegree) summary(m.0) Here is the regression outcome: The above walk-through should give you a way to collect network data, and then analyze it using R. Before, I conclude I want to discuss the various survey approaches used by network analysts Types of Network Surveys Roster based surveys: Roster based methods are perhaps the most common approach. This is what we just completed above. With roster surveys, you provide the respondent with a list of names of people or organizations. Then you ask them to indicate (by checking off the boxes next to the names) which of these people they have a certain relationship with. The nice thing about roster based surveys is that they tend to be quite accurate because people don’t have to recall the names out of the blue. Further, the roster allows you get longer network lists than if people had to recall names from memory. The down-side of this is that if the organization has too many people (say in the 1000s) it would be too hard to make people go through a list of 1000 or even worse, 2000 people. List based surveys: The other type of survey is a list survey. Here you ask the question and then request that your respondents list the names of people in the organization that they have this relationship with. What might be some concerns with a survey method like this? Ego-network surveys: This is a slightly modified version of the list-based survey. Here you ask the people to list up to five people (or k people) that they have a certain relationship with. Then you ask them to indicate whether the people listed also have a relationship of a certain type with each other. Position generator surveys: This is perhaps the least structural of the network surveys. Here what you do is the following: You provide a list of the “positions” that people can potentially occupy – so in an organization you list the different functional areas, levels of seniority, etc. And then ask people whether they have a no relationship with someone in such a position, acquaintance in that position, a friend in that position, etc. This is a very indirect measure of networks, but it provides a broad understanding of the “range” of a persons network. In addition to these classical approaches to collecting network data, organizations have more modern methods available to figure out potential sources of interaction between their employees. These include: Email: IT administrators know every email you send to everyone else and what it contains. This is true in most cases in the vast majority of organizations. Scary, yes. True, yes. But this is information that everyone knows exists and some organizations are using it to understand informal interaction and trying to make better decisions with this information. Mailing list/Groups activity: Another source of information about networks and interaction are the mailing lists that people are a part of. RFID: Most of our ID cards have RFID these days – we use these cards to enter/exit buildings. RFID censors can also be placed in strategic locations to understand interactions that are face-to-face between people. Conference organizers are also using RFID tags to understand interaction among attendees. Online data sources: LinkedIn — LinkedIn has a massive economic graph. Their data include where people got their degrees, where they worked, who they worked with, etc. Facebook: This is the largest social network in the world. Period. About firms: The websites of Venture capital firms tell you who their partners, etc. are and where they attended college and when they graduated. It also tells you that some may be investing in similar projects. More: In a future post, I will walk through how to create “network” data using text in documents. The “ties” here are measures of similarity between the text descriptions of entities. Peer effects, knowledge transfer and social influence The structural approach to social networks is inherently beautiful as a representational approach. I am always in awe of the fact that we can learn so much about how human beings act or their outcomes based merely on the pattern of their social ties. The idea is both simple and profound. The structural approach is built on assumptions regarding information transfer across a simpler unit of analysis: the dyad. In the world of dyads, new complications arise and different theories must be developed and tested. Let us take the Professionals data we have been analyzing as an example. Here is the advice network among these professionals. In the prior analyses, we have focused on analyzing the structure of each node’s connections. For example, each node has a specific number of incoming connections, its outdegree: The beauty of the structural approach to social networks is that we can learn a lot about the outcomes of individuals and organizations by merely looking at the pattern of their relationships. Recall our prior analysis. There is information in indegree. We were able to explain 6.5% of the variation in our measure of whether a person has the “knowledge to succeed” just by looking at the count of their incoming connections! While indegree may capture or reflect other processes and might not be causal, it is nevertheless information rich. However, an Ego’s alters (e.g., the people that a focal node is connected to) are not all the same—as we sometimes implicitly assume in our models. As a note, I don’t believe that researchers actually believe that all the people we are connected to are the same. Indeed, betweenness, closeness, eigenvector centrality, all assume that not all connections are the same by their very construction. However, the heterogeneity in alter characteristics is implicit rather than explicit because we never specify in our theories or models, exactly how these individuals vary. The peer effects framework on the other had often ignores variation in structure, but emphasizes variation in the characteristics of connections. Below, I walk through some examples of this approach. A simple model of peer effects The “peer effects” framework is called as such because it is based on a line of research in the economics of education where scholars were attempting to understand the impact of classroom peers on academic outcomes. Hence, peer effects. Let us start with a simple setup. Let us assume there are 100 students in a classroom. The teacher has decided that everyone in the class will have a study partner, so he asks each of the students to pair up into groups of two. There are now 50 pairs, each with two people. The teacher wonders, whether having a smart peer (i.e., alter) increases the performance of for a focal student (e.g. Ego). Visually, he is interested in understanding this influence process: At the end of the class, all of the students take a standardized exam. This exam is scored on a 100 point scale, and students can get anywhere from a score of 0 to 100. The teacher takes this score and runs the following regression with 100 observations, 1 for each student. She’s also good with standard errors, so she clusters standard errors at the level of the dyad: $score_{i} = \beta_{0} + \beta_{1} score_{j} + \epsilon$ After running the regression, she finds a large and statistically significant coefficient for $\beta_{1}$. How should she interpret it? A naive causal interpretation is: for every unit increase in $score_{j}$ there is a corresponding $\beta_{1}$ increase in $score_{i}$. Or, by having a study partner with a certain score, there is a corresponding increase/decrease in the performance of the focal student. This interpretation is naive for a reason, because is probably (though not definitely) wrong. But before we dive into why it is probably wrong, it is useful to reiterate that this “peer effects” representation is quite general. For example these outcomes might be determined in part by the influence of peers (however defined). • Finance: Putting money away into a retirement savings account, adopting a microfinance product, etc. • Health behaviors: Obesity, Happiness, use of HIV/AIDS test, etc. • Academic performance: Getting good grades, choosing a major. • Entrepreneurship: Becoming an entrepreneur; deciding against becoming an entrepreneur. • Careers: Quitting; moving to a new company. • Adoption of products: Prescribing a drug, buying a car. • Adoption of behaviors: Smoking, drinking, sexual events. • Adoption of ideas: Learning from patents. • Organizational behavior: Adoption of corporate practices and policies. The basic idea is simple: We observe some level or change in the behavior or characteristics of an alter (or alters) and we see whether these are correlated to the behaviors or outcomes of Ego. This apparently simple process is much more nuanced and complicated than it appears. There are dozens of “mechanisms” that can lead to the correlation we might observe (or that the teacher observes. Here are some examples of a few reasons why we might observe a correlation, either positive or negative. Consider the case of product adoption.  Name Definition 1 Direct transfer of specific information. Alter tells me about a product, but nothing more. 2 Persuasion Effects Alter tells me about the product, and forcefully persuades me to adopt it. 3 Direct transfer of general information. Alter tells me about a website that reviews products, and on this page a list is produced where the product that I adopt is listed first. 4 Role-modeling / Imitation I see Alter doing something, I copy it. 5 Install Base Effects I see many Alters adopting a product (i.e. buying an iPad, I adopt the iPad) 6 Threshold Effects I only buy an iPad if at least 10 people I know own it, once the 10th person adopts, I decide to adopt. 7 Snob effects I see an Alter(s) doing something, I avoid doing it myself. 8 Simultaneous Alter helps me out and I help her out, and together we perform better than either one would alone, because we, by talking through a problem for example, figure it out together. 9 Reverse causality The Alter does not affect Ego; but rather the Ego affects the Alter. 10 Contextual Effects We are both in the same neighborhood, and because we get exposed to the same billboard, we see the same advertisement for a project, and thus we adopt it. 11 Induced Environmental Effects Having a high achieving peer results in a teacher who teaches at a higher level, thus the student learns more not because of greater transfer of information from her peer, but because teaching quality improves. 12 Selection bias I become friends with people who already own iPads. I become friends with people who like technology, and because they like technology, they also own iPads. 13 Homophily Effects I like iPads and because I do, I become friends with iPads. Can you think of more mechanisms? Which mechanism is actually at play in a specific context? This question is a hard one. Because we have several potential mechanisms that we must work with, how do we rule out some of them? Some mechanisms are easier to rule out then others, but most are actually quite difficult to conclusively confirm or deny. To deal with this issue (which is VERY common during the review process) I have come up with a two part classification. The first set of mechanisms are what I call “pseudo-mechanisms.” Pseudo-mechanisms are alternative explanations of the correlation that have nothing to do with social influence of the type we care about: influence flowing from the peer to the focal individual. Charles Manski, in a famous paper has defined these as the reflection problem and the selection problem. Reflection problem: The reflection problem asks you to imagine a mirror. You see two objections moving. And if it is unclear to you that you are looking at a mirror, then you can’t tell which one is the actual person who is moving and which one is the mirror image. More formally, imagine that we have two sets of variables, let us call them x and y; let x be the measurement of the characteristics of individual ’s peers’ characteristics at time t and let y be the measurement of the focal individual ’s characteristics at time t. Now, because of the simultaneous measurement, we are unable to tell whether the change in x’s characteristics has caused a change in y’s characteristic, or vice versa. And this indeterminacy exists for each observation. Furthermore, we are unable to tell whether each of these actors was exposed to some environmental shock (advertising, etc. at the same time, which make their adoption correlated). The only way that we can insure that the reflection problem is not an issue is by measuring the traits and characteristics of the xs prior to measuring those of y. However, solving the doing this does not resolve the issue of causality. Thus, it is a necessary, but insufficient condition. Another important, and much more difficult condition now has to be met in order for the effect to have the title “Causal.” This is the selection problem. The set of conditions that solves the selection problem are twofold: 1. Either you know all the reasons why two people were paired together (i.e. why person y is friends with, shares a room with, enters the college as, with x). 2. OR the two individuals are randomly assigned, and thus breaking the correlation between the characteristics of x and y. Assume for a moment that we have ruled out reflection and selection effects by (1) using a lagged measure of peer consumption or action, and (2) the ego and alter are randomly paired, we have only ruled out a handful of possible “mechanisms” producing the peer effects. We can rule out the “pseudo-mechanisms” #8 – #13 (except for #11), but that leaves us with 8 possible mechanisms. Imagine a doctor telling you that “Yes, we’ve ruled out the fact that you are faking your symptoms, but there are 8 or more possible viruses that could be causing your infection!” So, we need to now try and distinguish between these. This is hard, even harder than resolving the reflection and selection problems. The reflection and selection problems are interesting in that they are hard problems to solve, but we know how to solve them. Not to make too many medical analogies, but this like separating conjoined twins. Hard, but someone can do it and has done it. So how do we distinguish between different mechanisms, say #1 – #7? This will depend a lot on context, and a lot on the data that you have available. Let us examine a very simple situation where we have two students. Let us call the first student “Ego” and let us call the second student “Alter.” Assume for a moment that we have completely alleviated the problems of reflection and selection. Let us say that really there are two contender mechanisms. (This is probably not true; but, for a moment assume that it is true.) Mechanism 1: A student learns general study habits from his/her peer (alter) and this why his performance increases. Mechanism 2: A student interacts a lot with his/her peer (alter) and they study together, and the peer helps the student learn the material. How would we go about designing a test that would distinguish between these two mechanisms? 1. For instance, if what the student is getting from her peer is increased motivation, that should have a positive effect on various subjects. 2. On the other hand, if the student is learning something rather specific (like how to do an integral), then the effects should be subject specific. Assume you do this test, and you find out that there are effects across subjects, what can you say about the mechanisms? Can you say anything? How to conduct the estimation in R Standard peer effects estimations are quite straightforward. This is especially true when you have randomization in the pairing of focal individuals to peers and longitudinal data so you can lag the characteristics of the peer. $score_{i,t+1} = \beta_{0} + \beta_{1} score_{j,t} + \epsilon$ Here is a synthetic peer effects dataset in which 2000 individuals have been randomly paired: peer_effects.csv. Let us examine the extent to which there are peer effects. The model we want to estimate is: $postself_{i,t+1} = \beta_{0} + \beta_{1} prepeer{j,t} + \epsilon$ Estimating this equation in R with this data results in: If the randomization is proper, this coefficient should be stable if we control for the focal individuals own pretreatment score. Another worry we have is whether this effect of the peer (captured by the pre-treatment characteristics) is homogeneous or heterogeneous. That is, does it depend on the characteristics of the focal individual or does it apply to everyone? To test this, we include a main effect of the characteristics of the focal individual (self_char) and an interaction term (pre_peer * self_char). Here, we see that the peer effects depends on the characteristic of the focal individual. If the focal individual has this characteristic (e.g., willingness to listen), the peer effect is larger. This is only a simple demonstration of the complexity of peer effects, there are likely to be many interactional factors that turn peer effects “on” or “off” or modulate them in some important way. One could imagine the following contingencies, where peer effects depend on characteristics of: • the focal individual • the environment • the alter/peer • personalities of both Entrepreneurial networks Who is this? Keep this face in mind, at least for a bit. Leading in the whitespace A major breakthrough in our understanding of the social nature of competition came through a series of papers and then a foundational book by Professor Ronald Burt of the University of Chicago, “Structural Holes: The Social Structure of Competition.” While others had made similar arguments before (see Bavelas 1948, and for a fantastic review see Centrality in Social Networks: Conceptual Clarification by Linton Freeman) Burt grounded this idea in theory and provided a very clear framework for other scholars to rethink competition and strategy through this structural lens. His, very powerful, argument to us was to think about “structural holes” as “opportunities. That is, bridges across this holes in social structure are sources of value for everyone involved—the person who bridges, as well as those being bridged. The research that followed resulted in a paradigmatic shift in our understanding of how competition within organizations and in markets functions. The early work made a clean and forceful point: the causal agent is not the “strength or weakness” of a tie, but the fact that bridges create value. Focus on the bridge. This structural argument was supported by two mechanisms of action. These can be described as the control and information benefits of structural holes. Consider the three archetypical networks depicted below (I’ve adapted this representation from Krackhardt 1999). On the left, the focal individual “YOU” is in a structure with very few structural holes. That is, all of his connections are connected to each other. On the far right, is the high structural holes condition. In this case, not of the focal individual’s connections are connected to each other. The intermediate network, which we will discuss later, is theorized to have its own special properties. The Control Benefits of Structural Holes Let us examine the control benefits first. In the first representation, who has control? Consider the situation in the figure on the left. What happens if you cheat one person in the network? They talk to each other. Your reputation suffers. You lose some of your control. So, who is in control? Not you, but the group. The role that closed networks play in creating trust through control is not uncommon. For instance, small businessmen/women in America and other countries often tend to do business with their co-ethnics. While preventing cheating is a good thing, a closed structure could also be highly constraining. Small and closed-knit groups have strong group norms that can force members to conform in unproductive or harmful ways. Innovation, for example, often requires people to take risks—both social and economic—and closed groups might stymie such risk taking. At the other end of the spectrum, the focal person’s connections are not connected to each other. This lack of connection implies that they cannot communicate, and as a result, information or gossip cannot travel between these disconnected parties as quickly. The focal individual in this case has more control, because they have the freedom to act without others coordinating against them. If you are in the third structure, there are two specific control benefits that you have: • The first strategy to exploit your control benefits here is one where you are the broker who can leverage your position to play-off two individuals (perhaps buyers or even sellers) who want the same thing from you. For instance, you can in subtle ways, make them either lower their demands or increase their willingness to pay. • The second strategy based on control is to be a broker between two people (companies) who have conflicting demands. The broker, in order to get one person change their demands, can leverage the demands of the other. Furthermore, since these two parties do not interact with each other — the broker has the ability (because of this increased control) to shape the information that one party gets about the other. These are obviously dangerous strategies – and ones that require a significant amount of finesse and skill. The Information Benefits of Structural Holes All is not lost if you can’t pull off the control strategy. Spanning structural holes also provides information benefits. The literature broadly posits three types of information benefits: • Access benefits: Access benefits consist of two components. First, because the broker spans structural holes, she connects two groups that do not have a high degree of overlap in their knowledge. Thus, the broker has access to information that is not accessible to those in the separate and spanned social groups. Second, since you are getting more diverse information because you have diverse connections — when you receive valuable information you know who can use it. • Timing benefits: Information can be transmitted over multiple channels. Consider job postings. Before a job is posted in an official manner, people in the department where the job will be know about it. Talking to someone in that department will give you knowledge about the job before everyone else. This subtle difference in timing can mean the difference between getting and not getting a job. Because the broker gets information through informal channels, she often has access to information before others. Timing matters in many contexts, including venture deals, hiring, knowing a house is on the market, etc. • Referrals: Trust matters. Period. People avoid hiring people, buying products, or investing in companies that they have limited information about. Those who span structural holes have contacts in different social worlds with their different opportunities. Contacts with people in these social circles can refer you to their own network, thereby increasing your trustworthiness. The Structural Holes in DNA Ok, now that we have the theory down. I want to share an example from real life that exemplifies the beauty of the theory of structural holes. This is James Watson, one of the co-discoverers of the structure of DNA. This discovery is described by many as one of the most (if not the most) important single scientific discoveries of the 20th century. In his gripping account of this discover, The Double Helix he recounts how he and Francis Crick discovered the structure of DNA. Here are some quotes about the quest for the structure of DNA from the Nobel Prize website: In the late 1940’s, the members of the scientific community were aware that DNA was most likely the molecule of life, even though many were skeptical since it was so “simple.” …Nobody had the slightest idea of what the molecule might look like. In order to solve the elusive structure of DNA, a couple of distinct pieces of information needed to be put together… As in the solving of other complex problems, the work of many people was needed to establish the full picture. Francis Crick, a brilliant scientist was already at Cambridge before James Watson had arrived, Watson describes Crick: “Before my arrival in Cambridge, Francis only occasionally thought about deoxyribonucleic acid (DNA) and its role in heredity. This was not because he thought it uninteresting. Quite the contrary. Francis, nonetheless, was not then prepared to jump into the DNA world…[S]uch a decision would create an awkward personal situation. At this time molecular on DNA in England was, for all practical purposes, the personal property of Maurice Wilkins, a bachelor who worked in London at Kings College…It would have looked very bad if Francis had jumped in on a problem that Maurice had worked over for several years. The matter was even worse because the two, almost equal in age, knew each other and, before Francis remarried, had frequently met for lunch of dinner to talk about science. The combination of England’s coziness – all the important people, if not related by marriage, seemed to know one another – plus the English sense of fair play would not allow Francis to move in on Maurice’s problem.” Watson, on the other hand was an outsider. He describes a few episodes that were critical to his discovery of DNA. Break #1: At a conference in the spring of 1951 in Naples, Watson heard Maurice Wilkins’ talk on the molecular structure of DNA. “I proceeded to forget Maurice, but not his DNA photograph.” Break #2: A manuscript on DNA (as a triple helix) had been written, a copy of which would soon be sent to Peter Pauling, the son of Linus Pauling, Nobel Prize Winner, and a scientist who was working on the structure of DNA himself. Break #3: Knowledge about Chargaff’s rules through is doctoral training in Indiana. Watson had unique access, through his network, to the photos produced by Rosalind Franklin in the Wilkin’s Lab, the unpublished manuscript prepared by Linus Pauling, and exposure to Erwin Chargaff’s rules about the ratio of bases in DNA. Because of his position, he was able to put these pieces together faster than anyone else. All three processes helped Watson: • Access to novel information. • Timing, getting access to information before it was published. • Referrals, through his famous and Nobel prize winning advisor, he was able to hop from one great lab in Europe to an other, and get access to conferences that he would not be able to attend otherwise. Luck? No. Social Networks. Growing your network strategically Structural holes theory also implies a series of tradeoffs between the size of one’s network and the benefits that the network produces. A large network is not necessarily a good thing. This is because maintaining a network connection implies some cost and results in some benefit. • Decreasing returns to network size: If we measure benefits in units of novel information, one could imagine that adding a new tie might entail some cost (time, resources, emotional energy, etc.) but subsequently not result in access to much more new information-e.g., you hear about the same job opportunities from the new connection that you heard about from your existing friend or acquaintance.) So at least in terms of information, there is a decreasing return to the network size: you pay the additional cost of the new connection, but it is providing less information per unit cost than a prior connection. • Constant returns to network size: A more palatable case is constant returns. Here doubling your network size, doubles the amount of information you have access to. Every new network connection provides information in proportion to what the prior network connections provided. • Increasing returns to network size: The most ideal situation is one where doubling the size of your network more than doubles the information you get. Is this even possible, since adding a new network connection that provides more information than before might also be substantially more costly? In any case, you clearly want to be at a point before your costs of maintaining a network significantly outweigh any benefits that you get. Structural holes theory provides some useful guidance on not going too far down the route of decreasing returns to size. A good heuristic for understanding this tradeoff is a calculation developed by Professor Ronald Burt called efficiency. Efficiency can be calculated in the following way: Efficiency = Effective Size / Actual Size Expanding this function out, we can define: Actual size = The number of connections that you have. Effective size = Actual Size – Sum of percent of overlapping ties for each of your connections. Bandwidth and Diversity The model above has been tremendously useful and very predictive. In recent years, some scholars have also highlighted another interesting tradeoff between stronger non-bridging ties and weaker bridging ties: the bandwidth/diversity tradeoff. On one hand, greater bandwidth ties result in greater greater informational volume. On the other hand, weaker bridging ties result in greater variance in information. Recent work suggests this relationship depends fundamentally on the nature of the environment in which people are building their social networks. There are two factors that can reduce the value of bridging ties and privilege high-bandwidth ties: 1. If the network has a homogenous set of knowledge – where most people talk about the same things. Then having more high-bandwidth ties may be more important. 2. If the “refresh rate” – is high – where people’s contacts and interactions churn very fast, or where the environment turbulent and the information is extremely complex — meaning that an idea contains multiple topics or subjects — then high bandwidth ties are better at sustaining the high variance information you need. However, what studies have found is that “strong” bridging ties that have both bandwidth and diversity are the best — but they are indeed rarer rare. Extending the Core Insights from Structural Hole Theory As one can imagine, structural holes theory was extremely powerful and scholars have been working to extend and refine the predictions of the theory further to account for structures that don’t neatly fit into the standard dichotomy or have dynamic elements. Consider dynamics: Given how difficult it is to maintain bridging positions, it is likely that bridges are fragile. Research suggests that bridging ties followed what is called a kinked decay function. Initially bridges have a low likelihood of breaking, followed shortly by a sharp rise in decay, if the bridge survives this spike in decay rates, it is likely to persist for a long time. Two processes often lead to decay: • Disintermediation: Disconnected parties learn to exchange on their own. • Competition from rival brokers: Rivals enter the fray and by offering either greater benefits or lower cost, whittle away at the original bridge’s benefits from occupying the hole. Indeed, the hole no longer exists. Why bridges decay: • -Low performance / High performers have lower rates of decay for bridges • If other relations are decaying, bridges are also likely to decay • Experience bridging improves the chances that new bridges survive • “Hole decay” may be limited when: • Deep barriers limit interaction across the hole. • The benefits to the bridged parties is high enough and switching costs are high. • The bridged individuals don’t question the role of the broker, or it is not salient to them. Beyond Information and Control There are also cases where brokering is disadvantageous. The underlying mechanism leading to the disadvantages of brokering have to do with identity and expectations. • In addition to information, networks also convey expectations about who one is (identity) and how one should behave (expectations). Many of us have been caught between two groups that expect different things from us. This happens at work, at home, and even in our social and personal lives with friends. The more disconnected are connections are, the more likely it is that they have different expectations about how we should behave. Podolny and Baron (1997) show that when a person is a broker in a network that conveys “identity” they are less likely to benefit from their brokerage position than when the network primarily provides “information.” • Similarly, Krackhardt in his Simmelian tie theory makes a related argument that brokering between two strongly connected groups creates pressure to conform to different norms which can create internal role conflict, stress, and thus reduce performance. Outcomes as Mean versus Variance The theories that we have focused on thus far attempt to predict mean or expected outcomes. That is, what is the average difference in wages/promotion rates/bonuses/ideas for those with or without structural holes. The graph below shows that there is a mean shift. The blue distribution (e.g., structural holes condition) has a higher mean outcome. However, this analysis can be pushed further by asking: is there a shift in the variance of potential outcomes. Does a specific structure reduce or increase the possible variation in outcomes. Note that the blue distribution below, is “tighter” than the black distribution. The black distribution has a greater likely hood of worse, but also better outcomes than the first. Which would you prefer below? James Lincoln of UC Berkeley did pioneering studies on business networks in Japan and found that companies that were members of the Keiretsu, while having lower means in terms of outcomes, also had lower variation and as a consequence were less likely to both do extremely poorly but also less likely to do extremely well. With respect to brokerage, we can also think about floors and ceilings. Networks that are high in closure reduce variation in performance, both high and low. The high performance is minimized because of the subsidizing of the lower performers by the high performers, and the low performers don’t do as poorly because the high performers help them out. The network structures that tend to most facilitate the low-variance strategy are closed networks, as one can imagine. The classic examples of this are ethnic networks, where people – the more wealthy people help out the less fortunate ones. Network Positions and Advantage: Status One of the most important things we do on a day-to-day basis is make predictions about the value of individuals or companies, or really, any entity. Making such predictions is challenging because we have limited information about the qualities of the entity we are attempting to make predictions about. For instance: • A hiring manager at a firm is trying to make a prediction about whether a certain applicant will be a high performer. • A PhD admissions committee makes predictions about whether an applicant to their program will turn into a star researcher. • A venture capitalist makes predictions about whether a startup or founding team will create a breakthrough product that will become a billion dollar company. • A search engine is making a prediction about whether a certain webpage contains useful information for its users. • A consumer makes predictions about the quality of a product before he/she buys it. Predictions of this type are commonplace and often rather difficult to make. This difficulty exists for two reasons. First, only a limited set of characteristics are observable to the decision maker, whereas much else is unobservable. A hiring manger, for instance, may observe a resume and a list of references. Based on this resume and reference list, she attempts to make an inference about many things: how hard working the applicant is, their base of knowledge, their ability to get along with other members of her team, and so on. Thus, the hiring manager attempts to use “observables” to infer something about the unobservables. The goal therefore is to map observables (the things that you can easily measure and observe about someone or some organization) to unobservables. What are some examples of unobservables and/or things that are difficult to observe: • Creativity • Whether a person you hire will “fit” with an organization’s culture • Whether a company you invest in will turn a profit • Trustworthiness The inability to effectively communicate information about these hard to quantify traits from one person to another becomes a problem for both the evaluator and in many cases for the person being evaluated, particularly if they are high quality, but others can’t tell this is the case. That is, how does one separate the signal from the noise? One solution proposed to this problem is signaling theory. People send signals and these signals contain information that allow “buyers” to ascertain whether the seller (a job market candidate) is of high quality or not. But anyone can send signals, and sometimes the signals are noisy or uninformative. If the signals are no good, then they don’t solve the asymmetric information problem. Michael Spence argued that some signals are harder to acquire than others, and this difficulty in acquiring the signal is related to some dimension of underlying quality. For instance, a hiring manager might be looking to hire someone with great machine learning talent. Anyone can put “machine learning” on his/her resume, so merely doing so isn’t likely to be a very good signal of having that skill. However, it is probably easier to win a Kaggle competition if you have good machine learning skills than if you do not. As a result, those with more machine learning skill are more likely to be represented among Kaggle winners than those without that skill. Thus, winning in Kaggle is likely to be a decent signal of ML skill. Further, since winning in Kaggle is easily observable, it is perhaps a decent signal for what we care about. Can you think of other signals that contain a lot of information and are difficult to fake? Joel Podolny in a series of articles proposed that social relations also help signal quality. This is a profound idea, and I will walk through it further. But let us fast forward to another application of Eigenvector centrality: the original Google PageRank algorithm. For example, social cues such as endorsements, recommendations, funding decisions or hiring decisions, convey/signal information. Consider James and Betty. Both have two connections of their own. And both of their connections think highly of them and recommend them. In an abstract sense, Betty and James are rated by their raters-e.g., their two connections. But a new problem arises: who has more reliable raters? This is what we can consider the “rating the raters” problem. While in the first degree out (the direct connections of these two individuals) they are indistinguishable, there is substantial variation in their second and third degree ties. Although James and Better have similarly sized networks, Betty’s network connections have far more connections of their own. While it is relatively easy to figure out the difference between the size of Betty and James’ second degree network, the problem gets more complicated the further we move out. Real networks don’t usually have connections out to the 2nd or 3rd degree, but to 4th, 5th, 6th, etc. The second problem is that real networks aren’t usually trees. Networks loop back on themselves over and over again which make the “rating the rater” problem hard.​ So we cannot just re-weight the rating by the ratings received by the rater. There is concept, called Eigenvector centrality, that does exactly what we thought was hard: it rates the raters, the rater’s raters, the rater’s rater’s raters, and so on.​ This measure gives us a nice summary statistic telling us how much “status” a node in the network has. ​Hard to fake because you can perhaps fake your own network ties, but not the ties of your connections’ connections. The nodes below, for instance, are resized by eigenvector centrality. The problem of determining the “value” or credibility of an object based on its connections and its connections’ connections is a general one.​ Google’s original algorithm, PageRank, is sociometric status. ​ The basic intuition of PageRank was if a site gets a lot of incoming links, and the sites linking to the original site also do, and so on. Then there must be some value to it.​ The insight arises by viewing the Web as a network, and using its structure to determine whether a page is useful or not.​ Ego and Altercentric Perspectives Now that we have the basic concept of sociometric status down. The “big idea” in sociology came from Joel Podolny. He suggested that we had focused primarily on seeing networks as “pipes” through which information, resources, support, and other “stuff” flows. However, networks are also useful for individuals in resolving problems of uncertainty because certain types of network structures also signal trust, reputation, and identity — network structures are prisms that reveal information as well. The extent to which networks operate as pipes or as prisms depends on the level of uncertainty faced by market participants. He developed a highly useful framework for thinking about characterizing what structure may matter when. There are two types of uncertainty, Egocentric and Altercentric. Fig. 1.—Illustrative markets arrayed by altercentric and egocentric uncertainty Egocentric uncertainty A market or market segment can rate highly on one type of uncertainty without rating highly on the other. Consider the four markets represented in the figure above. From Podolny (2001): Vaccines: Beginning with the market for a particular vaccine, such as polio or smallpox, in the upper left-hand quadrant. The most salient source of uncertainty in this market is that which underlies the development of the vaccine. Once the vaccine is developed and is given regulatory approval, there is little uncertainty on the part of consumers as to whether they will benefit from the innovation. Accordingly, a market for a vaccine is a market that rates high on egocentric uncertainty, but low on altercentric uncertainty. Roofers: Alternatively, consider the market in the lower right-hand corner, a regional market for roofers. “Roofing technology” is relatively well understood, and while roofers may face some uncertainty as to who needs a roof in any particular year, they can be confident that every homeowner will need repair work or a replacement every 20 years or so. By sending out fliers or advertising in the yellow pages, they can be assured of reaching a constituency with a demand for their service. However, because an individual consumer only infrequently enters the market, the consumer is generally unaware of quality-based distinctions among roofers. The consumer may be able to alleviate some of this uncertainty through consultation with others who have recently had roof repairs; however, the need for such consultation is an illustration of the basic point. Only through such search and consultation can the consumer’s relatively high level of uncertainty be reduced. Accordingly, this is a market that is comparatively low in terms of egocentric uncertainty, but relatively high in terms of altercentric uncertainty. What are some other examples of markets that are low on one type of uncertainty and high on another? What about markets that are high on both? How does one deal with altercentric uncertainty? Let us loop back to our earlier discussion of sociometric status. Why is sociometric status a useful signal to help resolve altercentric uncertainty? • Sociometric Status: A position in a social network – defined by the ties that you have to others – where you receive deference from others who are themselves highly respected or deferred to. When does Status goes awry? However, there are many instances where status does not serve as a perfect signal of quality – and this can lead to mis-perceptions of status and thus misperceptions of quality. When status is a perfect signal of quality it is said that there is tight coupling between status and quality. However, as a I mentioned, this is often not the case. Matthew Effect / Self-fulfilling prophecy: The classic example of this is the phenomenon of the 41st chair. This is the example of the “French Academy” where there are only 40 chairs, and there perhaps no substantive difference between #40 and #41 – but the 40th person becomes a holder of a chair, and the 41st person does not. This results in the 40th person get more rewards, recognition, etc. Which in turn allows them to do better work – because they now have significantly more resources than people who do not. In sociological parlance, the phenomenon of the 41st chair is called “Decoupling.” Here, the linear relationship between quality and status – the 40th person gains far more status than the 41st—breaks down. Buy low, sell high: This decoupling is an arbitrage situation for managers – because most people use status signals that are imperfect. There are two possible strategies to exploit this gap: 1. Figure out a more readily observable representation of social signals that maps onto to quality more tightly and sell that information. 2. Figure out a way to measure sociometric status in a situation where it is not currently used. Then use this as a better way of valuation. Beyond the basics The study of sociometric (and other Status) is an extremely rich area of research in organizational sociology and economic sociology. I have merely scratched the surface of this topic. Some excellent articles and reviews in this stream include: Stuart, Toby E., Ha Hoang, and Ralph C. Hybels. “Interorganizational endorsements and the performance of entrepreneurial ventures.” Administrative science quarterly 44.2 (1999): 315-349. Sauder, Michael, Freda Lynn, and Joel M. Podolny. “Status: Insights from organizational sociology.” Annual Review of Sociology 38 (2012): 267-283. Lynn, Freda B., Joel M. Podolny, and Lin Tao. “A Sociological (De) Construction of the Relationship between Status and Quality.” American Journal of Sociology 115.3 (2009): 755-804. Chen, Ya-Ru, et al. “Introduction to the special issue: Bringing status to the table—attaining, maintaining, and experiencing status in organizations and markets.” (2012): 299-307. Phillips, Damon J., and Ezra W. Zuckerman. “Middle-Status Conformity: Theoretical Restatement and Empirical Demonstration in Two Markets.” American Journal of Sociology 107.2 (2001): 379-429. Network Positions and Advantage: Structural Holes Who is this? Keep this face in mind, at least for a bit. In the prior lecture we discussed the simple micro-macro-micro process described in Granovetter (1973), the “Strength of Weak Ties.” Recall what we discussed: The forbidden triad is forbidden because in equilibrium it is generally unstable, because it is unbalanced. The unstable structure of the forbidden triad is particularly unstable for strong ties in which strength increases as some function of. • The amount of time that two people spend together • The emotional intensity of the interaction • The intimacy between the two parties (i.e., mutual confiding) • The reciprocal services which the two parties engage in. The way to sustain the “bridge structure” implied by the forbidden triad is to weaken one of these conditions. The weak tie that is a result, can allow for the persistence of “bridges” or “brokerage” across distinct and differentiated strong tie clusters across groups that divide the social world. One key assumption that we make is that there is different information that is being discussed across these different groups. For instance, these different groups could be scientific research communities, regional economic clusters, different departments in the same business school, and so on. We start with the assumption that people in these different groups are doing different things, they may have different cultures, and are members of different disciplines. Information within a cluster–e.g., information that person 1 and 2 who are in Group A possess–is much likely to be redundant than information across clusters. Consequently, information in group A and group B is said to be non-redundant. That is, a person from group A, by talking to someone in group B is more likely to learn something new than if she talked to someone else from group A. The “big idea” from the Strength of Weak Ties hypothesis is that there are “holes” in the social structure and that weak ties are the conduits that can transmit information across these holes. Thus, more weak ties mean that people have access more and newer information. The Holes in Social Structure The crystal clear mechanisms implied by the weak tie hypothesis can be credited to the imagination of the author for seeing something that others missed. Yet, the empirical facts of the original paper were consistent with this hypothesis, but the measurement did not capture spanning the holes in the structure per se. The theoretical argument was that weak ties because of why they exist should correspond to this structural configuration. Another major breakthrough came through a series of papers and then a foundational book by Professor Ronald Burt of the University of Chicago, “Structural Holes: The Social Structure of Competition.” While others had made similar arguments before (see Bavelas 1948, and for a fantastic review see Centrality in Social Networks: Conceptual Clarification by Linton Freeman) Burt grounded this idea in theory and provided a very clear framework for other scholars to rethink competition and strategy through this structural lens. His, very powerful, argument to us was to think about “structural holes” as “opportunities. That is, bridges across this holes in social structure are sources of value for everyone involved—the person who bridges, as well as those being bridged. The research that followed resulted in a paradigmatic shift in our understanding of how competition within organizations and in markets functions. The early work made a clean and forceful point: the causal agent is not the “strength or weakness” of a tie, but the fact that bridges create value. Focus on the bridge. This structural argument was supported by two mechanisms of action. These can be described as the control and information benefits of structural holes. Consider the three archetypical networks depicted below (I’ve adapted this representation from Krackhardt 1999). On the left, the focal individual “YOU” is in a structure with very few structural holes. That is, all of his connections are connected to each other. On the far right, is the high structural holes condition. In this case, not of the focal individual’s connections are connected to each other. The intermediate network, which we will discuss later, is theorized to have its own special properties. The Control Benefits of Structural Holes Let us examine the control benefits first. In the first representation, who has control? Consider the situation in the figure on the left. What happens if you cheat one person in the network? They talk to each other. Your reputation suffers. You lose some of your control. So, who is in control? Not you, but the group. The role that closed networks play in creating trust through control is not uncommon. For instance, small businessmen/women in America and other countries often tend to do business with their co-ethnics. While preventing cheating is a good thing, a closed structure could also be highly constraining. Small and closed-knit groups have strong group norms that can force members to conform in unproductive or harmful ways. Innovation, for example, often requires people to take risks—both social and economic—and closed groups might stymie such risk taking. At the other end of the spectrum, the focal person’s connections are not connected to each other. This lack of connection implies that they cannot communicate, and as a result, information or gossip cannot travel between these disconnected parties as quickly. The focal individual in this case has more control, because they have the freedom to act without others coordinating against them. If you are in the third structure, there are two specific control benefits that you have: • The first strategy to exploit your control benefits here is one where you are the broker who can leverage your position to play-off two individuals (perhaps buyers or even sellers) who want the same thing from you. For instance, you can in subtle ways, make them either lower their demands or increase their willingness to pay. • The second strategy based on control is to be a broker between two people (companies) who have conflicting demands. The broker, in order to get one person change their demands, can leverage the demands of the other. Furthermore, since these two parties do not interact with each other — the broker has the ability (because of this increased control) to shape the information that one party gets about the other. These are obviously dangerous strategies – and ones that require a significant amount of finesse and skill. The Information Benefits of Structural Holes All is not lost if you can’t pull off the control strategy. Spanning structural holes also provides information benefits. The literature broadly posits three types of information benefits: • Access benefits: Access benefits consist of two components. First, because the broker spans structural holes, she connects two groups that do not have a high degree of overlap in their knowledge. Thus, the broker has access to information that is not accessible to those in the separate and spanned social groups. Second, since you are getting more diverse information because you have diverse connections — when you receive valuable information you know who can use it. • Timing benefits: Information can be transmitted over multiple channels. Consider job postings. Before a job is posted in an official manner, people in the department where the job will be know about it. Talking to someone in that department will give you knowledge about the job before everyone else. This subtle difference in timing can mean the difference between getting and not getting a job. Because the broker gets information through informal channels, she often has access to information before others. Timing matters in many contexts, including venture deals, hiring, knowing a house is on the market, etc. • Referrals: Trust matters. Period. People avoid hiring people, buying products, or investing in companies that they have limited information about. Those who span structural holes have contacts in different social worlds with their different opportunities. Contacts with people in these social circles can refer you to their own network, thereby increasing your trustworthiness. The Structural Holes in DNA Ok, now that we have the theory down. I want to share an example from real life that exemplifies the beauty of the theory of structural holes. This is James Watson, one of the co-discoverers of the structure of DNA. This discovery is described by many as one of the most (if not the most) important single scientific discoveries of the 20th century. In his gripping account of this discover, The Double Helix he recounts how he and Francis Crick discovered the structure of DNA. Here are some quotes about the quest for the structure of DNA from the Nobel Prize website: In the late 1940’s, the members of the scientific community were aware that DNA was most likely the molecule of life, even though many were skeptical since it was so “simple.” …Nobody had the slightest idea of what the molecule might look like. In order to solve the elusive structure of DNA, a couple of distinct pieces of information needed to be put together… As in the solving of other complex problems, the work of many people was needed to establish the full picture. Francis Crick, a brilliant scientist was already at Cambridge before James Watson had arrived, Watson describes Crick: “Before my arrival in Cambridge, Francis only occasionally thought about deoxyribonucleic acid (DNA) and its role in heredity. This was not because he thought it uninteresting. Quite the contrary. Francis, nonetheless, was not then prepared to jump into the DNA world…[S]uch a decision would create an awkward personal situation. At this time molecular on DNA in England was, for all practical purposes, the personal property of Maurice Wilkins, a bachelor who worked in London at Kings College…It would have looked very bad if Francis had jumped in on a problem that Maurice had worked over for several years. The matter was even worse because the two, almost equal in age, knew each other and, before Francis remarried, had frequently met for lunch of dinner to talk about science. The combination of England’s coziness – all the important people, if not related by marriage, seemed to know one another – plus the English sense of fair play would not allow Francis to move in on Maurice’s problem.” Watson, on the other hand was an outsider. He describes a few episodes that were critical to his discovery of DNA. Break #1: At a conference in the spring of 1951 in Naples, Watson heard Maurice Wilkins’ talk on the molecular structure of DNA. “I proceeded to forget Maurice, but not his DNA photograph.” Break #2: A manuscript on DNA (as a triple helix) had been written, a copy of which would soon be sent to Peter Pauling, the son of Linus Pauling, Nobel Prize Winner, and a scientist who was working on the structure of DNA himself. Break #3: Knowledge about Chargaff’s rules through is doctoral training in Indiana. Watson had unique access, through his network, to the photos produced by Rosalind Franklin in the Wilkin’s Lab, the unpublished manuscript prepared by Linus Pauling, and exposure to Erwin Chargaff’s rules about the ratio of bases in DNA. Because of his position, he was able to put these pieces together faster than anyone else. All three processes helped Watson: • Access to novel information. • Timing, getting access to information before it was published. • Referrals, through his famous and Nobel prize winning advisor, he was able to hop from one great lab in Europe to an other, and get access to conferences that he would not be able to attend otherwise. Luck? No. Social Networks. Growing your network strategically Structural holes theory also implies a series of tradeoffs between the size of one’s network and the benefits that the network produces. A large network is not necessarily a good thing. This is because maintaining a network connection implies some cost and results in some benefit. • Decreasing returns to network size: If we measure benefits in units of novel information, one could imagine that adding a new tie might entail some cost (time, resources, emotional energy, etc.) but subsequently not result in access to much more new information-e.g., you hear about the same job opportunities from the new connection that you heard about from your existing friend or acquaintance.) So at least in terms of information, there is a decreasing return to the network size: you pay the additional cost of the new connection, but it is providing less information per unit cost than a prior connection. • Constant returns to network size: A more palatable case is constant returns. Here doubling your network size, doubles the amount of information you have access to. Every new network connection provides information in proportion to what the prior network connections provided. • Increasing returns to network size: The most ideal situation is one where doubling the size of your network more than doubles the information you get. Is this even possible, since adding a new network connection that provides more information than before might also be substantially more costly? In any case, you clearly want to be at a point before your costs of maintaining a network significantly outweigh any benefits that you get. Structural holes theory provides some useful guidance on not going too far down the route of decreasing returns to size. A good heuristic for understanding this tradeoff is a calculation developed by Professor Ronald Burt called efficiency. Efficiency can be calculated in the following way: Efficiency = Effective Size / Actual Size Expanding this function out, we can define: Actual size = The number of connections that you have. Effective size = Actual Size – Sum of percent of overlapping ties for each of your connections. Bandwidth and Diversity The model above has been tremendously useful and very predictive. In recent years, some scholars have also highlighted another interesting tradeoff between stronger non-bridging ties and weaker bridging ties: the bandwidth/diversity tradeoff. On one hand, greater bandwidth ties result in greater greater informational volume. On the other hand, weaker bridging ties result in greater variance in information. Recent work suggests this relationship depends fundamentally on the nature of the environment in which people are building their social networks. There are two factors that can reduce the value of bridging ties and privilege high-bandwidth ties: 1. If the network has a homogenous set of knowledge – where most people talk about the same things. Then having more high-bandwidth ties may be more important. 2. If the “refresh rate” – is high – where people’s contacts and interactions churn very fast, or where the environment turbulent and the information is extremely complex — meaning that an idea contains multiple topics or subjects — then high bandwidth ties are better at sustaining the high variance information you need. However, what studies have found is that “strong” bridging ties that have both bandwidth and diversity are the best — but they are indeed rarer rare. Extending the Core Insights from Structural Hole Theory As one can imagine, structural holes theory was extremely powerful and scholars have been working to extend and refine the predictions of the theory further to account for structures that don’t neatly fit into the standard dichotomy or have dynamic elements. Consider dynamics: Given how difficult it is to maintain bridging positions, it is likely that bridges are fragile. Research suggests that bridging ties followed what is called a kinked decay function. Initially bridges have a low likelihood of breaking, followed shortly by a sharp rise in decay, if the bridge survives this spike in decay rates, it is likely to persist for a long time. Two processes often lead to decay: • Disintermediation: Disconnected parties learn to exchange on their own. • Competition from rival brokers: Rivals enter the fray and by offering either greater benefits or lower cost, whittle away at the original bridge’s benefits from occupying the hole. Indeed, the hole no longer exists. Why bridges decay: • -Low performance / High performers have lower rates of decay for bridges • If other relations are decaying, bridges are also likely to decay • Experience bridging improves the chances that new bridges survive • “Hole decay” may be limited when: • Deep barriers limit interaction across the hole. • The benefits to the bridged parties is high enough and switching costs are high. • The bridged individuals don’t question the role of the broker, or it is not salient to them. Beyond Information and Control There are also cases where brokering is disadvantageous. The underlying mechanism leading to the disadvantages of brokering have to do with identity and expectations. • In addition to information, networks also convey expectations about who one is (identity) and how one should behave (expectations). Many of us have been caught between two groups that expect different things from us. This happens at work, at home, and even in our social and personal lives with friends. The more disconnected are connections are, the more likely it is that they have different expectations about how we should behave. Podolny and Baron (1997) show that when a person is a broker in a network that conveys “identity” they are less likely to benefit from their brokerage position than when the network primarily provides “information.” • Similarly, Krackhardt in his Simmelian tie theory makes a related argument that brokering between two strongly connected groups creates pressure to conform to different norms which can create internal role conflict, stress, and thus reduce performance. Outcomes as Mean versus Variance The theories that we have focused on thus far attempt to predict mean or expected outcomes. That is, what is the average difference in wages/promotion rates/bonuses/ideas for those with or without structural holes. The graph below shows that there is a mean shift. The blue distribution (e.g., structural holes condition) has a higher mean outcome. However, this analysis can be pushed further by asking: is there a shift in the variance of potential outcomes. Does a specific structure reduce or increase the possible variation in outcomes. Note that the blue distribution below, is “tighter” than the black distribution. The black distribution has a greater likely hood of worse, but also better outcomes than the first. Which would you prefer below? James Lincoln of UC Berkeley did pioneering studies on business networks in Japan and found that companies that were members of the Keiretsu, while having lower means in terms of outcomes, also had lower variation and as a consequence were less likely to both do extremely poorly but also less likely to do extremely well. With respect to brokerage, we can also think about floors and ceilings. Networks that are high in closure reduce variation in performance, both high and low. The high performance is minimized because of the subsidizing of the lower performers by the high performers, and the low performers don’t do as poorly because the high performers help them out. The network structures that tend to most facilitate the low-variance strategy are closed networks, as one can imagine. The classic examples of this are ethnic networks, where people – the more wealthy people help out the less fortunate ones. Network Analysis in R: Getting Started In some respects, the history of network analysis cannot be separated from the tools used to conduct network analysis. The importance of software to the enterprise of network analysis has been true since the very beginning of the field. Scholars have written and made available software programs to allow others to collect data and conduct analysis themselves. For instance, you can find some description of a software program called CONCOR in White et al. (1976) that finds roles in an informal social network. Other great technologies such as UCINET, KrackPlot and a host of other social network analysis software allowed network approaches to spread rapidly through the field. My hypothesis is that without these technologies and their ease of use (UCINet, I think was a game changer for the field), network analysis might still be in the backwaters. Today, there are lots of options for the researcher who wants to do network analysis. I myself use two primary tools that fit well into my workflow (e.g., I use an Apple Mac and I do a lot of non-network analysis as well). Those tools are: The R Statistical Programming Language + the SNA Package developed by Professor Carter Butts of UCI Irvine and STATA. While some of my posts (and the accompanying analysis) will use STATA, I will focus primarily on the use of R for network analysis. Getting started with R for Social Network Analysis Let us begin by downloading and installing the R programming language. Begin by navigating to the R-Project. I will do the walkthrough for the Mac version of R. After navigating there, click on the CRAN link under download. The closest server to me is probably at UC Berkeley, but pick which ever one is closest. Next, download and install the version of R for your operating system. I will click the Download R for (Mac) OS X, and then click on the most recent version (which, at the writing of this post is R-3.4.0. Download and install. I won’t walk you through this. Now that R is installed, lets open it up and get some basic network analysis going. Once the R console is open, click on File (in the top menu) and then click on New Document. This should open a blank script file. Type a comment (a line that begins with #). I’ve typed: # This file provides some simple code to get you started on your Network Analysis Journey Save the file (I’ve called it RSNApractice.R). Clicking on the file name will give you access to the complete file. Now that we have that sorted out, let us begin by installing some important packages. You can type this code directly into the console. install.packages(“data.table”) install.packages(“curl”) install.packages(“sna”) The data.table package allows us to import data from the web; the curl is a required package for data.table and sna. Once these packages are installed, lets get them loaded. library(data.table) library(curl) library(sna) Now that these are installed, let me tell you a little about the data that we are going to analyze. This data comes from professional services consulting firm on the east coast of the United States, collected some time in the early 2000s. There are 247 people at the firm and each of them responded to a network survey where they answered 6 questions. Here are the questions: #(Q0) “who do you know or know of at [the firm]”, #(Q1) “who you would approach for help or advice on work related issues”, #(Q2) “who might typically come to you for help or advice on work related issues”, #(Q3) who you go to “about more than just how to do your work well. For example, you may be interested in ‘how things work’ around here, or how to optimize your chances for a successful career here”, #(Q4) “who might typically come to you for help or advice along these [non-task related] dimensions” and finally #(Q5) “who you think of as friends here at [firm].” I’ve uploaded their responses to a dropbox folder in the form of matrices. The rows of the matrix indicate “senders” or “Ego” and the columns represent “receivers” or “Alters.” We can load the data using the following code: #Load the “Professionals” network data from Dropbox. #Convert the data.table objects into matrix format so they can be #analyzed using the sna package. q0 = as.matrix(q0) q1 = as.matrix(q1) q2 = as.matrix(q2) q3 = as.matrix(q3) q4 = as.matrix(q4) q5 = as.matrix(q5) # Create a vector of numbers from 1-247 and convert them to a string. # We will use these to rename our rows and columns. names = paste(seq(1:247)) # Rename all the rows rownames(q0) = names rownames(q1) = names rownames(q2) = names rownames(q3) = names rownames(q4) = names rownames(q5) = names # Rename all the columns colnames(q0) = names colnames(q1) = names colnames(q2) = names colnames(q3) = names colnames(q4) = names colnames(q5) = names This code should load all of the network data into the R console. Now, lets import some attributes. # Imports the attributes file and outcomes file, and converts it into a data frame. attr attr Now that these are all loaded, lets see how the data look. Type the following to look at the first ten rows and columns of q0. # Lets look at the first ten rows/columns of q0 q0[1:10,1:10] How do we interpret this? Person 1 doesn’t appear to know persons 2-10. However, person 2 says they know person 5, 7 and 10. Lets plot this as a graph. # Plot the first 10 people in the q0 matrix. gplot(q0[1:10,1:10]) Let us now plot the full q0 network. This is the “knowing” network of this firm of 247. # Plot the full “knowing” network gplot(q0) Quite dense. A lot of people know a lot of other people at the firm. Try to do this analysis for q1 to q5. What are the differences/similarities? Lets do some simple centrality calculations (more on Centrality in the Representing Networks post). # Calculate two simple centrality calculations on the q0 network. # Indegree is the number of people who say they know a focal person (in arrows on a node) # Outdegree is the number of people who a focal person says they know (out arrows from a node) q0.indegree = degree(q0, cmode =”indegree”) q0.outdegree = degree(q0, cmode =”outdegree”) The centrality measures are now saved in the objects q0.indegree and q0.outdegree. Lets plot histograms of these two measures. # Plot histograms of q0.indegree and q0.outdegree hist(q0.indegree) hist(q0.outdegree) These look very nicely distributed, almost poisson. Lets calculate some summary statistics on these measures. # Summary statistics on the indegree/outdegree measures summary(q0.indegree) summary(q0.outdegree) Now, lets do one final thing before we conclude this post (you can keep analyzing stuff, I will delve deeper into centrality measures and the like in a different post). I have also given you an outcomes file with three outcomes. Here are the outcome variables: relationships: whether the respondent feels their relationships at the firm are fulfilling success: whether the respondent feels that they have the knowledge to succeed at the firm appreciate: whether they feel appreciated Here is a description of the attribute variables: tenure: tenure at this firm title: whether the employee is an analyst, lateral hire, or partner location: what office they work in gender: male or female ethnicity: 91% are white age: age of employee elite: whether the employee graduated from an elite university feeder: whether the employee graduated from a “feeder” university work1-work24: types of work the employee does Lets conduct one final analysis. Lets see if there is a correlation between how many people an employee knows, and whether they feel like they have the knowledge to scuc # Examine if there is a correlation between how many people someone knows and whether they feel like they have the knowledge to succeed. m.0 summary(m.0) Looks like there is at least a bivariate correlation. Lets plot it. # Plot the regression and the data points. plot(q0.indegree,attr$success)
abline(m.0)

Now that you have most of the data, you can explore yourself. Here is the full code @ RSNApractice.R