Where do networks come from?

The key assumption underlying both the peer effects and structural approaches to network effects assume some degree of exogeneity in the existence and structure of network ties.

Exogeneity is both a theoretical claim as well as an empirical assumption. All reasonable theories are built on a set of axioms that assume some primitive or exogenous features of the world or of the target system which is being analyzed.  Many models in economics, for instance, assume that preferences are exogenous. From these preferences, we are then able to then derive things like behavior, choice, “roles” as well as the structure of social relationships.

Screen Shot 2017-05-10 at 10.31.25 AM.png

Similarly, some sociological and anthropological traditions start with axioms that assume that “roles” are exogenous. These roles—e.g., the position a individual occupies in a social structure—govern behavior, preferences, as well as social relationships.

Screen Shot 2017-05-10 at 10.31.32 AM

Much of the network analysis we’ve been conducting or discussing thus far also has an exogeneity assumption built in. The primitives are social relationships and their structure. All other things we observe such as behavior, preferences and roles emerge from the pattern of exogenous network ties. In the lectures on structural holes, status and peer effects, we argue that the pattern of social relationships cause in differences in behavior, preferences, as well as roles and not vice versa.

Screen Shot 2017-05-10 at 10.31.38 AM

The challenge of network formation

However, a challenge for the social relationships first perspective is that networks are unlikely to be fully “exogenous.” They form and evolve through certain processes that make some people more likely to connect to each other, and make some people less likely to do so.

Network scholars have spent considerable time on trying to understand how networks form and change. At a broad conceptual level, we can think about five factors that shape whether a tie between two individuals—e.g., ego and alter—forms.

Screen Shot 2017-05-10 at 11.08.33 AM.png

The logic behind most models of network formation is simple. At one end, there are “benefits” whether actual or perceived as well as pecuniary and non-pecuniary/psychic  for connecting with someone. At the other end, there are “costs” which make it either easier or harder to form a relationship with someone, either because searching for them, coordinating with them, or potentially dealing with them is more costly than with someone else. Relatedly, some individuals may have a lower cost of building a network than others and/or it may be lower cost (relative to benefit) to connect with someone.

Factor 1: Characteristics of Ego, the sender.

Characteristics encapsulated in “Factor 1” include a range of factors that make it easier for certain types of people (e.g., those who have a certain characteristics themselves) to connect with many others. This characteristic may include things that either make it easier for these people (relative to others) to make many connections or perhaps provide them greater benefit from doing so. Research in this stream has found a substantial range of characteristics that vary at the individual level, that also predict an increased or decreased propensity to have a certain type of network surrounding them. These things include:

  • Personality: Some work has found that differences in personality traits are correlated with network structure. For instance, individuals who have many ties are also likely to have Extroverted personalities. Relatedly, those who are high in “self monitoring” also have a greater likelihood of being “brokers” or occupying “structural holes” in a social network.
  • Other factors that may also be related to larger networks include:
    • Strategic intent
    • Intelligence
    • Physical characteristics (e.g., beauty or height)
    • Age
  • Some factors may be describe an individual at a certain point in time:
    • After the loss of a job
    • After being promoted to a new role
  • Other factors may be socially constructed, but describing the Ego in a given context:
    • Caste
    • Religion

One can reason about the various ways in which these characteristics of Ego either lower their costs of making ties or increase the benefit they get. Can you come up with other individual-level factors that might matter?

Factor 2: Characteristics of Alter, the receiver.

A related set of arguments can be made about the characteristics of an alter or alters. For instance, one could theorize about the following characteristics of alter(s) that may make them more likely to receive connections from others.

  • Personality
  • Intelligence
  • Skill
  • Wealth
  • Social standing
  • Formal role in the organization

Like the Ego-centric perspective, one could logically use a “cost” and “benefit” perspective for reasoning about why some Alter may have more advice seekers (e.g., they are smart) or more friends (e.g., they are helpful). In purely altercentric models, we ignore the characteristics of Ego.

Factor 3: The interaction of Ego/Alter characteristics (e.g., homophily)

The 3rd Factor is one related to the “Ego-Alter” interaction. In such models, there is something about the characteristics of Ego and Alter together that predict an increased or decreased propensity to have network ties. The most common theme in these models is homophily or the tendency for individuals who are similar to each other to have a higher propensity to connect. Research has found that individuals who are similar in the following characteristics are more likely to connect with each other, relative to the alternatives:

  • Race and ethnicity
  • Gender
  • Age
  • Formal organizational position
  • Occupation
  • Religion

There are many theories about why such a preference exists. On one hand, social contexts (e.g., communities, neighborhoods, etc.) are often organized by these characteristics. This makes it much easier to connect with people who are similar to you. There is also an element of choice. Individuals who are similar to you are likely have similar experiences, share similar values, and like and dislike similar things. As a consequence, the costs of interacting with similar people is likely to be less than interacting with people who are different.

However, the type of relation may matter here. In mating networks you are more likely to see heterophily than homophily. This might also be true of mentoring relationships, where individuals are more likely to be mentored by those of a different level of senority than them.

What other factors at this level might increase or decrease the cost of interaction or raise its benefits?

Factor 4: Social and Physical Context

The fourth factor can broadly be thought of as the social or physical context within which individuals are forming social networks. A simple example is office or neighborhood layout. A substantial amount of research has found that physical distance has a substantial effect on whether two individuals form ties. Scientists who are nearby, for instance, are more likely to collaborate and their research trajectories also become rather similar.

Research has found that there is a exponential relationship between physical distance and the propensity to connect. This effect is called propinquity. Individuals who are physically proximate are substantially more likely to interact, followed by steep declines in the rates of interaction as distance increases.

In addition to propinquity, other aspects of the social context are also likely to affect the extent of tie formation. These factors could be the reorganization of roles, task inter-dependencies, as well as cultural or organizational norms regarding competition or collaboration. Incentives are also important in determining what the shape of the network might be. The challenge with many of these effects are that they are often “absorbed” into the intercept of the model. That is, they are only able to be detected when looking across contexts, but not within context.

Factor 5: Endogenous Network Processes

 

Finally, the structure of one part of the network may affect the structure of another. Consider a simple example: Reciprocity. If I consider you a friend. There is a social-psychological as well as a sociological process that also increases the likelihood that I consider you a friend. This is akin to tit-for-tat. If you give me a gift, I will give you one in return. Networks exhibit this property with substantial regularity (but not always!). In this context, the emergence of a network tie, the reciprocal one, is endogenous to the network. That is, it emerges from within the network structure and not outside of it.

Similarly, there are other endogenous network processes that others have detected in networks. These include factors such as transitivity. For instance, a friend of a friend is often a friend. Heiderian balance theory, for example, argues that individuals desire balance in their relationships. The situation of being friend’s with your friend’s enemy is unsustainable according to balance theory (why?). Because it is, that structure will endogenously change into something else–either the enemies become friends or  the network splits.

Other forces include preferential attachment. New entrants into a network are proportionally more likely to connect to individuals based on the size of their degree centrality. This process gives some networks a power law distribution, rather than a binomial/normal distribution that would be expected if the network was formed through a purely random process.

 

Image result for power law distribution
Power law distribution

 

 

Image result for normal distribution
Normal Distribution

 

 

Empirical considerations

Though the theoretical ideas behind network formation are quite straightforward, disentangling the differential impact of these effects remains quite challenging. In a subsequent post, we will discuss the various approaches to estimating these models.

 

 

Peer effects, knowledge transfer and social influence

The structural approach to social networks is inherently beautiful as a representational approach. I am always in awe of the fact that we can learn so much about how human beings act or their outcomes based merely on the pattern of their social ties. The idea is both simple and profound.

The structural approach is built on assumptions regarding information transfer across a simpler unit of analysis: the dyad. In the world of dyads, new complications arise and different theories must be developed and tested.

Let us take the Professionals data we have been analyzing as an example. Here is the advice network among these professionals.

Screen Shot 2017-05-04 at 10.45.24 AM.png

In the prior analyses, we have focused on analyzing the structure of each node’s connections.  For example, each node has a specific number of incoming connections, its outdegree:

Screen Shot 2017-05-04 at 10.47.03 AM.png

The beauty of the structural approach to social networks is that we can learn a lot about the outcomes of individuals and organizations by merely looking at the pattern of their relationships. Recall our prior analysis. There is information in indegree. We were able to explain 6.5% of the variation in our measure of whether a person has the “knowledge to succeed” just by looking at the count of their incoming connections! While indegree may capture or reflect other processes and might not be causal, it is nevertheless information rich.

However, an Ego’s alters (e.g., the people that a focal node is connected to) are not all the same—as we sometimes implicitly assume in our models. As a note, I don’t believe that researchers actually believe that all the people we are connected to are the same. Indeed, betweenness, closeness, eigenvector centrality, all assume that not all connections are the same by their very construction. However, the heterogeneity in alter characteristics is implicit rather than explicit because we never specify in our theories or models, exactly how these individuals vary.

The peer effects framework on the other had often ignores variation in structure, but emphasizes variation in the characteristics of connections.

Below, I walk through some examples of this approach.

A simple model of peer effects

The “peer effects” framework is called as such because it is based on a line of research in the economics of education where scholars were attempting to understand the impact of classroom peers on academic outcomes. Hence, peer effects.

Let us start with a simple setup. Let us assume there are 100 students in a classroom. The teacher has decided that everyone in the class will have a study partner, so he asks each of the students to pair up into groups of two. There are now 50 pairs, each with two people. The teacher wonders, whether having a smart peer (i.e., alter) increases the performance of for a focal student (e.g. Ego). Visually, he is interested in understanding this influence process:

Screen Shot 2017-05-04 at 1.20.36 PM.png

At the end of the class, all of the students take a standardized exam. This exam is scored on a 100 point scale, and students can get anywhere from a score of 0 to 100. The teacher takes this score and runs the following regression with 100 observations, 1 for each student. She’s also good with standard errors, so she clusters standard errors at the level of the dyad:

score_{i} = \beta_{0} + \beta_{1} score_{j} + \epsilon 

After running the regression, she finds a large and statistically significant coefficient for \beta_{1}. How should she interpret it?

A naive causal interpretation is: for every unit increase in score_{j} there is a corresponding \beta_{1} increase in score_{i}. Or, by having a study partner with a certain score, there is a corresponding increase/decrease in the performance of the focal student. This interpretation is naive for a reason, because is probably (though not definitely) wrong.

But before we dive into why it is probably wrong, it is useful to reiterate that this “peer effects” representation is quite general. For example these outcomes might be determined in part by the influence of peers (however defined).

 

  • Finance: Putting money away into a retirement savings account, adopting a microfinance product, etc.
  • Health behaviors: Obesity, Happiness, use of HIV/AIDS test, etc.
  • Academic performance: Getting good grades, choosing a major.
  • Entrepreneurship: Becoming an entrepreneur; deciding against becoming an entrepreneur.
  • Careers: Quitting; moving to a new company.
  • Adoption of products: Prescribing a drug, buying a car.
  • Adoption of behaviors: Smoking, drinking, sexual events.
  • Adoption of ideas: Learning from patents.
  • Organizational behavior:  Adoption of corporate practices and policies.

The basic idea is simple: We observe some level or change in the behavior or characteristics of an alter (or alters) and we see whether these are correlated to the behaviors or outcomes of Ego.

 

This apparently simple process is much more nuanced and complicated than it appears. There are dozens of “mechanisms” that can lead to the correlation we might observe (or that the teacher observes. Here are some examples of a few reasons why we might observe a correlation, either positive or negative. Consider the case of product adoption.

 

 

  Name Definition
1 Direct transfer of specific information. Alter tells me about a product, but nothing more.
2 Persuasion Effects Alter tells me about the product, and forcefully persuades me to adopt it.
3 Direct transfer of general information. Alter tells me about a website that reviews products, and on this page a list is produced where the product that I adopt is listed first.
4 Role-modeling / Imitation I see Alter doing something, I copy it.
5 Install Base Effects  I see many Alters adopting a product (i.e. buying an iPad, I adopt the iPad)
6 Threshold Effects I only buy an iPad if at least 10 people I know own it, once the 10th person adopts, I decide to adopt.
7 Snob effects I see an Alter(s) doing something, I avoid doing it myself.
8 Simultaneous Alter helps me out and I help her out, and together we perform better than either one would alone, because we, by talking through a problem for example, figure it out together.
9 Reverse causality The Alter does not affect Ego; but rather the Ego affects the Alter.
10 Contextual Effects We are both in the same neighborhood, and because we get exposed to the same billboard, we see the same advertisement for a project, and thus we adopt it.
11 Induced Environmental Effects Having a high achieving peer results in a teacher who teaches at a higher level, thus the student learns more not because of greater transfer of information from her peer, but because teaching quality improves.
12 Selection bias I become friends with people who already own iPads. I become friends with people who like technology, and because they like technology, they also own iPads.
13 Homophily Effects I like iPads and because I do, I become friends with iPads.

Can you think of more mechanisms?

 

Which mechanism is actually at play in a specific context?

This question is a hard one. Because we have several potential mechanisms that we must work with, how do we rule out some of them? Some mechanisms are easier to rule out then others, but most are actually quite difficult to conclusively confirm or deny.

To deal with this issue (which is VERY common during the review process) I have come up with a two part classification. The first set of mechanisms are what I call “pseudo-mechanisms.” Pseudo-mechanisms are alternative explanations of the correlation that have nothing to do with social influence of the type we care about: influence flowing from the peer to the focal individual. Charles Manski, in a famous paper has defined these as the reflection problem and the selection problem. 

Reflection problem: The reflection problem asks you to imagine a mirror. You see two objections moving. And if it is unclear to you that you are looking at a mirror, then you can’t tell which one is the actual person who is moving and which one is the mirror image. More formally, imagine that we have two sets of variables, let us call them  x and y; let x be the measurement of the characteristics of individual ’s peers’ characteristics at time t and let y be the measurement of the focal individual ’s characteristics at time t. Now, because of the simultaneous measurement, we are unable to tell whether the change in x’s characteristics has caused a change in y’s characteristic, or vice versa. And this indeterminacy exists for each observation.

Furthermore, we are unable to tell whether each of these actors was exposed to some environmental shock (advertising, etc. at the same time, which make their adoption correlated). The only way that we can insure that the reflection problem is not an issue is by measuring the traits and characteristics of the xs prior to measuring those of y.

However, solving the doing this does not resolve the issue of causality. Thus, it is a necessary, but insufficient condition.

Another important, and much more difficult condition now has to be met in order for the effect to have the title “Causal.”  This is the selection problem. The set of conditions that solves the selection problem are twofold:

  1. Either you know all the reasons why two people were paired together (i.e. why person y is friends with, shares a room with, enters the college as, with x).
  2. OR the two individuals are randomly assigned, and thus breaking the correlation between the characteristics of x and y.

Assume for a moment that we have ruled out reflection and selection effects by (1) using a lagged measure of peer consumption or action, and (2) the ego and alter are randomly paired, we have only ruled out a handful of possible “mechanisms” producing the peer effects. We can rule out the “pseudo-mechanisms” #8 – #13 (except for #11), but that leaves us with 8 possible mechanisms.

Imagine a doctor telling you that “Yes, we’ve ruled out the fact that you are faking your symptoms, but there are 8 or more possible viruses that could be causing your infection!”

So, we need to now try and distinguish between these.

This is hard, even harder than resolving the reflection and selection problems.  The reflection and selection problems are interesting in that they are hard problems to solve, but we know how to solve them. Not to make too many medical analogies, but this like separating conjoined twins. Hard, but someone can do it and has done it.

So how do we distinguish between different mechanisms, say #1 – #7?

This will depend a lot on context, and a lot on the data that you have available.

Let us examine a very simple situation where we have two students. Let us call the first student “Ego” and let us call the second student “Alter.” Assume for a moment that we have completely alleviated the problems of reflection and selection.

 

Screen Shot 2017-05-04 at 2.31.58 PM.png

Let us say that really there are two contender mechanisms.  (This is probably not true; but, for a moment assume that it is true.)

Mechanism 1: A student learns general study habits from his/her peer (alter) and this why his performance increases.

Mechanism 2: A student interacts a lot with his/her peer (alter) and they study together, and the peer helps the student learn the material.

How would we go about designing a test that would distinguish between these two mechanisms?

  1. For instance, if what the student is getting from her peer is increased motivation, that should have a positive effect on various subjects.
  2. On the other hand, if the student is learning something rather specific (like how to do an integral), then the effects should be subject specific.

Assume you do this test, and you find out that there are effects across subjects, what can you say about the mechanisms? Can you say anything?

How to conduct the estimation in R

Standard peer effects estimations are quite straightforward. This is especially true when you have randomization in the pairing of focal individuals to peers and longitudinal data so you can lag the characteristics of the peer.

score_{i,t+1} = \beta_{0} + \beta_{1} score_{j,t} + \epsilon 

Here is a synthetic peer effects dataset in which 2000 individuals have been randomly paired: peer_effects.csv.

Let us examine the extent to which there are peer effects.

The model we want to estimate is:

postself_{i,t+1} = \beta_{0} + \beta_{1} prepeer{j,t} + \epsilon 

Estimating this equation in R with this data results in:

Screen Shot 2017-05-04 at 3.28.39 PM.png

If the randomization is proper, this coefficient should be stable if we control for the focal individuals own pretreatment score.

Screen Shot 2017-05-04 at 3.30.22 PM.png

Another worry we have is whether this effect of the peer (captured by the pre-treatment characteristics) is homogeneous or heterogeneous. That is, does it depend on the characteristics of the focal individual or does it apply to everyone? To test this, we include a main effect of the characteristics of the focal individual (self_char) and an interaction term (pre_peer * self_char).

Screen Shot 2017-05-04 at 3.33.01 PM.png

Here, we see that the peer effects depends on the characteristic of the focal individual. If the focal individual has this characteristic (e.g., willingness to listen), the peer effect is larger.

This is only a simple demonstration of the complexity of peer effects, there are likely to be many interactional factors that turn peer effects “on” or “off” or modulate them in some important way. One could imagine the following contingencies, where peer effects depend on characteristics of:

  • the focal individual
  • the environment
  • the alter/peer
  • personalities of both

 

Entrepreneurial networks

Who is this? Keep this face in mind, at least for a bit.

James Dewey Watson

 

 

Leading in the whitespace

A major breakthrough in our understanding of the social nature of competition came through a series of papers and then a foundational book by Professor Ronald Burt of the University of Chicago, “Structural Holes: The Social Structure of Competition.” While others had made similar arguments before (see Bavelas 1948, and for a fantastic review see Centrality in Social Networks: Conceptual Clarification by Linton Freeman) Burt grounded this idea in theory and provided a very clear framework for other scholars to rethink competition and strategy through this structural lens.

His, very powerful, argument to us was to think about “structural holes” as “opportunities.

That is, bridges across this holes in social structure are sources of value for everyone involved—the person who bridges, as well as those being bridged.

The research that followed resulted in a paradigmatic shift in our understanding of how competition within organizations and in markets functions. The early work made a clean and forceful point: the causal agent is not the “strength or weakness” of a tie, but the fact that bridges create value. Focus on the bridge.

This structural argument was supported by two mechanisms of action. These can be described as the control and information benefits of structural holes.  Consider the three archetypical networks depicted below (I’ve adapted this representation from Krackhardt 1999).

Picture1.png

On the left, the focal individual “YOU” is in a structure with very few structural holes. That is, all of his connections are connected to each other. On the far right, is the high structural holes condition. In this case, not of the focal individual’s connections are connected to each other. The intermediate network, which we will discuss later, is theorized to have its own special properties.

The Control Benefits of Structural Holes

Let us examine the control benefits first. In the first representation, who has control?

Consider the situation in the figure on the left. What happens if you cheat one person in the network? They talk to each other. Your reputation suffers. You lose some of your control. So, who is in control? Not you, but the group. The role that closed networks play in creating trust through control is not uncommon. For instance, small businessmen/women in America and other countries often tend to do business with their co-ethnics.

While preventing cheating is a good thing, a closed structure could also be highly constraining. Small and closed-knit groups have strong group norms that can force members to conform in unproductive or harmful ways. Innovation, for example, often requires people to take risks—both social and economic—and closed groups might stymie such risk taking.

At the other end of the spectrum, the focal person’s connections are not connected to each other. This lack of connection implies that they cannot communicate, and as a result, information or gossip cannot travel between these disconnected parties as quickly. The focal individual in this case has more control, because they have the freedom to act without others coordinating against them.

If you are in the third structure, there are two specific control benefits that you have:

  • The first strategy to exploit your control benefits here is one where you are the broker who can leverage your position to play-off two individuals (perhaps buyers or even sellers) who want the same thing from you.  For instance, you can in subtle ways, make them either lower their demands or increase their willingness to pay.
  • The second strategy based on control is to be a broker between two people (companies) who have conflicting demands. The broker, in order to get one person change their demands, can leverage the demands of the other. Furthermore, since these two parties do not interact with each other — the broker has the ability (because of this increased control) to shape the information that one party gets about the other. 

These are obviously dangerous strategies – and ones that require a significant amount of finesse and skill.

The Information Benefits of Structural Holes

All is not lost if you can’t pull off the control strategy. Spanning structural holes also provides information benefits. The literature broadly posits three types of information benefits:

  • Access benefits: Access benefits consist of two components. First, because the broker spans structural holes, she connects two groups that do not have a high degree of overlap in their knowledge. Thus, the broker has access to information that is not accessible to those in the separate and spanned social groups.  Second, since you are getting more diverse information because you have diverse connections — when you receive valuable information you know who can use it.
  • Timing benefits: Information can be transmitted over multiple channels. Consider job postings. Before a job is posted in an official manner, people in the department where the job will be know about it. Talking to someone in that department will give you knowledge about the job before everyone else. This subtle difference in timing can mean the difference between getting and not getting a job. Because the broker gets information through informal channels, she often has access to information before others.  Timing matters in many contexts, including venture deals, hiring, knowing a house is on the market, etc.
  • Referrals: Trust matters. Period. People avoid hiring people, buying products, or investing in companies that they have limited information about. Those who span structural holes have contacts in different social worlds with their different opportunities. Contacts with people in these social circles can refer you to their own network, thereby increasing your trustworthiness.   

The Structural Holes in DNA

Ok, now that we have the theory down. I want to share an example from real life that exemplifies the beauty of the theory of structural holes.

This is James Watson, one of the co-discoverers of the structure of DNA. This discovery is described by many as one of the most (if not the most) important single scientific discoveries of the 20th century. In his gripping account of this discover, The Double Helix he recounts how he and Francis Crick discovered the structure of DNA.

James Dewey Watson

Here are some quotes about the quest for the structure of DNA from the Nobel Prize website:

In the late 1940’s, the members of the scientific community were aware that DNA was most likely the molecule of life, even though many were skeptical since it was so “simple.”

…Nobody had the slightest idea of what the molecule might look like.

In order to solve the elusive structure of DNA, a couple of distinct pieces of information needed to be put together…

As in the solving of other complex problems, the work of many people was needed to establish the full picture.

Picture1.png

Francis Crick, a brilliant scientist was already at Cambridge before James Watson had arrived, Watson describes Crick:

“Before my arrival in Cambridge, Francis only occasionally thought about deoxyribonucleic acid (DNA) and its role in heredity.  This was not because he thought it uninteresting. Quite the contrary.

Francis, nonetheless, was not then prepared to jump into the DNA world…[S]uch a decision would create an awkward personal situation.  At this time molecular on DNA in England was, for all practical purposes, the personal property of Maurice Wilkins, a bachelor who worked in London at Kings College…It would have looked very bad if Francis had jumped in on a problem that Maurice had worked over for several years. The matter was even worse because the two, almost equal in age, knew each other and, before Francis remarried, had frequently met for lunch of dinner to talk about science.

The combination of England’s coziness – all the important people, if not related by marriage, seemed to know one another – plus the English sense of fair play would not allow Francis to move in on Maurice’s problem.”

Watson, on the other hand was an outsider. He describes a few episodes that were critical to his discovery of DNA.

Screen Shot 2017-05-02 at 11.09.10 AM.png

Break #1:

At a conference in the spring of 1951 in Naples, Watson heard Maurice Wilkins’ talk on the molecular structure of DNA.

“I proceeded to forget Maurice, but not his DNA photograph.”

Break #2:

A manuscript on DNA (as a triple helix) had been written, a copy of which would soon be sent to Peter Pauling, the son of Linus Pauling, Nobel Prize Winner, and a scientist who was working on the structure of DNA himself.

Break #3:

Knowledge about Chargaff’s rules through is doctoral training in Indiana.

Watson had unique access, through his network, to the photos produced by Rosalind Franklin in the Wilkin’s Lab, the unpublished manuscript prepared by Linus Pauling, and exposure to Erwin Chargaff’s rules about the ratio of bases in DNA.  Because of his position, he was able to put these pieces together faster than anyone else.

All three processes helped Watson:

  • Access to novel information.
  • Timing, getting access to information before it was published.
  • Referrals, through his famous and Nobel prize winning advisor, he was able to hop from one great lab in Europe to an other, and get access to conferences that he would not be able to attend otherwise.

Luck? No. Social Networks.

Growing your network strategically

Structural holes theory also implies a series of tradeoffs between the size of one’s network and the benefits that the network produces. A large network is not necessarily a good thing. This is because maintaining a network connection implies some cost and results in some benefit.

  • Decreasing returns to network size:  If we measure benefits in units of novel information, one could imagine that adding a new tie might entail some cost (time, resources, emotional energy, etc.) but subsequently not result in access to much more new information-e.g., you hear about the same job opportunities from the new connection that you heard about from your existing friend or acquaintance.) So at least in terms of information, there is a decreasing return to the network size: you pay the additional cost of the new connection, but it is providing less information per unit cost than a prior connection.
  • Constant returns to network size: A more palatable case is constant returns. Here doubling your network size, doubles the amount of information you have access to. Every new network connection provides information in proportion to what the prior network connections provided.
  • Increasing returns to network size: The most ideal situation is one where doubling the size of your network more than doubles the information you get. Is this even possible, since adding a new network connection that provides more information than before might also be substantially more costly?

In any case, you clearly want to be at a point before your costs of maintaining a network significantly outweigh any benefits that you get.

Structural holes theory provides some useful guidance on not going too far down the route of decreasing returns to size. A good heuristic for understanding this tradeoff is a calculation developed by Professor Ronald Burt called efficiency. Efficiency can be calculated in the following way:

Efficiency = Effective Size / Actual Size

Expanding this function out, we can define:

Actual size = The number of connections that you have.

Effective size = Actual Size – Sum of percent of overlapping ties for each of your connections.

Bandwidth and Diversity

The model above has been tremendously useful and very predictive. In recent years, some scholars have also highlighted another interesting tradeoff between stronger non-bridging ties and weaker bridging ties: the bandwidth/diversity tradeoff.

On one hand, greater bandwidth ties result in greater greater informational volume. On the other hand, weaker bridging ties result in greater variance in information.

Recent work suggests this relationship depends fundamentally on the nature of the environment in which people are building their social networks. There are two factors that can reduce the value of bridging ties and privilege high-bandwidth ties:

  1. If the network has a homogenous set of knowledge – where most people talk about the same things. Then having more high-bandwidth ties may be more important.
  2. If the “refresh rate” – is high – where people’s contacts and interactions churn very fast, or where the environment turbulent and the information is extremely complex — meaning that an idea contains multiple topics or subjects — then high bandwidth ties are better at sustaining the high variance information you need.

However, what studies have found is that “strong” bridging ties that have both bandwidth and diversity are the best — but they are indeed rarer rare.

 Extending the Core Insights from Structural Hole Theory

As one can imagine, structural holes theory was extremely powerful and scholars have been working to extend and refine the predictions of the theory further to account for structures that don’t neatly fit into the standard dichotomy or have dynamic elements.

Consider dynamics: Given how difficult it is to maintain bridging positions, it is likely that bridges are fragile. Research suggests that bridging ties followed what is called a kinked decay function. Initially bridges have a low likelihood of breaking, followed shortly by a sharp rise in decay, if the bridge survives this spike in decay rates, it is likely to persist for a long time.

Two processes often lead to decay:

  • Disintermediation: Disconnected parties learn to exchange on their own.
  • Competition from rival brokers: Rivals enter the fray and by offering either greater benefits or lower cost, whittle away at the original bridge’s benefits from occupying the hole. Indeed, the hole no longer exists.

Why bridges decay:

  • -Low performance / High performers have lower rates of decay for bridges
  • If other relations are decaying, bridges are also likely to decay
  • Experience bridging improves the chances that new bridges survive
  • “Hole decay” may be limited when:
    • Deep barriers limit interaction across the hole.
    • The benefits to the bridged parties is high enough and switching costs are high.
    • The bridged individuals don’t question the role of the broker, or it is not salient to them.

Beyond Information and Control

There are also cases where brokering is disadvantageous. The underlying mechanism leading to the disadvantages of brokering have to do with identity and expectations.

  •  In addition to information, networks also convey expectations about who one is (identity) and how one should behave (expectations). Many of us have been caught between two groups that expect different things from us.  This happens at work, at home, and even in our social and personal lives with friends. The more disconnected are connections are, the more likely it is that they have different expectations about how we should behave. Podolny and Baron (1997) show that when a person is a broker in a network that conveys “identity” they are less likely to benefit from their brokerage position than when the network primarily provides “information.”
  • Similarly, Krackhardt in his Simmelian tie theory makes a related argument that brokering between two strongly connected groups creates pressure to conform to different norms which can create internal role conflict, stress, and thus reduce performance.

Outcomes as Mean versus Variance

The theories that we have focused on thus far attempt to predict mean or expected outcomes. That is, what is the average difference in wages/promotion rates/bonuses/ideas for those with or without structural holes. The graph below shows that there is a mean shift. The blue distribution (e.g., structural holes condition) has a higher mean outcome.

Picture1

However, this analysis can be pushed further by asking: is there a shift in the variance of potential outcomes. Does a specific structure reduce or increase the possible variation in outcomes. Note that the blue distribution below, is “tighter” than the black distribution. The black distribution has a greater likely hood of worse, but also better outcomes than the first.

Which would you prefer below?

Picture1

James Lincoln of UC Berkeley did pioneering studies on business networks in Japan and found that companies that were members of the Keiretsu, while having lower means in terms of outcomes, also had lower variation and as a consequence were less likely to both do extremely poorly but also less likely to do extremely well.

With respect to brokerage, we can also think about floors and ceilings. Networks that are high in closure reduce variation in performance, both high and low.

The high performance is minimized because of the subsidizing of the lower performers by the high performers, and the low performers don’t do as poorly because the high performers help them out.

The network structures that tend to most facilitate the low-variance strategy are closed networks, as one can imagine.

The classic examples of this are ethnic networks, where people – the more wealthy people help out the less fortunate ones. 

Network Positions and Advantage: Status

 

One of the most important things we do on a day-to-day basis is make predictions about the value of individuals or companies, or really, any entity.  Making such predictions is challenging because we have limited information about the qualities of the entity we are attempting to make predictions about. For instance:

  •  A hiring manager at a firm is trying to make a prediction about whether a certain applicant will be a high performer.
  • A PhD admissions committee makes predictions about whether an applicant to their program will turn into a star researcher.
  • A venture capitalist makes predictions about whether a startup or founding team will create a breakthrough product that will become a billion dollar company.
  • A search engine is making a prediction about whether a certain webpage contains useful information for its users.
  • A consumer makes predictions about the quality of a product before he/she buys it.

Predictions of this type are commonplace and often rather difficult to make. This difficulty exists for two reasons. First, only a limited set of characteristics are observable to the decision maker, whereas much else is unobservable. A hiring manger, for instance, may observe a resume and a list of references. Based on this resume and reference list, she attempts to make an inference about many things: how hard working the applicant is, their base of knowledge, their ability to get along with other members of her team, and so on. Thus, the hiring manager attempts to use “observables” to infer something about the unobservables.

The goal therefore is to map observables (the things that you can easily measure and observe about someone or some organization) to unobservables. What are some examples of unobservables and/or things that are difficult to observe:

  • Creativity
  • Whether a person you hire will “fit” with an organization’s culture
  • Whether a company you invest in will turn a profit
  • Trustworthiness

The inability to effectively communicate information about these hard to quantify traits from one person to another becomes a problem for both the evaluator and in many cases for the person being evaluated, particularly if they are high quality, but others can’t tell this is the case.

That is, how does one separate the signal from the noise?

One solution proposed to this problem is signaling theory. People send signals and these signals contain information that allow “buyers” to ascertain whether the seller (a job market candidate) is of high quality or not. But anyone can send signals, and sometimes the signals are noisy or uninformative. If the signals are no good, then they don’t solve the asymmetric information problem.

Michael Spence argued that some signals are harder to acquire than others, and this difficulty in acquiring the signal is related to some dimension of underlying quality.

For instance, a hiring manager might be looking to hire someone with great machine learning talent. Anyone can put “machine learning” on his/her resume, so merely doing so isn’t likely to be a very good signal of having that skill. However, it is probably easier to win a Kaggle competition if you have good machine learning skills than if you do not. As a result, those with more machine learning skill are more likely to be represented among Kaggle winners than those without that skill. Thus, winning in Kaggle is likely to be a decent signal of ML skill. Further, since winning in Kaggle is easily observable, it is perhaps a decent signal for what we care about.

 

 

Can you think of other signals that contain a lot of information and are difficult to fake?

Joel Podolny in a series of articles proposed that social relations also help signal quality. This is a profound idea, and I will walk through it further. But let us fast forward to another application of Eigenvector centrality: the original Google PageRank algorithm.

For example, social cues such as endorsements, recommendations, funding decisions or hiring decisions,  convey/signal information.

Screen Shot 2017-05-02 at 1.58.08 PM.png

Consider James and Betty. Both have two connections of their own. And both of their connections think highly of them and recommend them. In an abstract sense, Betty and James are rated by their raters-e.g., their two connections. But a new problem arises: who has more reliable raters?

This is what we can consider the “rating the raters” problem. While in the first degree out (the direct connections of these two individuals) they are indistinguishable, there is substantial variation in their second and third degree ties. Although James and Better have similarly sized networks, Betty’s network connections have far more connections of their own.

Screen Shot 2017-05-02 at 2.06.58 PM.png

While it is relatively easy to figure out the difference between the size of Betty and James’ second degree network, the problem gets more complicated the further we move out. Real networks don’t usually have connections out to the 2nd or 3rd degree, but to 4th, 5th, 6th, etc. The second problem is that real networks aren’t usually trees. Networks loop back on themselves over and over again which make the “rating the rater” problem hard.​ So we cannot just re-weight the rating by the ratings received by the rater.

There is concept, called Eigenvector centrality, that does exactly what we thought was hard: it rates the raters, the rater’s raters, the rater’s rater’s raters, and so on.​ This measure gives us a nice summary statistic telling us how much “status” a node in the network has. ​Hard to fake because you can perhaps fake your own network ties, but not the ties of your connections’ connections. The nodes below, for instance, are resized by eigenvector centrality.

Screen Shot 2017-05-02 at 2.12.38 PM.png

The problem of determining the “value” or credibility of an object based on its connections and its connections’ connections is a general one.​  Google’s original algorithm, PageRank, is sociometric status. ​ The basic intuition of PageRank was if a site gets a lot of incoming links, and the sites linking to the original site also do, and so on. Then there must be some value to it.​ The insight arises by viewing the Web as a network, and using its structure to determine whether a page is useful or not.​
Screen Shot 2017-05-02 at 2.13.41 PM.png

Ego and Altercentric Perspectives

Now that we have the basic concept of sociometric status down. The “big idea” in sociology came from Joel Podolny. He suggested that we had focused primarily on seeing networks as “pipes” through which information, resources, support, and other “stuff” flows. However, networks are also useful for individuals in resolving problems of uncertainty because certain types of network structures also signal trust, reputation, and identity — network structures are prisms that reveal information as well.

The extent to which networks operate as pipes or as prisms depends on the level of uncertainty faced by market participants. He developed a highly useful framework for thinking about characterizing what structure may matter when. There are two types of uncertainty, Egocentric and Altercentric.

 

Picture1.png

Fig. 1.—Illustrative markets arrayed by altercentric and egocentric uncertainty

Egocentric  uncertainty

A market or market segment can rate highly on one type of uncertainty without rating highly on the other.

 

Consider the four markets represented in the figure above. From Podolny (2001):

Vaccines: Beginning with the market for a particular vaccine, such as polio or smallpox, in the upper left-hand quadrant. The most salient source of uncertainty in this market is that which underlies the development of the vaccine. Once the vaccine is developed and is given regulatory approval, there is little uncertainty on the part of consumers as to whether they will benefit from the innovation. Accordingly, a market  for a vaccine is a market that rates high on egocentric uncertainty, but low on altercentric uncertainty.

 

Roofers: Alternatively, consider the market in the lower right-hand corner, a regional market for roofers. “Roofing technology” is relatively well understood, and while roofers may face some uncertainty as to who needs a roof in any particular year, they can be confident that every homeowner will need repair work or a replacement every 20 years or so. By sending out fliers or advertising in the yellow pages, they can be assured of reaching a constituency with a demand for their service. However, because an individual consumer only infrequently enters the market, the consumer is generally unaware of quality-based distinctions among roofers. The consumer may be able to alleviate some of this uncertainty through consultation with others who have recently had roof repairs; however, the need for such consultation is an illustration of the basic point. Only through such search and consultation can the consumer’s relatively high level of uncertainty be reduced. Accordingly, this is a market that is comparatively low in terms of egocentric uncertainty, but relatively high in terms of altercentric uncertainty.

 

What are some other examples of markets that are low on one type of uncertainty and high on another? What about markets that are high on both?

 

 

How does one deal with altercentric uncertainty?

Let us loop back to our earlier discussion of sociometric status. Why is sociometric status a useful signal to help resolve altercentric uncertainty?

  • Sociometric Status: A position in a social network – defined by the ties that you have to others – where you receive deference from others who are themselves highly respected or deferred to.

 

 

When does Status goes awry?

However, there are many instances where status does not serve as a perfect signal of quality – and this can lead to mis-perceptions of status and thus misperceptions of quality.   When status is a perfect signal of quality it is said that there is tight coupling between status and quality. However, as a I mentioned, this is often not the case.

 

Matthew Effect / Self-fulfilling prophecy:  The classic example of this is the phenomenon of the 41st chair.  This is the example of the “French Academy” where there are only 40 chairs, and there perhaps no substantive difference between #40 and #41 – but the 40th person becomes a holder of a chair, and the 41st person does not.  This results in the 40th person get more rewards, recognition, etc. Which in turn allows them to do better work – because they now have significantly more resources than people who do not. In sociological parlance, the phenomenon of the 41st chair is called  “Decoupling.” Here, the linear relationship between quality and status – the 40th person gains far more status than the 41st—breaks down.

 

Buy low, sell high: This decoupling is an arbitrage situation for managers – because most people use status signals that are imperfect. There are two possible strategies to exploit this gap:

  1. Figure out a more readily observable representation of social signals that maps onto to quality more tightly and sell that information.
  2. Figure out a way to measure sociometric status in a situation where it is not currently used. Then use this as a better way of valuation.

Beyond the basics

The study of sociometric (and other Status) is an extremely rich area of research in organizational sociology and economic sociology. I have merely scratched the surface of this topic.

Some excellent articles and reviews in this stream include:

Stuart, Toby E., Ha Hoang, and Ralph C. Hybels. “Interorganizational endorsements and the performance of entrepreneurial ventures.” Administrative science quarterly 44.2 (1999): 315-349.

Sauder, Michael, Freda Lynn, and Joel M. Podolny. “Status: Insights from organizational sociology.” Annual Review of Sociology 38 (2012): 267-283.

Lynn, Freda B., Joel M. Podolny, and Lin Tao. “A Sociological (De) Construction of the Relationship between Status and Quality.” American Journal of Sociology 115.3 (2009): 755-804.

Chen, Ya-Ru, et al. “Introduction to the special issue: Bringing status to the table—attaining, maintaining, and experiencing status in organizations and markets.” (2012): 299-307.

Phillips, Damon J., and Ezra W. Zuckerman. “Middle-Status Conformity: Theoretical Restatement and Empirical Demonstration in Two Markets.” American Journal of Sociology 107.2 (2001): 379-429.

Network Positions and Advantage: Structural Holes

Who is this? Keep this face in mind, at least for a bit.

James Dewey Watson

In the prior lecture we discussed the simple micro-macro-micro process described in Granovetter (1973), the “Strength of Weak Ties.” Recall what we discussed: The forbidden triad is forbidden because in equilibrium it is generally unstable, because it is unbalanced.

Picture1.png

The unstable structure of the forbidden triad is particularly unstable for strong ties in which strength increases as some function of.

  • The amount of time that two people spend together
  • The emotional intensity of the interaction
  • The intimacy between the two parties (i.e., mutual confiding)
  • The reciprocal services which the two parties engage in.

The way to sustain the “bridge structure” implied by the forbidden triad is to weaken one of these conditions.  The weak tie that is a result, can allow for the persistence of “bridges” or “brokerage” across distinct and differentiated strong tie clusters across groups that divide the social world.
Picture1

One key assumption that we make is that there is different information that is being discussed across these different groups. For instance, these different groups could be scientific research communities, regional economic clusters, different departments in the same business school, and so on. We start with the assumption that people in these different groups are doing different things, they may have different cultures, and are members of different disciplines. Information within a cluster–e.g., information that person 1 and 2 who are in Group A possess–is much likely to be redundant than information across clusters. Consequently, information in group A and group B is said to be non-redundant. That is, a person from group A, by talking to someone in group B is more likely to learn something new than if she talked to someone else from group A.

The “big idea” from the Strength of Weak Ties hypothesis is that there are “holes” in the social structure and that weak ties are the conduits that can transmit information across these holes. Thus, more weak ties mean that people have access more and newer information.

Picture1

The Holes in Social Structure

The crystal clear mechanisms implied by the weak tie hypothesis can be credited to the imagination of the author for seeing something that others missed. Yet, the empirical facts of the original paper were consistent with this hypothesis, but the measurement did not capture spanning the holes in the structure per se. The theoretical argument was that weak ties because of why they exist should correspond to this structural configuration.

Another major breakthrough came through a series of papers and then a foundational book by Professor Ronald Burt of the University of Chicago, “Structural Holes: The Social Structure of Competition.” While others had made similar arguments before (see Bavelas 1948, and for a fantastic review see Centrality in Social Networks: Conceptual Clarification by Linton Freeman) Burt grounded this idea in theory and provided a very clear framework for other scholars to rethink competition and strategy through this structural lens.

His, very powerful, argument to us was to think about “structural holes” as “opportunities.

That is, bridges across this holes in social structure are sources of value for everyone involved—the person who bridges, as well as those being bridged.

The research that followed resulted in a paradigmatic shift in our understanding of how competition within organizations and in markets functions. The early work made a clean and forceful point: the causal agent is not the “strength or weakness” of a tie, but the fact that bridges create value. Focus on the bridge.

This structural argument was supported by two mechanisms of action. These can be described as the control and information benefits of structural holes.  Consider the three archetypical networks depicted below (I’ve adapted this representation from Krackhardt 1999).

Picture1.png

On the left, the focal individual “YOU” is in a structure with very few structural holes. That is, all of his connections are connected to each other. On the far right, is the high structural holes condition. In this case, not of the focal individual’s connections are connected to each other. The intermediate network, which we will discuss later, is theorized to have its own special properties.

The Control Benefits of Structural Holes

Let us examine the control benefits first. In the first representation, who has control?

Consider the situation in the figure on the left. What happens if you cheat one person in the network? They talk to each other. Your reputation suffers. You lose some of your control. So, who is in control? Not you, but the group. The role that closed networks play in creating trust through control is not uncommon. For instance, small businessmen/women in America and other countries often tend to do business with their co-ethnics.

While preventing cheating is a good thing, a closed structure could also be highly constraining. Small and closed-knit groups have strong group norms that can force members to conform in unproductive or harmful ways. Innovation, for example, often requires people to take risks—both social and economic—and closed groups might stymie such risk taking.

At the other end of the spectrum, the focal person’s connections are not connected to each other. This lack of connection implies that they cannot communicate, and as a result, information or gossip cannot travel between these disconnected parties as quickly. The focal individual in this case has more control, because they have the freedom to act without others coordinating against them.

If you are in the third structure, there are two specific control benefits that you have:

  • The first strategy to exploit your control benefits here is one where you are the broker who can leverage your position to play-off two individuals (perhaps buyers or even sellers) who want the same thing from you.  For instance, you can in subtle ways, make them either lower their demands or increase their willingness to pay.
  • The second strategy based on control is to be a broker between two people (companies) who have conflicting demands. The broker, in order to get one person change their demands, can leverage the demands of the other. Furthermore, since these two parties do not interact with each other — the broker has the ability (because of this increased control) to shape the information that one party gets about the other. 

These are obviously dangerous strategies – and ones that require a significant amount of finesse and skill.

The Information Benefits of Structural Holes

All is not lost if you can’t pull off the control strategy. Spanning structural holes also provides information benefits. The literature broadly posits three types of information benefits:

  • Access benefits: Access benefits consist of two components. First, because the broker spans structural holes, she connects two groups that do not have a high degree of overlap in their knowledge. Thus, the broker has access to information that is not accessible to those in the separate and spanned social groups.  Second, since you are getting more diverse information because you have diverse connections — when you receive valuable information you know who can use it.
  • Timing benefits: Information can be transmitted over multiple channels. Consider job postings. Before a job is posted in an official manner, people in the department where the job will be know about it. Talking to someone in that department will give you knowledge about the job before everyone else. This subtle difference in timing can mean the difference between getting and not getting a job. Because the broker gets information through informal channels, she often has access to information before others.  Timing matters in many contexts, including venture deals, hiring, knowing a house is on the market, etc.
  • Referrals: Trust matters. Period. People avoid hiring people, buying products, or investing in companies that they have limited information about. Those who span structural holes have contacts in different social worlds with their different opportunities. Contacts with people in these social circles can refer you to their own network, thereby increasing your trustworthiness.   

The Structural Holes in DNA

Ok, now that we have the theory down. I want to share an example from real life that exemplifies the beauty of the theory of structural holes.

This is James Watson, one of the co-discoverers of the structure of DNA. This discovery is described by many as one of the most (if not the most) important single scientific discoveries of the 20th century. In his gripping account of this discover, The Double Helix he recounts how he and Francis Crick discovered the structure of DNA.

James Dewey Watson
17th October 1962: American biochemist Dr. James Dewey Watson seated in his lab at Harvard University, Massachusetts. He shared the 1962 Nobel Prize in medicine for the discovery of the molecular structure of DNA. (Photo by Hulton Archive/Getty Images)

Here are some quotes about the quest for the structure of DNA from the Nobel Prize website:

In the late 1940’s, the members of the scientific community were aware that DNA was most likely the molecule of life, even though many were skeptical since it was so “simple.”

…Nobody had the slightest idea of what the molecule might look like.

In order to solve the elusive structure of DNA, a couple of distinct pieces of information needed to be put together…

As in the solving of other complex problems, the work of many people was needed to establish the full picture.

Picture1.png

Francis Crick, a brilliant scientist was already at Cambridge before James Watson had arrived, Watson describes Crick:

“Before my arrival in Cambridge, Francis only occasionally thought about deoxyribonucleic acid (DNA) and its role in heredity.  This was not because he thought it uninteresting. Quite the contrary.

Francis, nonetheless, was not then prepared to jump into the DNA world…[S]uch a decision would create an awkward personal situation.  At this time molecular on DNA in England was, for all practical purposes, the personal property of Maurice Wilkins, a bachelor who worked in London at Kings College…It would have looked very bad if Francis had jumped in on a problem that Maurice had worked over for several years. The matter was even worse because the two, almost equal in age, knew each other and, before Francis remarried, had frequently met for lunch of dinner to talk about science.

The combination of England’s coziness – all the important people, if not related by marriage, seemed to know one another – plus the English sense of fair play would not allow Francis to move in on Maurice’s problem.”

Watson, on the other hand was an outsider. He describes a few episodes that were critical to his discovery of DNA.

Screen Shot 2017-05-02 at 11.09.10 AM.png

Break #1:

At a conference in the spring of 1951 in Naples, Watson heard Maurice Wilkins’ talk on the molecular structure of DNA.

“I proceeded to forget Maurice, but not his DNA photograph.”

Break #2:

A manuscript on DNA (as a triple helix) had been written, a copy of which would soon be sent to Peter Pauling, the son of Linus Pauling, Nobel Prize Winner, and a scientist who was working on the structure of DNA himself.

Break #3:

Knowledge about Chargaff’s rules through is doctoral training in Indiana.

Watson had unique access, through his network, to the photos produced by Rosalind Franklin in the Wilkin’s Lab, the unpublished manuscript prepared by Linus Pauling, and exposure to Erwin Chargaff’s rules about the ratio of bases in DNA.  Because of his position, he was able to put these pieces together faster than anyone else.

All three processes helped Watson:

  • Access to novel information.
  • Timing, getting access to information before it was published.
  • Referrals, through his famous and Nobel prize winning advisor, he was able to hop from one great lab in Europe to an other, and get access to conferences that he would not be able to attend otherwise.

Luck? No. Social Networks.

Growing your network strategically

Structural holes theory also implies a series of tradeoffs between the size of one’s network and the benefits that the network produces. A large network is not necessarily a good thing. This is because maintaining a network connection implies some cost and results in some benefit.

  • Decreasing returns to network size:  If we measure benefits in units of novel information, one could imagine that adding a new tie might entail some cost (time, resources, emotional energy, etc.) but subsequently not result in access to much more new information-e.g., you hear about the same job opportunities from the new connection that you heard about from your existing friend or acquaintance.) So at least in terms of information, there is a decreasing return to the network size: you pay the additional cost of the new connection, but it is providing less information per unit cost than a prior connection.
  • Constant returns to network size: A more palatable case is constant returns. Here doubling your network size, doubles the amount of information you have access to. Every new network connection provides information in proportion to what the prior network connections provided.
  • Increasing returns to network size: The most ideal situation is one where doubling the size of your network more than doubles the information you get. Is this even possible, since adding a new network connection that provides more information than before might also be substantially more costly?

In any case, you clearly want to be at a point before your costs of maintaining a network significantly outweigh any benefits that you get.

Structural holes theory provides some useful guidance on not going too far down the route of decreasing returns to size. A good heuristic for understanding this tradeoff is a calculation developed by Professor Ronald Burt called efficiency. Efficiency can be calculated in the following way:

Efficiency = Effective Size / Actual Size

Expanding this function out, we can define:

Actual size = The number of connections that you have.

Effective size = Actual Size – Sum of percent of overlapping ties for each of your connections.

 

 

Bandwidth and Diversity

The model above has been tremendously useful and very predictive. In recent years, some scholars have also highlighted another interesting tradeoff between stronger non-bridging ties and weaker bridging ties: the bandwidth/diversity tradeoff.

On one hand, greater bandwidth ties result in greater greater informational volume. On the other hand, weaker bridging ties result in greater variance in information.

Recent work suggests this relationship depends fundamentally on the nature of the environment in which people are building their social networks. There are two factors that can reduce the value of bridging ties and privilege high-bandwidth ties:

  1. If the network has a homogenous set of knowledge – where most people talk about the same things. Then having more high-bandwidth ties may be more important.
  2. If the “refresh rate” – is high – where people’s contacts and interactions churn very fast, or where the environment turbulent and the information is extremely complex — meaning that an idea contains multiple topics or subjects — then high bandwidth ties are better at sustaining the high variance information you need.

However, what studies have found is that “strong” bridging ties that have both bandwidth and diversity are the best — but they are indeed rarer rare.

 Extending the Core Insights from Structural Hole Theory

As one can imagine, structural holes theory was extremely powerful and scholars have been working to extend and refine the predictions of the theory further to account for structures that don’t neatly fit into the standard dichotomy or have dynamic elements.

Consider dynamics: Given how difficult it is to maintain bridging positions, it is likely that bridges are fragile. Research suggests that bridging ties followed what is called a kinked decay function. Initially bridges have a low likelihood of breaking, followed shortly by a sharp rise in decay, if the bridge survives this spike in decay rates, it is likely to persist for a long time.

Two processes often lead to decay:

  • Disintermediation: Disconnected parties learn to exchange on their own.
  • Competition from rival brokers: Rivals enter the fray and by offering either greater benefits or lower cost, whittle away at the original bridge’s benefits from occupying the hole. Indeed, the hole no longer exists.

Why bridges decay:

  • -Low performance / High performers have lower rates of decay for bridges
  • If other relations are decaying, bridges are also likely to decay
  • Experience bridging improves the chances that new bridges survive
  • “Hole decay” may be limited when:
    • Deep barriers limit interaction across the hole.
    • The benefits to the bridged parties is high enough and switching costs are high.
    • The bridged individuals don’t question the role of the broker, or it is not salient to them.

Beyond Information and Control

There are also cases where brokering is disadvantageous. The underlying mechanism leading to the disadvantages of brokering have to do with identity and expectations.

  •  In addition to information, networks also convey expectations about who one is (identity) and how one should behave (expectations). Many of us have been caught between two groups that expect different things from us.  This happens at work, at home, and even in our social and personal lives with friends. The more disconnected are connections are, the more likely it is that they have different expectations about how we should behave. Podolny and Baron (1997) show that when a person is a broker in a network that conveys “identity” they are less likely to benefit from their brokerage position than when the network primarily provides “information.”
  • Similarly, Krackhardt in his Simmelian tie theory makes a related argument that brokering between two strongly connected groups creates pressure to conform to different norms which can create internal role conflict, stress, and thus reduce performance.

Outcomes as Mean versus Variance

The theories that we have focused on thus far attempt to predict mean or expected outcomes. That is, what is the average difference in wages/promotion rates/bonuses/ideas for those with or without structural holes. The graph below shows that there is a mean shift. The blue distribution (e.g., structural holes condition) has a higher mean outcome.

Picture1

However, this analysis can be pushed further by asking: is there a shift in the variance of potential outcomes. Does a specific structure reduce or increase the possible variation in outcomes. Note that the blue distribution below, is “tighter” than the black distribution. The black distribution has a greater likely hood of worse, but also better outcomes than the first.

Which would you prefer below?

Picture1

James Lincoln of UC Berkeley did pioneering studies on business networks in Japan and found that companies that were members of the Keiretsu, while having lower means in terms of outcomes, also had lower variation and as a consequence were less likely to both do extremely poorly but also less likely to do extremely well.

With respect to brokerage, we can also think about floors and ceilings. Networks that are high in closure reduce variation in performance, both high and low.

The high performance is minimized because of the subsidizing of the lower performers by the high performers, and the low performers don’t do as poorly because the high performers help them out.

The network structures that tend to most facilitate the low-variance strategy are closed networks, as one can imagine.

The classic examples of this are ethnic networks, where people – the more wealthy people help out the less fortunate ones. 

Rankings of Social Science Research Productivity of Indian Universities, Colleges and Institutes

In 2011, I created a ranking of Indian universities based on productivity in the social sciences. Here is the original ranking. Look out for the new ranking forthcoming in the next few months.

Rankings of Social Science Research Productivity of Indian Universities, Colleges and Institutes

Below you will find rankings of Indian universities and institutes based on productivity in social science research. The universities and institutes are ranked in four categories: (1) sociology, demography and family studies, (2) economics, (3) psychology, and (4) business and management. The rankings presented here are based on a limited set of variables, namely the number of peer-reviewed journal articles produced by an institution and the number of citations these articles received.

The data used for the rankings are derived from Thomson’s ISI Web of Knowledge. The raw data included any article published in one of 3015 social science journals indexed by ISI (including Indian journals as well as international journals) by an author affiliated with a university or institute located in India between 2000 and 2010. These data were subset to include only those institutions that had more than 20 publications in any social science category during this ten-year time frame; the final sample consists of 61 universities and institutes. Finally, the institutions were ranked according to their research productivity (citations and publications) in the four categories mentioned above.

Omissions and Caveats

There is no doubt that these rankings are limited in important ways. Most significantly, the measures I use do not incorporate research output in the form of books, book chapters, journals not indexed by ISI, peer-reviewed conferences, as well as other academic writing such as case studies. Excluding these output venues reduces the value of the rankings in some respects, but also makes comparing universities straightforward and systematic. Moreover, peer-reviewed journal publications and citation counts are universally accepted measures of academic productivity and are clearly valuable and informative.

The rankings also do not incorporate other information—such as academic placements of doctoral students, quality of instruction, facilities, and peer quality—that might be useful for prospective graduate students.  It is advised that you learn more about the universities and institutes you are considering before making any choices.

Reason for the rankings

The rankings were created for my own use; I wanted to find out which Indian universities were producing the most (and, if possible, the most interesting) social science research.   The rankings are therefore my attempt at making sense of the social science ecosystem in India and not an attempt at producing a definitive and thorough ranking.


Measures:

Int. Collab (International Collaboration):  The proportion of articles co-authored with international collaborators (e.g. co-authors of the focal Indian author located outside of India).  For example, in the Sociology, Demography and Family Studies category, nine percent of all articles published by Jawaharlal Nehru University were co-authored with scholars affiliated to institutions located outside of India. Whereas sixty-seven percent of all articles from the Indian Institute of Management – Bangalore were co-authored with international collaborators. This measure is not incorporated into the ranking. It is provided only for informational purposes.

Avg. Cite (Average citations):  This measure is the sum of the citations received by the articles produced by an institution divided by the total number of articles. This measure attempts to quantify the impact of the research published by a given institution.

NOTE: This measure does not take into account the amount of research produced by an institution; thus, two universities may have similar scores on Avg. Cite, but vastly different levels of productivity (e.g. one produced only 5 articles and another produced 100). The Cite Adj. Pubs measure attempts to incorporate both the number of articles produced and the impact of the articles.

Pubs (Number of Publications): The total number of publications in the journals belonging to the category (e.g. Economics) during the period 2000 to 2010 authored or co-authored by someone affiliated with the institution.  For instance, scholars from the University of Hyderabad published forty-nine articles in the journals indexed by ISI’s Social Science Index in the Sociology, Demography and Family Studies categories between 2000 and 2010.

Cite Adj. Pubs (Citation Adjusted Publications): This measure is a relatively straightforward combination of the Avg. Cite and Pubs measures.  Every article i is given a score which is equal to a(i) = 1 + ln(1+citations(i)). I then add up all the article scores a(i) for each institution, resulting in a institution specific Citation Adj. Publication score. If none of the articles published by a university has received a citation, the Cite Adj. Pubs score is equal to the number of publications. As the number of citations increase, this score increases at a decreasing rate.

The Rankings

Where do Indian Scholars Publish?

I have also provided a list of journals in which India-based scholars have published more than five articles in the past ten years.

Collaboration Network



THE RANKINGS

Sociology, Demography and Family Studies
Rank Name Int. Collab Avg. Cite Pubs Cite adj. Pubs
1 Delhi Univ – All Others 0.09 0.43 69 80.50
2 Jawaharlal Nehru Univ 0.04 1.08 51 71.15
3 Inst Econ Growth – Delhi University 0.02 0.39 49 56.78
4 Univ Hyderabad 0.03 1.87 38 42.28
5 Delhi Sch Econ – Delhi University 0.09 4.05 22 37.32
6 Int Inst Populat Sci – Mumbai 0.19 4.00 16 34.81
7 Ctr Dev Studies – Trivandrum 0.05 1.48 21 31.27
8 Populat Council 0.56 5.11 9 22.23
9 Punjab Univ-Chandigarh 0.00 0.16 19 20.79
10 Tata Inst Social Sci – Mumbai 0.13 0.33 15 17.48
11 Ctr Studies Social Sci – Calcutta 0.17 1.00 12 14.56
12 Indian Stat Inst – Calcutta 0.10 0.70 10 14.16
13 Madras Inst Dev Studies 0.20 1.20 10 13.69
14 Indian Inst Technol – Bombay 0.08 0.00 13 13.00
15 Karnatak Univ 0.80 6.00 5 12.17
16 Inst Social & Econ Change – Bangalore 0.25 1.00 8 12.09
17 Indian Stat Inst – Delhi 0.50 9.25 4 11.85
18 Ctr Womens Dev Studies – West Midnapore 0.00 0.56 9 11.77
19 Indian Inst Technol – Delhi 0.67 3.33 6 11.70
20 Univ Pune 0.00 0.00 11 11.00
21 Indian Inst Technol – Kanpur 0.00 1.17 6 9.47
22 Indian Inst Technol – Kharagpur 0.00 5.25 4 9.46
23 Indian Inst Management – Bangalore 0.67 18.33 3 9.15
24 Univ Allahabad 0.75 2.50 4 8.28
25 Goa Univ 0.00 0.00 8 8.00
26 Banaras Hindu Univ 0.17 0.67 6 7.61
27 Indian Sch Business – Hyderabad 1.00 6.33 3 7.50
28 Univ Calcutta 0.00 1.20 5 6.95
29 Indian Inst Management – Calcutta 0.00 11.00 3 6.53
30 Reserve Bank India 0.33 3.67 3 6.40
31 Natl Inst Adv Studies – Bangalore 0.00 0.00 6 6.00
32 Indian Inst Technol – Guwahati 0.00 0.00 6 6.00
33 Univ Mumbai 0.00 0.20 5 5.69
34 Natl Council Appl Econ Res – New Delhi 0.67 2.67 3 5.20
35 Jadavpur Univ 0.00 1.33 3 5.08
36 Jamia Millia Islamia 0.00 0.00 5 5.00
Economics Research

Rank Name Int. Collab Avg. Cite Pubs Cite adj. Pubs
1 Indian Stat Inst – Delhi 0.544 8.000 57 130.85
2 Indian Stat Inst – Calcutta 0.277 6.851 47 98.41
3 Jawaharlal Nehru Univ 0.263 2.053 57 97.75
4 Delhi Sch Econ – Delhi University 0.382 3.559 34 65.44
5 Indira Gandhi Inst Dev Res – Mumbai 0.412 3.235 34 60.71
6 Ctr Studies Social Sci – Calcutta 0.704 4.667 27 58.13
7 Delhi Univ – All Others 0.429 1.657 35 52.43
8 Inst Econ Growth – Delhi University 0.320 4.480 25 50.46
9 Indian Inst Technol – Delhi 0.333 15.400 15 42.48
10 Univ Calcutta 0.250 2.700 20 39.01
11 Reserve Bank India 0.111 2.389 18 31.04
12 Indian Inst Management – Bangalore 0.471 1.235 17 26.17
13 Jadavpur Univ 0.385 1.692 13 22.40
14 Indian Inst Technol – Kanpur 0.500 10.375 8 21.96
15 Indian Inst Management – Ahmedabad 0.500 12.600 10 20.75
16 Madras Sch Econ 0.556 3.778 9 19.89
17 Indian Inst Technol – Kharagpur 0.571 8.143 7 18.18
18 Inst Social & Econ Change – Bangalore 0.273 3.545 11 17.67
19 Punjab Univ-Chandigarh 0.273 2.909 11 16.85
20 Ctr Dev Studies – Trivandrum 0.222 3.111 9 16.23
21 Natl Council Appl Econ Res – New Delhi 0.667 1.444 9 15.76
22 Indian Sch Business – Hyderabad 1.000 5.500 6 14.93
23 Natl Inst Sci Technol & Dev Studies – New Delhi 0.000 2.714 7 14.20
24 Tata Inst Social Sci – Mumbai 0.143 3.429 7 13.98
25 Univ Hyderabad 0.667 6.000 6 13.61
26 Indian Inst Management – Calcutta 1.000 5.000 5 12.78
27 Indian Inst Sci – Bangalore 0.800 5.800 5 12.77
28 Madras Inst Dev Studies 0.222 0.778 9 11.64
29 Visva Bharati Univ – Santiniketan 0.200 2.000 5 9.56
30 Indian Inst Management – Lucknow 0.250 4.500 4 9.19
31 Utkal Univ 0.333 7.333 3 8.97
32 Univ Mumbai 0.125 0.125 8 8.69
33 Indian Inst Technol – Madras 0.250 4.750 4 8.64
34 Indian Stat Inst – Bangalore 0.500 2.250 4 7.87
35 Indian Inst Informat Technol – Various 0.200 0.800 5 7.20
36 Vidyasagar Univ – Midnapore 0.000 4.333 3 6.78
37 Natl Inst Technol – Various 0.667 5.333 3 6.47
38 Populat Council 0.500 16.500 2 6.19
39 Univ Allahabad 0.333 2.000 3 5.48

 

Psychology

Rank Name Int. Collab Avg. Cite Pubs Cite adj. Pubs
1 Delhi Univ – All Others 0.29 3.60 45 74.30
2 Univ Allahabad 0.26 0.97 35 49.88
3 Indian Inst Technol – Kharagpur 0.29 3.48 21 40.12
4 Banaras Hindu Univ 0.47 3.35 17 32.05
5 Indian Inst Technol – Delhi 0.21 11.36 14 30.48
6 Indian Inst Management – Ahmedabad 0.36 8.36 14 28.70
7 Indian Inst Technol – Kanpur 0.12 3.47 17 27.36
8 Indian Sch Business – Hyderabad 0.83 22.00 6 22.91
9 Univ Calcutta 0.00 1.80 10 16.58
10 Indian Stat Inst – Calcutta 0.43 11.29 7 16.40
11 Aligarh Muslim Univ 0.63 5.13 8 16.12
12 Indian Inst Management – Calcutta 0.50 28.50 4 13.77
13 Jawaharlal Nehru Univ 0.09 0.91 11 13.40
14 Punjab Univ-Chandigarh 0.08 0.25 12 13.39
15 Indian Inst Management – Bangalore 0.67 5.50 6 13.38
16 Populat Council 0.80 6.20 5 13.31
17 Indian Inst Technol – Roorkee 0.00 16.80 5 12.79
18 Cent Inst Psychiat 0.20 3.00 5 11.58
19 Univ Pune 0.00 0.10 10 10.69
20 Univ Mysore 0.57 1.14 7 10.04
21 Indian Inst Technol – Bombay 0.22 0.11 9 9.69
22 Indian Inst Sci – Bangalore 0.25 4.75 4 9.12
23 Indian Inst Technol – Madras 0.60 2.80 5 8.66
24 Ctr Dev Studies – Trivandrum 0.25 3.25 4 8.38
25 Tata Inst Fundamental Res – Mumbai 0.75 3.00 4 8.28
26 Jadavpur Univ 0.40 1.20 5 8.00
27 Karnatak Univ 1.00 12.00 3 7.91
28 Int Inst Populat Sci – Mumbai 0.50 2.00 4 7.69
29 Utkal Univ 0.00 0.00 7 7.00
30 Univ Hyderabad 0.20 0.60 5 6.79
31 Jamia Millia Islamia 0.00 0.40 5 6.10
32 Natl Inst Sci Technol & Dev Studies – New Delhi 0.33 3.00 3 5.30
33 Indian Stat Inst – Delhi 0.50 11.00 2 5.14
34 Indian Inst Informat Technol – Various 0.00 0.00 5 5.00

Business and Management Research

Rank Name Int. Collab Avg. Cite Pubs Cite adj. Pubs
1 Indian Inst Management – Bangalore 0.53 6.82 45 98.78
2 Indian Inst Technol – Delhi 0.25 16.21 24 72.60
3 Indian Inst Management – Calcutta 0.46 5.75 28 67.70
4 Indian Sch Business – Hyderabad 0.88 4.63 24 52.98
5 Management Dev Inst – Gurgaon 0.33 3.00 21 38.58
6 Indian Inst Management – Ahmedabad 0.37 2.53 19 32.33
7 Delhi Univ – All Others 0.06 2.00 16 27.61
8 Indian Inst Technol – Madras 0.25 6.42 12 26.64
9 Indira Gandhi Inst Dev Res – Mumbai 0.20 8.60 10 25.67
10 Indian Inst Sci – Bangalore 0.50 4.36 14 24.30
11 Jawaharlal Nehru Univ 0.33 1.40 15 23.95
12 Indian Inst Technol – Kanpur 0.50 11.00 8 23.78
13 Indian Inst Technol – Bombay 0.00 2.08 12 22.21
14 Indian Inst Technol – Kharagpur 0.20 6.60 10 20.89
15 Natl Inst Sci Technol & Dev Studies – New Delhi 0.43 5.43 7 17.18
16 Indian Inst Management – Lucknow 0.10 1.30 10 16.58
17 Inst Econ Growth – Delhi University 0.29 5.86 7 16.44
18 Indian Stat Inst – Delhi 0.33 3.67 6 13.86
19 Indian Inst Technol – Roorkee 0.00 8.20 5 12.97
20 Govt India 0.50 3.00 6 11.19
21 Ctr Studies Social Sci – Calcutta 0.33 0.83 6 9.18
22 Jadavpur Univ 0.40 2.40 5 9.09
23 Indian Stat Inst – Calcutta 0.14 0.43 7 8.79
24 Punjab Univ-Chandigarh 0.25 4.25 4 7.87
25 Natl Council Appl Econ Res – New Delhi 0.50 1.50 4 7.58
26 Natl Inst Technol – Various 0.00 0.33 6 7.10
27 Indian Stat Inst – Bangalore 0.00 2.25 4 6.89
28 Univ Calcutta 0.25 1.00 4 6.48
29 Cent Inst Psychiat 0.00 2.33 3 6.18
30 Banaras Hindu Univ 0.00 2.33 3 6.00
31 Tata Inst Fundamental Res – Mumbai 0.67 1.67 3 5.89
32 Delhi Sch Econ – Delhi University 0.33 4.67 3 5.71
33 Tata Inst Social Sci – Mumbai 0.50 18.50 2 5.64

THE JOURNALS

Sociology, Demography and Family Studies

Journal No. of Pubs
Contributions To Indian Sociology 413
Journal Of Biosocial Science 33
Culture Health & Sexuality 23
Social Indicators Research 22
International Sociology 17
Journal Of Comparative Family Studies 11
Journal Of Family Planning And Reproduc 9
Studies In Family Planning 9
Population Studies-A Journal Of Demogra 8
Journal Of Medical Ethics 7
Men And Masculinities 7
Agriculture And Human Values 6
Human Ecology 6

Economics

Journal No. of Pubs
Ecological Economics 50
Value In Health 47
World Development 43
Journal Of Development Studies 40
Futures 35
Applied Economics Letters 33
Journal Of Development Economics 27
Journal Of Policy Modeling 27
Economics Letters 21
Applied Economics 20
Agricultural Economics 19
Kyklos 19
Economic Modelling 18
Environmental & Resource Economics 13
Singapore Economic Review 13
Economic Theory 12
International Review Of Economics & Fin 12
Journal Of Economic Behavior & Organiza 12
Journal Of Economic Policy Reform 12
Social Choice And Welfare 12
Review Of Development Economics 11
American Journal Of Agricultural Econom 10
Cambridge Journal Of Economics 10
Developing Economies 10
Economic Development And Cultural Chang 10
Energy Economics 10
Japanese Economic Review 10
Journal Of The Asia Pacific Economy 10
Journal Of World Trade 10
Hitotsubashi Journal Of Economics 9
Journal Of International Trade & Econom 9
Food Policy 8
Games And Economic Behavior 8
International Journal Of Industrial Org 8
International Labour Review 8
Journal Of Economic Theory 8
Journal Of Economics 8
Manchester School 8
Pacific Economic Review 8
Emerging Markets Finance And Trade 7
Japan And The World Economy 7
Oxford Economic Papers-New Series 7
European Economic Review 6
Feminist Economics 6
Journal Of Agrarian Change 6
Journal Of Economic Dynamics & Control 6

Psychology

Journal No. of Pubs
International Journal Of Psychology 248
Psycho-Oncology 34
Aids Care-Psychological And Socio-Medic 32
Perceptual And Motor Skills 16
Physiology & Behavior 13
Applied Psychophysiology And Biofeedbac 10
Brain And Cognition 10
Journal Of Cross-Cultural Psychology 10
Child Care Health And Development 9
Ergonomics 9
Human Resource Management 9
Perception 9
Applied Ergonomics 8
Culture & Psychology 7
Psychological Reports 7
Asian Journal Of Social Psychology 6
Environment And Behavior 6
International Journal Of Behavioral Med 6
Journal Of Social Psychology 6
Studia Psychologica 6

Business and Management

Journal Number of Pubs.
Total Quality Management & Business Exc 42
Journal Of The Operational Research Soc 34
Asian Case Research Journal 25
International Journal Of Technology Man 23
Omega-International Journal Of Manageme 23
Journal Of Business Ethics 17
Management Decision 17
Supply Chain Management-An Internationa 17
Technological Forecasting And Social Ch 17
Harvard Business Review 16
Interfaces 13
International Journal Of Human Resource 12
International Review Of Economics & Fin 12
Journal Of Knowledge Management 12
Technovation 11
Research Policy 10
Human Resource Management 9
African Journal Of Business Management 8
International Journal Of Operations & P 8
International Labour Review 8
Systems Research And Behavioral Science 8
Disaster Prevention And Management 7
Emerging Markets Finance And Trade 7
Journal Of International Business Studi 7
Asian Business & Management 6
Corporate Social Responsibility And Env 6
Information Technology & Management 6
International Transactions In Operation 6
Journal Of Futures Markets 6
Marketing Science 6

Network Analysis in R: Getting Started

In some respects, the history of network analysis cannot be separated from the tools used to conduct network analysis. The importance of software to the enterprise of network analysis has been true since the very beginning of the field. Scholars have written and made available software programs to allow others to collect data and conduct analysis themselves.  For instance, you can find some description of a software program called CONCOR in White et al. (1976) that finds roles in an informal social network. Other great technologies such as UCINET, KrackPlot and a host of other social network analysis software allowed network approaches to spread rapidly through the field. My hypothesis is that without these technologies and their ease of use (UCINet, I think was a game changer for the field), network analysis might still be in the backwaters.

Today, there are lots of options for the researcher who wants to do network analysis. I myself use two primary tools that fit well into my workflow (e.g., I use an Apple Mac and I do a lot of non-network analysis as well). Those tools are: The R Statistical Programming Language + the SNA Package developed by Professor Carter Butts of UCI Irvine and STATA. While some of my posts (and the accompanying analysis) will use STATA, I  will focus primarily on the use of R for network analysis.

Getting started with R for Social Network Analysis

Let us begin by downloading and installing the R programming language. Begin by navigating to the R-Project. I will do the walkthrough for the Mac version of R.

Screen Shot 2017-04-28 at 9.09.05 AM.png

After navigating there, click on the CRAN link under download. The closest server to me is probably at UC Berkeley, but pick which ever one is closest.

Screen Shot 2017-04-28 at 9.10.56 AM.png

Next, download and install the version of R for your operating system. I will click the Download R for (Mac) OS X, and then click on the most recent version (which, at the writing of this post is R-3.4.0. Download and install. I won’t walk you through this.

Screen Shot 2017-04-28 at 9.14.27 AM.png

Now that R is installed, lets open it up and get some basic network analysis going. Once the R console is open, click on File (in the top menu) and then click on New Document. This should open a blank script file. Type a comment (a line that begins with #). I’ve typed:

# This file provides some simple code to get you started on your Network Analysis Journey

Save the file (I’ve called it RSNApractice.R). Clicking on the file name will give you access to the complete file.

Screen Shot 2017-04-28 at 9.16.38 AM.png

Now that we have that sorted out, let us begin by installing some important packages. You can type this code directly into the console.

install.packages(“data.table”)
install.packages(“curl”)
install.packages(“sna”)

The data.table package allows us to import data from the web; the curl is a required package for data.table and sna. Once these packages are installed, lets get them loaded.

library(data.table)
library(curl)
library(sna)

Now that these are installed, let me tell you a little about the data that we are going to analyze. This data comes from professional services consulting firm on the east coast of the United States, collected some time in the early 2000s. There are 247 people at the firm and each of them responded to a network survey where they answered 6 questions. Here are the questions:

#(Q0) “who do you know or know of at [the firm]”,

#(Q1) “who you would approach for help or advice on work related issues”,

#(Q2) “who might typically come to you for help or advice on work related issues”,

#(Q3) who you go to “about more than just how to do your work well. For example, you may be interested in ‘how things work’ around here, or how to optimize your chances for a successful career here”,

#(Q4) “who might typically come to you for help or advice along these [non-task related] dimensions” and finally

#(Q5) “who you think of as friends here at [firm].”

I’ve uploaded their responses to a dropbox folder in the form of matrices. The rows of the matrix indicate “senders” or “Ego” and the columns represent “receivers” or “Alters.”

We can load the data using the following code:

#Load the “Professionals” network data from Dropbox.

q0 <- fread(https://www.dropbox.com/s/xsk5t5nhsmp8614/q0_res.csv?dl=1&#8217;)
q1 <- fread(https://www.dropbox.com/s/aplyb7h947993ca/q1_res.csv?dl=1&#8217;)
q2 <- fread(https://www.dropbox.com/s/qrwr6j5mjz57kbr/q2_res.csv?dl=1&#8217;)
q3 <- fread(https://www.dropbox.com/s/wlw8w34cjlxvs3y/q3_res.csv?dl=1&#8217;)
q4 <- fread(https://www.dropbox.com/s/o82cg1mcjx0u09u/q4_res.csv?dl=1&#8217;)
q5 <- fread(https://www.dropbox.com/s/x86r63ewbh2ol6p/q5_res.csv?dl=1&#8217;)

#Convert the data.table objects into matrix format so they can be
#analyzed using the sna package.

q0 = as.matrix(q0)
q1 = as.matrix(q1)
q2 = as.matrix(q2)
q3 = as.matrix(q3)
q4 = as.matrix(q4)
q5 = as.matrix(q5)

# Create a vector of numbers from 1-247 and convert them to a string.
# We will use these to rename our rows and columns.

names = paste(seq(1:247))

# Rename all the rows

rownames(q0) = names
rownames(q1) = names
rownames(q2) = names
rownames(q3) = names
rownames(q4) = names
rownames(q5) = names

# Rename all the columns

colnames(q0) = names
colnames(q1) = names
colnames(q2) = names
colnames(q3) = names
colnames(q4) = names
colnames(q5) = names

This code should load all of the network data into the R console.

Now, lets import some attributes.

# Imports the attributes file and outcomes file, and converts it into a data frame.

attr attr

Now that these are all loaded, lets see how the data look. Type the following to look at the first ten rows and columns of q0.

# Lets look at the first ten rows/columns of q0

q0[1:10,1:10]

Screen Shot 2017-04-28 at 10.45.52 AM.png

How do we interpret this? Person 1 doesn’t appear to know persons 2-10. However, person 2 says they know person 5, 7 and 10.

Lets plot this as a graph.

# Plot the first 10 people in the q0 matrix.

gplot(q0[1:10,1:10])

Screen Shot 2017-04-28 at 10.47.47 AM.png

Let us now plot the full q0 network. This is the “knowing” network of this firm of 247.

# Plot the full “knowing” network

gplot(q0)

Screen Shot 2017-04-28 at 10.51.02 AM.png

Quite dense. A lot of people know a lot of other people at the firm. Try to do this analysis for q1 to q5. What are the differences/similarities?

Lets do some simple centrality calculations (more on Centrality in the Representing Networks post).

# Calculate two simple centrality calculations on the q0 network.
# Indegree is the number of people who say they know a focal person (in arrows on a node)
# Outdegree is the number of people who a focal person says they know (out arrows from a node)

q0.indegree = degree(q0, cmode =”indegree”)
q0.outdegree = degree(q0, cmode =”outdegree”)

The centrality measures are now saved in the objects q0.indegree and q0.outdegree. Lets plot histograms of these two measures.

# Plot histograms of q0.indegree and q0.outdegree

hist(q0.indegree)
hist(q0.outdegree)

These look very nicely distributed, almost poisson. Lets calculate some summary statistics on these measures.

# Summary statistics on the indegree/outdegree measures

summary(q0.indegree)
summary(q0.outdegree)

Screen Shot 2017-04-28 at 11.06.25 AM.png

Now, lets do one final thing before we conclude this post (you can keep analyzing stuff, I will delve deeper into centrality measures and the like in a different post). I have also given you an outcomes file with three outcomes.

Here are the outcome variables:

relationships: whether the respondent feels their relationships at the firm are fulfilling
success: whether the respondent feels that they have the knowledge to succeed at the firm
appreciate: whether they feel appreciated

Here is a description of the attribute variables:

tenure: tenure at this firm
title: whether the employee is an analyst, lateral hire, or partner
location: what office they work in
gender: male or female
ethnicity: 91% are white
age: age of employee
elite: whether the employee graduated from an elite university
feeder: whether the employee graduated from a “feeder” university
work1-work24: types of work the employee does

Lets conduct one final analysis. Lets see if there is a correlation between how many people an employee knows, and whether they feel like they have the knowledge to scuc

# Examine if there is a correlation between how many people someone knows and whether they feel like they have the knowledge to succeed.

m.0 summary(m.0)

Screen Shot 2017-04-28 at 11.29.52 AM.png

Looks like there is at least a bivariate correlation.  Lets plot it.

# Plot the regression and the data points.

plot(q0.indegree,attr$success)
abline(m.0)

Screen Shot 2017-04-28 at 11.31.12 AM.png

Now that you have most of the data, you can explore yourself. Here is the full code @ RSNApractice.R