In some respects, the history of network analysis cannot be separated from the tools used to conduct network analysis. The importance of software to the enterprise of network analysis has been true since the very beginning of the field. Scholars have written and made available software programs to allow others to collect data and conduct analysis themselves. For instance, you can find some description of a software program called CONCOR in White et al. (1976) that finds roles in an informal social network. Other great technologies such as UCINET, KrackPlot and a host of other social network analysis software allowed network approaches to spread rapidly through the field. My hypothesis is that without these technologies and their ease of use (UCINet, I think was a game changer for the field), network analysis might still be in the backwaters.
Today, there are lots of options for the researcher who wants to do network analysis. I myself use two primary tools that fit well into my workflow (e.g., I use an Apple Mac and I do a lot of non-network analysis as well). Those tools are: The R Statistical Programming Language + the SNA Package developed by Professor Carter Butts of UCI Irvine and STATA. While some of my posts (and the accompanying analysis) will use STATA, I will focus primarily on the use of R for network analysis.
Getting started with R for Social Network Analysis
Let us begin by downloading and installing the R programming language. Begin by navigating to the R-Project. I will do the walkthrough for the Mac version of R.
Next, download and install the version of R for your operating system. I will click the Download R for (Mac) OS X, and then click on the most recent version (which, at the writing of this post is R-3.4.0. Download and install. I won’t walk you through this.
Now that R is installed, lets open it up and get some basic network analysis going. Once the R console is open, click on File (in the top menu) and then click on New Document. This should open a blank script file. Type a comment (a line that begins with #). I’ve typed:
# This file provides some simple code to get you started on your Network Analysis Journey
Save the file (I’ve called it RSNApractice.R). Clicking on the file name will give you access to the complete file.
Now that we have that sorted out, let us begin by installing some important packages. You can type this code directly into the console.
The data.table package allows us to import data from the web; the curl is a required package for data.table and sna. Once these packages are installed, lets get them loaded.
Now that these are installed, let me tell you a little about the data that we are going to analyze. This data comes from professional services consulting firm on the east coast of the United States, collected some time in the early 2000s. There are 247 people at the firm and each of them responded to a network survey where they answered 6 questions. Here are the questions:
#(Q0) “who do you know or know of at [the firm]”,
#(Q1) “who you would approach for help or advice on work related issues”,
#(Q2) “who might typically come to you for help or advice on work related issues”,
#(Q3) who you go to “about more than just how to do your work well. For example, you may be interested in ‘how things work’ around here, or how to optimize your chances for a successful career here”,
#(Q4) “who might typically come to you for help or advice along these [non-task related] dimensions” and finally
#(Q5) “who you think of as friends here at [firm].”
I’ve uploaded their responses to a dropbox folder in the form of matrices. The rows of the matrix indicate “senders” or “Ego” and the columns represent “receivers” or “Alters.”
We can load the data using the following code:
#Load the “Professionals” network data from Dropbox.
q0 <- fread(‘https://www.dropbox.com/s/xsk5t5nhsmp8614/q0_res.csv?dl=1’)
q1 <- fread(‘https://www.dropbox.com/s/aplyb7h947993ca/q1_res.csv?dl=1’)
q2 <- fread(‘https://www.dropbox.com/s/qrwr6j5mjz57kbr/q2_res.csv?dl=1’)
q3 <- fread(‘https://www.dropbox.com/s/wlw8w34cjlxvs3y/q3_res.csv?dl=1’)
q4 <- fread(‘https://www.dropbox.com/s/o82cg1mcjx0u09u/q4_res.csv?dl=1’)
q5 <- fread(‘https://www.dropbox.com/s/x86r63ewbh2ol6p/q5_res.csv?dl=1’)
#Convert the data.table objects into matrix format so they can be
#analyzed using the sna package.
q0 = as.matrix(q0)
q1 = as.matrix(q1)
q2 = as.matrix(q2)
q3 = as.matrix(q3)
q4 = as.matrix(q4)
q5 = as.matrix(q5)
# Create a vector of numbers from 1-247 and convert them to a string.
# We will use these to rename our rows and columns.
names = paste(seq(1:247))
# Rename all the rows
rownames(q0) = names
rownames(q1) = names
rownames(q2) = names
rownames(q3) = names
rownames(q4) = names
rownames(q5) = names
# Rename all the columns
colnames(q0) = names
colnames(q1) = names
colnames(q2) = names
colnames(q3) = names
colnames(q4) = names
colnames(q5) = names
This code should load all of the network data into the R console.
Now, lets import some attributes.
# Imports the attributes file and outcomes file, and converts it into a data frame.
Now that these are all loaded, lets see how the data look. Type the following to look at the first ten rows and columns of q0.
# Lets look at the first ten rows/columns of q0
How do we interpret this? Person 1 doesn’t appear to know persons 2-10. However, person 2 says they know person 5, 7 and 10.
Lets plot this as a graph.
# Plot the first 10 people in the q0 matrix.
Let us now plot the full q0 network. This is the “knowing” network of this firm of 247.
# Plot the full “knowing” network
Quite dense. A lot of people know a lot of other people at the firm. Try to do this analysis for q1 to q5. What are the differences/similarities?
Lets do some simple centrality calculations (more on Centrality in the Representing Networks post).
# Calculate two simple centrality calculations on the q0 network.
# Indegree is the number of people who say they know a focal person (in arrows on a node)
# Outdegree is the number of people who a focal person says they know (out arrows from a node)
q0.indegree = degree(q0, cmode =”indegree”)
q0.outdegree = degree(q0, cmode =”outdegree”)
The centrality measures are now saved in the objects q0.indegree and q0.outdegree. Lets plot histograms of these two measures.
# Plot histograms of q0.indegree and q0.outdegree
These look very nicely distributed, almost poisson. Lets calculate some summary statistics on these measures.
# Summary statistics on the indegree/outdegree measures
Now, lets do one final thing before we conclude this post (you can keep analyzing stuff, I will delve deeper into centrality measures and the like in a different post). I have also given you an outcomes file with three outcomes.
Here are the outcome variables:
relationships: whether the respondent feels their relationships at the firm are fulfilling
success: whether the respondent feels that they have the knowledge to succeed at the firm
appreciate: whether they feel appreciated
Here is a description of the attribute variables:
tenure: tenure at this firm
title: whether the employee is an analyst, lateral hire, or partner
location: what office they work in
gender: male or female
ethnicity: 91% are white
age: age of employee
elite: whether the employee graduated from an elite university
feeder: whether the employee graduated from a “feeder” university
work1-work24: types of work the employee does
Lets conduct one final analysis. Lets see if there is a correlation between how many people an employee knows, and whether they feel like they have the knowledge to scuc
# Examine if there is a correlation between how many people someone knows and whether they feel like they have the knowledge to succeed.
Looks like there is at least a bivariate correlation. Lets plot it.
# Plot the regression and the data points.
Now that you have most of the data, you can explore yourself. Here is the full code @ RSNApractice.R