Andreas Weigend | Social Data Revolution | Fall 2014
School of Information | University of California at Berkeley | INFO 290A-03


At its heart, the Social Data Revolution is a communication revolution that is transforming our economies and societies. Complex networks are formed through the interactions between individuals. Almost all their interactions with each other and the world leave digital traces. By studying human behavior on digital networks like Facebook, we acquire insights and techniques to understand and to influence people.
If you were Amazon, Facebook or a dating app, what experiments would you want to conduct? How do ideas propagate in a network? How are identity, reputation and trust created online? How has the notion of who we are, our identity, changed as social technology drives the transition from conspicuous consumption to ubiquitous communication? And in what ways has this changed our ideas and ideals of friendship and relationships?
Simon Zhang will join us on and discuss some of the data, algorithms and products of the data refinery LinkedIn.

What is a Graph?

A graph is a structural object found primarily in mathematics, which illustrates objects and the links between them.

external image 6n-graf.svg


Vertices/Nodes: mathematical abstractions of the objects being represented on the graph; and usually represented as circles
Edges/Connections: the links that connect vertices. Typically represented as lines or curves between vertices.


Directed vs Undirected: the connections in a graph can have directionality, indicating that a relationship works FROM x TO y, but not necessarily in the other way around.
Weighted vs Non-Weighted: edges of the graph can have values associated with them. This indicates varying “weights” of each edge, with various applications.


For the world wide web, which is a graph of internet resources and protocols to interact with that graph
Machines can establish networks, which also rely on graph theories; particularly useful for scenarios like the internet of things, or even wireless printers

The Social Graph

Historical Perspective

Examples in history
  • Phone/call tree - Created a forced social graph to propagate a message; simulated virality
  • Party lines - Established a social graph with phones as nodes, and communication between them as edges; not an accurate representation of the organic social graph
  • LAN Parties - Use of a network graph for social purposes
  • Xanga - Early social media blogging site, which created a unidirectional social graph
  • IRC - Creates a central hub for social interact; not a stellar use of the social graph, but can be represented with one
  • Telegrams - Created a clear geographic social graph through the transfer of messages across long distances; note the presence of an addition to the graph, however, with the presence of human actors interpreting, transferring, and delivering these messages.
  • Circuit switched telephones - Establishes a social graph that is loosely represented by the wires on the telephone circuit
  • Organizational charts - Shows a clear social graph of members of a company, student group, or other organization of people. Helps show interactions as they relate to the organization, but doesn’t necessarily represent the true social interactions within the organizations (e.g., the HR person who knows everyone)
  • Pony Express - Viewing the mapped routes of the pony express would show a geographical representation of the social graph.
  • Crowdsourcing - Sometimes considered a social network, but there is controversy on this statement (see link.)
  • What other examples of representations and models come to your mind?

Ways of Looking at the Social Graph

Observing the social graph

From a global, omniscient view, there is one web of connectedness that is purpose-agnostic. It simply shows how all people are connected. This kind of graph is excellent for analysis,

The potentiality of the social graph

The social graph has enormous potential connections - every person on the planet could potentially have a connection to every other person on the planet. This graph doesn't necessarily exist yet, but each of those connections is on the cusp of existing. This is particularly salient in some social apps like Fling, a message bombing application.

Clustering in the social graph

One of the most important features of the social graph is that it can actually be looked at. Analysis of clustering is a handy way for outsiders to look at the social graph. Clustering can be hierarchical.

Experiencing and creating the social graph

More personal or purpose-specific. Some examples of this experiential graph are:

Professional life
A person might want to keep track of their former coworkers, or build new connections to launch a new career path.

external image movie_seating.png

Social circles
A person might want to separate their friends into groups based on real life social groups or based on interests. We generate content and tailor interactions differently when communicating with coworkers, people who frequent shows in a particular genre, family, closest friends, people we may want to date

There are people on the social graph that have negative interactions, e.g. on social media like Twitter or sites like Reddit. This can also be seen in the phenomenon of subtweeting, or indirectly mentioning somebody in a tweet and even trolling, the phenomenon of aggravating and baiting members of the social graph into arguments.

Celebrity and Fans
While celebrity life has always been transparent, the social graph allows fans to feel more connected to stars. This is due to the ability to clearly establish their commitment to the star through acts like following, liking, and reposting the stars content, creating a unidirectional relationship on the social graph. This is also seen in the establishment of “fandoms” (a play on the idea of a fan and the suffix ‘dom’) which create a social graphs/communities around even obscure or niche media, like British Television in American society. Especially prevelant on Tumblr, with fandoms like “superwholock”, a portmanteau of Supernatural, Dr. Who, and Sherlock, three popular shows. The social graph also allows for content creators to look into fanart and desires of their audience, and modify content accordingly. The openness of the social graph can actually change story lines - for example, Sterek from MTV's Teen Wolf became canon.

Example platforms

(imperfect, partial – gives us really nice partial representations)
external image linkedin_logo.jpg
**LinkedIn** models a social graph that represents a person’s professional connections. LinkedIn serves as an excellent way for people to model the professional relationships that they cultivate, whether those are developed with friends, recruiters, or coworkers. Furthermore, the LinkedIn graph draws more distinct connections between people, noting what aspects of their skills, resumes and, perhaps most notably, their social networks, match up.

external image Facebook_logo_(square).png
**Facebook** prioritizes mutual friendship and acquaintanceship. Facebook highlights friend to friend connections and creating an atmosphere which, as per its missions, values openness and connectedness. The social graph of Facebook doesn’t leverage a single aspect of a person’s social circle as a core asset, though; while the social graph of Facebook might represent a person’s social life well, it doesn’t necessarily model the individual spheres of ones life as accurately; ascertaining things like professional life or set of elementary school friends would require analysis of the connection, rather than being clear in the connection itself. Note that some relationships can be unidirectional; people with pages have followers, usually due to some degree of celebrity or influence.

external image Vector-WeChat-iOS-App-Icon.jpg
**WeChat** leverages a person’s social graph in order to facilitate communication with it; while WeChat leverages the social graph found in a person’s contact book, or even attempts to establish new connections with nearby people. WeChat is particularly handy for showing that a single node has a unidirectional connection to other people. Jameson Hsu, the founder of WeChat, came to speak in the Social Data Revolution course, and his video can provide more information.

Uses & Users of This New Social Graph

Automated connections

The social graph leverages algorithms to make recommendations for nodes within the graph; based on information of a single node, one can make certain assumptions about other connections the node might be able to make.

Gives us a representation

The social graph previously relied on vague physical models, like a Rolodex, an address book, or perhaps even a business card holder. While these objects aren’t obsolete (link: wechat, iPhone contacts), social networking platforms provide richer, dynamic information about the connections in a person’s social graph. People are able to quickly add, remove, and update connections that provide a surprising information (e.g., coworker on LinkedIn, family member on Facebook, close friends on Facebook, groups on Twitter.)

Ease of establishment and maintenance (easy to follow up once you meet)

The social graph of @socialdata allows people to opt into a broader system; when a person joins LinkedIn, they become a part of a graph of over 250 million users. Thus, users can easily create a digital representation of their connection and access information about a person’s entire work history, for example, through that connection. Furthermore, users can take advantage of the network to communicate and maintain the connection with one-on-one notes, or posting “statuses” that are available to people within their network and create a sense of presence.

Allows different groups to leverage/build products in ways that were never available

Understanding a demographic used to require extensive testing in imperfect lab conditions. The social graph creates an opportunity to glean insights from sources like the Twitter firehose, and developers can use these conversations to generate visualizations and products that integrate the social graph into their fundamental workings (Facebook/Twitter login). Additionally, following the conversation can provide insights for innovation

external image heatmap.png

Demographics in advertising and marketing

Having representations of the social graph makes it much easier for advertisers to identify “tastemakers” or opinion leaders. Furthermore, we can identify a person’s interests, and the interests of those in their social network in order to make better decisions about our target demographic. Demographics no longer rely solely on metrics like age, ethnicity or gender. Instead, marketers might look for qualitative, but easily accessible aspects of a person. For example, Home Depot might be interested in a person’s predisposition for DIY projects.

Healthcare applications

More data means more accurate modeling and prediction!

The CDC has tracked flu outbreaks using Twitter posts (analyzing data on the social graph)

The CDC also uses the social graph to push information out to people, allowing followers to quickly transfer this information to their networks

The Internet allows people to more easily build social networks consisting of others with similar conditions/diagnoses (or support groups for those with interests in certain medical diagnoses, as in the case of a parent of a child with a certain condition).

Virality of things

Track why something is popular/how it became popular (trends)

Researchers can conduct analysis of the conversation to open the way for better understanding of human behavior, especially regarding media consumption. The social graph can be used to easily track something like the ALS Ice Bucket Challenge which propagated across the social graph, but might have otherwise been slowed or never even conceived of.

LinkedIn: a case study

external image adrianland-linkedin-social-graph.jpg


“Our mission is simple: connect the world's professionals to make them more productive and successful. When you join LinkedIn, you get access to people, jobs, news, updates, and insights that help you be great at what you do.”

Screen Shot 2014-10-12 at 7.42.23 PM.png


  • Job seekers
  • Recruiters and/or companies
  • The company itself


  • Talent Solutions
  • Marketing Solutinos
  • Sales Solutions
  • Premium Subscriptions

Business Model

Leverage the technology platform to...
  • Collect a critical mass of data
  • Create relevant and valuable products and services
  • Facilitate member growth and engagement
...Thereby feeding back into the critical mass of data

The power of big data

  • Talent flows uncover trends about companies
  • We can now better target what people want and deliver information to only the people who want it. This provides value and supports user retention
  • Social data is a new source far more vast than CRM, ERP, or even Web data
  • The social graph allows for making and inferring connections in ways never before possible. We can see what information is being shared and track how it is being shared
  • Enables “data refineries” like LinkedIn
  • Connections are power
  • Big data allows for visualizations that allow us to think about people and connections in ways we never have before
Screen Shot 2014-10-12 at 7.42.48 PM.png

How they do it

Transforming the pyramid to the diamond!

The pyramid is extremely large and slow. Employees used to spend too much time (95%) working on the larger base layer, on operational types of tasks. This meant that employees averaged solving 2 real problems each year. By using technology to shrink the bottom layers of the pyramid (to approach a diamond shape), they’re allowing people to spend more time (and the company to spend more money) working on the upper layers, where the real value is created for users and customers.

Potential Negative Implications

Evil Third Party Graph Analysis

Vectors of Identity

It is more difficult to keep our roles/facets (?) separate in this new world. We used to have natural physical boundaries between our social sub-graphs: we would see our coworkers, friends, and family in discrete locations unless we elected to have them cross the boundaries into another one of our graphs. Rarely did these groups overlap when we didn’t intend them to. Now it is quite difficult to hide associations, particularly when they can be mined and inferred online. How do we mediate those selves when people can freely access each one through the social graph?


Actions that used to be wholly private, or intended for a small number of other eyes (think keeping a password protected diary online, or sending an email intended for one other person) are now open to companies, governments, and hackers. Previously if we began writing a personal letter and we forgot to hide it in a drawer when company came over, someone might read it if they saw it, but they wouldn’t automatically be in possession of a copy (unless they took a picture). Now if we fail to secure our writing or photos online, they’re available to a vastly larger audience, and each of those people or machines can easily make a copy and propagate it out to an even larger audience. 

Facebook's tips on privacy.

The Difficulty of Ctrl+Z

Building on the point above, it’s very hard to correct mistakes on the web due to the speed with which copies are made and distributed. How does one control their information as it gets pushed through the social graph.

Differing Levels of Technological Literacy; The Balance of Power

Users of technology don’t necessarily understand what’s possible with technology, but companies do. This shifts the balance of power away from consumers and toward industry and governments who have a vested interest in all the information but consumers/citizens may just be trying to communicate with their loved ones with convenient tools that now exist, unaware that they’ve become commodities.

Humans Are Complex So Our Data Models Must Be; False Positives

New technologies can be developed with the intent of helping people or solving age old problems that might disproportionally impact certain groups in an unfair manner, or which might result in false positives because the model isn’t complex enough.

Example: It has been suggested that a solution to the problem of how best to model a credit score is to base it on who you’re connected to on a social network. In some cases this might give highly accurate results, but that won't always be the case. Consider someone who is highly responsible but be from a low income upbringing and thus connected to many people who have a poor repayment history. Likewise someone may be the director of a non-profit that helps people to overcome debt problems… if you’ve friended the people you’re helping on a social network, they’re going to bring your score down.

Creepy Insights

Do major players in the social graph have unique insights on consumer or employee actions? Can these insights on social movements be used to quell movements of free speech, or take unfair actions that affect the natural movements of social constructions like economies?

We're Trusting the Holders of Power to Make the Right Call

The social graph and this new pool of data allows for some dangerous inferences that could impact the lives of individuals or even the world financial market (LinkedIn has a strong mission and a policy of suppressing information when it becomes too sensitive, but another company might not)

The Future of the Social Graph

How might we use the social graph to better humanity in new ways?

What is the most evil use of the social graph that you can think of? How can we "pull back" and make that evil use good, feasible, or meaningful?

How do we mediate the good and the bad applications of the social graph?

Timeline Oct 7, 2014 (Part 2)

5:15 Social Graph: Overview, Networks, Properties, Applications
5:25 Brief discussion of HW1
5:30 Simon Zhang, LinkedIn
  • Problem
  • Hypothesis
  • Data
  • Model
  • Evaluation (of the model)
  • Learnings
5:50 Student breakout (Simon to pose a question, students to discuss with their neighbors)
6:00 Discussion Simon / Andreas
  • What is hard?
  • What are the trade-offs?
  • What were some surprises?
  • What are the bottlenecks (data incomplete and dirty, too few training data, talent hard to find, regulation, partnership deals, fraud, incentive design)
6:10 Q&A
6:15 Summary
6:25 HW2 / Logistics / Housekeeping
6:30 END

This page created by: (names / emails of students) Rena Coen (rena@ischool), Ricky Holtz (ricky.holtz@ischool)