Andreas Weigend | Social Data Revolution | Fall 2014
School of Information | University of California at Berkeley | INFO 290A-03

Audio: weigend_ischool2014_3.mp3
Transcript: weigend_ischool2014_3.docx


When we think of using data to make better business decisions, we think of social networks and the promise of online services customized for every individual. It is easy to overlook many other forms of data that are already being constantly generated through the course of doing business. Consider MasterCard, a company with the data on billions of credit card transactions. How can we apply new techniques and ideas to these existing stockpiles of data to help people make better decisions, whether in combating fraud or determining retail trends? We look into unleashing the hidden potential of existing information silos by transforming them into data warehouses.
Max Levchin will join us in class and discuss how Affirm uses the social graph for loan decisions.

Timeline Oct 7, 2014

3:30 Setting up
3:50 Max Levchin, Affirm
4:20 Student breakout (Max to pose a question, students to discuss with their neighbors)
4:30 Discussion Max / Andreas
4:45 Max’s vision for the future of data and society, fairness, data ownership (what would you do with all of the data)
4:50 Summary
5:00 BREAK

Money, Money, Money

Many companies have been revolutionizing the digital payments industry. These are some of the big players as well as up-and-coming startups in the field today.


Max Levchin, co-founder of Paypal, is now the founder and CEO of Affirm, a company that aims to allow affordable POS (point-of-sale) financing to those who do not have the necessary credit to do so normally. This is seen as a possible alternative to credit cards, although the mechanism is similar. Payments are still split over time, and can be made via debit card, bank account, or check. The lending decision is not made via credit score, but through a detailed mechanism that can involve thousands of personal qualities in determining whether an individual is likely to pay back the loan.

Screenshot 2014-10-10 23.11.21.jpg





(also Bill Me Later)



Lending Landscape


"Debt is a good thing because it allows you to borrow against your future self or to invest in your more intelligent self." - Max Levchin (refer to video)
Underwriting is simply a data problem. If we know everything about a person, then we can figure out whether or not to trust people and figure out how much we should loan them. Credit cards are built fundamentally on our ability to price risk. This risk is assessed by how likely an individual is to pay back a debt, and is spread over all loans in the form of an interest rate.

Annual Percentage Rate (APR)
The APR (annual percentage rate) of a loan is the amount of interest for a loan over a year's time. Although an APR is simple in concept, there is no simple formula to easily calculate it, and people generally don't understand what it means. In this way, the current system is broken. Individuals, even those who plan on paying off the entire loan, often get into contracts that include interest that they are not capable of keeping up with.

FICO Scores
external image ce_scorebreakdown.png
In short, FICO is essentially a bunch of little things and your debt-to-income ratio - how much you have already borrowed and how much you earn to pay that off. This by itself is actually quite effective in determining the financial credibility of an individual. The problem with FICO scores is that they update at quarterly to semiannually basis, which does not reflect the most up-to-date and relevant information on a person.

Identity Resolution and Fraud Mitigation

When someone comes in to take out a loan, the goal is to figure out if they are who they say they are, i.e. identity resolution of a node in a graph. Part two is the fraud determination or fraud prediction problem -- which is an edge social graph problem. If we know all the transactions and friends of a person, it's easy to pin down the truth about them (if they aren't actually located in Berkeley but are from Turkey). As Max Levchin says, this is not a fully solved problem, but is relatively well contained these days. We are generally able to detect and neutralize bad transactions, but there are still many bad people out there trying to fool the system.

Underwriting and Risk Assessment

Underwriting refers to the process that a large financial service provider (bank, insurer, investment house) uses to assess the eligibility of a customer to receive their products (equity capital, insurance, mortgage, or credit).

Once you can resolve the identity of a person, there are three possible outcomes of giving a loan:
  1. Loan is paid back with no problems.
  2. The loanee means to pay back the loan but can't due to irresponsibility or foolishness in assessment of ability to repay it. The accumulated interest from many responsible loanees offset the irresponsibility of each of these people.
  3. Loanee borrows an appropriate amount of money but an unexpected event stops repayment -- bankruptcy declared, death, etc. This is an unpredictable data problem. There is no way to know if or when events like these happen, but assessing the likelihood of these events is necessary in

In order for a lending firm like Affirm to succeed, there must be criteria that are used to decide who to loan to and with what interest. This essentially amounts to a big data problem: given all of the (legally decidable) data that is out in the world today, what are some of the questions that need to be answered in order to best judge risk in giving a loan?
  1. Is the borrower trying to steal money?
    Stolen identities are a good clue that the borrower does not intend to return the money (for obvious reasons). There are hundreds of features that can be used in heuristics to solve this problem -- one example is that those who steal identities are less likely to care how their names are spelled (and those who enter a lower-case name are slightly more likely to not attempt to pay back the loan)
  2. How much money are you willing to risk with someone?
    This is what was originally trying to be solved with the FICO credit scores. However, this model is incredibly simplistic -- there are only tens of factors that are used, and it started with only a handful of them. It mostly revolves around debt-to-income ratio: how much you owe, compared to how much you make in a year. However, FICO updates very infrequently - sometimes only a couple times per year. Now, smarter approximations can attempt to predict a future FICO score, such as when the borrower is soon going to be starting a job (and making money).
  3. How do you predict the likelihood of bad things happening to good people?
    At one extreme, you can record the actions of every person (eg. like with a camera on their forehead) to try to determine risk. However, this is not feasible, but there is still the problem of unbalanced risk. Inevitably, the least risky people will pay for the most risky people, and this serves as a possible moral hazard.

Novel Data Sources

These days, there are many tells of a person's financial responsibility from sources outside of FICO scores. Every aspect of our life, what we do and how we do it, is somehow related to who we are as a financial individual. Through these small observations, we can collect important data on individuals to gauge their trustworthiness.
  • Education
  • Social networking usage
  • Donation tracking
  • Responsibility games
  • Benign self-infected spyware
  • Sporting behavior
  • Video game performance
  • Social FICO Learning Games

Financial Data Revolution

Many companies are revolutionizing the field in their strive for better data and their willingness to navigate obstacles to obtain that data. All financial services companies wind up being built through the fortitude of the people willing to lose tons of money as they learn what kinds of users they have. The goal is to ultimately build systems that can take the identity of a person and output a summary of his or her financial risk. These are some of the companies doing their part in the revolution.


Assess financial responsibility of people at POS terminals to quickly determine loan credibility using many data sources outside of just FICO scores.


Real time sentiment and breaking news analysis, using social data sources. It provides an aggregate of social media data under one API, and was essentially purchased by Twitter in 2014.


A search engine for almost all types of numerical data, including financial and economic data sets. Open data, and free!


Provides a marketplace for users to sell their personal social data. As of February 2014, will pay $8 per month for access to social data, including Twitter, Facebook, and credit card transactions. They will then sell the insights and correlations found by this data to other corporations.
Read more in the data ownership wiki!

Financial Trust and Bitcoin

Bitcoin is a decentralized virtual currency that is based on cryptography.


Unfalsifiable fully distributed ledger. Bitcoin is not really a currency, it's simply a list of records of exchanges of promissory notes measured in Bitcoins. The records are permanent and irreversible by nature of the system -- there will theoretically always be enough people who can validate the ledger in the Bitcoin ecosystem.

Minds of Bitcoin

It is important to note that while the monetary value of Bitcoin is unstable and questionable, there is an intrinsic value created by the intellect spent on the field. Marc Andreessen and Peter Thiel are just some of the brilliant minds that have a very positive outlook on the future of virtual currency and are dedicating many hours in furthering development and adoption.


Whether it be Bitcoin or something else, cryptographic accounting is likely the future. This style of accounting is applicable to any trust-dependent events such as notaries for witness of signature. We are likely to see this methodology extend into other fields in the near future.

MasterCard Lecture

What is big data?

Big data is the aggregation of data created by people, whether voluntary or involuntary. It consists of their actions and their interactions with the people and the world around them.

Big Data for MasterCard

MasterCard has real-time transaction data and history of transactions for 2 billion accounts. They have the billing address of accounts for localization and can see where people spend money and what they spend money on.

Big Data Outcomes
  • attrition
  • best customer
  • next purchase
  • delinquency
  • fraud
  • credit line increase/decrease
  • retention
  • mobile user

Transaction Variables
  • amount/count
  • time of day
  • amount over time
  • amount recurring payments
  • amount online transactions
  • purchase sequence
  • date of first/last transaction
  • transaction patterns

Data Types
  • transaction
  • sku
  • credit bureau
  • social
  • sentiment
  • location
  • demographic

Discovery Driven Planning

DDP is a framework used to define, discuss, and test new business plans.

Follow up ideas:

To what degree do loans vary given the identity of a person?

The decision to give a loan is based on the likelihood of the borrower paying the loan plus some interest back. The difficulty in offering a loan is being able to correctly price risk. In order to do that a number of factors are considered including: age, current debt, earning power, credit history, length of loan, etc... All these factors and many more will go into calculating the risk involved with offering you a loan (Julian Prochaska).

Given a FICO score, how do you determine whether or not a person gets the loan?

Remembering that the FICO score is just a measurement of risk determining whether or not someone gets a loan really comes down to whether the lender feels like it is a good risk. This decision process can be broken into three categories 1) FICO score, 2) how much is the loan for and 3) the risk of the loan itself?

The first category (FICO score) is trying to assess how likely in general is the borrower to pay back money they borrow.
The second category deals with what percentage of the lenders money is the person asking to borrow.
The third category deals with why the person needs the money/the risk involved in how they spend it (is it for a house or to start a risky business that will likely fail), the ability of the person to payback the loan, and assets the person has, and how much money can be recuperated if the borrower defaults on their loan.
Since the lender has a finite amount of money it is then up to the lender to determine how risk adverse they want to be with their money (i.e. do they want to lend their money to anyone or only to people they know will pay them back). (Julian Prochaska)

Is whether you actually get a loan binary based on score?

The short answer is no, but there might be base credit scores you need to meet to be considered for specific loans.
One obvious reason that you know it's not binary and a reason some people try to build a good credit score is that your interest rate per loan goes down as your credit score goes up (i.e. when you borrow money the amount of interest you have to pay for that money over time is lower the better credit score you have). While you might need to meet a base credit score to be considered by certain lenders for a loan the amount you will have to pay in interest can change dramatically as your risk profile changes. (Julian Prochaska)

Alternatives to FICO?

List your ideas...

The FICO score is really ripe for disruption. Max Levchin brought up the idea of harnessing social data as an alternative to FICO in class and I've thought a lot about it since. One idea on how to redesign the FICO score is instead of trying to guess your risk based on a list of facts about you, to instead approximate your risk based on how much the people who actually know you would lend to you. By looking at how much the people who know you trust you with their money you could get a much better approximation of how risky a person they are. On top of that/another way to get that data is to build a decentralized micro loaning platform where friends would loan to friends. It would be required for people to rank their friends relatively to their other friends in how much they would loan them and how trust worthy they think they are. From harnessing that data you could see which people are trust worthy/responsible and which aren't. (Julian Prochaska)

TODO: (details for students to add to this page):

What other possible risk signs can you think of that would possibly indicate that the borrower doesn't intend to return the money?
What types of data can you look at to help determine this future FICO score?
What other problems can you imagine that Affirm needs to solve?

Oct 6 page created by: Matthew Fong (, George Yiu (
Oct 21 part created by: Matthew Fong (, George Yiu (