Introduction

Background and Motivation

We choose loan data because both of us are interested in working in FinTech industry. Learning more about what loan data looks like and how to get insights from loan data can prepare us to answer possible interview questions and help us understand financial data better.

Project Objectives

Loan risk is always a topic that financial industry cares about and a field where state-of-the-art technology can be applied to innovate new approaches to help the company get more control.

We would like to make the data visualization with the techniques learned from this class to demonstrate the loan data in an inspiring way and thus communicate with non-technical people more efficiently.

Data

We collected the data from Kaggle Datasets, as it has cleaned data and thus easier for building visualizations. The data is Lending Club Loan Data and can be found at here.

This dataset consists of 75 variables, including numerical variables and many categorical variables with a range of unique cardinality from 3 to 30. It makes our work easier because the data is rich and we won’t be limited by the data types of those columns.

Data Processing

An apparent problem is there are many null values for a few columns. We might consider dropping them as due to the specialty of this topic, it is totally reasonable for those missing values to exist. Because there are 75 variables, dropping some will not affect the data diversity.

Meanwhile, we will need to parse the column “issue date” into “year” and “month” date format, because time is a very important factor for this dataset and we would like our visualization show how the loan conditions being affected by time.

Loan Amount by Purpose over Time

update_dropdown

Overview

This plot reflects the trend of loan amount by purpose over time.

Clicking the reset option in the dropdown menu, we see the overall trend of all purposes. It is very obvious that the loan amount of debt consolidation is always the main purpose of applying for loan. This finding is logical since from the description column of the dataset, we know that many people borrow money from other people or use their credit cards to satisfy their other purposes such as medical, home improvement first, and then apply for loans to consolidate the existing debts.

If we hover across the histogram, almost all kinds of loans increase with year, except for wedding and education. The loan amounts for these two purposes are also the lowest among all the purposes. This understandable because it’s natural for category with less loan amount to be more unstable comparing with category with more loan amount.

If we select one year in the dropdown menu, our plot will only display data for that year so that you can have a better sense of the loan amount proportion of different purposes. The top 3 purposes are debt consolidation, credit card, and home improvement.

Loan Amount by States

state-loanamnt

Overview

This plot shows the distribution of loan amount over states.

The deeper the color of the state, the more loans that state borrowed. Choosing different years from the dropdown menu, we can see that California, New York, and Texas are always the states that borrow the most loans. There are some states that are blank, which does not necessarily mean that they are rich or do not borrow loans, it can simply be that Lending club is not a popular lending company for those regions.

Loan Amount vs Annual Income by Grade

gapminder_example1524722743.63

Overview

Loans issued tend to be stable. In 2007: All the loans were issued with an average amount below $15k, where average income of debtors range from $20k to $180k. Over time, the range of loan amount gathers together towards $13k and $25k, and the range of debtors’ average income converges to 65k and 100k.

From the animation, we can see that this trend is pretty obvious. The reason is highly likely to be the change of Lending Club’s loan issuing strategy. At the beginning of their business, it was reasonable to take some risk to issue the loans to people with different annual income, while remain a low issuing amount to avoid too much loss.

With time, after they discover more patterns, and being more accurate in risk controlling, they increased the loan amount gradually, and concentrate the customer group to those who have annual income within a smaller range.

Another phenomenon is that Lending Club issue loans with amounts that are relatively a quarter of debtors’ annual incomes.

Take a closer look

By looking at particular 1 grade at a time, we noticed that with the lower risk level (A-D) of loan grade, the issuing strategy is more cautious, basically there is not much huge movement with the bubbles, while higher risk level loans (such as F and G) changes massively.

Grade G loans were issued aggressively during 2011-2013, but 2012 was a year that the company made a pause and went back to a conservative plan. In 2015, we saw that all grades of loans were together within such grids.

Main Grade for Loans

news-source

Overview

The most important numerical features in this data, loan amount and the count of loan issued, are sometimes hard to visualize to show insights. Usually it is a good idea to scale them into 100%. In this plot, each year’s total number of loan issued represents 100%, the quantity of loans for each grade has their percentage out of it.

Even though it is hard to contribute all changes to a specific category, such as whether it is affected by the country’s economy, the living quality of people, or the issuing policy change of Lending Club, it is still useful to take a look at the relative trend of each grade of loans over time.

Remember that from the last plot, we saw that F and G grade loans were issued with aggressive strategy, but from this plot, it shows that the number of its issuance were very stable.

The company might consider to increase its issuance by trying different kind of strategies, which was reflected from the last plot, but it doesn’t seem to work well. The percent of number of issuance of lower level loans do not seem to be surged.