Credit card clustering is the task of grouping credit card holders based on their buying habits, credit limits, and many other financial factors. If you want to learn how to use clustering analysis to group credit card holders, this article is for you. In this article, I will take you through the task of credit card clustering with Machine Learning using Python.
Credit card clustering means grouping credit card holders based on their buying habits, credit limits, and many more financial factors. It is also known as credit card segmentation. Such clustering analysis helps businesses find their potential customers and many more marketing strategies. For the task of credit card clustering with Machine Learning, we need to have a dataset based on buying history of credit card holders. I found an ideal dataset for this task that contains all the necessary features that are enough to understand credit card cluster analysis. You can download the dataset from here. In the section below, I will take you through the task of credit card clustering analysis with Machine Learning using the Python programming language.
Let’s start the task of credit card cluster analysis by importing the necessary Python libraries and the dataset: 1
import pandas as pd
import numpy as np
from sklearn import cluster
data = pd.read_csv("CC GENERAL.csv")
print(data.head())
CUST_ID BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES \ 0 C10001 40.900749 0.818182 95.40 0.00 1 C10002 3202.467416 0.909091 0.00 0.00 2 C10003 2495.148862 1.000000 773.17 773.17 3 C10004 1666.670542 0.636364 1499.00 1499.00 4 C10005 817.714335 1.000000 16.00 16.00 INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY \ 0 95.4 0.000000 0.166667 1 0.0 6442.945483 0.000000 2 0.0 0.000000 1.000000 3 0.0 205.788017 0.083333 4 0.0 0.000000 0.083333 ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY \ 0 0.000000 0.083333 1 0.000000 0.000000 2 1.000000 0.000000 3 0.083333 0.000000 4 0.083333 0.000000 CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT \ 0 0.000000 0 2 1000.0 1 0.250000 4 0 7000.0 2 0.000000 0 12 7500.0 3 0.083333 1 1 7500.0 4 0.000000 0 1 1200.0 PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE 0 201.802084 139.509787 0.000000 12 1 4103.032597 1072.340217 0.222222 12 2 622.066742 627.284787 0.000000 12 3 0.000000 NaN 0.000000 12 4 678.334763 244.791237 0.000000 12
Before moving forward, let’s check whether this dataset contains any null values or not: 1
data.isnull().sum()
CUST_ID 0 BALANCE 0 BALANCE_FREQUENCY 0 PURCHASES 0 ONEOFF_PURCHASES 0 INSTALLMENTS_PURCHASES 0 CASH_ADVANCE 0 PURCHASES_FREQUENCY 0 ONEOFF_PURCHASES_FREQUENCY 0 PURCHASES_INSTALLMENTS_FREQUENCY 0 CASH_ADVANCE_FREQUENCY 0 CASH_ADVANCE_TRX 0 PURCHASES_TRX 0 CREDIT_LIMIT 1 PAYMENTS 0 MINIMUM_PAYMENTS 313 PRC_FULL_PAYMENT 0 TENURE 0 dtype: int64
The dataset has some null values in the minimum payments column. I will drop the rows with null values and move further: 1
data = data.dropna()
There are three features in the dataset which are very valuable for the task of credit card segmentation:
These three features are enough to group credit card holders as they tell us about the buying history, bank balance, and credit limit of the credit card holders. So let’s use these features to create clusters from the dataset: 1
clustering_data = data[["BALANCE", "PURCHASES", "CREDIT_LIMIT"]]
from sklearn.preprocessing import MinMaxScaler
for i in clustering_data.columns:
MinMaxScaler(i)
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit_predict(clustering_data)
data["CREDIT_CARD_SEGMENTS"] = clusters
I have added a new column as “CREDIT_CARD_SEGMENTS”. It contains labels about the group of credit card customers. The groups formed range from 0 to 4. For simplicity, I will transform the names of these clusters: 1
data["CREDIT_CARD_SEGMENTS"] = data["CREDIT_CARD_SEGMENTS"].map({0: "Cluster 1", 1:
"Cluster 2", 2: "Cluster 3", 3: "Cluster 4", 4: "Cluster 5"})
print(data["CREDIT_CARD_SEGMENTS"].head(10))
0 Cluster 1 1 Cluster 3 2 Cluster 3 4 Cluster 1 5 Cluster 1 6 Cluster 5 7 Cluster 1 8 Cluster 3 9 Cluster 5 10 Cluster 1 Name: CREDIT_CARD_SEGMENTS, dtype: object
Now let’s visualize the credit card clusters we found from our cluster analysis: 1
import plotly.graph_objects as go
PLOT = go.Figure()
for i in list(data["CREDIT_CARD_SEGMENTS"].unique()):
PLOT.add_trace(go.Scatter3d(x = data[data["CREDIT_CARD_SEGMENTS"]== i]['BALANCE'], y = data[data["CREDIT_CARD_SEGMENTS"] == i]['PURCHASES'],
z = data[data["CREDIT_CARD_SEGMENTS"] == i]['CREDIT_LIMIT'], mode = 'markers',marker_size = 6, marker_line_width = 1, name = str(i)))
PLOT.update_traces(hovertemplate='BALANCE: %{x} <br>PURCHASES %{y} <br>DCREDIT_LIMIT: %{z}')
PLOT.update_layout(width = 800, height = 800, autosize = True, showlegend = True, scene = dict(xaxis=dict(title = 'BALANCE', titlefont_color = 'black'),yaxis=dict(title = 'PURCHASES', titlefont_color = 'black'),zaxis=dict(title = 'CREDIT_LIMIT', titlefont_color = 'black')), font = dict(family = "Gilroy", color = 'black', size = 12))
So this is how you can perform credit card segmentation with Machine Learning using Python.
Credit card cluster analysis means grouping credit card holders based on their buying habits, credit limits, and many more financial factors. Such clustering analysis helps businesses find their potential customers and many more marketing strategies. I hope you liked this article on credit card segmentation with Machine Learning using Python. Feel free to ask valuable questions in the comments section below.