How to detect 70% of churning bank customers with big data analytics and ML
A bank spends $100 or more on average to attract one customer,
according to Unicom. Retaining a customer base is much cheaper, so companies not only strive to attract a new audience but also to retain the existing one. The main question is how to identify customers who are likely to leave.
The answer lies in customer data, which swirls in vast arrays within every enterprise company's processes. Eastwind analyzed the data of a Russian bank using its own ML models and identified churning customers whose decisions could still be influenced by the right NBO (Next Best Offer).
Key Results
Why to predict churn
Predicting customer churn helps to:
● Reduce the cost of attracting new customers
● Increase the loyalty of the existing base
● Reduce financial risks since income from long-term customers is more predictable
● Increase customer LTV (Life Time Value)
The customer gradually decides to switch to another company: offers from your competitors flicker around them constantly, pushing them to take this step. Big data analytics enables you to identify when doubts first arise in the consumer. It records essential indicators of the customer's behavior that show they intend to leave but have not yet made a final decision.
Project format
In a pilot project with a bank whose customer base is estimated at hundreds of thousands, our client sought to understand how effectively customer churn could be predicted and whether this could be achieved with a limited data set. Eastwind specialists created a predictive model that forecasted churn and compared the analysis results with actual data.
In this project, we used EW DataFlow, a platform that works with big data and AI/ML models. We carried out the project in four stages:
- Setting the task: what kind of customers we are looking for
- Collecting information: what data we are interested in
- Data analysis: what signs we use to determine potential churn
- Summing up: how accurate the prediction is
Learn more about DataFlow
Stage one: setting the task
The client identified three types of churn within their bank:
- Intentional: the customer decides to switch to a competitor
- Rotational: the customer leaves for reasons beyond their control; for example, they move
- Forced: the bank terminates the contract with the customer
Together with the bank’s specialists, we decided to consider only the first two groups as churning customers.
Then, we determined how much time before the churn date could still be effective for customer retention measures. We found that the customer is usually already confident in their decision two months before the expected churn, making it nearly impossible to retain them. Therefore, we analyzed the customer's actions five months before the churn. Three of them — the earlier ones — were included in the machine learning model (highlighted in green in the diagram).
Events that led the customer to change banks occurred during the initial months. We needed to identify what these events were and how to track them in advance
Stage two: collecting information
Once we had determined what exactly constituted churn and when it needed to be predicted, we began collecting information for analysis.
- The bank downloaded customers’ data from its information systems for nine months.
- The data was depersonalized and encrypted to prevent the transfer of personal information.
- The bank transferred data to Eastwind specialists in text files.
- We converted all files to the required format and structured them for further work in EW DataFlow.
We have collected the history of customer activity in an anonymous form:
● General information about customers
● General information about accounts
● Agreements with customers, including tariffs
● Account balances and movement of funds for each day
● Transfers within the bank and to external counterparties
● Transactions with bank cards
● Communications between the bank and customers
● Applications for account closures
Stage three: data analysis
Filtration
After collecting the data, we filtered it, retaining only what was necessary for developing the model. For example, we filtered out unreliable customers whom the bank did not plan to retain. We also excluded those with a transaction history of less than 30 days.
After filtering, several hundred bank customers remained in our database. Based on information about their activities, we began developing an algorithm to train the model.
Preparing the training sample
Eastwind analysts formed a training sample to forecast churn — a group of customers about whom we know whether they have churned. We visualized the activity history of each participant in the sample and compared the behavior graphs of customers who have and have not churned from the bank.
How the transactions change
The number of transactions of a customer planning to change the bank gradually decreased. As a rule, churning customers had no debit turnover for over a month and minimal credit turnover.
Green dots are credit transactions, and orange dots are debit transactions. Here and further on, the upper graph shows the transactions of customers who did not plan to change the bank; the lower one shows the churning customers
How activity decreases
If customers regularly used the bank's services, their activity line on the graph would be far from zero. The less active the customer is, the more likely they were to churn.
Activity became nearly zero for a customer who wanted to change the bank (bottom graph)
How the balance changes
Another marker of churn was that the customer hardly replenished the balance before leaving the bank.
The black line shows the balance values, and the blue line smoothes the black line. For a churning customer, account replenishments became increasingly rare and insignificant
Activity analysis
In addition to studying financial activity, we examined behavior patterns in customer groups: who interacted with whom and how often. To do this, we studied their social graphs.
Blue dots on the graph indicate the bank's customers and red dots indicate competitors' customers. Lines show transactions between them
One indicator of potential churn is the lack of transactions with customers of the same bank. If a person has no interactions within your bank, but there is movement of funds with other counterparties, then the customer is prone to churn. On the graph, this may look like this:
"Our client particularly liked the idea of visualizing customer interactions. According to them, the traditional format lacks clarity, so it is sometimes difficult to notice the obvious. As part of the pilot, we wrote a small program in which they can enter the contract ID and see who it interacts with."
Pavel Oliver, Development Director, Eastwind
Creating an ML model
After forming a test sample, we developed a machine-learning algorithm to predict churn. The ML model analyzed manually labeled customer data and identified patterns in their behavior. The model then searched the database for all existing customers whose behavior matched these patterns. This is how the machine formed a forecast of which of the existing customers were likely to churn.
Stage four: evaluation of the result
We evaluated the results of our predictive model post factum by requesting data from the bank on customers who had left by the target date and comparing it with our forecast.

The bank was satisfied with the prediction result and bought the ML model for further scaling. The bank then needed to develop a set of measures to retain churning customers.
"The accuracy of the model could improve further with additional data, such as information on customer activity in online banking. We would know the emotional coloring of requests, the speed of processing applications, and the results of surveys of churning customers. In that case, we could determine the fact and the hypothetic cause for the churn."
Andrey Pluschenko, Head of Data Analysis Group, Eastwind
What other tasks can EW DataFlow be used for?
Churn prediction is just one of thousands of possible scenarios for using data analytics that the bank could conduct thanks to EW DataFlow. The platform allows you to take a fresh look at marketing:
— Analyze customer activity
— Create a deep profile for each user
— Form target groups as accurately as possible
— Identify potential fraud
— Build NPTB (Next Product to Buy) models
Most importantly, all this can be done by using the company's existing data without a massive staff of specialists in different resources. Data scientists carry out all manipulations with data in EW DataFlow within one system.