Fraud Detection Platform
SUMMARY
- Client
- Česká spořitelna
- Tech stack
- SparkSQL, Apache Spark, Hadoop, Python
Fraud erodes margins and trust—speed of detection decides the loss. For Česká spořitelna (Erste Group), we built a high-speed analytics platform that computes anti-fraud predictors across up to 1.5 billion transactions per day within a tight processing window. Fully integrated with bank systems, it lets in-house teams design and refine predictors, scales with demand, and delivers timely signals to cut risk and protect customers.
Results
Project Background
Like banks, Česká spořitelna, the Czech arm of the Erste Group, needs to monitor transactions to confirm whether they appear normal or suspicious. This process uses statistical data – predictors – to automatically process transactional data and flag up suspicious transactions.
The bank’s solution, based on a traditional relational database, was not fit for purpose. They could not process the transaction history within the limited computational window each night.
Quick fraud detection is essential for minimising losses, so the team at Česká spořitelna were keen to implement a solution. They were aware that Apache Spark implementation on the Hadoop cluster was one potential approach, but they did not have the in-house expertise to execute this solution.
Business Needs
The solution needed to meet the following specifications:
- Have the ability to perform high-speed computations of predictors within a limited time window
- Allow in-house departments to design and adjust the computed predictors
- Easy integration with the surrounding banking systems
- Scalability for future extensions and customization
Solution & Results
Profinit built a custom-made big data computational platform, based on the Hadoop, Apache Spark and Python technological stack. We worked with the bank’s in-house data lake environment department to design proper data architecture, which included the creation of a new dedicated data mart. This custom-built solution is the first of its kind within the client’s infrastructure. It’s scalable and integrates with all required systems.
Incorporating big data technologies
This unique architecture was built to perfectly match the requirements of the in-house analytics department. The core of the application is based on big data technologies, but all computations are defined using SparkSQL. This allows the in-house credit risk, fraud detection and business intelligence departments to fully understand the computational processes, and it allows them to design, implement and adjust any new predictors.
HEAR FROM THE CLIENT
The whole platform is integrated with all required systems. It is the unique solution, the first Apache Spark implementation for production data computation in our Data Lake distributed environment.