Big Data Technology Stack

SUMMARY

Client
Raiffeisenbank
Tech stack
OpenAI API, Profinit DATA_FRAME Automation tool, Microsoft Active Directory, Cloudera, Apache Spark, Hadoop, MS Azure

Our long term client Raiffeisenbank CZ needed a complex delivery of Hadoop platform to perform analytical business use cases across large amounts of transactional data. We implemented an end-to-end Hadoop solution that enables the bank to process up to billions of records daily.

Graphic promoting case study Big Data Technology Stack

Results

Billions
of bank records processed daily
95% code
autogenerated using Profinit DATA_FRAME tool
Security compliant
robust platform

Our long term client Raiffeisenbank CZ was looking for the complex delivery of a big data Hadoop platform to enable the bank to perform analytical business use cases across large amounts of transactional data. Computations of such massive volume are not possible to achieve by standard DWH capacity. For this reason, a brand new parallel processing big data platform had to be built from scratch. Tools for solving business cases with data science – and their implementation into the client’s environment – were needed as a part of the solution.

Profinit came up with analytical use cases and began designing and implementing a complete, end-to-end analytical solution. Together with bank data in-house specialists, the Profinit team selected the suitable hardware, sizing and the right variant of Hadoop distribution. The Profinit DATA_FRAME Automation tool was used for the fast design of the architecture and implementation.  

The major challenge was to create a blueprint solution, as this was the client’s very first implementation of a big data platform. It was important to solve all architectural and security compliance requirements. Reliable data anonymisation was essential. The client also requested single sign-on authentication and related integration with Active Directory. The whole platform needed to work independently outside the internet, which resulted in offline storage for OS, Hadoop, and data science tools packages.

The solution needed to meet the following specifications:

  • Select and build a highly efficient big data processing platform
  • Set up tools for solving business cases with data science
  • Meet strict requirements on system security and data anonymisation
  • Implement a single sign-on feature and the integration with IBM Cognos and MS Active Directory

From the very beginning, we approached the task intending to deliver an end-to-end solution. After defining business data analytics use cases, we focused on choosing and designing the most suitable platform to achieve the business goals. In the initial analytical phase, we collected detailed requirements and specifications. According to the analysis, we chose the most suitable Hadoop distribution together with optimal sizing and hardware configuration.

Optimal spec and full security compliance

We optimised CPU, memory, and storage sizing to achieve balanced performance and effectiveness. The architecture fulfilled all requirements including security compliance, single sign-on access, and integrations. After installation, we implemented two data models for defined business use cases. Thanks to our DATA_FRAME Automation tool, approximately 95% of code was generated automatically, enabling the bank to process billions of records daily.