Data Warehousing and Customer Segmentation Modeler


A large financial institution needed help to design, develop and support their Extraction, Transformation and Loading (ETL) for large Data warehouse and Segmentation modeler for their BI analysis as a part of their Marketing analysis & Research

The main goal of the system is to aggregate the data from various sources, filter, validate and load the Petabyte size of data in common source so that it can be analyzed efficiently.

Once the data is present in the common source various algorithms can be executed on the structured data for customer segmentation and analysis.

Business Need:

  • Consolidate the ETL process.
  • Optimize the extraction and data loading process.
  • Build the custom segmentation modeler and reporting tools.

The Challenges

The biggest challenge was to provide scalable architecture for consolidating huge amounts of data and perform the quick analytics tool for the massive data.


Our Solution

We provided consulting services to the client data warehousing.

We reviewed the data source systems, and data structure needs. Based on the client's requirement, we developed ETL product using Hadoop Map Reduce for loading the data. Setup and configured Hadoop cluester for devlopement and production.We also designed custom segmentation modeler using Java swing as front end and apache hive as back end.

We also inplemented the notification and alerting system for ETL and reporting tool which enabled better execution of hive jobs.

Implementation Process

We followed an iterative phased approach to implement the solution that included the following phases:

  • Business requirements analysis.
  • Architecture design
  • ETL Development (Extract, Transform and Load).
  • Designed the segmentation modeler


The proposed system over the legacy system improved the performace of overall system performace as well as revenue generation by 18% monthly.

Some of our clients