Case study

Big Data ETL for Real Estate Investment Decisions

Apache Spark's big data processing speeded up investment decision-making and provided scalable infrastructure for growing data volumes and machine learning solutions

Big Data Etl Real Estate Thumb


TMC America Group
United States

Project Duration

4 months
3 people

Client Challenge

The client, TMC America Group, is a private equity real estate firm who is active in the provision of financing for a spectrum of property-related projects. TMC Group need to collect, process and analyse significant amounts of data gathered from the real estate market. The existing solution took days rather than hours to handle the big data processing operations and the client searched for a way to optimize and speed up the processes involved. More specifically, the challenge to meet was related to the complex geo-location nature of the data processed.

The solution to be implemented was meant not only to meet the current client’s needs but also prepare the system to handle much bigger data volumes when the operations would scale in the future. With the new solution in place, the client’s market analysts were supposed to be able to access the processed data much more efficiently and thus make more timely investment decision.

Service Process

Service Process

We implemented an Apache Spark based ETL engine for the analysis of the real estate market data at scale. The solution pre-processes the client’s large body of statistical data, which enables the data processing operations to be executed several times faster than in the original system the client strove to optimize.

The system included a sophisticated GIS module which allowed to solve issues by efficient and scalable processing of large geospecial data sets. Moreover, the big-data infrastructure we implemented enables the client to scale the processes involved for much bigger data volumes to be managed in the future. Finally, the back-end infrastructure was architectured to prepare the system for the implementation of machine learning solutions in the coming project milestones.

Technically, the data is retrieved from the client’s data source and the results are recorded into a data store. The solution we designed and implemented did not require to introduce any changes in the existing client’s system; the system delivered operates as an independent application which is integrated with the existing client’s system via an API interface.

Project Results

The time needed by the client’s market analysts to access processed data and make investment decisions has been significantly reduced. The processes have been streamlined and the staff are now able to work much more efficiently.


  • back-end infrastructure for processing data at scale
  • API integration with the existing system
  • architecture for machine learning solutions


  • significantly reduced data processing time
  • more efficient investment decision making
  • readiness to scale data volumes processed and implement machine learning solutions