Case study

AWS Redshift based OLTP to OLAP synchronization engine

Building and deploying a Java-based engine to synchronize data from a relational OLTP database into a scalable AWS Redshift OLAP warehouse for more efficient analytical queries processing

Aws Redshift Based Oltp To Olap Synchronization Engine

Client

Acrolinx GmbH
Germany

Project Duration

2 months
1 person

Client Challenge

Acrolinx came to life at the German Research Center for Artificial Intelligence, and has been developed by the world's leading AI experts in Natural Language Processing. The company provides an AI-powered platform for enterprise content governance. Their customers use the solution to deliver consistent strategy-aligned content at scale, across multiple teams, projects, and locations. Following a three-stage process, the customers -

  1. define specific content guidelines to fit their goals by leveraging existing style and terminology guidelines and by running existing content through the Acrolinx engine and then adjusting the initially generated guidelines;

  2. create content aligned with the guidelines; Acrolinx uses AI technology that enables content contributors to shorten their editorial process with automation - content contributors can create or optimize their existing content with a Sidebar offering real-time feedback and suggestions along which to fix issues and improve the text;

  3. track and analyze the quality and performance of their content for optimization.

The platform enables customers to create engaging, readable, consistent, and compliant content in an efficient and cost-effective way at scale. You can choose to watch this explainer video for the Acrolinx solution overview or this demo video for some more detailed product introduction. Some brands using the Acrolinx product are: Google, Amazon, Facebook, Microsoft, Sony, Volvo, Scania, Boeing, Siemens. Acrolinx has an OLTP relational database used by their transactional system. The relational database schema wasn’t designed for the purpose of running complex analytical queries against it. Acrolinx needed to mirror their existing OLTP into a new OLAP warehouse store so they could run analytical queries against their data much more efficiently.

Service Process

Service Process

The solution built leverages AWS cloud services, such as AWS S3 for temporary data ingestion storage and the AWS IAM authentication service to securely manage access to AWS services and resources.

We designed and implemented a Java-based application that periodically synchronizes data from the relational PostgreSQL OLTP database into an AWS Redshift OLAP warehouse. The solution built leverages AWS cloud services, such as AWS S3 for temporary data ingestion storage and the AWS IAM authentication service to securely manage access to AWS services and resources. It’s the latter service that efficiently synchronizes data originating in the Acrolinx transactional system into a highly scalable AWS Redshift warehouse. The AWS Redshift Query Editor was in turn used to support the process of running interactive queries against the migrated data in the OLAP warehouse.

Project Results

With the new architectural set-up the analytical queries are run against Redshift instead of PostgreSQL, which reduces load on the transactional database. With the adopted approach, there was no need to refactor the existing transactional database schema for analytical queries purposes. The implemented process is much more efficient.

Deliverables

  • Java / Spring Boot application synchronizing data from PostgreSQL OLTP into AWS Redshift OLAP
  • AWS S3 for storage and AWS IAM for secure access management
  • AWS Redshift Query Editor for interactive queries processing in a scalable AWS Redshift OLAP warehouse
  • Docker environment for application deployment

Benefits

  • greater efficiency in analytical queries processing
  • reduced load on the transactional PostgreSQL database
  • no need to refactor existing transactional database schema for analytical queries purposes
  • clean architectural separation between transactional and analytical databases