CASE STUDY

Big data processing platform

Explore our client’s journey from a startup to a big data powerhouse with $150M+ investments raised.

Tech stack/tools we used: Java 13, Sprint Boot, Docker, GCP, CassandraDB, Gradle, JUnit, Mockito, Jmeter, Hibernate, Uber Cadence, Apache Airflow, Python

Big data processing platform
Our workBig data processing platform

Client

Big data processing startup

A group of entrepreneurs came up with an idea for a platform that would help businesses make better data-driven decisions. While most businesses hire data analysts for semi-manual data processing, our client wanted to build a platform that retrieved, transformed, and delivered data in the desired format to the client automatically. Ultimately, the goal was to enable users to make informed decisions and drive their business’ success.

Problem

Small startup struggles with delivering large datasets to clients

Being a small startup in the world of big data posed a significant challenge. Our client faced the dilemma between allocating their resources to immediate manual dataset processing and their goal of data processing automation.

Solution

Data processing platform that can take on clients of any size

To help the startup generate revenue and further investments even before data processing automation, Syberry deployed a team of data analysts whose task was to transform raw data into categorized, user-friendly formats before the deadline. The team used Airflow and Python to turn jumbled numbers into a clear dataset in under eight business hours.

The scripts the team wrote over the years proved useful when we automated data processing. We utilized the team's experience to build a platform that automatically retrieves data from the source, validates the data, studies patterns within the dataset, converts it into the format requested by the client, and delivers the data. The platform requires manual intervention only if it encounters a non-standard situation such as an unusual data entry.

After we released the MVP of the platform, the startup was able to attract more clients, including the biggest players on the market. In the process, they transformed from a young startup to a maturing enterprise.

Result

Growing enterprise that redefined businesses’ use of big data

Challenge #1

Swift data processing despite dataset sizes

To effectively process datasets as big as 1 GB .TXT files, our team devised workarounds to speed up work and reduce the need for storage.

Challenge #2

Transition to Cadence from Airflow

Initially, we chose Airflow as the suitable workflow orchestration system for the system. After Cadence appeared on the market, we realized that it would advance the system’s functionality when scaling up. Since Airflow is based on Python, and Cadence uses Java, we undertook the task of rewriting the entire system to make it compatible with Cadence.

Key features

  • Adaptive extraction

    We designed and implemented an extraction platform that can handle non-standard situations when collecting data. We engineered the platform so that it collects data at specific intervals, including edge cases like delayed data arrival. This ensures comprehensive data capture, leaving no valuable insights behind.

  • API for cross-team communication

    To ensure that all teams’ work will eventually integrate into a cohesive system, we created a centralized metadata platform. This application programming interface (API) facilitated efficient communication and data exchange between the various sub-teams to reduce potential integration errors.

  • Workflow gatekeeper to boost performance

    Because the platform was not able to process all incoming datasets at once, we developed a workflow gatekeeper. This part of the system created a prioritized queue for workflows. This allowed the client to optimize resource allocation, resulting in faster data processing and reduced bottlenecks.

  • Cohesive UI experience

    With several sub-teams working on the project at once, it was crucial to unify the user experience and implement a cohesive design. This resulted in smoother UI/UX and minimized learning curve for users.

YOU MIGHT ALSO BE INTERESTED IN...

View all case studies

Succeed faster with Syberry.

Get in touch to discuss your vision—for your software and your business.

I am a...
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.