CASE STUDY
Big data processing platform
Explore our client’s journey from a startup to a big data powerhouse with $150M+ investments raised.
Tech stack/tools we used: Java 13, Sprint Boot, Docker, GCP, CassandraDB, Gradle, JUnit, Mockito, Jmeter, Hibernate, Uber Cadence, Apache Airflow, Python
Client
Big data processing startup
A group of entrepreneurs came up with an idea for a platform that would help businesses make better data-driven decisions. While most businesses hire data analysts for semi-manual data processing, our client wanted to build a platform that retrieved, transformed, and delivered data in the desired format to the client automatically. Ultimately, the goal was to enable users to make informed decisions and drive their business’ success.
Problem
Small startup struggles with delivering large datasets to clients
Being a small startup in the world of big data posed a significant challenge. Our client faced the dilemma between allocating their resources to immediate manual dataset processing and their goal of data processing automation.
Solution
Data processing platform that can take on clients of any size
To help the startup generate revenue and further investments even before data processing automation, Syberry deployed a team of data analysts whose task was to transform raw data into categorized, user-friendly formats before the deadline. The team used Airflow and Python to turn jumbled numbers into a clear dataset in under eight business hours.
The scripts the team wrote over the years proved useful when we automated data processing. We utilized the team's experience to build a platform that automatically retrieves data from the source, validates the data, studies patterns within the dataset, converts it into the format requested by the client, and delivers the data. The platform requires manual intervention only if it encounters a non-standard situation such as an unusual data entry.
After we released the MVP of the platform, the startup was able to attract more clients, including the biggest players on the market. In the process, they transformed from a young startup to a maturing enterprise.
Result
Growing enterprise that redefined businesses’ use of big data
- $150M+
investments raised
- $100-500M
post-money valuation
Challenge #1
Swift data processing despite dataset sizes
To effectively process datasets as big as 1 GB .TXT files, our team devised workarounds to speed up work and reduce the need for storage.
Challenge #2
Transition to Cadence from Airflow
Initially, we chose Airflow as the suitable workflow orchestration system for the system. After Cadence appeared on the market, we realized that it would advance the system’s functionality when scaling up. Since Airflow is based on Python, and Cadence uses Java, we undertook the task of rewriting the entire system to make it compatible with Cadence.
Key features
Adaptive extraction
We designed and implemented an extraction platform that can handle non-standard situations when collecting data. We engineered the platform so that it collects data at specific intervals, including edge cases like delayed data arrival. This ensures comprehensive data capture, leaving no valuable insights behind.
API for cross-team communication
To ensure that all teams’ work will eventually integrate into a cohesive system, we created a centralized metadata platform. This application programming interface (API) facilitated efficient communication and data exchange between the various sub-teams to reduce potential integration errors.
Workflow gatekeeper to boost performance
Because the platform was not able to process all incoming datasets at once, we developed a workflow gatekeeper. This part of the system created a prioritized queue for workflows. This allowed the client to optimize resource allocation, resulting in faster data processing and reduced bottlenecks.
Cohesive UI experience
With several sub-teams working on the project at once, it was crucial to unify the user experience and implement a cohesive design. This resulted in smoother UI/UX and minimized learning curve for users.
YOU MIGHT ALSO BE INTERESTED IN...
- Web
Game-changing financial platform
This data-driven investment management firm built a product to change the game in the industry.
KotlinGroovyPythonMicronautMaterial UIReactJest16 more - Web
Payment processing system for a B2B FinTech startup
This entrepreneur created a disruptive payment platform for small businesses.
ReactJava Spring BootDockerAzure
Industries we work in
Succeed faster with Syberry.
Get in touch to discuss your vision—for your software and your business.