How Remitano Modernized its Data Platform by Migrating to Lakehouse Architecture on Databricks
Challenges
Remitano is one of the largest peer-to-peer cryptocurrency trading platforms in Vietnam. It has services in many countries, including Australia, Malaysia, and India, where people can buy and sell Bitcoin, Ethereum, Bitcoin Cash, Litecoin, USDT, and Ripple.
As they scaled, Remitano faced challenges with their data infrastructure including:
- Underperforming data systems: The data systems running in AWS Redshift did not meet their intended objectives.
- Technical debt and inefficiencies: Accumulated technical debt within the data models and queries resulted in slow speeds and inefficient cloud resource utilization.
- Pipeline bottlenecks: Their existing data pipelines were experiencing performance issues impacting the timely data availability for reporting.
Searce Solution
In line with Remitano's vision, our team of solvers implemented a transformative approach to propel its data infrastructure into the future. Working with the data team of Remitano, Searce swiftly identified the core challenges and technology stack, formulating solutions that completely transformed:
- Data Profiling: Performed data profiling to identify ways to improve the existing pipelines from Amazon RDS to Databricks.
-
Pipeline design best practices: Implemented pipeline design best practices
to improve data processing SLA performance which included:
- Lakehouse Architecture
- Using uniformly distributed keys to reduce read skew
- Right-sizing of Delta table partition files
- Applying DBT incremental strategies such as insert-overwrite
- System recovery techniques: Investigated the different causes of pipeline failures and implemented system recovery techniques to handle them.
Business Impact
The implemented solutions significantly enhanced the operational efficiency of Remitano's diverse data pipelines, resulting in:
- Overall ETL batch processing time from 8.5 hours to 0.5 hours (~95%)
- Reduced the cost per query from an estimated $63 to $3 (~95%).
- Improved the query processing speed from 70 minutes to 15 minutes (~80%)
Searce completed the Data Pipeline Dev/Test set up according to the design specifications,
demonstrating strong troubleshooting and performance optimization skills, particularly with
Databricks and Data Build Tool platforms.
They displayed strong technical proficiency across various data engineering tools and
frameworks, effective communication, and were approachable and easy to work with. The
overall engagement was highly positive, and the work was thoroughly handed over.
more case studies