What you'll do
We are seeking an experienced Data Engineer. The role involves designing ingestion pipelines, optimizing query performance, and ensuring data quality, governance, and cost efficiency at scale (50–300 TB workloads).
Key Responsibilities
Key Responsibilities
- Migration Strategy & Execution
- Design and implement data ingestion pipelines to extract data from Oracle into GCS/Iceberg.
- Migrate and modernize existing Oracle schemas, partitions, and materialized views into Iceberg tables.
- Define CDC (Change Data Capture) strategies using custom ETL.
- Data Lakehouse Architecture
- Configure and optimize Trino clusters (coordinator/worker, Helm charts, autoscaling).
- Design partitioning, compaction, and clustering strategies for Iceberg tables.
- Implement schema evolution, time-travel, and versioning capabilities.
- Performance & Cost Optimization
- Benchmark Trino query performance vs Oracle workloads.
- Tune Trino/Iceberg for large-scale analytical queries, minimizing query latency and storage costs.
- Data Quality, Metadata & Governance
- Integrate Iceberg datasets with metadata/catalog services (Postgre/Hive Metastore, or Glue).
- Ensure compliance with governance, observability, and lineage requirements.
- Define and enforce standards for unit testing, regression testing, and data validation.
- Collaboration & Delivery
- Support existing reporting workloads (regulatory reporting, DWH) during and after migration.
- Document architecture, migration steps, and provide knowledge transfer.