Carlos Forero

Data Engineer

Especializado en diseño e implementación de pipelines de datos escalables en GCP y Python

stack.py
1# data_engineer_stack.py
2from dataclasses import dataclass
3
4@dataclass
5class DataEngineerStack:
6 cloud: list = ["GCP", "BigQuery", "Dataflow"]
7 processing: list = ["Apache Beam", "Airflow", "dbt"]
8 languages: list = ["Python", "SQL", "Bash"]
9 tools: list = ["Docker", "Git", "Linux"]
10
11 def pipeline_expertise(self):
12 return {
13 "etl": "Batch & Streaming",
14 "scale": "TB-scale data",
15 "optimization": "Cost & Performance"
16 }

Core Competencies

Cloud & Data Warehousing

  • • Google Cloud Platform
  • • BigQuery optimization
  • • Cloud Storage & Pub/Sub
  • • IAM & Security

Data Processing

  • • Apache Beam / Dataflow
  • • Batch & Streaming ETL
  • • Airflow orchestration
  • • dbt transformations

Development & Tools

  • • Python (advanced)
  • • SQL optimization
  • • Docker & CI/CD
  • • Git version control

Featured Projects

View all →

Real-time Analytics Pipeline

Streaming data pipeline processing 2.5TB/day using Pub/Sub, Dataflow, and BigQuery

GCPApache BeamBigQuery

ETL Optimization Framework

Cost optimization framework reducing BigQuery costs by 40% through query optimization and partitioning

PythonSQLOptimization

Impact Metrics

Data Processed Daily

2.5 TB

15% last month

Pipeline Uptime

99.9%

Stable

Cost Reduction

40%

vs previous quarter