Carlos Forero

Data Engineer

Especializado en diseño e implementación de pipelines de datos escalables en GCP y Python

stack.py

1# data_engineer_stack.py
2from dataclasses import dataclass
3 
4@dataclass
5class DataEngineerStack:
6    cloud: list = ["GCP", "BigQuery", "Dataflow"]
7    processing: list = ["Apache Beam", "Airflow", "dbt"]
8    languages: list = ["Python", "SQL", "Bash"]
9    tools: list = ["Docker", "Git", "Linux"]
10    
11    def pipeline_expertise(self):
12        return {
13            "etl": "Batch & Streaming",
14            "scale": "TB-scale data",
15            "optimization": "Cost & Performance"
16        }

Core Competencies

Cloud & Data Warehousing

• Google Cloud Platform
• BigQuery optimization
• Cloud Storage & Pub/Sub
• IAM & Security

Data Processing

• Apache Beam / Dataflow
• Batch & Streaming ETL
• Airflow orchestration
• dbt transformations

Development & Tools

• Python (advanced)
• SQL optimization
• Docker & CI/CD
• Git version control

Featured Projects

View all →

Real-time Analytics Pipeline

Streaming data pipeline processing 2.5TB/day using Pub/Sub, Dataflow, and BigQuery

GCPApache BeamBigQuery

ETL Optimization Framework

Cost optimization framework reducing BigQuery costs by 40% through query optimization and partitioning

PythonSQLOptimization

Impact Metrics

Data Processed Daily

2.5 TB

↑ 15% last month

Pipeline Uptime

99.9%

Stable

Cost Reduction

40%

↑ vs previous quarter