Senior Data Engineer · Guatemala

Jeferson Stiv Argueta Hernández

I design cloud data platforms that don't page anyone at 3 AM — ELT on Snowflake with dbt, event-driven lakes on AWS and GCP, and the orchestration glue that keeps it honest. Seven years across supply chain, marketing, telecom, pharma, and healthcare.

Guatemala · +(502) 3312-6786 · ajefersonstiv@gmail.com

About

Senior Data Engineer with experience across supply chain, marketing, telecommunications, pharmaceutical, and healthcare. Comfortable across the stack: Python, SQL, C#, Julia, JavaScript, C++; AWS and GCP; Snowflake, Redshift, Postgres, Oracle, DB2, SQL Server, MongoDB. IT graduate from Universidad Mariano Gálvez and current Master’s student in Information Security.

I care about clean data models, observable pipelines, and small improvements that compound — the kind of engineering that survives audits, cloud-bill reviews, and 2 AM on-call pages.

Experience

Senior Data Engineer

TeamInternational · Medellín, Colombia ·

  • Designed and maintained ELT pipelines in Postgres and Snowflake using dbt, Snowpark, and PySpark; introduced custom macros and incremental strategies to automate DDL/DML and SCD loads.
  • Built large-scale ingestion and transformation pipelines in Python/Spark; tuned Snowflake warehouses and queries to reduce compute spend by 35%.
  • Implemented an event-driven data lake on AWS and GCP and exposed curated datasets via FastAPI endpoints to power dashboards and downstream applications.
  • Developed robust SQL for data quality and cleaning, with automated checks and retries for resiliency.
  • Partnered with Data Science / ML teams to productionize features and datasets.

Big Data Engineer

Millicom (Tigo) · Guatemala ·

  • Migrated the on-premises data warehouse to AWS, rewriting Talend pipelines in Python/PySpark and adopting S3, Glue, Lambda, Athena, Redshift, ECS/Fargate; improved pipeline execution times by 40–50%.
  • Built data-lake pipelines on S3 with PySpark and Python; automated orchestration with Airflow.
  • Integrated Kafka and Kinesis for streaming to enable near real-time analytics and alerts.
  • Performed data modeling and data mining on large-volume datasets for the network analytics area.

Data Analyst / BI Developer

IQVIA · Guatemala ·

  • Built batch and incremental pipelines with Python and SSIS into a SQL Server data warehouse; implemented SCD patterns and audit logging for lineage and traceability.
  • Designed star/snowflake dimensional models and wrote production-grade T-SQL (CTEs, window functions, stored procedures) for commercial and medical data marts.
  • Refactored critical T-SQL stored procedures used for data quality and cleaning, improving runtime by 70% and stabilizing refreshes.
  • Automated reporting with SSRS, Tableau, and Power BI.

Software Engineering Consultant

CDAG · Guatemala ·

  • Developed an internal process-management web app on .NET Framework, increasing stakeholder visibility by 40% and replacing legacy workflows.
  • Automated data cleaning and reporting with Python and C#; delivered BI with SSRS, Crystal Reports, and Power BI.
  • Managed the full SDLC; worked within Scrum.

Selected Projects

Cloud migration: AWS → GCP

Re-platformed a serverless-first data platform from AWS to GCP while preserving data models, orchestration patterns, and SLAs. Stack: Python, PySpark, Dagster, Snowflake, dbt. Validated end-to-end with data-quality checks; reduced cloud spend by 45% and, after optimizations identified during re-platforming, improved pipeline execution times by 35%.

  • GCP
  • AWS
  • Dagster
  • Snowflake
  • dbt
  • PySpark

Reverse engineering of an undocumented dbt / Snowflake pipeline

Reconstructed lineage and business logic for an inherited pipeline with limited documentation. Produced diagrams (models, dependencies, run order), corrected defects, and implemented guardrails — restoring reliable operation and establishing maintainability standards.

  • dbt
  • Snowflake
  • Lineage
  • Data quality

LLM agent for aircraft-repair knowledge

Built a retrieval-augmented agent on GCP to surface historical maintenance insights from Oracle records. Dual-pipeline architecture: Python on Compute Engine (cron) extracted from Oracle to Cloud Storage; Cloud Functions → Airflow DAG loaded curated layers into Postgres following a medallion architecture (bronze / silver / gold). Embeddings served via Vertex AI. End users resolve maintenance issues 60% faster.

  • Vertex AI
  • RAG
  • GCP
  • Airflow
  • Postgres
  • Medallion

Technical Skills

Cloud & Warehousing

  • AWS (S3, Lambda, Glue, ECS/Fargate, EC2, Athena, Aurora, SNS/SQS)
  • GCP (Cloud Storage, Pub/Sub, serverless compute)
  • Snowflake
  • Redshift
  • Oracle
  • Postgres
  • DB2
  • MySQL
  • SQL Server
  • Hadoop
  • DuckDB

Orchestration & Modeling

  • dbt
  • Dagster
  • Airflow
  • Star / Snowflake
  • SCD
  • Change Data Capture

Processing & Pipelines

  • PySpark
  • Snowpark
  • SQL
  • ELT / ETL design
  • Event-driven / serverless

Streaming & Integrations

  • Kafka
  • Kinesis
  • FastAPI
  • REST

BI & Reporting

  • Tableau
  • Power BI
  • SSIS
  • SSRS
  • Crystal Reports

Programming & Tools

  • Python
  • SQL
  • C#
  • JavaScript
  • C++
  • Julia
  • Docker
  • Git
  • Linux

Data Quality & Governance

  • dbt tests
  • Lineage
  • Audit logging
  • RBAC

Training & Certifications

From GitHub

Most recent public repositories, loaded live from the GitHub API.

Keyboard shortcuts