Architecting distributed data systems processing 100TB → 6PB scale datasets with PySpark, AWS, and deep learning.
Data Engineer with 7+ years of experience building large scale distributed data systems and real time pipelines. Strong Python and SQL fundamentals paired with hands-on PyTorch experience implementing CNNs and other deep learning architectures.
Deep expertise in PySpark, AWS, and distributed computing, with a strong focus on data quality, validation, idempotency, and reproducible pipelines that feed analytics and ML consumers.
Currently designing data platforms that power analytics, marketing insights, and AI driven decision systems at Smart Energy Water.
A constellation of tools I use to build large-scale data systems. Drag to rotate, scroll to zoom.
Open to data engineering, ML engineering, and platform roles. Reach out anytime.