0
Loading
ML Systems Engineer · Data Infrastructure · UC San Diego

Building data systems that actually scale.

ML Systems Engineer with 2 years of production experience building the infrastructure AI runs on - real-time pipelines, feature stores, model serving, and the observability layer that keeps it all reliable. From streaming data at scale to deploying models in the cloud, I own the full production loop.

About me
Atharva Hirulkar

I build the systems that make ML work in production.

I'm an ML Systems Engineer with hands-on experience building production data infrastructure at serious scale. At State Street Corporation, I owned Snowflake pipelines processing ~5M records/day, monitored ~$1B in real-time transaction flows, and engineered ISO 20022-compliant payment workflows.

I've architected multi-cloud infrastructure across AWS, Azure, and OCI using Terraform - and automated operations with Ansible, cutting provisioning time by ~40%. I care about reliability, not just velocity.

Currently pursuing an MS in Data Science at UC San Diego (GPA 3.80), deepening expertise in scalable data systems, statistical NLP, and optimization. I hold a granted copyright in biomedical time-series ML.

Data Systems
ETLSnowflakeKafkaSparkAirflowPostgreSQLTimescaleDBNeo4jQdrantData Warehousing
Machine Learning
PyTorchTensorFlowScikit-learnMLflowMLOpsNLPTransformersHugging Face LLMsRAGPrompt EngineeringOllamaOpenAI APILangChainAgentic AIFeature EngineeringAnomaly DetectionTime-Series Forecasting
Cloud & DevOps
AWSAzureOracle CloudTerraformAnsibleDockerKubernetesCI/CDGrafana
Programming & Tools
PythonSQLShell / BashPandasNumPyNetworkXPlotlyMatplotlib
Career

Where I've worked
& what I shipped.

Jul 2023 - Aug 2025
State Street Corp.
Bangalore, India
Data Engineer
  • Built and owned SQL + Python ETL pipelines on Snowflake processing 5M+ records/day, ensuring 99.9%+ uptime compliance with enterprise SLAs across production data workflows.
  • Architected and provisioned multi-cloud data infrastructure (OCI, AWS, Azure) using Terraform IaC, enabling scalable, zero-downtime deployments across production data pipelines.
  • Automated data environment provisioning and configuration using Ansible playbooks and Shell scripting, reducing manual operational overhead by 40% across production data environments.
  • Implemented Azure DevOps CI/CD for data services, cutting release cycles by 40% (5 days → 3 days) and achieving 99.8% deployment success across cloud data workflows.
PythonSQLSnowflakeTerraformAnsibleAzure DevOpsAWS
Jan 2023 - Jul 2023
State Street Corp.
Hyderabad, India
Payment Systems Analyst - Intern
  • Built real-time monitoring dashboards for $1B daily transaction flows (LYNX, CHIPS, TARGET2), detecting 98% of pipeline failures within 30 seconds and sustaining 99.9% settlement uptime.
  • Engineered ISO 20022 payment data workflows in Shell and PL/SQL, raising data-validation accuracy from 96% to 99.4% on high-volume financial messaging pipelines.
PL/SQLShellISO 20022Financial Messaging
"All models are wrong, but some are useful!" - George Box
Work

Things I've built.

01
SignalStack
End-to-end ML systems pipeline. Polygon.io WebSocket → Kafka → PySpark Structured Streaming → TimescaleDB feature store → three concurrent models (LSTM, LightGBM, Isolation Forest) → FastAPI at <10ms p99. Point-in-time correct features, PSI drift monitoring, live Grafana dashboards.
KafkaPySparkTimescaleDBGrafanaPolygon.ioPython
02
FraudLens
Production ML system for real-time fraud detection with explainable AI. IEEE-CIS · XGBoost · LightGBM · MLflow · FastAPI · AWS ECS Fargate + ALB · Qdrant · Airflow · GitHub Actions CI/CD.
XGBoostLightGBMFastAPIAWS ECS FargateMLflowAirflow
03
CadburyIn Development
Local-first AI agent with explicit constraint architecture. Deny-by-default tool policy, allowlist-only file search with user approval, semantic RAG over local folders with source citations, and full audit logging of every tool call. Ollama local inference keeps all prompts and file text on-device. The policy layer is the product.
PythonOllamaQwen 2.5 7BRAGSemantic SearchPolicy Engine
04
JobPilotIn Development
Autonomous job application pipeline powered by GPT-4o-mini. Resume to JD alignment scoring with structured skill-gap extraction, LaTeX resume tailoring via brace-match injection, Cover letter generation, and PostgreSQL persistence with dedup pipeline.
PythonGPT-4o-miniFastAPIPostgreSQLLaTeXOpenAI API
05
CosmeTik
Multi-database skincare analytics platform. Unified product + review pipeline across PostgreSQL, Neo4j (entity graphs), and Qdrant (vector search). 1M+ reviews, hybrid ML recommender.
PostgreSQLNeo4jQdrantPythonNLPRecommendation Engine
06
PulseMLCopyright L-114951/2022 (Gov. of India)
Wearable ECG/PPG physiological monitoring. LSTM-based arrhythmia detection with custom IoT data pipeline. Core ML framework for real-time anomaly alerts on biosignals.
LSTMPyTorchSignal ProcessingIoTPythonFeature Engineering
07
Multilingual Speech Transcription
Video-to-text pipeline supporting 100+ languages. BART-Large-CNN abstractive summarization with multilingual encoder. Flask API serving transcription + translation.
BARTHugging FaceNLPFlaskPythonTransformers
08
Seismic Risk AtlasDataHacks 2026
🏆 Best Use of Marimo & Sphinx
Block-level earthquake loss estimator for LA County's 2,498 census tracts. Physics-based ground-motion simulations (500 M6.7 scenarios, Scripps Institution) fused with Zillow ZHVI housing values and ACS census demographics via KD-tree spatial joins on Databricks Spark. FEMA HAZUS fragility curves applied per building code era; Monte Carlo aggregation over all scenarios preserves damage-function nonlinearity. XGBoost damage model GPU-trained on NVIDIA L40s (R² 0.99996, 0.70s). Interactive Leaflet choropleth map with FastAPI + OpenAI tract-level plain-English summaries.
PythonMarimoDatabricksSparkFastAPILeaflet.jsXGBoostSphinxFeature EngineeringOpenAI
Certifications

Credentials &
in progress.

JP Morgan Quantitative Research
✓ Certified
Qdrant Vector DB Essentials
✓ Certified
Qdrant Multi-Vector Search
✓ Certified
Neo4j Graph Data Science
✓ Certified
OCI Architect Associate
✓ Certified
OCI Foundations Associate
✓ Certified
Introduction to Data Analytics
✓ Certified
More coming…
Education
University of California, San Diego
M.S. in Data Science · Sept 2025 - Dec 2026
3.80 / 4.00 GPA
ML Systems · Causal Inference · Optimization · Scalable Data Systems · Statistical NLP · Advanced Data Mining · Data Ethics
Savitribai Phule Pune University
B.Tech. in Computer Engineering · Aug 2019 - Jun 2023
3.91 / 4.00 GPA
Applied Mathematics · Linear Algebra · Machine Learning · BI & Data Analytics · Artificial Intelligence · DBMS · Algorithms
More About Me

Curious who I really am?

Beyond code and data systems, there's more to explore. Check out my personal insights, interests, and the journey behind the engineer.

Contact

Got an
interesting
problem?

Email
atharvahirulkar.010@gmail.com
Location
San Diego, CA