Atharva Hirulkar

About me

I build the systems that make ML work in production.

I'm an ML Systems Engineer building production ML systems end-to-end, from streaming data pipelines to model serving to RAG-powered explanations. At State Street Corporation, I spent 3 years building enterprise data infrastructure at scale: Snowflake pipelines processing 5M+ records/day, multi-cloud automation across AWS, Azure, and OCI, and real-time monitoring for ~$1B in transaction flows.

I've architected multi-cloud infrastructure across AWS, Azure, and OCI using Terraform and automated operations with Ansible, cutting provisioning time by 40%. I care about reliability, not just velocity.

Currently pursuing an MS in Data Science at UC San Diego (GPA 3.85), deepening expertise in ML systems, scalable data systems, machine learning, and causal inference. I hold a granted copyright from the Government of India in biomedical time-series ML.

ML Systems & MLOps

MLflowModel RegistryFeature PipelinesPSI Drift MonitoringModel ServingAirflowFastAPICI/CD

Cloud & DevOps

AWSAzureOracle CloudTerraformAnsibleDockerKubernetesGitHub ActionsGrafanaPrometheus

Data Engineering

KafkaPySparkSnowflakePostgreSQLTimescaleDBNeo4jQdrantDatabricks SparkETL

Machine Learning & GenAI

PyTorchXGBoostLightGBMLSTMScikit-learnSHAPRAG PipelinesLangChainOpenAI APIClaude APIOllamaAgentic AIVector Search

Programming & Tools

PythonSQLShell / BashC++PandasNumPyNetworkXPlotly

Career

Where I've worked
& what I shipped.

Apr 2025 - Aug 2025

State Street Corp.

Bengaluru, India

Data Engineer

Data Platforms

Owned multi-cloud automation across OCI and Azure, designing Terraform IaC and Ansible playbooks adopted as the standard provisioning approach across teams; reduced manual overhead by 40% while sustaining 99.9%+ uptime on distributed production systems serving the global custodian.
Drove enterprise database migration across 120+ application teams as the lead on security architecture: designed IAM policies, RBAC, authentication workflows, and user provisioning from scratch, becoming the go-to engineer for security implementation across the program.
Led Azure DevOps CI/CD adoption across globally distributed engineering teams by building the automation framework and mentoring deployment best practices, cutting release cycles by 40% (5 days to 3) at 99.8% deployment success.

TerraformAnsibleOCIAzureAzure DevOpsIAM/RBACCI/CD

Jul 2023 - Mar 2025

State Street Corp.

Bengaluru, India

Platform Engineer

Data Platforms

Engineered and owned production-grade Snowflake ETL pipelines processing 5M+ records/day at 99.9%+ SLA uptime on institutional data workflows; established data quality standards and validation frameworks adopted across the engineering team.
Architected and automated multi-cloud environments (OCI, AWS, Azure) independently via Terraform IaC, cutting infrastructure provisioning time by 50%; the design was later adopted as the standard provisioning approach across teams.
Built a database automation tool with configurable filters that eliminated 15 hours/week of manual data retrieval, identifying the inefficiency independently and shipping the solution without a formal assignment.

PythonSQLSnowflakeTerraformAWSAzureOCI

Jan 2023 - Jun 2023

State Street Corp.

Hyderabad, India

Payment Systems Analyst, Intern

Cash & Funds

Built real-time monitoring systems across LYNX, CHIPS, and TARGET2 payment networks for ~$1B in daily transaction flows, achieving 98% failure detection within 30 seconds and sustaining 99.9% settlement uptime.
Engineered automated validation logic in PL/SQL and Shell for high-volume ISO 20022 payment workflows, raising data-validation accuracy from 96% to 99.4% and reducing manual error resolution across the payments team.

PL/SQLShellISO 20022Financial Messaging

Work

Things I've built.

FraudLens

Production ML system for real-time fraud detection with explainable AI. IEEE-CIS · XGBoost · LightGBM · MLflow · FastAPI · AWS ECS Fargate + ALB · Qdrant · Airflow · GitHub Actions CI/CD. AUC-PR >0.88, p99 under 80ms, RAG explanation layer powered by SHAP + GPT-4o-mini.

XGBoostLightGBMFastAPIAWS ECS FargateMLflowAirflowQdrantRAG

↗

SignalStack

End-to-end ML systems pipeline. Polygon.io WebSocket → Kafka → PySpark Structured Streaming → TimescaleDB feature store → three concurrent models (LSTM, LightGBM, Isolation Forest) → FastAPI at <10ms p99. Point-in-time correct features, PSI drift monitoring, live Grafana dashboards.

KafkaPySparkTimescaleDBGrafanaPolygon.ioPython

↗

ChokepointBow Capital Defense Hackathon 2026

→ Live Demo

Defense procurement supply-graph intelligence. Ingests 9.1M USAspending contract rows across FY2024 to FY2026, builds a bipartite vendor to agency to NAICS graph (49,842 vendors, 228K weighted edges), and ranks every vendor by N-1 contingency failure impact. GradientBoosting trained on simulated coverage-drop labels hits recall@10 = 0.80, doubling the centrality baseline. Validated against 10 real defense supply-chain disruptions: 6 of 10 in the top 1 percent of 49,842 vendors, median 99.16th percentile. Sub-millisecond live stress simulator deployed on Hugging Face Spaces.

PythonNetworkXGradientBoostingIsolationForestFastAPIStreamlitDockerHugging Face Spaces

↗

Seismic Risk AtlasDataHacks 2026

🏆 Winner

Block-level earthquake loss estimator for LA County's 2,498 census tracts. Physics-based ground-motion simulations (500 M6.7 scenarios, Scripps Institution) fused with Zillow ZHVI housing values and ACS census demographics via KD-tree spatial joins on Databricks Spark. FEMA HAZUS fragility curves applied per building code era; Monte Carlo aggregation over all scenarios preserves damage-function nonlinearity. XGBoost damage model GPU-trained on NVIDIA L40s (R² 0.99996, 0.70s). Interactive Leaflet choropleth map with FastAPI + OpenAI tract-level plain-English summaries.

PythonMarimoDatabricksSparkFastAPILeaflet.jsXGBoostSphinxFeature EngineeringOpenAI

↗

CosmeTik

Multi-database skincare analytics platform with unified product + review pipeline across PostgreSQL, Neo4j (entity graphs), and Qdrant (vector search). 1M+ reviews ingested with consistency-managed derived stores rebuilt from PostgreSQL source of truth. Detected 30 ingredient communities via PageRank + Louvain clustering; hybrid ML recommender combines graph community signals with semantic similarity over review embeddings.

PostgreSQLNeo4jQdrantNetworkXsentence-transformersFastAPIPython

↗

Irona

Local-first AI agent with explicit constraint architecture. Deny-by-default tool policy, allowlist-only file search with user approval, semantic RAG over local folders with source citations, and full audit logging of every tool call. Ollama local inference keeps all prompts and file text on-device. The policy layer is the product.

PythonOllamaQwen 2.5 7BRAGSemantic SearchPolicy Engine

↗

JobPilot

Autonomous job application pipeline powered by GPT-4o-mini. Resume to JD alignment scoring with structured skill-gap extraction, LaTeX resume tailoring via brace-match injection, cover letter generation, and PostgreSQL persistence with dedup pipeline.

PythonGPT-4o-miniFastAPIPostgreSQLLaTeXOpenAI API

↗

PulseMLCopyright L-114951/2022 (Gov. of India)

Wearable ECG/PPG physiological monitoring. LSTM-based arrhythmia detection with custom IoT data pipeline. Core ML framework for real-time anomaly alerts on biosignals. AUC 0.93, 30% reduction in false-positive cardiac alerts, BPM forecasting 2.5 minutes ahead.

LSTMPyTorchSignal ProcessingIoTPythonFeature Engineering

↗

Multilingual Speech Transcription

Video-to-text pipeline supporting 100+ languages. BART-Large-CNN abstractive summarization with multilingual encoder. Flask API serving transcription + translation.

BARTHugging FaceNLPFlaskPythonTransformers

↗

Building systems that actually scale.

I build the systems that make ML work in production.

Where I've worked
& what I shipped.

Things I've built.

Credentials &
in progress.

Curious who I really am?

Got an
interesting
problem?

Building systems that actually scale.

I build the systems that make ML work in production.

Where I've worked& what I shipped.

Things I've built.

Credentials &in progress.

Curious who I really am?

Got aninterestingproblem?

Where I've worked
& what I shipped.

Credentials &
in progress.

Got an
interesting
problem?