Blog

Notes on ML engineering, data platforms, and the developer tools I build along the way.

| llm / go / devops

TTFT varies 13x across LLM providers. Here are the numbers.

Hourly probes across 15 frontier models from OpenAI, Anthropic, Google, DeepSeek, and xAI. Median TTFT ranges from 321ms to 4,226ms. Raw data included.

Read post
| llm / python / devtools

OpenAI's own cookbook costs $1,884/month to run. One model swap fixes most of it.

I scanned OpenAI's cookbook for LLM API calls and estimated the monthly cost at 1,000 calls per site. Four gpt-5 call sites account for 68% of the total spend.

Read post
| data-engineering / kubernetes / python

Building a data platform with dbt, Dagster, and ArgoCD

How I built an ELT data platform for 100k+ IoT devices: Dagster for orchestration, dbt for transforms, Sqitch for migrations, ArgoCD for GitOps deployment, and PII-safe extraction from five API shards.

Read post
| mlops / python / data-engineering

Evaluating ML algorithms in production: from field data to fleet deployment

How I built an evaluation pipeline for battery prediction algorithms serving 100k+ IoT devices: Dagster-orchestrated dataset creation from field data, human-in-the-loop review, isolated venv testing across algorithm versions, MLflow tracking, and fleet-wide rollout.

Read post
| llm / go / devops

I monitored 6 LLM APIs for 7 days. Here's what I found.

60,000 probes across GPT-4o-mini, Claude 3.5 Haiku, Gemini 2.0 Flash, Llama 3.3 70B, DeepSeek Chat, and Mistral Small. Real latency numbers from continuous monitoring.

Read post
| llm / python / devtools

How I built Infracost for LLM spend in a day

Building tokentoll, an Infracost-style cost-impact tool for LLM API spend, in a single day. Architecture, model-name resolution, multi-pass constant propagation, and validation across twenty real codebases.

Read post