Tokenization: The First Decision That Shapes Everything Your LLM Does

Every kirana store has a billing system. If the system was set up for a shop that mostly sells atta, dal, and rice, those items get single, efficient product codes — beep, done. But walk in asking for high-mountain oolong tea or high-altitude quinoa, and the shopkeeper has to punch in each word letter-by-letter from a handwritten label. Slower billing. Higher effort. And the shopkeeper has never stocked half of it, so good luck getting a recommendation. ...

June 16, 2026 · 8 min · biswajit

The Wall Has Circuitry Inside It: Descriptors in Python

You have a class. It has an attribute. You want to protect that attribute — maybe validate it, maybe log every write, maybe make it read-only after the first assignment. So you do what most developers do. You prefix with an underscore, write a get_temperature() method, write a set_temperature() method, and tell yourself that’s “Pythonic.” It works. But every class that needs this pattern gets the same boilerplate. Every attribute that needs protection gets its own pair of methods. The class that started as twelve lines is now forty-five. And somewhere around the third get_x / set_x pair, you start feeling like you’re writing Java. ...

June 11, 2026 · 6 min · biswajit

Why Your B-tree Index Is Useless for Semantic Search — And What HNSW Does Instead

For software engineers who know what an index is and what an embedding is. Start Here: A Question You Think You Already Know the Answer To You have a chunks table with 50,000 rows. Each row has a text column and a vector column — a list of 1536 floating point numbers representing the semantic meaning of that text. A user query arrives. You embed it into another 1536-dimensional vector. Now you want the most semantically similar chunks. ...

June 6, 2026 · 8 min

How I Built My Blog and Hosted It for Free on Cloudflare

I wanted a blog that was fast, free to host, and didn’t require me to manage a server. This post is a step-by-step account of how I built exactly that — using Hugo to generate the site and Cloudflare Pages to serve it. No prior experience needed. If you can use a terminal and have a GitHub account, you can follow along. First, understand what you’re building Before touching any tools, it helps to understand what’s actually happening. ...

June 5, 2026 · 5 min