Your AI Writes Tests That Pass. That Is the Problem.
AI-generated tests can look reassuring while proving very little, exposing a dangerous gap between green checkmarks and real verification.
16 posts
AI-generated tests can look reassuring while proving very little, exposing a dangerous gap between green checkmarks and real verification.
COBOL modernization is not just a technical story; it threatens the consulting toll booths built around legacy systems.
As AI writes more code, naming becomes even more central: the human craft shifts toward concepts, boundaries, and meaning.
LLMs may act impressively while still failing to know when they are capable, making self-assessment a core safety problem.
Context engineering and requirements engineering converge, suggesting better ways to specify AI-assisted software before code is written.
Different coding models show recognizable habits, risk tolerances, and failure modes, making 'personality' a practical engineering concern.
Google's DORA findings suggest AI amplifies team quality: strong practices get stronger, broken processes get louder.
A developer-focused guide to choosing between OpenAI's Chat Completions, Responses, and Assistants APIs in 2025.
Goodhart's Law explains why AI alignment can fail when proxy metrics become targets and systems learn the wrong game.
Project Strawberry and the physical weight of the internet meet in a playful reflection on knowledge, storage, and scale.
The post warns against an AI cargo cult that confuses impressive mimicry with the harder problem of genuine intelligence.
THERMOMETER targets overconfident language models, offering a way to calibrate systems that bluff too easily.
Decentralized multi-agent systems promise problem-solving without a central boss, but coordination becomes the real challenge.
Multi-agent LLM systems are explored as a path toward distributed reasoning, specialization, and collaborative AI workflows.
GPT-4's Turing-test performance revives the old question of whether fooling humans proves intelligence or just fluency.
Mojo is presented as a promising language for AI and machine learning, blending Python-like usability with systems-level speed.