Changing the Edit Format Improved 15 LLMs at Once

blog.can.ac | ksl | Feb 16, 2026 |

Can Bölük ran a single afternoon experiment that moved coding benchmark scores dramatically - Grok Code Fast 1 jumped from 6.7% to 68.3% - without touching any model weights. The trick was replacing the edit tool itself. His "hashline" format tags each line with a short content hash, so models reference anchors instead of reproducing exact text. Patch-based formats, it turns out, are the worst performer for nearly every model tested. Fifteen different LLMs improved, some cutting output tokens by over 60%. The finding lands squarely in a growing conversation around how much of what we attribute to model capability is actually harness and scaffolding design. Vendors restricting open-source harness experimentation may be leaving the easiest gains on the table.

Changing the Edit Format Improved 15 LLMs at Once

// 0 comments