2026-04-26: Humbug v45

This week has been very exciting! The new capabilities from v43 and v44 have really started to shine.

Accelerating Menai

One of the exciting things I've done in the last few weeks is make Menai's VM much faster. With v45 I now have the Linux VM running with native code too, so that's all 3 platforms.

It would be very easy to get over-excited by the rapid incremental gains, but to ground this I developed a benchmark tool. This compares idiomatic Python, Python written to use Menai's functional style, and Menai.

Some of the examples really just demonstrate the current version take about 1.8 ms to do anything, vs Python that can be ready in microseconds. It's the bigger tests that are starting to show very encouraging numbers. This week I've still seen some optimizations that have gained 8-12% on specific tests, so it feels like there's a huge potential for improvement.

I compared the test results from 30th March and 4 weeks later we're between 6 and 7 time faster than we were! In some cases, we're already close to giving Python a run for its money.

(venv) $ [19:37:01 ~/github/m6r/humbug] python tools/menai/benchmark/run.py

JSON_PARSER
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Case                                      Menai                       Python (idiomatic)                      Python (functional)
                           mean (ms)   min (ms)     mean (ms)   min (ms)          vs ref     mean (ms)   min (ms)          vs ref
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
object                         1.982      1.752 ✓       0.003      0.001     755x faster ✓       0.010      0.008     194x faster ✓
integer                        1.829      1.788 ✓       0.001      0.001    2347x faster ✓       0.001      0.000    3070x faster ✓
float                          1.844      1.785 ✓       0.001      0.001    2515x faster ✓       0.001      0.000    3115x faster ✓
true                           1.860      1.794 ✓       0.001      0.001    2641x faster ✓       0.000      0.000    4422x faster ✓
false                          1.863      1.805 ✓       0.001      0.001    2812x faster ✓       0.000      0.000    3823x faster ✓
null                           1.883      1.823 ✓       0.001      0.001    2790x faster ✓       0.000      0.000    4654x faster ✓
string_esc                     1.879      1.795 ✓       0.001      0.001    2424x faster ✓       0.001      0.001    1617x faster ✓
empty_array                    1.898      1.829 ✓       0.001      0.001    2743x faster ✓       0.000      0.000    4465x faster ✓
empty_object                   1.883      1.825 ✓       0.001      0.001    2954x faster ✓       0.000      0.000    5254x faster ✓
nested                         1.911      1.838 ✓       0.001      0.001    2227x faster ✓       0.003      0.002     705x faster ✓
long_string                    3.292      3.222 ✓       0.002      0.002    1568x faster ✓       0.071      0.070      46x faster ✓
deep_array                     2.287      2.212 ✓       0.021      0.019     107x faster ✓       0.156      0.149      15x faster ✓
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Validation: Menai 12/12 ✓  |  Python (idiomatic) 12/12 ✓  |  Python (functional) 12/12 ✓


RUBIKS_CUBE
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Case                                      Menai                       Python (idiomatic)                      Python (functional)
                           mean (ms)   min (ms)     mean (ms)   min (ms)          vs ref     mean (ms)   min (ms)          vs ref
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1-move                        28.038     28.038 ✓       0.110      0.110     255x faster ✓       0.063      0.063     448x faster ✓
2-move                        22.338     22.338 ✓       0.097      0.097     231x faster ✓       0.061      0.061     364x faster ✓
3-move                        20.838     20.838 ✓       0.208      0.208     100x faster ✓       0.118      0.118     176x faster ✓
4-move                        23.503     23.503 ✓       2.108      2.108      11x faster ✓       1.271      1.271      18x faster ✓
5-move                        31.526     31.526 ✓       7.213      7.213     4.4x faster ✓       4.340      4.340     7.3x faster ✓
6-move                        80.433     80.433 ✓      46.714     46.714     1.7x faster ✓      27.996     27.996     2.9x faster ✓
7-move                       154.817    154.817 ✓     103.982    103.982     1.5x faster ✓      63.309     63.309     2.4x faster ✓
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Validation: Menai 7/7 ✓  |  Python (idiomatic) 7/7 ✓  |  Python (functional) 7/7 ✓


SORT
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Case                                      Menai                       Python (idiomatic)                      Python (functional)
                           mean (ms)   min (ms)     mean (ms)   min (ms)          vs ref     mean (ms)   min (ms)          vs ref
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
n=10                           0.066      0.059 ✓       0.000      0.000     150x faster ✓       0.005      0.003      14x faster ✓
n=50                           0.109      0.102 ✓       0.002      0.001      71x faster ✓       0.024      0.022     4.5x faster ✓
n=100                          0.171      0.165 ✓       0.003      0.002      55x faster ✓       0.052      0.049     3.3x faster ✓
n=250                          0.409      0.401 ✓       0.009      0.006      48x faster ✓       0.141      0.138     2.9x faster ✓
n=500                          0.958      0.910 ✓       0.018      0.013      52x faster ✓       0.318      0.315     3.0x faster ✓
n=1000                         2.401      2.373 ✓       0.041      0.029      58x faster ✓       0.666      0.643     3.6x faster ✓
n=2500                        10.129     10.071 ✓       0.114      0.082      89x faster ✓       1.892      1.882     5.4x faster ✓
n=5000                        36.834     36.705 ✓       0.287      0.233     128x faster ✓       4.145      4.132     8.9x faster ✓
n=10000                      150.612    148.666 ✓       0.696      0.636     217x faster ✓       8.679      8.660      17x faster ✓
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Validation: Menai 9/9 ✓  |  Python (idiomatic) 9/9 ✓  |  Python (functional) 9/9 ✓


SUDOKU
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Case                                      Menai                       Python (idiomatic)                      Python (functional)
                           mean (ms)   min (ms)     mean (ms)   min (ms)          vs ref     mean (ms)   min (ms)          vs ref
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Easy (36 givens)              29.891     29.645 ✓       3.199      3.177     9.3x faster ✓      11.305     11.282     2.6x faster ✓
Medium (30 givens)             1.482      1.463 ✓       0.043      0.040      35x faster ✓       0.162      0.154     9.2x faster ✓
Hard (25 givens)            6126.182   6126.182 ✓     713.161    713.161     8.6x faster ✓    1986.715   1986.715     3.1x faster ✓
Expert (23 givens)           350.124    350.124 ✓      40.707     40.707     8.6x faster ✓     164.350    164.350     2.1x faster ✓
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Validation: Menai 4/4 ✓  |  Python (idiomatic) 4/4 ✓  |  Python (functional) 4/4 ✓

Menai benchmark results

Another angle to this is that a huge amount of effort has gone into decoupling the Menai VM from the Python runtime library. There are still code paths to improve, but Menai now has its own bigint library, custom string processing, etc. Within a week or two, I'm hoping the core library has zero Python calls, and they're solely used on the boundary between C and Python.

More UI improvements

Again, courtesy of Tharik, we have conversation that now nest child/delegate conversations underneath them. Tharik's original version highlighted some problems with the conversation management, so I reworked things. There's a sub-task running that indexes all conversations in the mindspace and can construct graph relationships. This means we can link parents and children, but we can also track shared children of forked conversations, etc.

This is very exciting as it can form the basis of a much more advanced audit trail.

Tharik also gave us a beautiful icon rail, allowing us to collapse the mindspace view. Each view is now separate but quick to view. This is a huge improvement when working on a laptop (or anything without a wide screen) - we get about 15% more usable space for tabs

Humbug v45 with the mindspace panel collapsed — With the mindspace panel collapsed, there's significantly more screen real estate available for tabs

Spoiler alert: The image shows the final state of 8 minutes of vibe-coding a Markdown to Microsoft Word .docx converter. This isn't working properly yet, but is shockingly close!

The impact of better tools

One of the last changes I made last week was to add better search capabilities within the filesystem tool. Our AIs can now search anywhere in the mindspace for files or directories. They can also regexp search, with certainty about how those searches will work.

On MacOS and Linux this means far fewer tool approval requests because Humbug no longer needs to use shell commands to find things. On Windows this is a revelation - the agentic experience is now on a par with the Unix-based systems.

One huge surprise was just how much every AI model engaged with this tool. The system prompt tells them to look for AGENTS.md files and now they all seem to start by searching for every such file in the mindspace! Checking files are correct, finding approaches to implement things, etc., are dramatically better. Using the terminal is a last resort, and used solely for custom debug scripts and running tests.

This also means Humbug can run quite a lot longer without requiring tool approvals from a human. Once Menai gains some new I/O integration tools things should change even more!

Windows experience is soooo much better!

One of the biggest frustrations I've had with Humbug is that the Windows port has always sort-of felt second class. Our AI friends appear to be trained to use the rich toolset offered by Unix-like systems, but Windows doesn't have them in the cmd.exe command shell.

Having enhanced the generic toolset, however, Windows can now do all the same agentic search and discovery. This is game-changing. Suddenly Humbug can do the same agentic discovery of code and code structures on all platforms, so fixing issues on Windows is now just as easy as Linux or MacOS.

Version 45

New features:

If an AI generates a diff hunk with @@ but with no line numbers, the diff applier will try to find with a fuzzy match.
Added support for using the C version of the Menai VM for Linux systems.
Dramatically improve the performance of the Menai VM.
The Menai VM C code is now much cleaner and decoupled from the original Python implementation.
Added keyboard scrolling of diff tabs.
Double-clicking an ephemeral tab label will now make the tab persistent.
Find controls now have history buffers, so up and down arrow will navigate previous searches.
Conversations now track the relationships between parent conversations and child conversations, so the mindspace conversations view now groups conversations under each other.
When opening an editor tab from a diff tab, the editor tab will be scrolled to the same position being shown in the diff.
The mindspace panel can be expanded and collapsed. Also the panel shows just one view at a time, giving more room to see what's there.
If you change a file extension when renaming a file, you will now be prompted to confirm that's what you want to do.
Adding "thinking" support to the DeepSeek backend.
Updated the DeepSeek models to v4 (flash and pro).
Updated the OpenAI models to include GPT-5.5 and updated nano and mini models to GPT-5.4.
Updated the Ollama Qwen 3.5 models to 3.6.

Bug fixes:

Tighten guidance on regexp syntax to AI models.
Despite instructions, an AI may use the wrong alternation format in regexps. If we get no results using an escaped form, but get results when it's unescaped, the tool will now return the results and warning that it had to remove the escaping.
If you highlight text and then open the find control in a diff tab it now uses the selected text as the initial search option.
Enabled keyboard shortcuts for "copy" operations in diff tabs.
Resolved a problem where find positions could be lost when hiding and unhiding the find bar in a tab.
If you attempt to drag a file that cannot be opened then an exception would trigger and the exception canary would activate. Now this action will trigger a popup message box.
Fixed a problem in the C and C++ syntax highlighters when preprocessor lines are continued with \.
Fixed menu and dialog edit control scaling problem on Ubuntu under WSL2 on Windows 11.
Fixed bugs in the Menai and Scheme syntax highlighting.
Added tests for the Scheme syntax highlighter.
Menai now supports #d and #D numeric literals.