Having added GLM 4.7 support this week, I decided to take it for a test drive. It's pretty good, but it's not Claude Sonnet 4.5.
A Lua syntax highlighter
I've worked with Lua a few times in the past, but a new project I'm looking at may end up using it. To make this easier for me I figured I needed a Lua syntax highlighter.
GLM 4.7 did a very nice job of creating one based on the existing lexer/parser patterns. It wrote ad-hoc tests and then debugged quite a few issues it found. None of this required my intervention. So far so good!
Having got something that provisionally worked, I asked GLM to build unit tests. Again, it evaluated the existing test structure, proposed a design and wrote about 100 test cases. This was also pretty good. Unfortunately about 1/3 of the tests immediately failed.
Trying to get GLM to fix this took us down a rabbit hole. It's possible this was down to the context getting more full (at around 100k tokens), but it started to do a few strange things and kept repeating mistakes. I've seen this with other LLMs, so I might just have been unlucky this time. In the end, however, I switched to Claude Sonnet 4.5 and it fixed all the test issues (albeit with a fresh context).
Writing tests for the Python syntax highlighter
I gave GLM another test, this time to write tests for the Python syntax highlighter. It again started well, but after it had created the first 5 or 6 files it made a mistake in a tool call. Once that had happened it went off into the weeds. As soon as that broken tool call is in the chat history the LLM can't unsee it (the "don't think about elephants" problem again).
Again, Claude did the whole thing and debugged it all in a single session.
It's not all bad
While this might sound a little negative, it's not all bad. GLM has actually done a really nice job adding pages to my blog site!