Notes on the GLM-4.5 model from Z.ai, and updates to Humbug development including adding GLM-4.5 backends and removing the M6R backend.
Another day, another model
This time it's the GLM model from Z.ai.
- Blog : https://z.ai/blog/glm-4.5
- Docs at: https://docs.z.ai/guides/llm/glm-4.5
- Chat completions API: https://docs.z.ai/api-reference/llm/chat-completion
- Academic paper: https://arxiv.org/pdf/2508.06471
It appears to have very good benchmark performance, especially as an open source model.
Humbug updates
Added 5 backends for the GLM-4.5 model from Z.ai. I've only tested against their free endpoint (glm-4.5-flash) on their public API, but this should work locally.
Performance is a little slow, probably due to the round-trip to China, but it seems faster than DeepSeek.
Not really had much time to evaluate its capabilities.
The code is based on the xAI driver - that seems to be the closest match.
Also removed the M6R backend. This had been built as a proof-of-concept test backend with a view to implement a router service, but the concepts are now better handled generically, and other services can be implemented with custom tools.
Realized I could simplify the thinking/response token handling as a result of earlier changes, so simplified things!