My latest project, about 60% of the codebase was written with Z.ai's GLM-5.1 model. It's basically a Telegram bot that allows for embedding/downloading media easier within group chats.
model
GLM-5.1
huggingface.co/zai-org/GLM-5.1 ↗
84784 downloads1206 likestext-generationtransformers
from the model card
GLM-5.1 👋 Join our WeChat or Discord community. 📖 Check out the GLM-5.1 blog and GLM-5 Technical report. 📍 Use GLM-5.1 API services on Z.ai API Platform. 🔜 GLM-5.1 will be available on chat.z.ai in the coming days. [Paper] [GitHub] Introduction GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks). But the most meaningful leap goes beyond first-pass performance. Previous models—including GLM-5—tend to exhaust their repertoire early: they apply familiar techniques for quick initial gains, then plateau. Giving them more time doesn't help. GLM-5.1, by contrast, is built to stay effective on agentic tasks over much longer horizons. We've found that the model handles ambiguous problems with better judgment and stays productive over longer sessions. It breaks complex problems down, runs experiments, reads results, and identifies blockers with real precision. By revisiting its reasoning and revising its strategy through repeated iteration, GLM-5.1 sustains optimization over hundreds of rounds and thousands of tool calls. The longer it runs, the better the result. Benchmark | | GLM-5.1 | GLM-5 | Qwen3.6-Plus | Minimax M2.7 | DeepSee…
discussions
recent items
Show HN: Chuddy, self-hosted media downloading, translation and OCR Telegram bot (github.com via hn) Show HN: Grunden – Frontier AI inference hosted in Sweden, OpenAI-compatible (grunden.ai via hn) grunden.ai är en svensk AI-tjänst för utvecklare, myndigheter och helt vanliga människor. GLM 5.1 (open-weight) med EU-jurisdiktion, ett OpenAI-kompatibelt API och prissättning i kronor.
Has anybody been able to achieve reliable agentic performance with cheap/open source models? (www.reddit.com) Basically the title. Recently I've been trying various open source and comparatively cheaper models like minimax m2.7, qwen models and glm5.1 in Pi agent from openrouter, and the performance on coding tasks have be moderately adequate at b…
How to Find Open-Source Models / Providers that Do not Train on Data (www.reddit.com) A lot of people are saying just use X, just do Y, just run Z locally, but the best models cannot be run locally (GLM 5.1). No one ever talks about privacy, but for those concerned about privacy, how do we know when we use Z AI's GLM 5.1 th…
Ask HN: Are there any good open-source chat apps? (news.ycombinator.com) Hi HN family! I've recently been messing around with open models through ollama (glm-5.1 and kimi-k2.6), and I've been impressed with just how close they are to Claude Sonnet for my needs, especially programming.
GLM 5.1 Locally: 40tps, 2000+ pp/s (www.reddit.com) After some sglang patching and countless experiments, managed to get reap-ed nvfp4 version running stable and FAST on 4 x RTX 6000 Pros (limited to 350W). Very happy with performance and quality.
Mac Studio local loadout - May 2026 (www.reddit.com) Day-to-day user vibes, not rigorous benchmarks, so YMMV. GLM 5.1 has by far been my biggest winner in the last batch of releases.
GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)? (www.reddit.com) I wonder which one is better, I tested it a little bit (too slow, of course) and I'm still unsure. Does the GLM-5.1 smol-IQ2_KS loses too much?
GLM-5.1 on Mi50? (www.reddit.com) Hi, did anyone with an AMD MI50 setup (8x 32GB) test GLM-5 or GLM-5.1? Currently, I have 3x AMD MI50 and I was wondering if it's worth buying another 5 of them and a new PSU.
Group Buys for Shared Compute or Model Hosting? Is this a thing? (www.reddit.com) I've been using GLM 5.1 a lot lately, and I love this model. However I don't love sending all my requests to China.
PP speed on dual RTX 6000 12c EPYC setup (www.reddit.com) I want to run big models like GLM 5.1 or Kimi k2.6. I can buy Mac Studio M3 Ultra with 512gb ram, but PP speed would be ofc bad.
Built a self-hosted agent for small businesses that writes its own skills. ~$0.15 per customer booking on GLM-5.1 (www.reddit.com) Been working on this for a while and finally at a point where it's running in production for a couple of small businesses, so figured I'd share. The thing that kept bugging me about "AI employee" products is that none of them are something…
llama.cpp / ik_llama MoE Expert Offloading - Main Memory Bandwidth vs. PCIe Bandwidth (www.reddit.com) Capacity vs Speed trade-off: 1.1TB Mac Unified Memory vs. RTX 6000 Pros (www.reddit.com) I'm usually a Windows person, but I’m currently running a Mac cluster for local LLM orchestration. My setup consists of four 256GB Mac Studios plus one 96GB Mac Studio, giving me about 1.1TB of unified memory.
Local GLM 5.1 - Parkour! (www.reddit.com) Some more 'sloptuber' content for those who are enjoying it :) Model: unsloth glm 5.1 @ IQ2_XXS UD Prompt 1: Task: in a single web page, build a city based parkour game. wsad controls, moving player aligned with current camera direction.
What's the best GPU cluster/configuration 30k $ can buy? (www.reddit.com) Edit: I’m getting the consensus is that the budget I suggested is not enough for my lil ambitious project. I’d like to reshape the question for the upcoming comments: what’s the minimal budget to achieve my goal?
Ollama Cloud Pro ($20/mo) vs OpenAI Plus ($23/mo). Which gives more tokens ? (www.reddit.com) Hey everyone, I'm comparing these two plans side by side for running AI agents daily through OpenClaw (self-hosted AI agent platform): • Ollama Cloud Pro — $20/month • OpenAI Plus — €23/month (~$25) My setup: 3 agents running in parallel (…
I got better results when I made each AI tool do one job (www.reddit.com) I spent too much time trying to find one AI dev tool that could do everything. Planning, coding, fixing, reviewing, maybe filing my taxes too It never really worked.
Model API Performance (news.ycombinator.com) We’ve been benchmarking a few models on our API platform and got some interesting performance numbers: - MiniMax M2.5 → 0.118s time-to-first-token, 103 tokens/sec - GLM 5.1 → 120 tokens/sec throughput - Kimi K2.5 → 0.643s TTFT, 69 tokens/s…
Claude Code with Pro subscription + OpenRouter in parallel — what's the cleanest setup? (www.reddit.com) Hi there, I have a Claude Pro subscription and use Claude Code daily. I'd also like to use Claude Code routed through my OpenRouter API key so I can experiment with other models (GLM-5.1, DeepSeek, Kimi, Gemini, etc.) — without giving up m…
Best app to use Nvidia Nim? (www.reddit.com) GLM 5.1 is so smart! ( via reddit) could not extract summary
FREE Claude Code alternative using GLM 5.1 + VS Code (tutorial) (www.reddit.com) https://youtu.be/tL3cOdgukt8
What Am I Doing Wrong? Models Won't Listen, At All (GLM 5.1, MiniMax M2.7, Kimi K2.5) (www.reddit.com) What am I doing wrong here? I can't get models to follow my instructions, pretty much at all.