Something keeps nagging at me about the Chinese AI space lately. Every few months a new Chinese model drops that closes the gap with US frontier models a little more(not by throwing more compute at it, just genuinely clever engineering at…
#glm
118 items
Chinese AI companies are shipping faster and cheaper than anyone expected and I'm not sure the west has a good answer for it (www.reddit.com) Major drop in intelligence across most major models. (www.reddit.com) As of mid Apr 2026, I have noticed every model has had a major intelligence drop. And no I'm not talking about just ChatGPT.
2x 512gb ram M3 Ultra mac studios (www.reddit.com) Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild (www.reddit.com) Been following the infrastructure side of AI more lately and stumbled on this from Zai. They upgraded the network architecture on a thousand-GPU cluster running GLM-5.1 coding inference from the standard ROFT setup to something they built…
Recent Open models from last 6 Months - Nov 2025 - Apr 2026 (www.reddit.com) I created this chart with recent open models from last 6 months. Few might be older than that possibly.
Do you guys think there’s a high chance of Singularity being open source? (www.reddit.com) GLM 5.1 is dominant in almost every aspect in Design arena, surpassing Opus 4.6 in many tasks. Although user experiences vary dependent on subscription plans for both of those one of them is open source.
(Interactive)OpenCode Racing Game Comparison Qwen3.6 35B vs Qwen3.5 122B vs Qwen3.5 27B vs Qwen3.5 4B vs Gemma 4 31B vs Gemma 4 26B vs Qwen3 Coder Next vs GLM 4.7 Flash (www.reddit.com) Minimax M2.5 vs. GLM-5 vs. Kimi k2.5: How do they compare to Codex and Claude for coding? (www.reddit.com) Guys we have to change the pelican test (www.reddit.com) So i have been seeing more of those pelican on a bike svg tests and while they work i feel like (and maybe you guys do too) they are getting kinda benchmaxxed so we should switch things up soon and this is my idea generate me a html svg of…
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents (arxiv.org via hn) We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability…
ZAI might stop open-weighting their models? (www.reddit.com) Ever since the company went public, they’ve been making a lot of changes that clearly seem to be prioritizing profit without regard to their customers. For example, with their coding plans: - They promised/advertised that the Lite coding p…
Running gpt and glm-5.1 side by side. Honestly can’t tell the difference (www.reddit.com) So I have been running gpt and glm-5.1 side by side lately and tbh the gap is way smaller than what im paying for On SWE-Bench Pro glm-5.1 actually took the top spot globally, beat gpt-5.4 and opus 4.6. overall coding score is like 55 vs g…
Abliterlitics: Benchmarks and Tensor Comparison for Heretic, Abliterlix, Huiui, HauhauCS for GLM 4.7 Flash (www.reddit.com) This is a follow up to the previous benchmark and tensor analysis of abliteration techniques across the Qwen model family. Same approach, same toolkit, new model family.
The pacman benchmark: finally a viable local agentic coding agent with Qwen 3.6 27b (www.reddit.com) One way I like to test new models, is by one-shoting (with a good prompt) a single webpage clone of the classic arcade game pacman. I usually do 3 attempts and keep the best one.
Tested how OpenCode Works with SelfHosted LLMS: Qwen 3.5, 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash - v2 (www.reddit.com) I have run two tests on each LLM with OpenCode to check their basic readiness and convenience: - Create IndexNow CLI in Golang (Easy Task) and - Create Migration Map for a website following SiteStructure Strategy. (Complex Task) Tested Qwe…
Is Qwen3.6 current king for local agentic use? (www.reddit.com) I've been testing other models but it seems like nothing even come close to Qwen3.6 35B A3B for agentic use. The worse I'd get is a loop sometimes, while Gemma4 produced broken tool calls occasionally and I couldn't even get GLM 4.7 Flash…
Single question llm comparison (www.reddit.com) Kimi K2.6-Code-Preview, Opus 4.7, GLM 5.1, Minimax M2.7 and more tested in coding (www.reddit.com) Hi everyone. It's been a while since I posted (was a lil burned out), but some of you may have seen my older SanityHarness posts.
I expanded DystopiaBench to 42 models and 6 dystopia types. Claude is still the only one I'd trust with nuclear codes. (www.reddit.com) Since the last post I've added: Huxley module (Brave New World style behavioral conditioning) Baudrillard module (synthetic intimacy, trust collapse, simulation) 30 more models including Grok 4.3, GPT-5.5, Gemini 3.1 Pro, GLM-5.1 Multi-jud…
GLM 5.1 Locally: 40tps, 2000+ pp/s (www.reddit.com) After some sglang patching and countless experiments, managed to get reap-ed nvfp4 version running stable and FAST on 4 x RTX 6000 Pros (limited to 350W). Very happy with performance and quality.
GPT 5.5 (Codex) leading the future prediction race (www.reddit.com) Researchers from the Max Planck Institute recently released FutureSim, an environment in which agents are replayed a temporal slice of the web and are tasked with predicting real-world future events. In their environment, GPT 5.5 leads at…
Your local LLM predictions and hopes for May 2026 (www.reddit.com) Which of these do you think we'll get in May? Also, feel free to pick/rank which ones you'd want the most badly: more Gemma4 models (124b?) (other sizes?) more Qwen3.6 models (9b?
Comparing GPT-5.4, Opus 4.6, GLM-5.1, Kimi K2.5, MiMo V2 Pro and MiniMax M2.7 (www.codejam.info via hn) Local GLM 5.1 - Parkour! (www.reddit.com) Some more 'sloptuber' content for those who are enjoying it :) Model: unsloth glm 5.1 @ IQ2_XXS UD Prompt 1: Task: in a single web page, build a city based parkour game. wsad controls, moving player aligned with current camera direction.
Ollama Cloud Pro ($20/mo) vs OpenAI Plus ($23/mo). Which gives more tokens ? (www.reddit.com) Hey everyone, I'm comparing these two plans side by side for running AI agents daily through OpenClaw (self-hosted AI agent platform): • Ollama Cloud Pro — $20/month • OpenAI Plus — €23/month (~$25) My setup: 3 agents running in parallel (…
do you use different models for different steps in your agent, or just one for everything? (www.reddit.com) Our dev team flagged last week that xAI is retiring grok 4.1 fast. We weren't using it for anything critical but it made me ask something I'd never actually asked: how did we pick the models we're running?
DeepSeek's 10T USD grand strategy (twitter.com via hn) Have you ever wondered, how DeepSeek may make money, and lot of it? They didn't come up with competitive coding plans like GLM, MoonShot and MiniMax.
Tips for using Composer 2? New to Cursor (www.reddit.com) Hi. I new to using Cursor - coming from Claude Code, Antigravity and most recently GLM coding plan.
Anyone tried +- 100B models locally with foreign languages? (www.reddit.com) I am quite curious as I tried Gemma 4 31B, Qwen 3.6 27B, GLM 4.7 30B and some others in my native language (czech). Gemma performs "best" and considering the fact its "just" 18GB model - it actually blows my mind how well it can respond in…
Scaling Pain of Coding Agent Serving: Lessons from Debugging GLM-5 at Scale (z.ai via hn) Our belief in Scaling Laws has not only driven continuous breakthroughs in model parameters and data scale, but has also pushed infrastructure engineering toward its limits. This process inevitably comes with growing pains, which we refer…
Used a Claude Code skill to fine-tune Qwen3-1.7B from 327 noisy traces, matches GLM-5 (www.reddit.com) Had 327 production traces from a restaurant-reservation agent I wanted to retrain. The plan was to fine-tune a smaller self-hostable model so I could ditch the frontier-API bill.
I built a local GUI for the TradingAgents framework — works with Ollama (www.reddit.com) https://preview.redd.it/i90oxxk7n03h1.png?width=1898&format=png&auto=webp&s=7d219c804fda7dfe122b84fcdb6d0d6883818c68 A while back I came across TradingAgents — a really cool multi-agent LLM stock analysis framework where like a dozen "agen…
Update to the LLM Debate Benchmark: GPT-5.5, Grok 4.3, DeepSeek V4 Pro, GLM-5.1, Kimi K2.6, Qwen 3.6 Max Preview, Xiaomi MiMo V2.5 Pro, Tencent Hy3 Preview, and Mistral Medium 3.5 High Reasoning added (www.reddit.com) The benchmark uses adversarial, multi-turn debates across 683 curated motions. Each model pair debates the same motion twice with sides swapped.
Current state of open-source ? (www.reddit.com) I’m trying to understand the current open-source LLM landscape beyond surface-level hype. We all got used to the nerfed products of Claude/Geminj so I believe really in opensource as a solution.
llama.cpp / ik_llama MoE Expert Offloading - Main Memory Bandwidth vs. PCIe Bandwidth (www.reddit.com) I ran GLM-5.1 on a 16GB RAM machine (github.com via hn) 🧠 MoE-on-a-Potato Running a 754-Billion Parameter LLM on a 16GB RAM Consumer PC "Saying it's impossible is not engineering. Saying we don't know how yet is science." MoE-on-a-Potato is an experimental project dedicated to testing the extre…
Open weights GLM and Mimo are better than Gemini 3.5 flash according to arena (www.reddit.com) While we are weathering the gemini 3.5 flash hype, keep in mind that according to arena, GLM and Mimo are better. https://arena.ai/leaderboard/text/coding-no-style-control #7 GLM #9 Mimo #12 Gemini 3.5 Flash
cdesktop — open-source Claude Code Desktop alternative, runs locally via npx, supports any provider (www.reddit.com) I built cdesktop with Claude Code — it's an open-source alternative to Anthropic's Claude Code Desktop, running locally on your machine via npx cdesktop. Free, Apache 2.0.
Open source battle: GLM vs Kimi vs MiMo vs DeepSeek (www.youtube.com via reddit) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Show HN: Grunden – Frontier AI inference hosted in Sweden, OpenAI-compatible (grunden.ai via hn) grunden.ai är en svensk AI-tjänst för utvecklare, myndigheter och helt vanliga människor. GLM 5.1 (open-weight) med EU-jurisdiktion, ett OpenAI-kompatibelt API och prissättning i kronor.
Ran K2.6 through a third-party coding benchmark: heres how the figures stand up (www.reddit.com) I have been following the akitaonrails coding benchmark which tests against a fixed rails + Rubyllm + docker task rather than vendor-reported evals. April 2026 update put K2.6 at 87 sitting in tier A (80+), ahead of Qwen 3.6 plus (71), Dee…
Just got a beast. (www.reddit.com) 1.5 tb ram with 128gb vram and a 28 core processor. Mac Pro 2019.
Capacity vs Speed trade-off: 1.1TB Mac Unified Memory vs. RTX 6000 Pros (www.reddit.com) I'm usually a Windows person, but I’m currently running a Mac cluster for local LLM orchestration. My setup consists of four 256GB Mac Studios plus one 96GB Mac Studio, giving me about 1.1TB of unified memory.
What's the best GPU cluster/configuration 30k $ can buy? (www.reddit.com) Edit: I’m getting the consensus is that the budget I suggested is not enough for my lil ambitious project. I’d like to reshape the question for the upcoming comments: what’s the minimal budget to achieve my goal?
do GLM-4.7 Flash Q4_K_M have problem with claude or agent? (www.reddit.com) I'm brand new to local LLMs and started with GLM-4.7 Flash q4_K_M. When I run it directly: ollama run glm-4.7-flash:q4_K_M it works pretty decently — nothing amazing, but usable and responsive.
I got better results when I made each AI tool do one job (www.reddit.com) I spent too much time trying to find one AI dev tool that could do everything. Planning, coding, fixing, reviewing, maybe filing my taxes too It never really worked.
What's the current best code autocomplete LLM for local deployment (as of April 2026)? (www.reddit.com) I know this question has already been asked a thousand times, probably, but... what's the best or close-to-best model I can use with Continue for local IDE-like code autocomplete?
Show HN: Free open source coding models in Slack (www.runcord.com via hn) Hey HN, We believe we have the easiest onboarding from signup to being able to spin up coding agents in slack like Stripe, Ramp & Coinbase. Demo of the onboarding: https://www.tella.tv/video/connecting-cord-to-slack-1-19ep Every signup get…
Show HN: Chuddy, self-hosted media downloading, translation and OCR Telegram bot (github.com via hn) My latest project, about 60% of the codebase was written with Z.ai's GLM-5.1 model. It's basically a Telegram bot that allows for embedding/downloading media easier within group chats.
What’s going on with GLM? Are they scamming or what? (www.reddit.com) I have a GLM subscription that’s marketed as offering 3× higher usage than Claude Pro. I primarily use it through Claude Code CLI as a backup coding model.
Chinese AI Coding Plan (www.reddit.com) With the lowering usage limit in Claude, I am thinking of jumping ship to Chinese AI, since the benchmark is already very near compared to Sonnet or Haiku 4.5 , but for a fraction of the price. I am not worried about where is my data endin…
tested four newest open source Kimi K2.6 is the fastest, GLM 5.1 the fanciest, DeepSeek V4 is the most comprehensive, and Xiaomi MiMo is the slowest (www.reddit.com) Architecture explains the gap: MiMo's MoE runs more active params per token than Kimi K2.6's optimized routing hence slowest. DeepSeek V4's 'comprehensive' edge is partly MLA: ~75% KV-cache compression makes it far better for long agentic…
Why is no open weight model inference provider hosting Mimo-v2.5 or Mimo-v2.5-pro? (www.reddit.com) Literally no 3rd party api inference provider is hosting the mimo-2.5 series models from Xiaomi. They seem to be reallly good.
Who else thinks AI is reaching a plateau (www.reddit.com) I must say that I almost feel no difference in all of the latest models that are coming out. Opus 4.7 is almost equal to 4.6 and 4.5, same about the other GPT models, the Kimi K models and the GLM models they all I feel they’re almost all…
GLM-5.1 on Mi50? (www.reddit.com) Hi, did anyone with an AMD MI50 setup (8x 32GB) test GLM-5 or GLM-5.1? Currently, I have 3x AMD MI50 and I was wondering if it's worth buying another 5 of them and a new PSU.
Ask HN: Are there any good open-source chat apps? (news.ycombinator.com) Hi HN family! I've recently been messing around with open models through ollama (glm-5.1 and kimi-k2.6), and I've been impressed with just how close they are to Claude Sonnet for my needs, especially programming.
3 of TIME's top 10 AI companies are Chinese and I only knew one by name (www.reddit.com) I code for a living, close to 7 years now, and I read way too much tech news. TIME dropped their 2026 most influential AI companies list and going through it I see OpenAI, Anthropic, Google, Meta, Amazon, then Zhipu AI sitting right there…
Open Source Company Coding Plans (www.reddit.com) I’ve been looking to buy a coding plan from one of the major open source contributors to give my meager support to them and transition away from Claude. I would love to hear some feedback from the community of their experience with some of…
I'm Not a Dev But I Use Qwen 3.6 35b to Code (www.reddit.com) Full disclosure: I used to program a bit, but I was garbage at it so I found a new career. This was eons ago so I'm not a dev, obviously.
Claude Code Uses GLM 4.7 (old.reddit.com via hn) could not extract summary
Cursor 3 eating GLM 5.1 usage (www.reddit.com) Hello all just as it sounds. I recently started using GLM 5.1 in cursor 3 but unlike in the past, GLM 5.1 ran through my entire daily budget from summarizing chat context and running commands.
Show HN: LimitPing – Keep Claude Code and Codex rate-limit windows continuous (github.com via hn) CCLimitPing (limitping) English | 中文 Keep your Claude Code, Codex, and GLM (Zhipu / Z.ai Coding Plan) rate-limit windows back-to-back. These providers bill on a 5-hour rolling window (plus a weekly cap), and the 5h window starts on your fi…
Noob here, curious about roughly how advanced of a video game a model like Qwen3.6 27b could create, if kept fully offline, and got unlimited attempts/revisions (maybe ~1 month project time limit). Like, could it make something equivalent to Pokemon Red? Doom? Doom II? What if using GLM 5.1? (www.reddit.com) So, I got interested in local LLMs a few months ago, but, I don't have a background in coding, and I don't know how to code, and I am not good with computers or anything. So far I mainly just was having fun with comparing different local L…
Is Composer 2.5 better than Glm 5.1 and DeepSeek v4 pro in real world tasks? (www.reddit.com) I am new to Cursor and still testing the free version. Benchmark for Composer 2.5 indicates it is better than DeepSeek v4 and Glm 5.1.
When configuring a third-party AI large model on the MacBook Claude Code desktop client, an error message appears. How can this be resolved? (www.reddit.com) This is my GLM-4.6 model API configuration, and this error is really confusing me. I'm not sure which step went wrong.
Reliable Open Source LLM as a Service (www.reddit.com) Has anyone figured out a provider whose open source models (Kimi, Qwen, GLM e.t.c) can be used reliably in production. I have tested some well known providers and they all suffer from high latency and poor uptime rendering them mostly usel…
Vertex MaaS GLM-5 prompt cache telemetry seems inconsistent. Anyone else seeing this? (www.reddit.com) I'm testing prompt-cache behavior for GLM models on Vertex AI MaaS and I'm seeing inconsistent telemetry. I reproduced it with a synthetic long prompt and repeated identical requests.
Which Chinese Model is best for planning and which is best for implementation? I'm currently using Opencode with an Openrouter API Key, mostly wanna decide between Kimi, GLM, DeepSeek, Qwen, Minimax and Mimo (www.reddit.com) Original plan was to use Kimi/GLM for planning and DeepSeek for implementation, but seeing a lot of love for MiMo and Minimax lately. Anyone running a planner + coder split on Opencode?
Which model has less restrictions now? (www.reddit.com) GPT and Opus block on certain requests. This didnt use to be the case 2 months ago and I made signficant progress with Opus and then one day I had a 2 week break and then a single prompt to continue the work resulted in refusal.
Group Buys for Shared Compute or Model Hosting? Is this a thing? (www.reddit.com) I've been using GLM 5.1 a lot lately, and I love this model. However I don't love sending all my requests to China.
I plan to use a chinese AI model through API for coding through a harness, I'm a uni student so nothing prod related for now. should i go deepseek, minimax, kimi or glm? kinda confused (www.reddit.com) Just cancelled my claude subscription due to poor rate limits, gemini cli doesn't really excel in coding from my personal experience, and my local hardware isn't that powerful to run local AI models, and while codex is good, I wanna try so…
PP speed on dual RTX 6000 12c EPYC setup (www.reddit.com) I want to run big models like GLM 5.1 or Kimi k2.6. I can buy Mac Studio M3 Ultra with 512gb ram, but PP speed would be ofc bad.
Local LLM Benchmark about Backend Generation by Function Calling (GLM vs Qwen vs DeepSeek) (www.reddit.com) Detailed Article: https://autobe.dev/articles/local-llm-benchmark-about-backend-generation.html Five months ago I posted the "Hardcore function calling benchmark in backend coding agent" thread here. As I wrote in that post, it was an unco…
Built a self-hosted agent for small businesses that writes its own skills. ~$0.15 per customer booking on GLM-5.1 (www.reddit.com) Been working on this for a while and finally at a point where it's running in production for a couple of small businesses, so figured I'd share. The thing that kept bugging me about "AI employee" products is that none of them are something…
Received a message from Z.AI about occasional garbled outputs and unexpected behavior (www.reddit.com) I received this mail: "Hi developers, Some of you flagged occasional garbled outputs and unexpected behavior when building with the GLM-5 series, especially under heavy workloads. We heard you, reproduced the issues, and the fixes are now…
Comparing SVG Generation for the top open models (codeinput.com via reddit) Some of the larger models (like Llama) weren't available on OpenRouter, so I had to work with what was there. Best small model: Gemma 4 26B For its size, I think it had the best output.
Best value in the 20$ range coding agents? I want the best quality and high-usage-limit I can get at that price. (www.reddit.com) I'm a compsci student and I've been using the 10$ copilot plan for about 2 years now, and it was fine for me since I did a good model distribution taking into account the complexity of the task, I was able to get through the month always u…
anyone actually tried deepseek v4 pro for coding? (www.reddit.com) so v4 pro dropped and barely anyone is talking about it. feels weird since when kimi k2.6 came out i seen post about it everywhere anyone here tried v4 pro for actual code work?
Qwen 3.5 397b and GLM 5.1 Opus fine tune (www.reddit.com) Hi all. Many models on hugging face have been fine tuned with that 3000x opus dataset, but the two I mentioned in the title are missing it.
Best app to use Nvidia Nim? (www.reddit.com) Show HN: RepoGauge – save token costs and compare agents on your own repos (repogauge.org via hn) I've grown increasingly skeptical that public coding benchmarks tell me much about which model is actually worth paying for and worried that as demand continues to spike model providers will silently drop performance. I did a few manual an…
Minimax vs Qwen vs Kimi vs Mimo(Omni) vs Glm ( via reddit) could not extract summary
Upgrade paths for my 256g ddr4 ram + 4x24g vram system (www.reddit.com) So I was just about to give up playing with local models, until I realised I can actually run GLM 5.1 at not too horrible speeds, using this quant https://huggingface.co/ubergarm/GLM-5.1-GGUF/tree/main/IQ2_KL in ik llama. Getting around 6.…
Which AI model is best for real data analysis? [benchmark] (www.reddit.com) I created and run a benchmark for AI models in data analysis tasks. In contrary to other benchmarks, it is not one-prompt benchmark, but I tried to simulate the real work of data analyst.
Model API Performance (news.ycombinator.com) We’ve been benchmarking a few models on our API platform and got some interesting performance numbers: - MiniMax M2.5 → 0.118s time-to-first-token, 103 tokens/sec - GLM 5.1 → 120 tokens/sec throughput - Kimi K2.5 → 0.643s TTFT, 69 tokens/s…
What Am I Doing Wrong? Models Won't Listen, At All (GLM 5.1, MiniMax M2.7, Kimi K2.5) (www.reddit.com) What am I doing wrong here? I can't get models to follow my instructions, pretty much at all.
Would you pay for Chinese AI models if the quality was close enough? (www.reddit.com via reddit) DeepSeek, Qwen, and GLM aren't necessarily winning every benchmark. But they don't need to.
GLM-5.1 and Kimi K2.6 THE CHEAPEST WAY TO RUN (www.reddit.com via reddit) Guys how to run it as cheap as possible to get at least 15-20 ts? Asking for a friend!
Dynamic Workflows With External Models and Max Plan? (www.reddit.com via reddit) Has anyone figured out a way to mix max plan with models from other providers (like GLM or Deepseek) while using dynamic workflows? I suppose we could create a passthrough proxy and route sonnet and haiku to other models?
Z.ai, we need Air! GLM GGUF wen? (www.reddit.com via reddit) First we never saw an upgraded Air model after 4.5. Then GLM 4.7 Turbo was great, but quickly surpassed for coding.
Fuck, sucessfully ran minecraft server on GLM AI's Agent lol. (www.reddit.com via reddit) I just told it, make a minecraft server and let me play and it worked lol. I just asked "host a minecraft server so I can play" and it did host it, made me a dashboard ands its crazyyyyy lol, It is hosted in hongkong somewere TwT
Went to the monthly AI dev meetup (www.reddit.com) Usual crowd. Everyone's on Claude or Codex, nobody's really sure how any of it actually works, and that's fine, that's the vibe.
Some tests with qwen3.6 27b + 35b a3b about MTP vs ngram-mod (www.reddit.com) I will try to keep this short ;) I used GLM 5.1 to vibecode a vague prompt on my vibecoded react web app and have GLM 5.1 rank the plans made with each other and the one it made itself. Test strategy: - use starter prompt as always - add v…
OCR: what is the best way to extract data in JSON format from this old French book? (www.reddit.com) As some of you may have guessed, what we have here is an old Bible. I would like to extract the following information from the page: { verse: number, verse_content: string, comments: string[] } I've played around with PaddleOCR a bit; I co…
How to Find Open-Source Models / Providers that Do not Train on Data (www.reddit.com) A lot of people are saying just use X, just do Y, just run Z locally, but the best models cannot be run locally (GLM 5.1). No one ever talks about privacy, but for those concerned about privacy, how do we know when we use Z AI's GLM 5.1 th…
I built a 24h TPS + Intelligence Index table for Ollama Cloud models (www.reddit.com) I recently made ollamatps.com for my own model-selection workflow and thought it might be useful here too. It shows 39 Ollama cloud models sorted by average TPS over the last 24 hours, and I added the Artificial Analysis Intelligence Index…
We built Irene — an AI agent platform that actually remembers you, builds its own tools , adapts and improve as you use it (www.reddit.com) Hey r/AI_Agents — we're launching Irene today, and I want to be straight about what it is, why we built it, and where it's going. What makes Irene different Affordable with massive token limits and the latest open-source models We have gen…
Mac Studio local loadout - May 2026 (www.reddit.com) Day-to-day user vibes, not rigorous benchmarks, so YMMV. GLM 5.1 has by far been my biggest winner in the last batch of releases.
GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)? (www.reddit.com) I wonder which one is better, I tested it a little bit (too slow, of course) and I'm still unsure. Does the GLM-5.1 smol-IQ2_KS loses too much?
Running 7 autonomous AI agents for 14 days. Here's what actually happens when they need to find customers. (www.reddit.com) I set up 7 AI coding agents on a VPS with automated cron sessions (2-8 per day depending on the agent). Each uses a different model: Claude Sonnet, GPT-5.4, Gemini 2.5 Pro, DeepSeek V4 Pro, Kimi K2.6, MiMo V2.5 Pro, GLM-5.1.
Does running a model (like qwen3.6-27b) on vllm or transformers use less VRAM than llama.cpp? (www.reddit.com) I have been using llama.cpp to run some models recently. For example, I've been running GLM-4.7-Flash with this command .\llama-server.exe -hf unsloth/GLM-4.7-Flash-GGUF:Q6_K_XL --alias "GLM-4.7-Flash" --host 127.0.0.1 --port 10000 --ctx-s…
Should I replace stored models? (www.reddit.com) Hello everyone, the question is easy, with the new models of deepseek, kimi, GLM and qwen, should you replace the old models with the new version? Do I lose some quality, information or performance in the process?
Did anyone of you already make the "doomsday" or "offgrid" knowledge based? (ofc powered with LLM) (www.reddit.com) Basically, I’m really into the idea of a fully offline setup. (Another way to say it: I’m a data hoarder.) For LLMs, I’m using uncensored models from both Western (Gemma, GPT-OSS) and Eastern ones (GLM 4.7 Flash, Qwen 35B).
Qwen 3.6 27b S2 Opus + GLM + Kimi (huggingface.co via reddit) My first time releasing a fine-tune publicly! If anyone wants to independently eval against base, that’d be awesome.
How will you scale these models (www.reddit.com) How will you scale these models coding and overall. Deepseek v4 pro Kimi k2.6 Mimo v2.5 pro Glm 5.1 Qwen 3.6 plus
GLM 5.1 is so smart! ( via reddit) could not extract summary
QClaw-4B — a 4B agent model fine-tuned for tool use and agentic workflows (www.reddit.com) QClaw-4B is a 4-billion parameter language model fine-tuned for agentic tasks and tool use, designed for use with OpenClaw-compatible agent frameworks. Despite its compact size, QClaw-4B achieves state-of-the-art results in the 4B class, m…
Best open source LLM for planning ? (www.reddit.com) The quality of GPT-5.4 is infuriatingly POOR (www.reddit.com) I got a Codex membership when GPT-5.4 launched and was getting by well enough for a while. Then I started using Claude and GLM 5.1, and my production quality improved significantly.
FREE Claude Code alternative using GLM 5.1 + VS Code (tutorial) (www.reddit.com) https://youtu.be/tL3cOdgukt8
What’s your LLM routing strategy for personal agents? (www.reddit.com) TL;DR I try to keep most traffic on very cheap models (Nano / GLM‑Flash / Qwen / MiniMax) and only escalate to stronger models for genuinely complex or reasoning‑heavy queries. I’m still actively testing this and tweaking it several times…
Claude Code with Pro subscription + OpenRouter in parallel — what's the cleanest setup? (www.reddit.com) Hi there, I have a Claude Pro subscription and use Claude Code daily. I'd also like to use Claude Code routed through my OpenRouter API key so I can experiment with other models (GLM-5.1, DeepSeek, Kimi, Gemini, etc.) — without giving up m…
Long context prompt help (www.reddit.com) Hi all, I'm running GLM 4.7 flash uncensored (Q8) on a 5090. I'm trying to get it to edit a short story (about 8.5k tokens, added via PDF) to add a scene.
Speed on m5 pro 48Gb (www.reddit.com) Hey guys! How would you reckon a 30-50b model would run on a 48 GBs m5 pro?
Why most open-source models can't answer this question while most closed-source models can answer most of the time? (www.reddit.com) WEB SEARCH WAS ALWAYS ON!!!! Question Calculate the precise VRAM requirement for the **KV Cache only** at the maximum context window for **DeepSeek V3.2** and **MiniMax M2.5**.
GLM OCR for Arabic (www.reddit.com) So, I have been testing GLM OCR for my rag app, but it is not working good for Arabic. It is unable to extract data either on textual page, scanned pages or even images.
GLM-5.1: Towards Long-Horizon Tasks (simonwillison.net) 7th April 2026 - Link Blog GLM-5.1: Towards Long-Horizon Tasks. Chinese AI lab Z.ai's latest model is a giant 754B parameter 1.51TB (on Hugging Face) MIT-licensed monster - the same size as their previous GLM-5 release, and sharing the sam…
Stop donating your salary to OpenAI: Why Minimax M2.5 is making GPT-5.2 Thinking look like an overpriced dinosaur for coding plans. (www.reddit.com)