could not extract summary
#minimax
110 items
Ryan Lee from MiniMax posts article on the license stating it's mostly for API providers that did a poor job serving M2.1/M2.5 and may update the license for regular users! (www.reddit.com) Open Models - April 2026 - One of the best months of all time for Local LLMs? (www.reddit.com) Any underrated or overlooked models? FYI MiniMax-M2.7 switched their license(from MIT to Non-Commercial) so it's not in graph.
Minimax M2.5 vs. GLM-5 vs. Kimi k2.5: How do they compare to Codex and Claude for coding? (www.reddit.com) Guys we have to change the pelican test (www.reddit.com) So i have been seeing more of those pelican on a bike svg tests and while they work i feel like (and maybe you guys do too) they are getting kinda benchmaxxed so we should switch things up soon and this is my idea generate me a html svg of…
MiniMax m2.7 under 64gb for Macs - 91% MMLU (www.reddit.com) https://huggingface.co/JANGQ-AI/MiniMax-M2.7-JANGTQ Used TQ as quantization method where it matters. Finally mac users under 64 gb - esp base m5 users can get a real cloud SOTA-like level LLM running from home.
Update LICENSE · MiniMaxAI/MiniMax-M2.7 at edf8030 (huggingface.co via reddit) RyanLee's(MiniMax) recent tweets for same. I just updated our license.
Dual dgx spark (Asus GX10) MiniMax M2.7 results (www.reddit.com) MiniMax released MMX-CLI: one CLI for text, image, video, speech, music, vision, and web search — no MCP server needed. Works natively in Claude Code, Cursor, OpenClaw. (www.reddit.com) MiniMax just open-sourced MMX-CLI, a command-line tool built specifically for AI agents. Seven command groups: mmx text, mmx image, mmx video, mmx speech, mmx music, mmx vision, mmx search.
MiniMax M2.7 GGUF Investigation, Fixes, Benchmarks (www.reddit.com) Hey r/LocalLLaMA, we did an investigation into MiniMax-M2.7 GGUF causing NaNs on perplexity. Our findings show the issue affects 21%-38% of all GGUFs on Hugging Face (not just ours).
My first impressions of Minimax M2.7 (Q5_K_M) vs Qwen 3.5 27b (Q8_0) (www.reddit.com) I'm not sure if the AesSedai's Q5_K_M version of Minimax M2.7 is too much lobotomized or if the model itself is kind of weak. I did a simple experiment with both models running with the recommended parameters.
Those of you running minimax 2.7 locally, how are you feeling about it? (www.reddit.com) Im running the raw version straight from the minimax release on hugging face (https://huggingface.co/MiniMaxAI/MiniMax-M2.7) on 3 rtx pro 6000's on vllm. So no quantization.
Luce DFlash + PFlash on AMD Strix Halo: Qwen3.6-27B at 2.23x decode and 3.05x prefill vs llama.cpp HIP (www.reddit.com) Hey fellow Llamas, keeping it short. We just shipped DFlash and PFlash support for the AMD Ryzen AI MAX+ 395 iGPU (gfx1151, Strix Halo, 128 GiB unified memory).
Running Minimax 2.7 at 100k context on strix halo (www.reddit.com) Just wanted to share because it took me a lot of tweaking to get here: llama-server -hf unsloth/MiniMax-M2.7-GGUF:UD-IQ3_XXS --temp 1.0 --top-k 40 --top-p 0.95 --host 0.0.0.0 --port 8080 -c 100000 -fa on -ngl 999 --no-context-shift -fit of…
2x Asus Ascent GX10 - MiniMax M2.7 AWQ - cloud providers are dead to me (www.reddit.com) Hello, I've been on a quest to get something "close enough" of Opus 4.5 running locally, for agentic coding, as SWE with 15 years of experience. I tried with one spark (yeah I'm calling my Asus Ascent GX10 sparks - they're the same), with…
A 1-bit quant of MiniMax 2.7 that runs from a CD at 1500 tk/s would be nice. (www.reddit.com) Badda Boom.
Single question llm comparison (www.reddit.com) MiniMax M2.7 AWQ-4bit on 2x Spark vs 2x RTX 6000 96GB - performance and energy efficiency (www.reddit.com) Hello, This model/quant is my daily driver and I wanted to have some reference benchs for comparing my setup with a 3x more expensive and 4x time power hungry setup. Results first, methodology after, link at the end with all results Model:…
Kimi K2.6-Code-Preview, Opus 4.7, GLM 5.1, Minimax M2.7 and more tested in coding (www.reddit.com) Hi everyone. It's been a while since I posted (was a lil burned out), but some of you may have seen my older SanityHarness posts.
Bench 8xMI50 MiniMax M2.7 AWQ @ 64 tok/s peak (vllm-gfx906-mobydick) (www.reddit.com) Inference engine used (vllm fork): https://github.com/ai-infos/vllm-gfx906-mobydick/tree/main Huggingface Quants used: cyankiwi/MiniMax-M2.7-AWQ-4bit Relevant commands to run: docker run -it --name vllm-gfx906-mobydick-mixa3607 -v ~/llm/mo…
I Made LLMs Play Texas Hold’em. The Smallest Model Beat a ~1T Model by Being Too Dumb to Fold (www.reddit.com) Made LLMs play Texas Hold’em against each other. 6 models at the table: a tiny 1.2B running locally on my 16GB MacBook, a couple mid-size ones, and cloud models going up to about 1 trillion parameters.
MiniMax M2.7 ultra uncensored heretic is Out Now with 4/100 Refusals, Available in Safetensors and GGUFs Formats! (www.reddit.com) llmfan46/MiniMax-M2.7-BF16-ultra-uncensored-heretic: https://huggingface.co/llmfan46/MiniMax-M2.7-BF16-ultra-uncensored-heretic llmfan46/MiniMax-M2.7-ultra-uncensored-heretic-GGUF: https://huggingface.co/llmfan46/MiniMax-M2.7-ultra-uncenso…
Updated Minimax m2.7 still doesn't allow coding a product. But before the next riot starts, Ryan Lee has already confirmed that they are still working on the license, and sale of products built by m2.7 is permitted. (www.reddit.com) could not extract summary
Pushing the limit: minimax m2.7 q8_0 128k on 2x3090, 256GB DDR4 (www.reddit.com) CPU is just a secondhand 10900x. Using 128k context, unquantized kv cache.
Your local LLM predictions and hopes for May 2026 (www.reddit.com) Which of these do you think we'll get in May? Also, feel free to pick/rank which ones you'd want the most badly: more Gemma4 models (124b?) (other sizes?) more Qwen3.6 models (9b?
Comparing GPT-5.4, Opus 4.6, GLM-5.1, Kimi K2.5, MiMo V2 Pro and MiniMax M2.7 (www.codejam.info via hn) DeepSeek's 10T USD grand strategy (twitter.com via hn) Have you ever wondered, how DeepSeek may make money, and lot of it? They didn't come up with competitive coding plans like GLM, MoonShot and MiniMax.
Testing MiMo-V2.5-IQ3_S with 1'048'576 context (www.reddit.com) llama-server.exe --model "H:\gptmodel\AesSedai\MiMo-V2.5-GGUF\MiMo-V2.5-IQ3_S-00001-of-00004.gguf" --ctx-size 1048576 --threads 16 --host 127.0.0.1 --no-mmap --jinja --fit on --flash-attn on -sm layer --n-cpu-moe 0 --threads 16 --parallel…
I hate this group but not literally (www.reddit.com) True story, I got interested in AI after seeing it at work and wanted to run models locally. I started with an M3 Ultra 96GB, quickly learned it was not enough for what I wanted, and kept upgrading hardware (including refurbished Mac Studi…
Cuda + ROCm simultaneously with -DGGML_BACKEND_DL=ON ! (www.reddit.com) I invested quite a bit of time and it wasn't easy but finally I can run models like Minimax 2.7 Q4 using Cuda+ROCm at the same time bypassing Vulkan. load_tensors: offloaded 63/63 layers to GPU load_tensors: CUDA0 model buffer size = 83650…
Tenstorrent TT-QuietBox 2 Specifications (Blackhole) (www.reddit.com) Source: https://docs.tenstorrent.com/systems/quietbox/quietbox-bh-2/specifications.html Currently supported models: https://tenstorrent.com/developers From the specification docs above: CPU: Ryzen 7 9700X 65W Granite Ridge 3.8GHz Memory: 2…
MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities (twitter.com via hn) MiniMax (official) @MiniMax_AI Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench H…
Minimax M3 on Open Router (openrouter.ai via hn) MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding, and tool use.
I built a local GUI for the TradingAgents framework — works with Ollama (www.reddit.com) https://preview.redd.it/i90oxxk7n03h1.png?width=1898&format=png&auto=webp&s=7d219c804fda7dfe122b84fcdb6d0d6883818c68 A while back I came across TradingAgents — a really cool multi-agent LLM stock analysis framework where like a dozen "agen…
Considering two Sparks for local coding (www.reddit.com) I'm currently running a 4x RTX 3090 system (96GB VRAM, DDR4 2133 RAM) and have tested opencode and pi.dev using Qwen3.5-122B-A10B (AWQ) up to 200k context for web app coding (html/js/python). I'm now seriously considering picking up two Sp…
Agile as a cat (www.reddit.com) https://preview.redd.it/kgkv6knv2dyg1.png?width=1026&format=png&auto=webp&s=d2e37f1914136ad672bcecf98741eee5e8cd69da MiniMax M2.7 AWQ 4bit hallucinated a URL and instantly pivoted to treating its own error as a joke. That made me laugh (do…
Current state of open-source ? (www.reddit.com) I’m trying to understand the current open-source LLM landscape beyond surface-level hype. We all got used to the nerfed products of Claude/Geminj so I believe really in opensource as a solution.
MiniMax debuts AI model built for long and complex coding tasks (www.scmp.com via hn) MiniMax debuts AI model built for long and complex coding tasks Shanghai-based company says M3 can process data five times faster than its predecessor, while also slashing inference costs Chinese artificial intelligence start-up MiniMax ha…
MiniMax teased M3 Sparse Attention: 9.7x prefilling, 15.6x decoding at 1M (twitter.com via hn) Don’t miss what’s happening People on X are the first to know. Log in Sign up Post Conversation Skyler Miao @SkylerMiao7 Something BIG is coming 2:49 PM · May 26, 2026 307.3K Views New to X?
JANGQ-AI/MiniMax-M2.7-JANGTQ_K : mixed-bit quant of MiniMax M2.7 - 74 GB on disk (huggingface.co via reddit) MiniMax-M2.7-JANGTQ_K MiniMax M2.7 — 74 GB on disk (down from ~230 GB FP8 source) — mixed-bit JANGTQ_K quantization in JANGTQ-PRESTACK layout. Source: MiniMaxAI/MiniMax-M2.7 (62 layers, 256 routed experts top-8, 196K context) Quantization:…
Strix Halo 128GB on Proxmox - Vulkan vs ROCm benchmark matrix (www.reddit.com) Ryzen AI MAX+ 395, Bosgame M5, 128GB LPDDR5x. Proxmox VE 9.1 LXC containers with GPU passthrough.
I got better results when I made each AI tool do one job (www.reddit.com) I spent too much time trying to find one AI dev tool that could do everything. Planning, coding, fixing, reviewing, maybe filing my taxes too It never really worked.
Boundaries of Stationary Feature Learning: A Minimax Barrier for Scaling Laws (zenodo.org via hn) I do not derive the Chinchilla scaling law; I map the boundaries of the regime in which such a derivation could even be attempted. Working in the μP feature-learning setting on a Sobolev-on-manifold data model, I establish what the station…
MiniMax M3 Benchmarks One Pager (filecdn.minimax.chat via hn) could not extract summary
MiniMax M3 (xcancel.com via hn) Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax…
Show HN: Free open source coding models in Slack (www.runcord.com via hn) Hey HN, We believe we have the easiest onboarding from signup to being able to spin up coding agents in slack like Stripe, Ramp & Coinbase. Demo of the onboarding: https://www.tella.tv/video/connecting-cord-to-slack-1-19ep Every signup get…
What workstation to get for ~13k EUR? (www.reddit.com) My use-cases will be to test open-weight LLMs and work on harnesses, inference systems and possibly other non-ML workflows (CS-related) in the future. Fine-tuning would not be something I do locally because I can rent a B200 from RunPod fo…
Chinese AI Coding Plan (www.reddit.com) With the lowering usage limit in Claude, I am thinking of jumping ship to Chinese AI, since the benchmark is already very near compared to Sonnet or Haiku 4.5 , but for a fraction of the price. I am not worried about where is my data endin…
Is Qwen3-coder the best kept secret out there? (www.reddit.com) So I'm brand new to this scene but I'm using Claude to help me fine tune a model for a startup idea I have in the Healthcare space. I have been working with the 27-35B parameter mdoels (Qwen3.6, Gemma 4) and the couple of 120B+ models (Qwe…
Mesh LLM to build private personal AI, using open models (www.anarchai.org via hn) Model Catalog Filter Connected Peers | ID | Role | Version | Status | Model | Latency | VRAM | Share | | --- | --- | --- | --- | --- | --- | --- | --- | | | Host | 0.65.1 | Serving | MiniMax-M2.5 | <1 ms | 256.0 GB | 42% | | | Host | 0.65.…
M3 Ultra + DGX Spark = M5 Ultra-lite? (www.reddit.com) So I saw an article recently about exo disaggregated prefill with DGX Spark and M3 Ultra - prefill on one machine and decode on another. DGX Spark apparently has 4x matmul performance over an M3 Ultra - same as the M5 Ultra should have.
What's the best suscription under 20$? (www.reddit.com) I’m pretty overwhelmed. I feel like there are so many options that I don’t know which one to choose, and trying things until I find a decent one isn’t really my thing—even though I enjoy it.
Best Practices to Start with Vibe Coding? Best Local Apps for Agentic Vibe Coding? (www.reddit.com) DISCLAIMER: I am not a programmer nor do I have experience coding. I've been thinking about a small app running on gradio for some time now, and I want to try tweaking some extension for ComfyUI.
Web UI (www.reddit.com) Has chinese lab opensource their web UI? I am really impressed by minimax UI, coupled with agents, is there any similar self hostable UI for local llm?
eGPU vs system RAM (www.reddit.com) OpenCode + Self host Minimax-2.7 via SGLang? (www.reddit.com) anyone knows how to setup opencode to work with self hosted minimax-2.7 properly? It has <think> and </think> in the message and OpenCode failed to parse the answer correctly.
Use Claude, ChatGPT, or MiniMax Subscriptions in Cursor (open-vsx.org via hn) Ungate A Cursor-first extension for using Claude, ChatGPT, and MiniMax subscriptions in Cursor instead of paying for API tokens. How it works Ungate lets you use Claude, ChatGPT, and MiniMax in Cursor through account subscriptions instead…
Minimax M2.7 on Q3_K_S or Smaller Model with greater precision? (www.reddit.com) I currently am looking for models to fit into my single DGX Spark for use. I have an RTX Pro 6000 and also a 5090 as well that I'm considering using in combination if the DGX Spark is too slow, but the intent here is to play around with Op…
Ollama Cloud - Pro (www.reddit.com) Hi. I've been looking at ollama cloud's Pro offering ($20), which says "Run 3 cloud models at a time".
Ask HN: Former grok-code-fast-1 users, what coding model are you using now? (news.ycombinator.com) I get good, cheap, fast feature coding success with grok-4.1-fast for planning and grok-code-fast-1 for execution. But according to the Openrouter usage stats, grok-code-fast-1 is now old hat - usage dropped off a cliff in mid-Feb.
Inference Optimization for MiniMax Sparse Attention (www.together.ai via hn) - Together AI is the preferred cloud partner for MiniMax M3. Together AI will host the open-weights model as a developer endpoint upon its public release.
MiniMax M3 Review: Matching GPT-5.5 and Opus? (thomas-wiegold.com via hn) I ran my usual coding tests — two websites, a poker sim, and a code audit. Here's how MiniMax M3 actually stacks up against GPT-5.5 and Opus 4.8.
been pairing M2.7 with Hermes Agent for a few weeks. holds up surprisingly well. anyone else running this combo? (www.reddit.com) been self-hosting hermes agent locally for a few months and rotating through different model backends for it. tried claude sonnet 4.5, gpt-5.5, qwen 3.6 coder, and most recently minimax m2.7.
↯ Sonnet 4.5↯ Sonnet 4.5↯ Sonnet 4.5↯ Sonnet 4.5↯ Sonnet 4.5↯ Sonnet 4.5minimaxgpt-5sonnet+1
Finally tested an AI video tool that works directly in Claude without setup (www.reddit.com) Been using Claude for everything creative lately and got tired of switching to Runway every time I needed video. Found out Higgsfield supports MCP, connected it once, and now Claude generates video directly in chat.
Best bang for the buck model/provider for <15$ month (www.reddit.com) I am currently using Minimax Token plan (1500req/5h) for 10 dollars/month, but I would like to upgrade to a stronger model. I am not someone who pushes the 1500req to its limits, but I like this feeling of capped costs per month.
Help me choose an LLM Provider which doesn't take my life savings (www.reddit.com) Hi everyone 👋 I’m trying to choose an LLM provider for my personal projects and side experiments, but I also don’t want my API bill to quietly consume my entire salary 😅 My primary use cases are: Coding assistance Agentic workflows Browser…
Testing MiniMax M2.7 via API on three real ML and coding workflows (andlukyane.com via hn) Testing MiniMax M2.7 via API on three real ML and coding workflows I recently got access to some MiniMax M2.7 API credits, so I decided to plug this model directly into Claude Code and run it on three workflows I do regularly. The same tas…
Full Hermes Agent tutorial (Spanish with English auto-translation). Computer Use, MCP Blender, Hindsight memory and multi-agent setup (www.reddit.com) Spent weeks running Hermes Agent in production on my Mac Mini M4 before recording this. Wanted to show things nobody else was covering.
Has anybody been able to achieve reliable agentic performance with cheap/open source models? (www.reddit.com) Basically the title. Recently I've been trying various open source and comparatively cheaper models like minimax m2.7, qwen models and glm5.1 in Pi agent from openrouter, and the performance on coding tasks have be moderately adequate at b…
Regex Chess: A 2-ply minimax chess engine in 84,688 regular expressions (nicholas.carlini.com via hn) by Nicholas Carlini 2025-01-05 Over the holidays I decided it's been too long since I did something with entirely no purpose. So without further ado, I present to you ...
Show HN: Ungate – use Claude and GPT subscriptions in Cursor without API costs (github.com via hn) Ungate A Cursor-first extension for using Claude, ChatGPT, and MiniMax subscriptions in Cursor instead of paying for API tokens. How it works Ungate lets you use Claude, ChatGPT, and MiniMax in Cursor through account subscriptions instead…
Which Chinese Model is best for planning and which is best for implementation? I'm currently using Opencode with an Openrouter API Key, mostly wanna decide between Kimi, GLM, DeepSeek, Qwen, Minimax and Mimo (www.reddit.com) Original plan was to use Kimi/GLM for planning and DeepSeek for implementation, but seeing a lot of love for MiMo and Minimax lately. Anyone running a planner + coder split on Opencode?
I plan to use a chinese AI model through API for coding through a harness, I'm a uni student so nothing prod related for now. should i go deepseek, minimax, kimi or glm? kinda confused (www.reddit.com) Just cancelled my claude subscription due to poor rate limits, gemini cli doesn't really excel in coding from my personal experience, and my local hardware isn't that powerful to run local AI models, and while codex is good, I wanna try so…
I built vivkemind – an open-source, local‑first terminal AI coding agent with full AWS Bedrock support (www.reddit.com) wanted a terminal AI coding agent that doesn't lock me into one model provider. So I forked Qwen Code and added full support for every model available in AWS Bedrock.
Show HN: Token Usage Meter 12 Providers and Coding Agent (qlaud.ai via hn) Here once again A Token Usage Meter for 12+ AI Providers Anthropic, OpenAI, Google, Alibaba qween, Moonshot Kimi, MiniMax, ElevenLabs, Deepgram, Perplexity. Qlaud.ai provides token usage meter / AI billing layer.
Comparing SVG Generation for the top open models (codeinput.com via reddit) Some of the larger models (like Llama) weren't available on OpenRouter, so I had to work with what was there. Best small model: Gemma 4 26B For its size, I think it had the best output.
Free llm APIs from Nvidia (www.reddit.com) So build[.]nvidia[.]com[/]models give access to free APIs for llms ranging from SLMs to frontier models. I tried building with it and let's say the APIs are so slow to respond.
Please help improving a CPU-only inference speed (www.reddit.com) This is a request for help for the people that want to use locally very large models on Q8 and better quanta at all costs, in my case the cost is inference speed. So I have a 512GB DDR4 ECC 2666 with a Threadripper Pro 3945WS that gives me…
Minimax vs Qwen vs Kimi vs Mimo(Omni) vs Glm ( via reddit) could not extract summary
My frustrating experience with MiniMax models! (www.reddit.com) I keep on hearing from community here that Minimax models are pretty solid, their benchmark are also always respectable but I am never able to get decent result from them. I have tried local setup (multiple harness) I have even tried their…
How does a self correcting loop for AI agents work? (www.reddit.com) Hey guys, just checked out minimax 2.7, where they used AI to train itself, and ran over a hundred loops, and it improved it's performance by 30%, how does that work, can I also run a script that makes AI store it's memory in a loop on a m…
Model API Performance (news.ycombinator.com) We’ve been benchmarking a few models on our API platform and got some interesting performance numbers: - MiniMax M2.5 → 0.118s time-to-first-token, 103 tokens/sec - GLM 5.1 → 120 tokens/sec throughput - Kimi K2.5 → 0.643s TTFT, 69 tokens/s…
What Am I Doing Wrong? Models Won't Listen, At All (GLM 5.1, MiniMax M2.7, Kimi K2.5) (www.reddit.com) What am I doing wrong here? I can't get models to follow my instructions, pretty much at all.
MiniMax is digging its own grave (www.reddit.com via reddit) Generalization in Deep Neural Networks: Minimax Rates for Gradient Methods (arxiv.org) Understanding the generalization performance of over-parameterized neural networks has become a central topic in deep learning theory. While recent advances, particularly works under the Neural Tangent Kernel (NTK) regime, have shed light…
A Temporal Spatial Minimax Rate for Smoothly-Varying Distributions in Wasserstein Space (arxiv.org) We study the minimax rate of estimating a future value $\mu{tn+h}$ of a curve $t\mapsto\mut$ in the $2$-Wasserstein space $\mathcal{P}2(\mathbb{R}^d)$ from finitely many noisy snapshots of its past, under an adiabatic bound $\|\nablat^k v\…
Running a 24/7 AI agent dev team: I route each role to a different LLM (Claude/Kimi/MiniMax/GPT) to dodge a ~$2k/mo API bill. Setup + what actually breaks. (www.reddit.com via reddit) Context: I run an autonomous engineering "org" of AI agents on my own product. Once it grew past ~5 agents and started running around the clock, it maxed my Claude Max weekly limit by mid-week.
Qwen 3.6 27B on DeepSWE (www.reddit.com via reddit) Overview: It scored 2% (1.79% rounded up) It is 18/20th place scoring above Haiku 4.5 and Minimax M2.7 Full benchmark took 70 hours Average time per task 32m Average output tokens per task: 44k Perspectives: It scored suspiciously similar…
AA comparison of the latest local models (www.reddit.com via reddit) I picked models I consider local (usable on 3×3090), so there are no 300B models, and you should probably skip 200B models too (but MiniMax and Step are pretty fast in Q3) Gemma-4 12B is still missing
Anyone has experience between Mimo flash v2.5 pro vs Composer 2.5 (cursor pro+) (www.reddit.com via reddit) I have Mimo subscription alongside Claude Code Max. You won’t believe how suck Claude Opus can be at certain task but it does get more job done than any other model I have tried.
Literature-Guided Minimax Optimization of Virtual Epilepsy Neurostimulation (arxiv.org) Minimax optimal differentially private synthetic data for smooth queries (arxiv.org) I went from 1 to 10 apps on the App Store in 4 months - vibe coding as a senior iOS dev (www.reddit.com) I code for 20 years and make mobile apps for 15+. This February I decided to try vibe coding, but at scale.
My 1.2B model won 2 out of 5 poker tournaments against models up to 1T params. (www.reddit.com) I made 6 LLMs play Texas Hold’em against each other. Ran 5 tournaments on my 16GB MacBook.
We built Irene — an AI agent platform that actually remembers you, builds its own tools , adapts and improve as you use it (www.reddit.com) Hey r/AI_Agents — we're launching Irene today, and I want to be straight about what it is, why we built it, and where it's going. What makes Irene different Affordable with massive token limits and the latest open-source models We have gen…
Spec decoding for minimax m2.7? (www.reddit.com) MTP was not released for m2.7, so would anyone have experience with setting up speculative decoding for minimax m2.7 and its results? Whether via EAGLE3 or a distilled variant
Opus 4.6 is Vicious (www.reddit.com) This is the hardest I've ever seen it riff. Full shared link at the bottom, but here are some highlights.
[Research use case] MiniMax-M2.7 with small context, CPU+GPU (5090) setup on Llama.cpp (www.reddit.com) I was experimenting yesterday with running oversized models with smaller context size, hoping that leaving them overnight could compensate for the slow token generation and periodic pauses for compaction or task chunking. Summary: For rese…
Is it possible to edit LLAMA.CPP with Cline+Vscode+Minimax 2.7 Q4_K_S and get a working build? (www.reddit.com) It all started yesterday with this post by u/antirez https://www.reddit.com/r/LocalLLaMA/comments/1sw3stb/llamacpp_deepseek_v4_flash_experimental_inference/ I was intrigued by the first Deepseek V4 Flash GGUF in a small size that can fit o…
What's the smallest reasonable quant for coding? (www.reddit.com) What’s your LLM routing strategy for personal agents? (www.reddit.com) TL;DR I try to keep most traffic on very cheap models (Nano / GLM‑Flash / Qwen / MiniMax) and only escalate to stronger models for genuinely complex or reasoning‑heavy queries. I’m still actively testing this and tweaking it several times…
Use this prompt if you want to find a specific info off the Internet with lowest wrong answer possiblity. Works best for ~30b models. (www.reddit.com) For context i used to ask many near 30b model this question --> **^(Calculate the precise VRAM requirement for the \*KV Cache only** at the maximum context window for **DeepSeek V3.2** and **MiniMax M2.5**. * **DeepSeek V3.2 Max Context:**…
Need suggestions for local AI Machine (www.reddit.com) I’ve been running various AI harnesses like OpenClaw, ForgeCode, ClaudeCode, etc. Most of these are running via OpenRouter or Minimax (credits/subscription model).
But why Local LLM? How does this make economic sense vs API? (www.reddit.com) Hey guys, come fight me: how do you justify local LLMs from a value perspective? It doesn't seem economical?
I made a simple proxy to let Claude use MiniMax models as subagents (www.reddit.com) I made this due to the usage problem. Enjoy and tell me what you guys think!
Optimizing MiniMax 2.7 - Experts vs Layers for best VRAM/RAM utilization (www.reddit.com) I'm curious if there is a rule of thumb regarding how to best load Minimax given varying amounts of VRAM/RAM configurations. Is there a way to estimate how many experts versus layers to offload for individuals running either 16GB/24GB/32GB…
Best setup for MiniMax-M2.7 (230B) | 3x RTX 5090 | Threadripper 9975 | 512GB RAM (www.reddit.com) I have the following hardware and want to run MiniMax-M2.7 (230B) locally. What is the best software stack and configuration to maximize performance?
Mac Studio Performance Suggestion For minimax (www.reddit.com) I need help. I want to self-contain my MiniMax 2.7 and Qwen 3.5 (122 billion parameter) models.
Why most open-source models can't answer this question while most closed-source models can answer most of the time? (www.reddit.com) WEB SEARCH WAS ALWAYS ON!!!! Question Calculate the precise VRAM requirement for the **KV Cache only** at the maximum context window for **DeepSeek V3.2** and **MiniMax M2.5**.
Stop donating your salary to OpenAI: Why Minimax M2.5 is making GPT-5.2 Thinking look like an overpriced dinosaur for coding plans. (www.reddit.com) Aligning to What? Rethinking Agent Generalization in MiniMax M2 (huggingface.co)