model roundup

Qwen 3

12 items · started 2026-04-13 · closed 2026-04-21

Which kind of base/fine-tunes have you done? And which data did you use? (www.reddit.com)

+1 9w fine-tuning qwen
TPU v7x Ironwood vs Nvidia B200 (www.reddit.com)

+2 9w moe

Google published Ironwood inference benchmarks in their AI-Hypercomputer/tpu-recipes repo. Nvidia has InferenceMAX numbers for B200.
Tokens per second - RTX 5000 Ada generation (www.reddit.com)

+14 9w ollama

Hi everyone, I am testing the LocalLLaMA. I have a laptop with an RTX 5000 Ada generation, with Ollama and Open Webui.
Deepseek-r1 thinks for 30 minutes? (www.reddit.com)

9 10w deepseek

I was trying to ask a question about coding using DeepSeek-R1-0528-Qwen3-8B-Q4_K_M, and the thinking took 30 minutes??? https://preview.redd.it/kex3fgg4lgvg1.png?width=277&format=png&auto=webp&s=5f7e7cdc8502b935ea8b8fb83e0e4af60c3c4533 I h…
Show HN: Flint – A 30B model fine-tuned for less repetition (springboards.ai via hn)

+5 10w mmlu

As frontier LLMs have very little output diversity even for open ended queries. We built Flint to see if we could reverse this.
Potential Local LLM Setup Question (www.reddit.com)

2 10w

I want to set up a local coding llm, maybe with Qwen3:30BA3B (i have heard it's good). I want to use what I have as much as possible, I have an old desktop with a Ryzen 5600G and 16GB DDR4 RAM.
Compile English function descriptions into 22MB neural programs that run locally via llama.cpp (www.reddit.com)

+157 10w llama

We built a system where a neural compiler takes a plain-English function description and produces a "neural program" (a combination of a continuous LoRA adapter and a discrete pseudo-program). At inference time, these adapt a fixed interpr…
Qwen 3 Coder Next has a bug! Help Test? (www.reddit.com)

31 10w qwen

Hey y'all. So I've stumbled upon a really specific and esoteric "bug" where an llm can't comprehend a URL in like, 90% of scenarios.
inference on the Qwen3 -Coder-480B-A35B-Instruct with 4xH200 (www.reddit.com)

7 10w

Hello guys, I want to do the inference on the Qwen3 -Coder-480B-A35B-Instruct. I have a 4xH200 machine.
openrouter/elephant-alpha is 99% Chinese, likely Qwen 3 Nex (www.reddit.com)

4 10w qwen

openrouter/elephant-alpha is 99% Chinese, likely Qwen 3 Next. промт "Напиши сложный алгоритм на Python для анализа временных рядов, используя методы из китайских научных работ по эконометрике.
Built a Japanese ASR benchmark because existing ones can't measure quality differences properly (www.reddit.com)

+9 10w fine-tuning

Was fine-tuning a Japanese ASR model (based on Qwen3-ASR) to handle technical terminology better. The model clearly improved — "Next.js" comes out as "Next.js" instead of "ネクストジェイズ", punctuation works, etc.
ClaudeCodeCLI vs OpenCode vs Cline vs QwenCode (www.reddit.com)

+24 10w cline mcp

ClaudeCodeCLI vs OpenCode vs Cline vs QwenCode Local coding LLM - Qwen3-coder-next-80b-nvfp4 Wich "tool" do you can recommend for it, and with "Skills/Plugins/MCP's"?

← all threads