model roundup

Gemma 4

188 items · started 2026-04-12 · closed 2026-05-30

Fun Local LLM Comparisons with Gemma, Granite, and Qwen (ekorbia.com via hn)

+4 4w gemma qwen

Fun local LLM comparisons with Gemma, Granite, and Qwen Ekorbia v0.2 features a comparison-chat mode that runs 2-3 local models against the same prompt in parallel. Here are a few fun prompts running across Gemma 4 (e2b), IBM Granite 4.1 (…
Gemma-4-Harmonia-31B-Uncensored-Heretic Is Out Now, a Merge of Multiple gemma-4-31B-it Finetunes Designed for a Targeted Approach to Deep Neural Consolidation, Minimizing Regression While Amplifying Unique Capability Boundaries. With KLD 0.0047 and 9/100 Refusals! (huggingface.co via reddit)

+134 4w gemma

Provided in both Safetensors and GGUFs. Safetensors, llmfan46/Gemma-4-Harmonia-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic GGUFs, llmfan46/Gemma-4-Harmonia-31B-it-uncensored-heretic-GG…
Running Gemma4 31b-it on vLLM 0.21.0 A100s (bad quality or what am I doing wrong) (www.reddit.com)

+31 4w vllm gemma openai

Okay fun time I got access to two Nvlinked A100s for some research project I benchmarked my work against the Gemma 4 31b-it available through Google, but my dataset is rather massive, so I need to run it on the "local" resources. Basically…
Built a local-first AI memory system that indexes screen activity, meetings, and voice notes ( MCP + automations) (www.reddit.com)

+17 4w gemma llama cursor+1

Been experimenting with an idea — what if your AI assistant actually remembered everything you did on your computer? Not stateless chats, but real persistent context.
Is something went wrong with those online free model, why I feel they worse than Gemma 4 26B A4B Q4_KM ?? (www.reddit.com)

+22 4w gemma

It started with I just want to make a chat app like roleplay with characters but Gemma 4 26B A4B Q4_KM doesn't have info some old character so I crawl back to those online services as those model is much bigger parameter and quite update i…
Need Help - What would you build? Air-gapped NL assistant that is integrated with Splunk (www.reddit.com)

+43 4w openclaw qwen openai

So I have a side project with given scope: Fully air-gapped / on-prem - no internet, no outbound calls of any kind Engineers ask questions about Splunk data in natural language Has to hold the conversation in Korean (index/field names stay…
Best AI (agent?) for coding locally? (www.reddit.com)

23 4w chatgpt openai claude-code

Ryzen 5, 7500F RX 9070 XT 32 GB DDR5 I want to code a website and an app for something and I was wondering, whats the best AI I can run with my hardware, and should I use a tool like Claude Code or Pi agent to run them? I tried Gemma4 on P…
gemma 4 e2b quality degrades after ~30-40 continuous inferences on 4gb vram? (www.reddit.com)

+17 4w gemma llama

running gemma e2b via llama-server for continuous background tasks on a 1650 4gb. works great initially but after maybe 30-40 calls the outputs start getting noticeably worse — shorter responses, missing fields in json output, sometimes ju…
Choosing an abliterated version of Gemma 4 31B and 26B-A4B (www.reddit.com)

+64 4w gemma

The only thread was 2 months ago, when the model had just dropped. Since then, more versions from different authors have appeared, and users have had time to test them.
Run Chrome’s tiny Gemma4 (aka Gemini Nano) directly on PC without GPU (www.reddit.com)

+36 4w vllm gemma llama+1

Everyone remembers that sneaky download of Gemini Nano earlier this month? and if you talk to it, it will happily tell you it’s a Gemma.
Llama.cpp VS LiteRT on a custom Xiaomi 12 Pro 24/7 Server (V2 Redesign) (www.reddit.com)

+6 4w ollama llama

https://preview.redd.it/sm4ysgdw1w2h1.png?width=1376&format=png&auto=webp&s=3705932403919814fbf2008a1cba189d17e0591e Thanks everyone for the advice on my previous post (24/7 Headless AI Server on Xiaomi 12 Pro (Snapdragon 8 Gen 1 + Ollama/…
Gemma4 26b a4b Apex quant is quite good (www.reddit.com)

+153 4w gemma llama

I tried mudler's apex quant for gemma4 26b a4b and it was amazing! I got 38tps at 90.000 context with no loop and suprisingly no quality degradation.
Experimental "Preserve Thinking" Jinja Template for Gemma4 31B in llama.cpp (www.reddit.com)

+711 4w gemma llama

https://huggingface.co/stevelikesrhino/gemma-4-31B-it-nvfp4-GGUF/blob/main/gemma4-improved.jinja Yall are more than welcome to try it out and provide feedback. In my own testing in Pi-coding-agent I no longer have the "forgot to close thin…
G4-MeroMero-26B-A4B-it-uncensored-heretic Is Out Now, a Finetune of gemma-4-26B-A4B-it, With KLD of 0.0152 and 12/100 Refusals! (huggingface.co via reddit)

+266 4w gemma

When I previously posted the uncensored version of the 31B version of the MeroMero finetune, quite a few people asked for the 26B-A4B version, I wasn't so keen on it because I considered the 31B to be the better version, but I understand t…
Translate long subtitle files (www.reddit.com)

+11 5w

I'm struggling to find a good system to translate a movie length subtitle .srt file. My current setup is to run Kobold with Gemma4 into Subtitle Edit, which then sends a request to the LLM to translate every line, but it does a bad job bec…
Gemma 4 MTP with LlamaCPP (www.reddit.com)

+32 5w gemma

I am running Gemma 4 31B for a project using LlamaCPP. There is no integrated main model + MTP drafter GGUF.
Google AI Edge Gallery v1.0.13 & v1.0.14 updates: Gemma 4 Multi-Token Prediction, Pixel TPU support, experimental MCP, new skills, now saves chat history (github.com via reddit)

+358 5w gemma mcp

Google AI Edge Gallery ✨ Explore, Experience, and Evaluate the Future of On-Device Generative AI with Google AI Edge. AI Edge Gallery is the premier destination for running the world's most powerful open-source Large Language Models (LLMs)…
Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now, a Merge of Multiple Gemma 4 31B it Finetunes Designed to Boost Logical and Lateral Thinking for Improved Adherence, Increased Swipe Variety and Enhanced Creative Prose, With KLD of 0.0186 and 13/100 Refusals! (huggingface.co via reddit)

+126 5w gemma

Provided in both Safetensors and GGUFs. Safetensors: llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic GGUFs: llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic…
5060ti chads -> gemma-4-31b-it-nvfp4 + vllm + mtp (www.reddit.com)

+22 5w vllm gemma qwen

Hey all, While nvfp4 still seems to be a work in progress, the latest version of vllm 0.21 finally has mtp working for gemma. With all the talk of qwen being badass I thought I would revisit gemma.
Any good MOE ~60B models? I have 64GB vram (www.reddit.com)

+218 5w moe gemma

I have a build with 2 x MI50 32GBs and 64 gigs of DDR4 (bought before rampocolypse for ~630 USD total, I’m not rich) and I’m not gonna upgrade it for a long while. Are there any good MOE models that are around 60B in parameters so I can ma…
Looking to migrate off of Ollama and LMStudio (www.reddit.com)

+79 5w vllm ollama gemma+3

Hello, I'm currently using Ollama / lm studio for things like code inference and proof reading emails, etc. Definitely not experienced in this space but looking to grow.
gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it Writing Quality with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs! (huggingface.co via reddit)

+41 5w gemma

Provided in both Safetensors and GGUFs. llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic llmfan46/gemma-4-Ortenzya…
Why use token/s as a metric when perplexity and time to first token feel more important (www.reddit.com)

5 5w

I have been doing Local LLM to solve problems like mass classification of images, code generation, etc as opposed to generating text. In my experience, tokens per second aren't as descriptive of the quality of the model as is the time to f…
Audio input not accepted with llamacpp for Nemotron 3 nano Omni ? (www.reddit.com)

+41 5w llama

Llama-server does not accept audio input (or video for that matter) with Nemotron 3 nano omni (unsloth). I’m on a recent build of llamacpp and I redownloaded Nemotron, and I have the mmproj loaded too.
Gemma4 26b MoE running in MLX with turboquant (and custom kernel) (www.reddit.com)

+4 5w moe llama

TL;DR I spent a few crazy evenings this past week seeing if I could get Gemma4 running with proper turbo quant and rotating KV cache support. The answer was yes, and I'm now able to run Gemma4 26b on my MacBook Air M5 at 128k context with…
I just bought Asus Ascent : Nvidia GB10 (DGX) and It is slower than my Ryzen Ai Max (www.reddit.com)

+322 5w gemma llama

It is suppose to be 2-4x faster but i am only getting 6TK/s on Gemma4-31B . What am i doing wrong?
Built a fully offline suitcase robot around a Jetson Orin NX SUPER 16GB. Gemma 4 E4B, ~200ms cached TTFT, 30+ sensors, no WiFi/BT/cellular. He has opinions. (www.reddit.com)

+17332 5w gemma llama

Sparky runs entirely on the Jetson. Gemma 4 E4B at Q4_K_M via llama.cpp with q8_0 KV cache and flash attention.
llamacpp with Gemma4 31B dense and Gemma e4b as draft, plus audio input? (www.reddit.com)

+1 5w gemma llama

Hi, has anybody succeeded in running llama.cpp with Gemma 31b dense and Gemma e4b as draft model, and simultaneously inhibit the voice recognition feature? Is it even (theoretically) possible?
Gemma 4 + LiteRT-LM on mobile: much better memory/perf than my llama.cpp setup (www.reddit.com)

+64 5w gemma llama

Hi r/LocalLLaMA - I've been paying close attention to the edge AI ecosystem because it's an area where i see huge potential and where I truly believe AI will become more useful for day to day tasks. Around the gemma 4 release I was already…
Replaced my $15/mo Wispr Flow subscription with a free local macOS app I built using Claude Code (www.reddit.com)

+11 6w gemma claude-code

I spend most of my day writing prompts to Claude. Read a study recently that said people speak ~3x faster than they type, which lands differently when "writing" is basically your whole workflow.
The "the future is fictional" problem of many local LLMs (www.reddit.com)

+41 6w gemma gemini

Many local models have a problem (that raised due to excessive RHLF training): They mostly think that everything that is beyond their knowledge cutoff date would be "fictional" or "satirical". To be fair: Even the Gemini API without web ac…
On my RTX 4060 8GB laptop, I can run Gemma 4 E4B Q6 K XL with mmproj at only 6GB of VRAM usage despite sources recommending Q4 K M for my hardware. What is going on? (www.reddit.com)

1 6w gemma

I can set my context length as high as 64k and the vram usage is not even remotely close to the maximum utilisation. My TPS is also 40+.
LLMs on flagships smartphones? (www.reddit.com)

+13 6w gemma llama

I have been curious to see how small LLMs like Gemma-4-E2B-it run on a flagship smartphone (S25+ with Snapdragon 8 Elite) in terms of prompt processing and token generation. I have created a script that uses llama-cli and I achieve 48 tps…
very slow tok/s with Gemma 4 31B on a 5090?! (www.reddit.com)

+13 6w gemma llama

Hi, i have a 5090 and i was tyoing around with hermes-agent. To utilize 128K i thought about switching from LM Studio to llama-cpp (the turboquant fork) expecting better tok/s and also saving some VRAM from context quantization.
Does THINKING MODE significantly improve translation? (www.reddit.com)

+38 6w gemma qwen

Between a solid model from Qwen or Gemma 4, when translating a text, does "thinking mode" significantly boost the quality of the translation, or is the difference negligible?
How to run a Gemma4 MTP implementation on ollama or python transformers? (www.reddit.com)

1 6w ollama llama

Hi all I had a quick question while we wait for llama.cpp MTP implementation, have any of y'all tried Gemma4 MTP models on ollama and or transformers? What was your experience and or cli args and or workflows like?
Local audio/multimodal models that can be used for language pronunciation grading (www.reddit.com)

+1 6w gemma

My partner uses Duolingo for learning and practicing languages, but has been getting increasingly sick of it. I decided to experiment with whether local models would be good for creating and grading language exercises.
Gemma 4 MTP vs DFlash on 1x H100: dense vs MoE results (www.reddit.com)

+1712 6w vllm moe gemma

Benchmarked Gemma 4 MTP and z-lab's DFlash on a single H100 80GB using vLLM and NVIDIA's SPEED-Bench qualitative dataset. Setup: Hardware: 1x H100 80GB Runtime: vLLM Dataset: SPEED-Bench qualitative Prompts: 880 total, 80 prompts across ea…
converting weights to snn (www.reddit.com)

+23 6w gemma

Hello everyone, I developed the snn architecture from scratch based on the human brain. I had several successful launches of training spike models from scratch and I also had an idea: what would happen if I took the gemma 4 model and conve…
Gemma 4 E4B is great for short transcriptions (www.reddit.com)

+1 6w gemma

Yes, for material that is an hour long, there is no getting around tools like Whisper - or something even better. However, for transcribing short snippets, Gemma works very quickly and reliably- even in foreign languages.
Gemma 4 running fully offline on WebGPU with Transformers.js, controlling Reachy Mini over WebSerial. (www.reddit.com)

+266 6w gemma

could not extract summary
Terrible Vulkan pp/tg on Arrow Lake iGPUs (www.reddit.com)

+32 6w gemma llama

Hi, I recently tried to get llama.cpp with SYCL running on an Arrow Lake system but gave up halfway through since Vulkan is just way easier to set up. But, the pp/tg I'm getting on Vulkan w/ Arc 130T is disgustingly bad - 100 tokens/s for…
ExLlamaV3 Major Updates! (www.reddit.com)

+10049 6w gemma agentic

Turboderp has a been on an absolute tear recently, in the endless battle to cram new llamas into smaller, faster boxes. We started off last month with the release of gemma 4 support, and continued with improved caching efficiency.
Anybody else noticing how good gemma-4-26b-a4b is with one-shotting three.js? (rowanunderwood.github.io via reddit)

+1516 6w gemma

I wrote up this little python app to cycle through a bunch of prompts like this: Single HTML file using three.js from CDN. A central rotating MeshNormalMaterial torus knot.
Gemma Chat: Offline Vibe Coding on Apple Silicon (github.com via hn)

+1 6w gemma

Gemma Chat Vibe code without the internet. A local coding agent powered by Google's Gemma 4 — runs entirely on your Mac via Apple's MLX framework.
Dual gpu question (www.reddit.com)

+14 6w moe

Hı, i have rx 9060XT and rx 6600. 16gb and 8gb.
Grafting a Speech Head onto Gemma 4 E4B (www.frisson-labs.com via hn)

+21 6w gemma

Grafting a Speech Head onto Gemma 4 E4B For a Discord buddy, the tempting model shape is small, fast, and multimodal. It should hear the call, see the game, read the chat, and respond quickly enough that the moment is still alive.
Gemma 4 - website translations (large model, or small model)? (www.reddit.com)

+22 6w gemma

I have setup a workflow to process website translations with Gemma 4, I just host it on LM Studio, and a custom Python wrapper iterates through and runs overnight. My question is..
3060 Ti 12GB vs RX 7600 XT 16GB? (www.reddit.com)

+34 6w gemma

Trying to figure out which is better for LLM. Mainly Gemma 4.
MTP is all about acceptance rate (www.reddit.com)

+136 6w

So I was very excited about the MTP stuff especially since Gemma4 has become my "daily driver" for some stuff. I grabbed the latest mlx-vlm and did some tests and found it disappointing.
Gemma 4 26B Hits 600 Tok/s on One RTX 5090 (www.reddit.com)

+1714 6w vllm gemma

I ran a benchmark to see how much DFlash speculative decoding actually helps in vLLM. Setup: GPU: RTX 5090, 32GB VRAM vLLM: 0.19.2rc1 Main model: cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit Draft model: z-lab/gemma-4-26B-A4B-it-DFlash Workload: r…
I built a local proxy that does context work for Claude so you don't have to (www.reddit.com)

+11 7w gemma

Hey folks, I posted here a few months back about how I was basically working for Claude -- pasting the same emails, re-explaining the same backstory, being its memory across every chat. Today I'm launching Contextify.
Gemma4 26B A4B NVFP4 GGUF (www.reddit.com)

+21 7w gemma llama

Hey everyone! I’ve just uploaded a GGUF version of nvidia/Gemma-4-26B-A4B-NVFP4.
What's the right way to feed PDF files to Gemma-4? (www.reddit.com)

+12 7w gemma llama

In my line of work, PDF documents tend to be combinations of text, math formulas, tables and images. llama.cpp added support for PDFs a few months ago, but I believe it treats PDFs either as text (discarding everything else), or as images.
New Gemma 4 draft models released (alternativeto.net via reddit)

3 7w gemma

Just saw this and wanted to ask the obligatory GGUF when?
Google's Gemma 4 AI models get 3x speed boost by predicting future tokens (arstechnica.com)

7w gemma

Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google’s take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for G…
Getting unexpected output with Gemma 4 31b-it on vLLM (www.reddit.com)

3 7w vllm gemma openai

Hey everyone, I'm running into a weird issue and hoping someone here might have a fix or some troubleshooting ideas. I'm currently trying to run the new Gemma 4 31b-it model using vLLM (v0.20.0-cu130) deployed via Helm chart (https://gith…
Gemma 4 31B MTP Drafter on H100 -- Real Benchmarks + DFlash Comparison (www.reddit.com)

7 7w vllm gemma

Just tested Gemma 4 31B with the new official MTP Drafter on my H100 today and compared the approach with DFlash to help you decide which one to use. Without drafter: 13.7 tok/s.
Offload routine Claude Code work to Gemma 4 through the Google GenAI API (www.reddit.com)

+42 7w gemma codex mcp+1

The idea of offload-mcp is simple: instead of running hardware-hungry local models for routine work, let Claude offload that work to FREE model APIs and SAVE tokens. I’m using Gemma via the Google GenAI API because I like it in my processi…
Gemma 4 MTP Test - what speedup can you gain? (www.reddit.com)

3 7w gemma

Let's test the Gemma 4 MTP implementation using the HuggingFace Transformers library and the new drafter model by Google. We'll load both models and test on a couple of prompts with and without the MTP support https://www.youtube.com/live/…
Testing an Ollama powered agent mode with gemma4 inside Modly (www.reddit.com)

2 7w ollama

Hey everyone ! Quick Modly update.
Accelerating Gemma 4: faster inference with multi-token prediction drafters (blog.google via hn)

+314 7w gemma

Accelerating Gemma 4: faster inference with multi-token prediction drafters Just a few weeks ago, we introduced Gemma 4, our most capable open models to date. With over 60 million downloads in just the first few weeks, Gemma 4 is deliverin…
Running a 26B LLM locally with no GPU (www.reddit.com)

+1511 7w

This is crazy. I've been running local LLMs on CPU only for awhile now and have great results with 12B models running on an i5-8500 and only 32GB of RAM with no GPU.
Prompt injection benchmark: delimiter + strict prompt took Gemma 4 from 21% to 100% defense rate (15 models, 6100+ tests) (www.reddit.com)

+94 7w prompt-injection grok gemma+1

When dealing with untrusted outside input, I think you should handle it based on the situation. If you're processing structured data files, it's better to use tools to isolate and handle them.
I built an AI tool that turns any movie into viral recap videos in minutes (www.reddit.com)

+12 7w mistral ollama gemma+3

Hey everyone, I built a tool that creates movie recap videos automatically using local models. The problem: making recap videos takes forever.
Roundtable chat with Talkie-1930 and Gemma 4 31B (www.reddit.com)

+221 7w gemma

Talkie-1930-13b-it and Gemma 4 31b in the same chat. Talkie is a 13B vintage language model from 1930.
it's time to update your Gemma 4 GGUFs (www.reddit.com)

+429 7w gemma

Chat Template was fixed a few days ago choose your fav dealer: https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF https://huggingface.co/bartowski/google_gemma-4-E4B-…
interacting with gemma 4 w/ live video and audio (www.reddit.com)

+11 7w gemma

I saw someone on this forum demonstrate using gemma 4 - live streaming audio and video from his webcam to it asking it what it was seeing. It was pretty great but I cant find that post anymore and I can't find a good repo on github where I…
Anybody tried openclaw + M5 pro + 48gb? (www.reddit.com)

+21 7w openclaw gemma openai

Hello, posting again on this since my last post was removed. I am working on an AI agent solution to help me with my multiple daily tasks for different business activities; a few rental properties, a manufacturer trying to enter the Mexico…
These are the benchmark results for Gemma4 E4B tested on my iPhone 16 Pro. (www.reddit.com)

+165 7w

The first photo shows the results when run on the CPU, and the second one is on the GPU. Look at the speed difference between the Prefill and Decode speeds in my benchmark results.
Gemma 4 E2B runs surprisingly well on my 8GB Android phone, so I built a private voice notes app around it. (www.reddit.com)

+106 7w gemma

Been running Gemma 4 E2B locally on my OnePlus CE 5 (8GB RAM) for a few months. Chat quality is fine for the size.
Potential of Gemma4 Per-layer embeddings? (www.reddit.com)

+1 7w gemma

Hey there people. So let's talk about GEMMA 4 per layer embeddings.
Tried running Claude Code with local LLMs via Ollama — ended up subscribing to Pro anyway. But now I can't disconnect from the local server. (www.reddit.com)

+23 7w ollama gemma agentic+2

I've been experimenting with using Ollama to run Claude Code locally with models like Gemma 4, thinking I could avoid API costs. However, I quickly realised these models aren't really optimised for Claude Code's agentic workflows — they te…
gemma-4-31B-it-DFlash has been released (www.reddit.com)

+71 8w gemma llama

https://huggingface.co/z-lab/gemma-4-31B-it-DFlash I guess we'll have to wait until this PR is merged before we can test it. https://github.com/ggml-org/llama.cpp/pull/22105
Using a Radeon 9060 XT 16 GB, the gemma4 24b a4b iq4 nl model achieves 25.9 t/s (www.reddit.com)

+43 8w gemma llama

I'm testing running local LLMs on a gaming mini PC (AMD 7840HS, 32 GB RAM) paired with an eGPU (Radeon 9060XT with 16 GB VRAM). Since I'm not very familiar with using llama.cpp, I kept getting unsatisfactory results, but with the recent Ge…
thinking of gemma 4 26B vs 31B (www.reddit.com)

2 8w vllm moe gemma+2

I see a big difference in agentic coding between gemma-4-31B-it-Q5_K_M and gemma-4-26B-A4B-it-UD-Q8_K_XL. The 26B model is much faster because of A4B and generally works well, but there is a big difference in thinking.
Based on what should I choose Gemma 4 models/quantizations? (www.reddit.com)

+18 8w gemma gemini chatgpt

I have an RTX 4060 8GB(+16GB RAM) laptop, and when asking Gemini or ChatGPT, they say the Gemma 4 Q4 K M is the best fit for my hardware with Context Length around 16k-32k. However, in practice, after loading even a higher quantization lik…
If you could do anything with the local models in your corporate workflows, what would it be? (www.reddit.com)

+1 8w gemma

With the release of Gemma 4 models and a slew of open weight/source models subsequently, some of the workflows like drafting emails/ trivial coding tasks have become possible. I’m exploring the possibility of integrating some of the powerf…
Gemma 4 architecture support for QVAC-Fabric (Tether's llama.cpp fork) (github.com via hn)

+1 8w gemma llama

QVAC-Fabric Gemma 4 Architecture Patch Adds full Gemma 4 (gemma4) architecture support to QVAC-Fabric, Tether's llama.cpp fork. Base: QVAC-Fabric temp-upstream branch Target: All Gemma 4 variants (E2B, E4B, etc.
Gemma-4 MLX reasoning? (www.reddit.com)

+21 8w gemma

Gemma-4 is great. On a MacBook M5, using lm-studio, the MLX versions (specifically looking at https://huggingface.co/lmstudio-community/gemma-4-26B-A4B-it-MLX-8bit) rock.
I ran Gemma 4 E2B with llama.cpp on a lot of different iPhones, here's the setup report (www.reddit.com)

+2 8w gemma llama

TLDR: I've been running gemma4 e2b extensively on iOS with llama.cpp and found some interesting quirks and info you guys may like! These are specifics for the iPhone and what I've found worked across 20+ devices.
Most efficient way of running Gemma 4 E4B with multimodal capabilities on a laptop? (www.reddit.com)

+1 8w gemma llama

The gemma 4 E4B and E2B models have built-in multimodal capabilities. However, as far as I am aware, llama.cpp does not have proper support for vision and audio inputs (specially audio) for these models as of now.
AMG GPUs are faster at pre filling (www.reddit.com)

+5 8w gemma

I did give same prompt same document to 1660ti running Gemma 4 e2b q4 coz of the small vram and another to and igpu running Gemma 4 e4b q8 prefill rate before token generation was like 4-5 times faster with the 890m igpu then token generat…
How to run a local coding agent with Gemma 4 and Pi | Patrick Loeber (patloeber.com via reddit)

+3 8w gemma llama

Tutorial from the Google guy, I use very similar setup (llama.cpp instead of lmstudio)
Using Google's Gemma 4 E4B Local AI Model to Reverse Engineer a Simple Crackme (github.com via hn)

+1 8w gemma

Using Google's Gemma 4 E4B local AI model to Reverse Engineer a simple Crackme I was playing around with the new Gemma E4B open weights local model which Google released, and to my surprise I was seeing a great deal of success in using it…
Running Gemma 4 31B on Mac with Ollama (sammyrulez.github.io via hn)

+2 8w ollama gemma

A practical configuration for a 32 GB M5 Mac that still needs to remain usable Running large language models locally has become surprisingly practical on Apple Silicon. With a modern Mac, Ollama, and a carefully quantized GGUF model, it is…
Best sota 12b-32b creative writing model? (www.reddit.com)

1 8w gemma

I love using openrouter but I would also love a smaller model that can fit within 16gb of VRAM and 64b of ram, that can pack a punch for its size specifically in the creative writing section. Any good recommendations?
A weekend with LoRA on Gemma 4 E2B: instrumenting what fine-tuning changes (aiexplr.com via hn)

+1 8w fine-tuning gemma

Spent a week doing LoRA fine-tuning on Gemma 4 E2B (~5.1B total params, ~2B active in text decoder) for a narrow Python code-generation task. Bad outputs went from ~5% to 0% (greedy) and 1.5% (sampled) across 134 tests.
Gemma 4 Folks (www.reddit.com)

+1 8w gemma

Full Answer >>> A plane crashes on the border of two countries. Where do they bury the survi ...
Best settings for gemma-4 on a 3090? (www.reddit.com)

+1 8w gemma

3090 (24G) + 32G DDR4 Currently running --mmproj mmproj-BF16.gguf --chat-template-kwargs '{"enable_thinking":true}' \ --flash-attn on \ --cache-type-k q4_0 \ --cache-type-v q4_0 \ -np 1 \ -c 160000 \ --jinja at 26B-A4B-it-UD-Q5_K_XL and ge…
Three lessons from fine-tuning a 5B code assistant — bad outputs from 5% → 0% (www.reddit.com)

4 8w fine-tuning gemma

Spent a week doing LoRA fine-tuning on Gemma 4 E2B (gemma-4-e2b-it, ~5.1B total params, ~2B active in the text decoder) for a narrow Python code-generation task. Setup: Model: Gemma 4 E2B, bf16, language_model only (vision + audio towers f…
Ollama swap to llamacpp/llama server (www.reddit.com)

7 9w moe ollama llama

So I'm a newb in certain aspects but not in others, I'm currently running an AI stack on my unraid server: CPU: AMD Threadripper 3960X (24c/48t) Motherboard: Gigabyte TRX40 AORUS PRO WIFI RAM: 256GB DDR4-3200 G.Skill Trident Z GPU: Nvidia…
Short term access to 4x rtx6000pro... Suggestion on what to try/test? (www.reddit.com)

+15 9w vllm gemma claude-code

Always been stuck with models that fit on my 16gb .... Going to have about a week for free with 4x rtx6000pro .
Has anyone managed to use gemma 4 e4b in Open Code/other agentic TUIs? (www.reddit.com)

+14 9w aider gemma agentic+1

Hi everyone, as a power user I hit Claude Code's usage cap too often I wanted to set up my own local model, however I only have RTX 5070 with 12 GB of VRAM so the only realistic option was Gemma 4 with effective 4B params. When I tried to…
Gemma 4 VLA Demo on Jetson Orin Nano Super (huggingface.co)

9w gemma

Gemma 4 VLA Demo on Jetson Orin Nano Super You speak → Parakeet STT → Gemma 4 → [Webcam if needed] → Kokoro TTS → Speaker Press SPACE to record, SPACE again to stop. This is a simple VLA: the model decides on its own whether to act based o…
Gemma 4 is not your standard transformer (idlemachines.co.uk via hn)

+2 9w moe gemma

Gemma 4 makes five quiet departures from the standard transformer recipe. QK-norm instead of 1/√d, partial RoPE on global layers, per-layer input gating, KV sharing across layers, and an MoE that sits alongside the MLP rather than replacin…
Built an Android app that exposes Gemma 4 as an OpenAI-compatible endpoint on your LAN (www.reddit.com)

9w gemma chatgpt openai

My old Samsung S10 was sitting in a drawer so I turned it into an always-on LLM endpoint. PocketPal is great for on-phone chat, but I wanted the phone itself to be an OpenAI-compatible endpoint for the rest of my network.
Is there an alternative between vLLM and Ollama that handles token prefill? (Arc Pro B70) (www.reddit.com)

1 9w vllm ollama

I am using an Arc Pro B70 to do inference, and it's token generation speed is fine using Ollama, but it takes *forever* to do a prefill. vLLM absolutely tackles the prefill problem (nearly instant responses), but I can't run nearly as larg…
Anyone know how to run the new gemma 4 edge gallery litertlm format in the browser? Trying to load Gemma 4 e4b. (www.reddit.com)

9w gemma

How do I run Gemma 4 e4b, extracted via adb from Google AI Edge Gallery on Android, whose image is in litertlm format and weighs 3.6 gigs, in a browser? I mean using web technologies?
Did Google hide the best version of Gemma 4 e4b in Android? The extracted model beats Unsloth and everything else I've tried. (www.reddit.com)

+14 9w gemma

Why does Gemma 4 e4b from Google AI Edge Gallery on Android weigh only 3.6 gigs, while the one from Unsloth (gemma-4-E4B-it-UD-Q2_K_XL.gguf) weighs 3.7, and for some reason the model image in litertlm format extracted via adb from Google A…
Building the smallest Gemma 4 (35M params) from scratch — Part 1: Tokenization + Data Pipeline (www.reddit.com)

+12 9w gemma
Gemma 4 E4B is broken (www.reddit.com)

5 9w gemma
Gemma 4 Vision (www.reddit.com)

+9734 9w gemma llama
Lekh AI iOS v7.0 is Live – Bonsai 8B & Gemma 4 + Lower Memory Image Gen (www.reddit.com)

+1 9w gemma
Handling a large amount of files (www.reddit.com)

+34 9w gemma
Are we at the point where local AI isn’t a compromise anymore? (Gemma 4 experience) (medium.com via reddit)

9 9w moe gemma
model for frigate, a380 (www.reddit.com)

+32 9w llama
Gemm4:e4B-IT good at instructions following no refusals. (www.reddit.com)

+73 9w ollama gemma
Deploying Gemma 4 26B on an RTX 5090 (datapnt.com via hn)

+51 9w gemma
- Deploying Gemma 4 26B A4B on a single RTX 5090 — ~196 tok/s with AWQ + vLLM on RunPod Serverless (www.reddit.com)
Show HN: Prompt-to-Excalidraw demo with Gemma 4 E2B in the browser (3.1GB) (teamchong.github.io via hn)

+6025 9w gemma
Gemma 4 - MLX doesn't seem better than GGUF (www.reddit.com)

+6345 9w gemma
Gpu reccommendations for Coding/chat LLM (www.reddit.com)

+116 9w gemma
Gemma4 26B MoE on Arc 140T (www.reddit.com)

10 9w moe llama
How do I get the LLM to answer everything? (www.reddit.com)

9 9w gemma
Lm studio running some models very slow while others run normally. (www.reddit.com)

9 9w moe
Has anyone figured out STT with Gemma4 for Home Assistant? It works but responds with full thought chain. (www.reddit.com)

+21 9w gemma llama

I have Gemma4-E2B working within home assistant as STT, and E2B seems fast and accurate for STT (maybe a bit better than Parakeet), however, it responds with the entire thought process: https://preview.redd.it/v8zhb5elltvg1.png?width=599&f…
Gemma 4 coding performance, do different harnesses give wildly different results? (www.reddit.com)

+12 9w gemma gemini claude-code

So the question I've seen posed many times in /r/singularity is if the Gemini models are actually that bad at coding compared to their benchmarks, or whether the harness used makes an absolutely gigantic difference in model performance. Gi…
Tried hermes agent with local gemma4 on ollama. free tokens are nice but the agent quality gap vs cloud is still huge (www.reddit.com)

3 9w ollama deepseek agentic

Saw a post about running hermes agent locally with gemma4 through ollama. zero api costs, unlimited tokens, full privacy.
NVIDIA V100 32GB for AI in 2026 (www.reddit.com)

8 9w gemma qwen agentic

hello. i have the oportunity of buying Nvidia V100 with 32GB for about 915$ / 775 euro.
Will Gemma 4 replace Claude Code or are we lying to ourselves again (webmatrices.com via hn)

+2 9w gemma claude-code

Vercel Security Checkpoint | cle1::1776468758-lOAcIwtVVUa8cG9OLlcTtnlZwlvTxsBe
Apparently, llms are just graph databases? (www.reddit.com)

16 10w

I found this youtube video, where this guy created a database querying language to basically query models as if they are just database. I am blind so can't see the graphs, but he talks about edges, nodes, features and entities.
Dabbling in Ai - Which Hardware to get (www.reddit.com)

+14 10w openclaw

Hi everyone, I want to get deeper into running local models and need new Hardware for this. My best suited machine for this is currently the 2020 M1 Macbook Pro with 16GB shared Memory which is cool for Gemma4 4B but I think I am missing o…
Bonsai models are pure hype: Bonsai-8B is MUCH dumber than Gemma-4-E2B (www.reddit.com)

+6742 10w gemma llama

I'm using the https://github.com/PrismML-Eng/llama.cpp fork for Bonsai, regular llama.cpp for Gemma. Without embedding parameters: Gemma 4 has 2.3B at 4.8 bpw (Q4_K_M) = 1104 MB Bonsai-8B has 6.95B at 1.125 bpw (Q1_0) = 782 MB (only 29% sm…
New to this - question about main AI model plus sub agent (www.reddit.com)

+1 10w openclaw

Just installed open claw It is runing gemma4 right now - which feels somewhat slow in responding. After doing some more reading I wanted to ask if it's really practical to use Free Chat GPT - for the main chat agent.
Want your LLM to use the internet? Here's an MCP server for that. (www.reddit.com)

3 10w gemma mcp

The showcased examples were made using Gemma 4 31b. Any LLM with tool calling support should work.
Getting gibberish when trying to generate with gemma-4-31b-it in LM Studio (lmstudio-community quant) (www.reddit.com)

+23 10w gemma

could not extract summary
MB Pro M5, 24GB/32GB difference? (www.reddit.com)

+32 10w ollama gemma copilot

Hi, I got new MB Pro 24GB/1TB. I've test Gemma 4 26B with ollama, 16k context.
Frontier Coding Agents Built a Video Diffusion Pipeline on Max (www.modular.com via hn)

+1 10w gemma

Gemma 4 just dropped on Modular, Day Zero! Read More → Inference Products Shared Endpoints Access frontier models via an API Dedicated Endpoints Mission critical reliability Custom models Your model, peak performance Deployment Options Our…
Where my Gemma 4 gets this data? Trying to explain weird behaviour. Please help! (www.reddit.com)

7 10w gemma

https://preview.redd.it/w6ssjgidjlvg1.png?width=2786&format=png&auto=webp&s=f52736d40580fe8a8ff74adbbb5be81f12fbcbfc So I was playing with Gemma 4 and was trying to figure out whether the model could determine its own training data cutoff…
Fine-tuning and deploying Gemma 4 is not that easy (ghost.oxen.ai via hn)

+4 10w fine-tuning gemma

Writing a fine-tuning and deployment pipeline isn't as easy as it looks (Gemma 4 Version) Fine-tune and deploy Gemma 4 on Oxen.ai Google's Gemma 4 dropped in April 2026 with multimodal support (text, image, video, audio), a novel hybrid KV…
Findings: Gemma4 26B-A4B fine-tuning on a single RTX 4090 — 10 patches, benchmark, PCIELink path #1 (www.reddit.com)

+22 10w fine-tuning

Summary of Findings This issue documents what we learned making Gemma4 26B-A4B-it train on consumer hardware (RTX 4090, 24GB VRAM). No A100.
Gemma 4-written, small cc0 encyclopedia of some core science content (stateofutopia.com via hn)

+11 10w gemma

Published: April 16, 2026 This is an encyclopedia of some core content from Biology and Health Sciences, Physical Sciences, and Technology. It contains 2,259 small entries of about a paragraph each.
why gemma 4 31b so bad in long context? (www.reddit.com)

+717 10w gemma

question, I'm using it for text translations and on each large prompt (20K+) it stops with a remark 'now I'm going to put that to the file' or some other operation I have asked in the prompt for but it did nothing, just stopped. I'm runnin…
LiteRT LM Framework with Rockchip NPU (RKNN 3588) (www.reddit.com)

+1 10w gemma llama

Im searching for build version of LiteRT LM framework can use and utilize the NPU of the RKNN 3588. It would be great since I can run gemma 4 e2b model using this framework on the machine, because I wont have to migrate my codebase from li…
Need suggestions for local AI Machine (www.reddit.com)

11 10w minimax openclaw gemma+1

I’ve been running various AI harnesses like OpenClaw, ForgeCode, ClaudeCode, etc. Most of these are running via OpenRouter or Minimax (credits/subscription model).
gemma4 e2b ore4b on rtx 5070 ti laptop 12GB not running on vLLM (www.reddit.com)

3 10w vllm gemma

I cant get gemma 4 e2b or gemma 4 e4b to run on my laptop. I am runnning it via docker as per vllm website and i get the error : Free memory on device cuda:0 (9.71/11.5 GiB) on startup is less than desired GPU memory utilization (0.9, 10.3…
- gemma4 e4b on rtx 5070 ti laptop 12GB running slow 5t/s llama.cpp (www.reddit.com)
5090 for 285k on amazon india? (amzn.in via reddit)

6 10w gemma

How is it possible the seller also has no record just wanted to run gemma 4 31B q4 with 150k ctx
How many move your favorite LLM model before it's cheat then brain-dead in chess game ? (www.reddit.com)

6 10w gemma llama

I try with Gemma 4 E4B via llama-sever to play chess at https://www.chess.com/play/computer (any platform or site you convenient), result quite unexpected for me. Result: 9 moves before it make cheating move (like try to move a pawn take a…
Llama.cpp vs LM Studio on gaming PC (www.reddit.com)

+66 10w gemma qwen llama

Here is my experience, I've been using LM Studio with RTX 5080 and 64GB RAM using Windows 11. I'm very happy with LM Studio except the speed.
Gemma 4 on iOS: Anyone else stuck on CPU because of the “Buffer(31) Metal Crash? (www.reddit.com)

10w gemma

Gemma 4 on iOS: Anyone else stuck on CPU because of the "Buffer(31)" Metal crash? Hey everyone, I’m hitting a massive performance wall building an on-device AI app for the iPhone 17 Pro.
Experience with medium sized LLMs (www.reddit.com)

+16 10w

I have tried to use several models on my 8gb ram MacBook and concluded that 4b parameters models are just “stupid” for my tasks (i.e. summarisation of pdfs, language learning, etc.).
For those running an OpenClaw instance, how do you manage sandboxing and prevention of unwanted behavior? (www.reddit.com)

5 10w prompt-injection openclaw security

Right now, I'm working on a small app to help eliminate my own doomscrolling by automatically crawling sites and summarizing news articles. However, I don't like the idea of giving OpenClaw free reign of my system, nor giving it any sort o…
Issues with Gemma 4 tool calling - abrupt gen ending despite the model telling me it wants to do X. (www.reddit.com)

+39 10w gemma

Hello, I have noticed an annoying issue with Gemma 4 26b a4b. It seems like it cannot do multiple think->tool call->think->tool call turns.
Gemma 4 is good or bad at real word (www.reddit.com)

6 10w gemma

Based on real-world usage by the community, roughly which version of which model is Gemma 4 comparable to? It would be great if you could also mention the hardware requirements for running it (like VRAM or GPU needs)
Offload settings for unsloth/Gemma-4 on Apple Silicon? (www.reddit.com)

1 10w gemma llama

Can default settings be optimized, or is it the best it is going to get? M1 Max Is it best in llama.cpp, LM Studio, or ?
Ollama Cloud - Pro (www.reddit.com)

+21 10w minimax ollama openclaw+1

Hi. I've been looking at ollama cloud's Pro offering ($20), which says "Run 3 cloud models at a time".
What's the better way to install llama.cpp on Android? (www.reddit.com)

+12 10w gemma llama

I own an Oppo Find X3 Pro (Snapdragon 888, 12/256 GB, Android 14.0) unused because of 3 green vertical lines on the screen and poor battery. I tried Google AI Edge Gallery with Gemma-4-E2B-it and it performs well so I thinked: "why don't t…
Is Gemma 4 26B MoE or 31B good as an MCP agent for coding with Xcode? (www.reddit.com)

1 10w moe gemma mcp

Thanks
What are your opinions on the SuperGemma finetune? (www.reddit.com)

6 10w gemma

So, I'm relatively new to the scene and I kind of want to do a sanity check. I've been using gemma-4-26B.
Gemma 4 Jailbreak System Prompt (www.reddit.com)

+446111 10w jailbreak gemma security

Use the following system prompt to allow Gemma (and most open source models) to talk about anything you wish. Add or remove from the list of allowed content as needed.
Gemma 4 running locally on an iPhone 13 Pro (www.reddit.com)

+49 10w gemma

I’ve been experimenting with running LLMs fully on-device, and managed to get Gemma 4 running locally on an iPhone 13 Pro. This is built on top of a lightweight Swift wrapper I open-sourced: https://github.com/mylovelycodes/LiteRTLM-Swift…
Ive automated my email/sms/phone (www.reddit.com)

+815 10w gemma agentic

we got it good boys! how many of you are doing this??
Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference (www.gizmoweek.com via hn)

+248160 10w gemma

Google Gemma 4 Runs Natively on iPhone With Full Offline AI Inference - GizmoWeek GizmoWeek Read the News News Reviews Apple How to Phones Products Subscribe Subscribe to newsletter [x] I've read and accept the Privacy Policy. Follow us Fa…
Fixed: IPEX-LLM + modern Ollama models (qwen3, gemma4) on Intel Arc 140V Lunar Lake Windows 11 — undocumented solution (www.reddit.com)

+24 10w ollama

Been trying to run local LLMs on my new Dell XPS 13 with Intel Arc 140V (Lunar Lake, 16GB) and hit a wall — Intel's official docs point to a portable zip frozen at Ollama v0.5.4 which can't pull any modern model. Spent a while debugging it…
Local Agent Hermes setup with Gemma 4 and llama.cpp (www.youtube.com via reddit)

10w gemma llama

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Gemma 4 & Obsidian (www.reddit.com)

+12 10w gemma claude-code

so today I tried the Obsidian LLM wiki system by Karparthy, but with Gemma 4 locally in OpenCode with instead of Claude code. My experience is very frustrating.
Does an MLX conversation have same capabilities as the GGUF? (www.reddit.com)

+21 10w gemma

For example, in LMStudio the official Gemma 4 is a GGUF that has Vision, Reasoning, and Tools flags. But the MLX version does not.
Gemopus: A Gemma fine-tune that prioritizes stability over long chain-of-thought (huggingface.co via hn)

+1 10w fine-tuning gemma

🌟 Gemopus-4-26B-A4B-it [!NOTE] Gemopus is an attempt at fine-tuning Gemma 4 with a core philosophy of "stability first". While preserving the original reasoning order of Gemma 4 as much as possible, we conducted targeted refinements for an…
Gemma 4 and the Economics of Selling AI (gertlabs.com via hn)

+6 10w gemma

Benchmarks, rankings, and live play for AI models and agents.
[Fix] Gemma 4 MCP tool calls broken in LM Studio — "Unknown test: sequence" (www.reddit.com)

+41 10w gemma mcp

If you're using Gemma 4 with external MCP servers in LM Studio and getting this error: Error rendering prompt with jinja template: "Unknown test: sequence" This is a bug in Google's official Gemma 4 Jinja prompt template. LM Studio's Jinja…
Gemma 4 E2B on Android: OpenCL crash on emulator, anyone solved this? (www.reddit.com)

+12 10w gemma

I was building an Android app and integrated Gemma 4 E2B directly using LiteRT-LM. On-device translation, zero server cost, the dream setup.
Gemma 4 base GGUF? (www.reddit.com)

+18 10w gemma

Hello, I've seen reviews that gemma 4 31b base is very good at roleplaying. But I can't find the gguf version of the basic gemma 4 anywhere.
Suggestion for a local model to solve math problems. (www.reddit.com)

+221 10w gemma

Does anyone know of a good edge local llm that is good in math's. I tried Gemma 4 E2B, microsoft phi mini reasoning but both can't answer some basic apti question's.
RTX 3090 llamacpp flags help (www.reddit.com)

+33 10w gemma qwen llama

Hi, my current system hardware RTX 3090 24GB VRAM & Sysrem RAM 64GB using windows 11 been playing around with hermes agent and local llm (Qwopus3.5-27B-v3-GGUF & gemma-4-26B-A4B-it-GGUF) when i try asking the hermes agent to do a task with…
Q8 Cache (www.reddit.com)

+1211 10w llama

https://github.com/ggml-org/llama.cpp/pull/21038 Since now cache quantization has better quality, does that mean Q8 cache is a good choice now? For example for 26B Gemma4?
Gemma 4 31B — 4bit is all you need (www.reddit.com)

+3964 10w gemma

Gemma quant comparison on M5 Max MacBook Pro 128GB (subjective of course, but on variety of categories): gemma 4 leaderboard the surprising bit: Gemma 4 31B 4bit scored higher than 8bit. 91.3% vs 88.4%.
24/7 Headless AI Server on Xiaomi 12 Pro (Snapdragon 8 Gen 1 + Ollama/Gemma4) (www.reddit.com)

+601181 10w ollama

Turned a Xiaomi 12 Pro into a dedicated local AI node. Here is the technical setup: OS Optimization: Flashed LineageOS to strip the Android UI and background bloat, leaving ~9GB of RAM for LLM compute.
Share your speculative settings for llama.cpp and Gemma4 (www.reddit.com)

+2412 10w llama

I have totally missed the boat on speculative decoding. Today when generating some code again for the frontend i found myself staring down at some quite monotonic javascript code.
My guess as to what Apple Foundation Models will be like in iOS 27 (www.reddit.com)

3 10w gemma

Could you imagine if the new Apple Foundation Models was based on Gemma 4 E4B text like the LiteRT version is? That would be one amazing built in model.
Looking for a team to participate in Gemma 4 good hackathon (www.reddit.com)

+12 10w gemma

Hey folks, I've been tinkering with Gemma 4 and absolutely the fact this model can run locally on Android phone! I am experienced fullstackdev, open to solve any real-world problem that has an impact.
common/gemma4 : handle parsing edge cases by aldehir · Pull Request #21760 · ggml-org/llama.cpp (github.com via reddit)

+2012 10w gemma llama

If you are on Gemma (like me), you basically have to compile llama.cpp daily now
Best setup for multiple high-end dissimilar PCs (www.reddit.com)

1 10w cowork gemma

I did some searching and didn't find a extremely similar situation. I'm jumping head first into hosting locally, and my experience has been good so far.
¿Es el procesamiento 100% offline el verdadero "game changer" de este año? (www.reddit.com)

10w gemma

Con el lanzamiento de modelos optimizados para ejecutarse localmente (como lo que estamos viendo con la evolución de Gemma 4), parece que el péndulo de la IA se está alejando de la nube.
How do I use gemma4 on 5090 gpu for coding? (www.reddit.com)

+26 10w aider cline ollama+3

I'm trying to replace openai codex which i used for development all the time, with gemma4 on 4090, small tasks it solves quite impressively, but i need to have some agent. So I tried to connect 31b to cline and to aider and it didn't reall…
Show HN: I benchmarked Gemma 4 E2B – the 2B model beat the 12B on multi-turn (aiexplr.com via hn)

+5 10w gemma

Gemma 4 E2B vs the Gemma Family: The 2B Underdog That Punches Above Its Weight Google's newest 2B model tested across 10 enterprise task suites against Gemma 2 2B, Gemma 3 4B, Gemma 4 E4B, and Gemma 3 12B. Run locally on Apple Silicon.
Why some small/medium models fail at grammar checking task? (www.reddit.com)

+54 10w gemma openai

Recently, I try playing with gemma 4 (gemma-4-E4B-it-Q5_K_S.guff) and find out it fail at easy grammar check (it try to fix the already corrected word "contemporary"). I noticed the same mistake from openai/gpt-oss-20b and qwen3-next-80b-a…
Local AI coding assistant that runs fully offline (Gemma 4, codebase-aware) (www.reddit.com)

9 10w gemma llama

I’ve been experimenting with running a local coding assistant on Gemma 4 26B, focused on understanding full codebases instead of single-file prompts. Main idea: - build a project map (files, symbols, structure) - run a planning step to dec…
Are the LiteRT versions of Gemma 4 a different architecture? (www.reddit.com)

3 10w gemma claude-code

I was surprised at how much smaller the LiteRT versions of Gemma 4 E2B used in Edge Gallery were (2.0-3.3 GB) compared to the main release (10.2 GB), so I had Claude code take a look. Claude tells me that the vocab size for the LiteRT vers…
I ran Gemma 4 as a local model in Codex CLI (medium.com via hn)

+1 10w gemma codex

I ran Gemma 4 as a local model in Codex CLI | by Daniel Vaughan | Google Cloud - Community | Apr, 2026 | Medium Sitemap Open in app Sign up Sign in Get app Write Search Sign up Sign in Google Cloud - Community · A collection of technical a…
Opencode + lmstudio : first prompt very slow (www.reddit.com)

1 10w gemma

I actually make some tests with lm studio and Opencode with the new Gemma 4 26b model. The results are really impressive especially on small refactoring and integration tasks.
Thinking with a smaller model to speed things up? (www.reddit.com)

+69 10w gemma

Question: can i do the thinking with a smaller model, like Gemma 4 4B, then use that as the prompt for Gemma 4 31B, to speed things up? Has anyone done this and measure if it's worth it?
Open Claw on my old PC (32GB Ram, 12GB VRAM) model suggestions? (www.reddit.com)

3 10w llama

I tried running Gemma4 E4B through llama cpp, and I couldn't get it to reply wiithout timing out.
Best local LLM that will work fine as a backend for an NSFW discord bot? + having an issue with OpenClaw (www.reddit.com)

3 10w openclaw gemma

My specs: RTX 5060ti(16gb), 16gb DDR5 ram. (os : Fedora 43) I want an uncensored model, it would be preferable if it can do image gen but if the quality of text is high enough it should not be problem if it does not support it.
Local models are a godsend when it comes to discussing personal matters (www.reddit.com)

+23981 10w gemma

I’ve been keeping a personal journal for the past few years. The entire thing is made up of over 100k+ tokens.
Getting no result train Gemma 4 for structured data extraction (www.reddit.com)

+12 10w gemma

Hello, I've been trying for several days to train Gemma-4 for extracting data from a string and convert it into a structured JSON. I've tried a fair amount of different configurations, I've tried Unsloth studio and Llamafactory, but in eac…
Cursor Native tool calling with Gemma4 and Ollama: (www.reddit.com)

+35 10w ollama cursor

I'm a beginner using local models, now I have a good GPU I installed ollama using docker. Pulled the Gemma4 weights and was able to add it to cursor using ngrok.

← all threads