#gemma

88 items

Gemma 4 Jailbreak System Prompt (www.reddit.com via reddit) 446 pts·111 replies· 1d

Use the following system prompt to allow Gemma (and most open source models) to talk about anything you wish. Add or remove from the list of allowed content as needed.

↯ Gemma 4 jailbreak security gemma
Gemma4 26b & E4B are crazy good, and replaced Qwen for me! (www.reddit.com via reddit) 392 pts·100 replies· 1d

My pre-gemma 4 setup was as follows: Llama-swap, open-webui, and Claude code router on 2 RTX 3090s + 1 P40 (My third 3090 died, RIP) and 128gb of system memory Qwen 3.5 4B for semantic routing to the following models, with n_cpu_moe where…

↯ Qwen 3.5 qwen llama gemma+2
Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference (www.gizmoweek.com via hn) 248 pts·160 replies· 1d

Google Gemma 4 Runs Natively on iPhone With Full Offline AI Inference - GizmoWeek GizmoWeek Read the News News Reviews Apple How to Phones Products Subscribe Subscribe to newsletter [x] I've read and accept the Privacy Policy. Follow us Fa…

↯ Gemma 4 gemma
Local models are a godsend when it comes to discussing personal matters (www.reddit.com via reddit) 239 pts·81 replies· 3d

↯ Gemma 4 gemma
The LLM tunes its own llama.cpp flags (+54% tok/s on Qwen3.5-27B) (www.reddit.com via reddit) 100 pts·51 replies· 2d

↯ Qwen 3.5 llama gemma
CPUs Aren't Dead. Gemma2B Out Scored GPT-3.5 Turbo on Test That Made It Famous (seqpu.com via hn) 86 pts·45 replies· 1d

CPUs Aren't Dead. Gemma 2B Just Scored Higher Than GPT-3.5 Turbo on the Test That Made It Famous — Your Laptop Can Run It, or Cloudflare for $5/Mo.

gemma
Do you guys think there’s a high chance of Singularity being open source? (www.reddit.com via reddit) 74 pts·67 replies· 4d

↯ Qwen 3.6 glm qwen gemma+1
Gemma 4 31B — 4bit is all you need (www.reddit.com via reddit) 39 pts·64 replies· 2d

↯ Gemma 4 gemma
PSA: Having issues with Qwen3.5 overthinking? Give it a tool, and it can help dramatically. (www.reddit.com via reddit) 34 pts·11 replies· 2d

I'm sure everyone has seen the posts from people talking about Qwen 3.5 over-thinking, or maybe you've experienced it yourself. Considering we're like 2 months out from the release and I still see people talk about this issue, I decided it…

↯ Qwen 3.5 qwen gemma gemini
Comparing Qwen3.5 27B vs Gemma 4 31B for agentic stuff (www.reddit.com via reddit) 28 pts·26 replies· 3d

↯ Qwen 3.5 gemma agentic
Benchmarked Gemma 4 E2B: The 2B model beat every larger sibling on multi-turn (70%) (aiexplr.com via reddit) 26 pts·12 replies· 3d

↯ Gemma 4 function-calling prompt-injection rag+2
common/gemma4 : handle parsing edge cases by aldehir · Pull Request #21760 · ggml-org/llama.cpp (github.com via reddit) 20 pts·12 replies· 2d

If you are on Gemma (like me), you basically have to compile llama.cpp daily now

↯ Gemma 4 llama gemma
Gemma 4 31b 3D geometry (www.reddit.com via reddit) 15 pts·8 replies· 8h

I have been nothing but impressed by the quality of Gemma 4 since release. In general conversation it's adaptable to different personas.

↯ Sonnet 4.6 sonnet gemma gemini+2
Gemma 4 31B passed 7/8 real-world production tests — including ones I designed to make it fail. Full prompts + outputs. (www.reddit.com via reddit) 12 pts·11 replies· 1d

I've been waiting for a capable free local LLM for a while. I think we're close — the quality is getting there fast, and Gemma 4 is the first open-weight model where I genuinely considered using it in production for simple-to-medium tasks.

↯ Gemini 3.1 moe gemma gemini+2
[cupel] M5 Max 128GB: Qwen3.5-397B IQ2 @ 29 tokens per second (www.reddit.com via reddit) 12 pts·8 replies· 3d

↯ Qwen 3.5 qwen gemma
Turn an old Android phone into a Local AI Voice Assistant (www.reddit.com via reddit) 11 pts·1 replies· 17h

I had a nice old cracked pixel 5a laying around that I wanted to get some use out of, so I turned it into a local AI Voice assistant. A server on a laptop running llama.cpp gemma-3-4b-q4.gguf served by flask connects to a script running on…

↯ Gemma 3 llama gemma
(llama.cpp) Possible to disable reasoning for some requests (while leaving reasoning on by default)? (www.reddit.com via reddit) 11 pts·10 replies· 1d

I am running unsloth/gemma-4-26B-A4B-it-GGUF/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf with llama-server (with reasoning enabled). Is it possible to disable reasoning for some requests only?

↯ Gemma 4 llama gemma
Pocket LLM v1.3.0: Offline local LLM chat on Android with LiteRT + ONNX builds (www.reddit.com via reddit) 10 pts· 1d

Hi everyone, I've been working on Pocket LLM, an Android app for running local LLMs fully offline for private, real-time chat. The latest v1.3.0 update adds: • LiteRT support for Gemma 4 E2B, Gemma 4 E4B, and Qwen3-0.6B • Persistent local…

gemma
GPU advice for Qwen 3.5 27B / Gemma 4 31B (dense) — aiming for 64K ctx, 30+ t/s (www.reddit.com via reddit) 9 pts·78 replies· 19h

Hey all, Looking for some real-world advice on GPU choices for running the new dense models — mainly Qwen 3.5 27B and Gemma 4 31B. What I’m targeting Context: 64K+ (ideally higher later) Speed: 30+ tok/s @ tg128 minimum Power: not critical…

↯ Qwen 3.5 moe qwen gemma
Ive automated my email/sms/phone (www.reddit.com via reddit) 8 pts·15 replies· 1d

we got it good boys! how many of you are doing this??

↯ Gemma 4 gemma agentic
why gemma 4 31b so bad in long context? (www.reddit.com via reddit) 7 pts·17 replies· 9h

question, I'm using it for text translations and on each large prompt (20K+) it stops with a remark 'now I'm going to put that to the file' or some other operation I have asked in the prompt for but it did nothing, just stopped. I'm runnin…

↯ Gemma 4 gemma
Llama.cpp vs LM Studio on gaming PC (www.reddit.com via reddit) 6 pts·6 replies· 19h

Here is my experience, I've been using LM Studio with RTX 5080 and 64GB RAM using Windows 11. I'm very happy with LM Studio except the speed.

↯ Gemma 4 qwen llama gemma
Gemma 4 and the Economics of Selling AI (gertlabs.com via hn) 6 pts· 2d

↯ Gemma 4 gemma
Thinking with a smaller model to speed things up? (www.reddit.com via reddit) 6 pts·9 replies· 3d

↯ Gemma 4 gemma
What's your favorite small-medium local model? (www.reddit.com via reddit) 5 pts·10 replies· 19h

I'm now having fun with Gemma-4-E4B and Qwen3.5-9B, trying different variants like Gemopus and Qwopus, and Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0 don't quite know other models, so what's your favorite? why and how are them?

↯ Qwen 3.5 gemma
Show HN: I benchmarked Gemma 4 E2B – the 2B model beat the 12B on multi-turn (aiexplr.com via hn) 5 pts· 3d

↯ Gemma 4 gemma
Why some small/medium models fail at grammar checking task? (www.reddit.com via reddit) 5 pts·4 replies· 3d

↯ Gemma 4 gemma openai
Fine-tuning and deploying Gemma 4 is not that easy (ghost.oxen.ai via hn) 4 pts· 4h

Writing a fine-tuning and deployment pipeline isn't as easy as it looks (Gemma 4 Version) Fine-tune and deploy Gemma 4 on Oxen.ai Google's Gemma 4 dropped in April 2026 with multimodal support (text, image, video, audio), a novel hybrid KV…

↯ Gemma 4 fine-tuning gemma
GGUF Quants Arena for MMLU (24GB VRAM + 128GB RAM) (www.reddit.com via reddit) 4 pts·8 replies· 12h

Dataset: MMLU subset (DEV+TEST) Llamacpp setting: 3 params only ctx 8192 , seed 42 , fa on Let me know whatelse do you want to see. Thanks.

↯ Claude 4.6 mmlu gemma opus+1
Gemma 4 running locally on an iPhone 13 Pro (www.reddit.com via reddit) 4 pts·9 replies· 1d

I’ve been experimenting with running LLMs fully on-device, and managed to get Gemma 4 running locally on an iPhone 13 Pro. This is built on top of a lightweight Swift wrapper I open-sourced: https://github.com/mylovelycodes/LiteRTLM-Swift…

↯ Gemma 4 gemma
[Fix] Gemma 4 MCP tool calls broken in LM Studio — "Unknown test: sequence" (www.reddit.com via reddit) 4 pts·1 replies· 2d

↯ Gemma 4 gemma mcp
Issues with Gemma 4 tool calling - abrupt gen ending despite the model telling me it wants to do X. (www.reddit.com via reddit) 3 pts·9 replies· 1d

Hello, I have noticed an annoying issue with Gemma 4 26b a4b. It seems like it cannot do multiple think->tool call->think->tool call turns.

↯ Gemma 4 gemma
Loading "stacks" of models on-demand? Does a tool like this exist? (www.reddit.com via reddit) 3 pts·3 replies· 1d

I'd like to self-host some LLM models but a couple different ones for different usecases, and they don't all fit in VRAM at the same time. So i'm kind of looking for a tool in which i can define "profiles" or "stacks" of LLM's that get loa…

↯ Qwen 3.5 gemma openai
What's the deal with Qwen3.5's and Gemma 4's reasoning traces? (www.reddit.com via reddit) 3 pts·4 replies· 2d

Hey there, I noticed something odd when trying out the latest and greatest local reasoning models recently. First, I just noticed it for Qwen3.5, but Gemma 4 seems to do it too: The reasoning traces do that weird thing of starting with "He…

↯ Qwen 3.5 gemma
RTX 3090 llamacpp flags help (www.reddit.com via reddit) 3 pts·3 replies· 2d

↯ Gemma 4 qwen llama gemma
Show HN: Hitoku Draft – context aware local macOS assistant (github.com via hn) 3 pts· 3d

↯ Qwen 3.5 qwen gemma claude-code+1
Knlowledge Graph and hybrid DB (www.reddit.com via reddit) 2 pts· 7h

Hello, everybody! I'm building and hybrid database with Qdrant and Neo4j for a few personal projects.

↯ Gemma 3 ollama gemma
Minimax M2.7 on Q3_K_S or Smaller Model with greater precision? (www.reddit.com via reddit) 2 pts·5 replies· 1d

I currently am looking for models to fit into my single DGX Spark for use. I have an RTX Pro 6000 and also a 5090 as well that I'm considering using in combination if the DGX Spark is too slow, but the intent here is to play around with Op…

↯ Qwen 3.5 minimax openclaw gemma
Ollama Cloud - Pro (www.reddit.com via reddit) 2 pts·1 replies· 1d

Hi. I've been looking at ollama cloud's Pro offering ($20), which says "Run 3 cloud models at a time".

↯ Gemma 4 minimax ollama openclaw+1
Does an MLX conversation have same capabilities as the GGUF? (www.reddit.com via reddit) 2 pts·1 replies· 2d

For example, in LMStudio the official Gemma 4 is a GGUF that has Vision, Reasoning, and Tools flags. But the MLX version does not.

↯ Gemma 4 gemma
Suggestion for a local model to solve math problems. (www.reddit.com via reddit) 2 pts·21 replies· 2d

↯ Gemma 4 gemma
How do I use gemma4 on 5090 gpu for coding? (www.reddit.com via reddit) 2 pts·6 replies· 3d

I'm trying to replace openai codex which i used for development all the time, with gemma4 on 4090, small tasks it solves quite impressively, but i need to have some agent. So I tried to connect 31b to cline and to aider and it didn't reall…

↯ Gemma 4 aider cline ollama+3
Gemma 4-written, small cc0 encyclopedia of some core science content (stateofutopia.com via hn) 1 pts·1 replies· 7h

Published: April 16, 2026 This is an encyclopedia of some core content from Biology and Health Sciences, Physical Sciences, and Technology. It contains 2,259 small entries of about a paragraph each.

↯ Gemma 4 gemma
Local Coding Stacks (www.reddit.com via reddit) 1 pts·2 replies· 7h

I’m trying to reduce my reliance on Claude. I have a 5090/128GB RAM.

↯ Qwen 3.5 sonnet qwen gemma+1
Feedback on iOS app with local AI models (www.reddit.com via reddit) 1 pts· 8h

Hey everyone, I just shipped an iOS app that runs local AI models. Current has 12 models: Gemma 4, Llama 3.3, Qwen3, DeepSeek R1 Distill, Phi-4, etc.

deepseek llama gemma+1
LiteRT LM Framework with Rockchip NPU (RKNN 3588) (www.reddit.com via reddit) 1 pts· 11h

Im searching for build version of LiteRT LM framework can use and utilize the NPU of the RKNN 3588. It would be great since I can run gemma 4 e2b model using this framework on the machine, because I wont have to migrate my codebase from li…

↯ Gemma 4 llama gemma
Thinking issue [Qwen3.5] (www.reddit.com via reddit) 1 pts· 1d

I've been testing a few models lately and I'm running into a weird issue with the bigger Qwen3.5s. Tested: Gemma 4 26B Qwen3.5 9B Qwen3.5 27B Qwen3.5 35B The 27B and 35B are driving me nuts.

↯ Qwen 3.5 gemma
Best Ollama models/settings for an 8GB VPS (CPU only, ARM)? Running into memory & looping issues. (www.reddit.com via reddit) 1 pts·2 replies· 1d

Hi everyone, I'm trying to run a local LLM via Ollama on a Hetzner cax21 VPS (ARM64, 4 vCPUs, 8GB RAM, 80GB SSD). I have Ollama running successfully via Coolify.

ollama qwen gemma
Ask HN: What are you building with Gemma? What do you wish existed? (news.ycombinator.com via hn) 1 pts· 1d

gemma
What's the better way to install llama.cpp on Android? (www.reddit.com via reddit) 1 pts·2 replies· 1d

I own an Oppo Find X3 Pro (Snapdragon 888, 12/256 GB, Android 14.0) unused because of 3 green vertical lines on the screen and poor battery. I tried Google AI Edge Gallery with Gemma-4-E2B-it and it performs well so I thinked: "why don't t…

↯ Gemma 4 llama gemma
Show HN: Running Gemma 4 on an iPhone 13 Pro (github.com via hn) 1 pts· 1d

I just open-sourced how https://github.com/mylovelycodes/LiteRTLM-Swift LiteRTLM-Swift lets you run LLMs locally with a clean Swift API. - On-device inference - No cloud required - Built for iOS

↯ Gemma 4 gemma
Gemma 4 Thinking Like Claude Opus (decrypt.co via hn) 1 pts· 1d

If you've been following the local AI scene, you probably know Qwopus—the open-source model that tried to distill Claude Opus 4.6's reasoning into Alibaba's Qwen, so you could run something resembling Opus on your own hardware for free. It…

↯ Opus 4.6 qwen gemma opus+1
Can LLM make small change to the software program? (www.reddit.com via reddit) 1 pts·4 replies· 1d

I'm currently vibe-coding (I'm new to vibe-coding) with Gemma 4 4EB Q4 and Qwen 3.5 9B Q5 (KV is quantized to 4 bits with new Google TurboQuant implemented in llama.cpp - I use koboldcpp and release said it's automatically activated): the…

↯ Qwen 3.5 qwen llama gemma
Gemma 4 & Obsidian (www.reddit.com via reddit) 1 pts·2 replies· 2d

so today I tried the Obsidian LLM wiki system by Karparthy, but with Gemma 4 locally in OpenCode with instead of Claude code. My experience is very frustrating.

↯ Gemma 4 gemma claude-code claude
Gemopus: A Gemma fine-tune that prioritizes stability over long chain-of-thought (huggingface.co via hn) 1 pts· 2d

🌟 Gemopus-4-26B-A4B-it [!NOTE] Gemopus is an attempt at fine-tuning Gemma 4 with a core philosophy of "stability first". While preserving the original reasoning order of Gemma 4 as much as possible, we conducted targeted refinements for an…

↯ Gemma 4 fine-tuning gemma
Gemma 4 E2B on Android: OpenCL crash on emulator, anyone solved this? (www.reddit.com via reddit) 1 pts·2 replies· 2d

↯ Gemma 4 gemma
Gemma 4 base GGUF? (www.reddit.com via reddit) 1 pts·8 replies· 2d

↯ Gemma 4 gemma
DGX spark (www.reddit.com via reddit) 1 pts·5 replies· 2d

↯ Qwen 3.5 vllm qwen llama+1
Looking for a team to participate in Gemma 4 good hackathon (www.reddit.com via reddit) 1 pts·2 replies· 2d

↯ Gemma 4 gemma
I ran Gemma 4 as a local model in Codex CLI (medium.com via hn) 1 pts· 3d

↯ Gemma 4 gemma codex
Getting no result train Gemma 4 for structured data extraction (www.reddit.com via reddit) 1 pts·2 replies· 3d

↯ Gemma 4 gemma
Spring benchmark update: Gemma 4 / Qwen3.5 vs Gemma 3 / Qwen3 for chat (www.reddit.com via reddit) 3 replies· 7h

Google and Alibaba recently shipped Gemma 4 and Qwen3.5, so I wanted to see whether the new generations are actually better on my setup. My context is private local chat running on my own hardware, a Mac mini M4 Pro.

↯ Qwen 3.5 tool-use gemma agentic
gemma-4-31B-it thinking? (www.reddit.com via reddit) 2 replies· 8h

I can't get my model to think. According to the documentation, thinking should be triggered by starting the system prompt with a '<|think|>' string.

↯ Gemma 4 vllm gemma
Need suggestions for local AI Machine (www.reddit.com via reddit) 11 replies· 12h

I’ve been running various AI harnesses like OpenClaw, ForgeCode, ClaudeCode, etc. Most of these are running via OpenRouter or Minimax (credits/subscription model).

↯ Gemma 4 minimax qwen openclaw+1
gemma4 e2b ore4b on rtx 5070 ti laptop 12GB not running on vLLM (www.reddit.com via reddit) 3 replies· 12h

I cant get gemma 4 e2b or gemma 4 e4b to run on my laptop. I am runnning it via docker as per vllm website and i get the error : Free memory on device cuda:0 (9.71/11.5 GiB) on startup is less than desired GPU memory utilization (0.9, 10.3…

↯ Gemma 4 vllm gemma
gemma4 e4b on rtx 5070 ti laptop 12GB running slow 5t/s llama.cpp (www.reddit.com via reddit) 9 replies· 13h

I hope sincerely someonecan help me because i have tried everything i can and i get this speed using ollama.cpp and opencode. I have put as detail i can my setup and how i am running it.

↯ Gemma 4 ollama llama gemma
How faster is Gemma 4 26B-A4B during inference vs 31B? (www.reddit.com via reddit) 16 replies· 16h

I want to download one and usually do inference on CPU having old GPU so I'm concerned with speed. One link on the web (I have posted with it and post been removed): Multiple users are reporting that Gemma 4's MoE model (26B-A4B) runs sign…

↯ Qwen 3.5 moe qwen llama+1
5090 for 285k on amazon india? (amzn.in via reddit) 6 replies· 18h

How is it possible the seller also has no record just wanted to run gemma 4 31B q4 with 150k ctx

↯ Gemma 4 gemma
How many move your favorite LLM model before it's cheat then brain-dead in chess game ? (www.reddit.com via reddit) 6 replies· 19h

I try with Gemma 4 E4B via llama-sever to play chess at https://www.chess.com/play/computer (any platform or site you convenient), result quite unexpected for me. Result: 9 moves before it make cheating move (like try to move a pawn take a…

↯ Gemma 4 llama gemma
Gemma 4 on iOS: Anyone else stuck on CPU because of the “Buffer(31) Metal Crash? (www.reddit.com via reddit) 22h

Gemma 4 on iOS: Anyone else stuck on CPU because of the "Buffer(31)" Metal crash? Hey everyone, I’m hitting a massive performance wall building an on-device AI app for the iPhone 17 Pro.

↯ Gemma 4 gemma
Gemma 4 is good or bad at real word (www.reddit.com via reddit) 6 replies· 1d

Based on real-world usage by the community, roughly which version of which model is Gemma 4 comparable to? It would be great if you could also mention the hardware requirements for running it (like VRAM or GPU needs)

↯ Gemma 4 gemma
Offload settings for unsloth/Gemma-4 on Apple Silicon? (www.reddit.com via reddit) 1 replies· 1d

Can default settings be optimized, or is it the best it is going to get? M1 Max Is it best in llama.cpp, LM Studio, or ?

↯ Gemma 4 llama gemma
running models bigger than physical memory capacity (www.reddit.com via reddit) 14 replies· 1d

has anyone really tried running models bigger than physical memory capacity? I'd guess most users stick with running models that fit in DRAM + VRAM https://unsloth.ai/docs/models/qwen3.5 even google gemma 4 are released with about 30+ bill…

↯ Qwen 3.5 qwen llama gemma
Is Gemma 4 26B MoE or 31B good as an MCP agent for coding with Xcode? (www.reddit.com via reddit) 1 replies· 1d

Thanks

↯ Gemma 4 moe gemma mcp
What are your opinions on the SuperGemma finetune? (www.reddit.com via reddit) 6 replies· 1d

So, I'm relatively new to the scene and I kind of want to do a sanity check. I've been using gemma-4-26B.

↯ Gemma 4 gemma
Local Agent Hermes setup with Gemma 4 and llama.cpp (www.youtube.com via reddit) 2d

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

↯ Gemma 4 llama gemma
Why don't Groq (with a q) and Cerebras add new models (www.reddit.com via reddit) 12 replies· 2d

Both Groq and Cerebras haven't really updated their provided model for a while, long enough to notice the difference between old and new models on the market. So why don't they add any new models?

↯ Qwen 3.5 gemma
Hardware needed for Gemma 26B MoE vs Qwen 14B for ~100–300 users (vLLM, single node?) (www.reddit.com via reddit) 16 replies· 2d

↯ Qwen 2.5 moe vllm qwen+1
What is the best way to deploy LLM on 3x3090? (www.reddit.com via reddit) 13 replies· 2d

↯ Qwen 3.5 vllm llama gemma
My guess as to what Apple Foundation Models will be like in iOS 27 (www.reddit.com via reddit) 3 replies· 2d

↯ Gemma 4 gemma
Best setup for multiple high-end dissimilar PCs (www.reddit.com via reddit) 1 replies· 2d

↯ Gemma 4 cowork gemma claude
Opinion on best suit for my hardware (www.reddit.com via reddit) 2 replies· 2d

Hello everyone, a newbie here. Amazed by OpenClaw and worried by its high API consumption, I decided to buy two Asus Ascent GX10s (like the Nvidia Spark), so I have a pretty powerful inference cluster with 220GB of real available memory.

↯ Qwen 3.5 openclaw gemma
¿Es el procesamiento 100% offline el verdadero "game changer" de este año? (www.reddit.com via reddit) 3d

↯ Gemma 4 gemma
Speed on m5 pro 48Gb (www.reddit.com via reddit) 3d

Hey guys! How would you reckon a 30-50b model would run on a 48 GBs m5 pro?

↯ Qwen 3.5 glm qwen gemma
Local AI coding assistant that runs fully offline (Gemma 4, codebase-aware) (www.reddit.com via reddit) 9 replies· 3d

↯ Gemma 4 llama gemma
Are the LiteRT versions of Gemma 4 a different architecture? (www.reddit.com via reddit) 3 replies· 3d

↯ Gemma 4 gemma claude-code claude
Opencode + lmstudio : first prompt very slow (www.reddit.com via reddit) 1 replies· 3d

↯ Gemma 4 gemma
Best local LLM that will work fine as a backend for an NSFW discord bot? + having an issue with OpenClaw (www.reddit.com via reddit) 3 replies· 3d

↯ Gemma 4 openclaw gemma

← all tags