Use the following system prompt to allow Gemma (and most open source models) to talk about anything you wish. Add or remove from the list of allowed content as needed.
#gemma
88 items
Gemma 4 Jailbreak System Prompt (www.reddit.com via reddit) Gemma4 26b & E4B are crazy good, and replaced Qwen for me! (www.reddit.com via reddit) My pre-gemma 4 setup was as follows: Llama-swap, open-webui, and Claude code router on 2 RTX 3090s + 1 P40 (My third 3090 died, RIP) and 128gb of system memory Qwen 3.5 4B for semantic routing to the following models, with n_cpu_moe where…
Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference (www.gizmoweek.com via hn) Google Gemma 4 Runs Natively on iPhone With Full Offline AI Inference - GizmoWeek GizmoWeek Read the News News Reviews Apple How to Phones Products Subscribe Subscribe to newsletter [x] I've read and accept the Privacy Policy. Follow us Fa…
Local models are a godsend when it comes to discussing personal matters (www.reddit.com via reddit) The LLM tunes its own llama.cpp flags (+54% tok/s on Qwen3.5-27B) (www.reddit.com via reddit) CPUs Aren't Dead. Gemma2B Out Scored GPT-3.5 Turbo on Test That Made It Famous (seqpu.com via hn) CPUs Aren't Dead. Gemma 2B Just Scored Higher Than GPT-3.5 Turbo on the Test That Made It Famous — Your Laptop Can Run It, or Cloudflare for $5/Mo.
Do you guys think there’s a high chance of Singularity being open source? (www.reddit.com via reddit) Gemma 4 31B — 4bit is all you need (www.reddit.com via reddit) PSA: Having issues with Qwen3.5 overthinking? Give it a tool, and it can help dramatically. (www.reddit.com via reddit) I'm sure everyone has seen the posts from people talking about Qwen 3.5 over-thinking, or maybe you've experienced it yourself. Considering we're like 2 months out from the release and I still see people talk about this issue, I decided it…
Comparing Qwen3.5 27B vs Gemma 4 31B for agentic stuff (www.reddit.com via reddit) Benchmarked Gemma 4 E2B: The 2B model beat every larger sibling on multi-turn (70%) (aiexplr.com via reddit) common/gemma4 : handle parsing edge cases by aldehir · Pull Request #21760 · ggml-org/llama.cpp (github.com via reddit) If you are on Gemma (like me), you basically have to compile llama.cpp daily now
Gemma 4 31b 3D geometry (www.reddit.com via reddit) I have been nothing but impressed by the quality of Gemma 4 since release. In general conversation it's adaptable to different personas.
Gemma 4 31B passed 7/8 real-world production tests — including ones I designed to make it fail. Full prompts + outputs. (www.reddit.com via reddit) I've been waiting for a capable free local LLM for a while. I think we're close — the quality is getting there fast, and Gemma 4 is the first open-weight model where I genuinely considered using it in production for simple-to-medium tasks.
[cupel] M5 Max 128GB: Qwen3.5-397B IQ2 @ 29 tokens per second (www.reddit.com via reddit) Turn an old Android phone into a Local AI Voice Assistant (www.reddit.com via reddit) I had a nice old cracked pixel 5a laying around that I wanted to get some use out of, so I turned it into a local AI Voice assistant. A server on a laptop running llama.cpp gemma-3-4b-q4.gguf served by flask connects to a script running on…
(llama.cpp) Possible to disable reasoning for some requests (while leaving reasoning on by default)? (www.reddit.com via reddit) I am running unsloth/gemma-4-26B-A4B-it-GGUF/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf with llama-server (with reasoning enabled). Is it possible to disable reasoning for some requests only?
Pocket LLM v1.3.0: Offline local LLM chat on Android with LiteRT + ONNX builds (www.reddit.com via reddit) Hi everyone, I've been working on Pocket LLM, an Android app for running local LLMs fully offline for private, real-time chat. The latest v1.3.0 update adds: • LiteRT support for Gemma 4 E2B, Gemma 4 E4B, and Qwen3-0.6B • Persistent local…
GPU advice for Qwen 3.5 27B / Gemma 4 31B (dense) — aiming for 64K ctx, 30+ t/s (www.reddit.com via reddit) Hey all, Looking for some real-world advice on GPU choices for running the new dense models — mainly Qwen 3.5 27B and Gemma 4 31B. What I’m targeting Context: 64K+ (ideally higher later) Speed: 30+ tok/s @ tg128 minimum Power: not critical…
Ive automated my email/sms/phone (www.reddit.com via reddit) we got it good boys! how many of you are doing this??
why gemma 4 31b so bad in long context? (www.reddit.com via reddit) question, I'm using it for text translations and on each large prompt (20K+) it stops with a remark 'now I'm going to put that to the file' or some other operation I have asked in the prompt for but it did nothing, just stopped. I'm runnin…
Llama.cpp vs LM Studio on gaming PC (www.reddit.com via reddit) Here is my experience, I've been using LM Studio with RTX 5080 and 64GB RAM using Windows 11. I'm very happy with LM Studio except the speed.
Gemma 4 and the Economics of Selling AI (gertlabs.com via hn) Thinking with a smaller model to speed things up? (www.reddit.com via reddit) What's your favorite small-medium local model? (www.reddit.com via reddit) I'm now having fun with Gemma-4-E4B and Qwen3.5-9B, trying different variants like Gemopus and Qwopus, and Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0 don't quite know other models, so what's your favorite? why and how are them?
Show HN: I benchmarked Gemma 4 E2B – the 2B model beat the 12B on multi-turn (aiexplr.com via hn) Why some small/medium models fail at grammar checking task? (www.reddit.com via reddit) Fine-tuning and deploying Gemma 4 is not that easy (ghost.oxen.ai via hn) Writing a fine-tuning and deployment pipeline isn't as easy as it looks (Gemma 4 Version) Fine-tune and deploy Gemma 4 on Oxen.ai Google's Gemma 4 dropped in April 2026 with multimodal support (text, image, video, audio), a novel hybrid KV…
GGUF Quants Arena for MMLU (24GB VRAM + 128GB RAM) (www.reddit.com via reddit) Dataset: MMLU subset (DEV+TEST) Llamacpp setting: 3 params only ctx 8192 , seed 42 , fa on Let me know whatelse do you want to see. Thanks.
Gemma 4 running locally on an iPhone 13 Pro (www.reddit.com via reddit) I’ve been experimenting with running LLMs fully on-device, and managed to get Gemma 4 running locally on an iPhone 13 Pro. This is built on top of a lightweight Swift wrapper I open-sourced: https://github.com/mylovelycodes/LiteRTLM-Swift…
[Fix] Gemma 4 MCP tool calls broken in LM Studio — "Unknown test: sequence" (www.reddit.com via reddit) Issues with Gemma 4 tool calling - abrupt gen ending despite the model telling me it wants to do X. (www.reddit.com via reddit) Hello, I have noticed an annoying issue with Gemma 4 26b a4b. It seems like it cannot do multiple think->tool call->think->tool call turns.
Loading "stacks" of models on-demand? Does a tool like this exist? (www.reddit.com via reddit) I'd like to self-host some LLM models but a couple different ones for different usecases, and they don't all fit in VRAM at the same time. So i'm kind of looking for a tool in which i can define "profiles" or "stacks" of LLM's that get loa…
What's the deal with Qwen3.5's and Gemma 4's reasoning traces? (www.reddit.com via reddit) Hey there, I noticed something odd when trying out the latest and greatest local reasoning models recently. First, I just noticed it for Qwen3.5, but Gemma 4 seems to do it too: The reasoning traces do that weird thing of starting with "He…
RTX 3090 llamacpp flags help (www.reddit.com via reddit) Show HN: Hitoku Draft – context aware local macOS assistant (github.com via hn) Knlowledge Graph and hybrid DB (www.reddit.com via reddit) Hello, everybody! I'm building and hybrid database with Qdrant and Neo4j for a few personal projects.
Minimax M2.7 on Q3_K_S or Smaller Model with greater precision? (www.reddit.com via reddit) I currently am looking for models to fit into my single DGX Spark for use. I have an RTX Pro 6000 and also a 5090 as well that I'm considering using in combination if the DGX Spark is too slow, but the intent here is to play around with Op…
Ollama Cloud - Pro (www.reddit.com via reddit) Hi. I've been looking at ollama cloud's Pro offering ($20), which says "Run 3 cloud models at a time".
Does an MLX conversation have same capabilities as the GGUF? (www.reddit.com via reddit) For example, in LMStudio the official Gemma 4 is a GGUF that has Vision, Reasoning, and Tools flags. But the MLX version does not.
Suggestion for a local model to solve math problems. (www.reddit.com via reddit) How do I use gemma4 on 5090 gpu for coding? (www.reddit.com via reddit) I'm trying to replace openai codex which i used for development all the time, with gemma4 on 4090, small tasks it solves quite impressively, but i need to have some agent. So I tried to connect 31b to cline and to aider and it didn't reall…
Gemma 4-written, small cc0 encyclopedia of some core science content (stateofutopia.com via hn) Published: April 16, 2026 This is an encyclopedia of some core content from Biology and Health Sciences, Physical Sciences, and Technology. It contains 2,259 small entries of about a paragraph each.
Local Coding Stacks (www.reddit.com via reddit) I’m trying to reduce my reliance on Claude. I have a 5090/128GB RAM.
Feedback on iOS app with local AI models (www.reddit.com via reddit) Hey everyone, I just shipped an iOS app that runs local AI models. Current has 12 models: Gemma 4, Llama 3.3, Qwen3, DeepSeek R1 Distill, Phi-4, etc.
LiteRT LM Framework with Rockchip NPU (RKNN 3588) (www.reddit.com via reddit) Im searching for build version of LiteRT LM framework can use and utilize the NPU of the RKNN 3588. It would be great since I can run gemma 4 e2b model using this framework on the machine, because I wont have to migrate my codebase from li…
Thinking issue [Qwen3.5] (www.reddit.com via reddit) I've been testing a few models lately and I'm running into a weird issue with the bigger Qwen3.5s. Tested: Gemma 4 26B Qwen3.5 9B Qwen3.5 27B Qwen3.5 35B The 27B and 35B are driving me nuts.
Best Ollama models/settings for an 8GB VPS (CPU only, ARM)? Running into memory & looping issues. (www.reddit.com via reddit) Hi everyone, I'm trying to run a local LLM via Ollama on a Hetzner cax21 VPS (ARM64, 4 vCPUs, 8GB RAM, 80GB SSD). I have Ollama running successfully via Coolify.
Ask HN: What are you building with Gemma? What do you wish existed? (news.ycombinator.com via hn) What's the better way to install llama.cpp on Android? (www.reddit.com via reddit) I own an Oppo Find X3 Pro (Snapdragon 888, 12/256 GB, Android 14.0) unused because of 3 green vertical lines on the screen and poor battery. I tried Google AI Edge Gallery with Gemma-4-E2B-it and it performs well so I thinked: "why don't t…
Show HN: Running Gemma 4 on an iPhone 13 Pro (github.com via hn) I just open-sourced how https://github.com/mylovelycodes/LiteRTLM-Swift LiteRTLM-Swift lets you run LLMs locally with a clean Swift API. - On-device inference - No cloud required - Built for iOS
Gemma 4 Thinking Like Claude Opus (decrypt.co via hn) If you've been following the local AI scene, you probably know Qwopus—the open-source model that tried to distill Claude Opus 4.6's reasoning into Alibaba's Qwen, so you could run something resembling Opus on your own hardware for free. It…
Can LLM make small change to the software program? (www.reddit.com via reddit) I'm currently vibe-coding (I'm new to vibe-coding) with Gemma 4 4EB Q4 and Qwen 3.5 9B Q5 (KV is quantized to 4 bits with new Google TurboQuant implemented in llama.cpp - I use koboldcpp and release said it's automatically activated): the…
Gemma 4 & Obsidian (www.reddit.com via reddit) so today I tried the Obsidian LLM wiki system by Karparthy, but with Gemma 4 locally in OpenCode with instead of Claude code. My experience is very frustrating.
Gemopus: A Gemma fine-tune that prioritizes stability over long chain-of-thought (huggingface.co via hn) 🌟 Gemopus-4-26B-A4B-it [!NOTE] Gemopus is an attempt at fine-tuning Gemma 4 with a core philosophy of "stability first". While preserving the original reasoning order of Gemma 4 as much as possible, we conducted targeted refinements for an…
Gemma 4 E2B on Android: OpenCL crash on emulator, anyone solved this? (www.reddit.com via reddit) Gemma 4 base GGUF? (www.reddit.com via reddit) DGX spark (www.reddit.com via reddit) Looking for a team to participate in Gemma 4 good hackathon (www.reddit.com via reddit) I ran Gemma 4 as a local model in Codex CLI (medium.com via hn) Getting no result train Gemma 4 for structured data extraction (www.reddit.com via reddit) Spring benchmark update: Gemma 4 / Qwen3.5 vs Gemma 3 / Qwen3 for chat (www.reddit.com via reddit) Google and Alibaba recently shipped Gemma 4 and Qwen3.5, so I wanted to see whether the new generations are actually better on my setup. My context is private local chat running on my own hardware, a Mac mini M4 Pro.
gemma-4-31B-it thinking? (www.reddit.com via reddit) I can't get my model to think. According to the documentation, thinking should be triggered by starting the system prompt with a '<|think|>' string.
Need suggestions for local AI Machine (www.reddit.com via reddit) I’ve been running various AI harnesses like OpenClaw, ForgeCode, ClaudeCode, etc. Most of these are running via OpenRouter or Minimax (credits/subscription model).
gemma4 e2b ore4b on rtx 5070 ti laptop 12GB not running on vLLM (www.reddit.com via reddit) I cant get gemma 4 e2b or gemma 4 e4b to run on my laptop. I am runnning it via docker as per vllm website and i get the error : Free memory on device cuda:0 (9.71/11.5 GiB) on startup is less than desired GPU memory utilization (0.9, 10.3…
gemma4 e4b on rtx 5070 ti laptop 12GB running slow 5t/s llama.cpp (www.reddit.com via reddit) I hope sincerely someonecan help me because i have tried everything i can and i get this speed using ollama.cpp and opencode. I have put as detail i can my setup and how i am running it.
How faster is Gemma 4 26B-A4B during inference vs 31B? (www.reddit.com via reddit) I want to download one and usually do inference on CPU having old GPU so I'm concerned with speed. One link on the web (I have posted with it and post been removed): Multiple users are reporting that Gemma 4's MoE model (26B-A4B) runs sign…
5090 for 285k on amazon india? (amzn.in via reddit) How is it possible the seller also has no record just wanted to run gemma 4 31B q4 with 150k ctx
How many move your favorite LLM model before it's cheat then brain-dead in chess game ? (www.reddit.com via reddit) I try with Gemma 4 E4B via llama-sever to play chess at https://www.chess.com/play/computer (any platform or site you convenient), result quite unexpected for me. Result: 9 moves before it make cheating move (like try to move a pawn take a…
Gemma 4 on iOS: Anyone else stuck on CPU because of the “Buffer(31) Metal Crash? (www.reddit.com via reddit) Gemma 4 on iOS: Anyone else stuck on CPU because of the "Buffer(31)" Metal crash? Hey everyone, I’m hitting a massive performance wall building an on-device AI app for the iPhone 17 Pro.
Gemma 4 is good or bad at real word (www.reddit.com via reddit) Based on real-world usage by the community, roughly which version of which model is Gemma 4 comparable to? It would be great if you could also mention the hardware requirements for running it (like VRAM or GPU needs)
Offload settings for unsloth/Gemma-4 on Apple Silicon? (www.reddit.com via reddit) Can default settings be optimized, or is it the best it is going to get? M1 Max Is it best in llama.cpp, LM Studio, or ?
running models bigger than physical memory capacity (www.reddit.com via reddit) has anyone really tried running models bigger than physical memory capacity? I'd guess most users stick with running models that fit in DRAM + VRAM https://unsloth.ai/docs/models/qwen3.5 even google gemma 4 are released with about 30+ bill…
Is Gemma 4 26B MoE or 31B good as an MCP agent for coding with Xcode? (www.reddit.com via reddit) Thanks
What are your opinions on the SuperGemma finetune? (www.reddit.com via reddit) So, I'm relatively new to the scene and I kind of want to do a sanity check. I've been using gemma-4-26B.
Local Agent Hermes setup with Gemma 4 and llama.cpp (www.youtube.com via reddit) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Why don't Groq (with a q) and Cerebras add new models (www.reddit.com via reddit) Both Groq and Cerebras haven't really updated their provided model for a while, long enough to notice the difference between old and new models on the market. So why don't they add any new models?
Hardware needed for Gemma 26B MoE vs Qwen 14B for ~100–300 users (vLLM, single node?) (www.reddit.com via reddit) What is the best way to deploy LLM on 3x3090? (www.reddit.com via reddit) My guess as to what Apple Foundation Models will be like in iOS 27 (www.reddit.com via reddit) Best setup for multiple high-end dissimilar PCs (www.reddit.com via reddit) Opinion on best suit for my hardware (www.reddit.com via reddit) Hello everyone, a newbie here. Amazed by OpenClaw and worried by its high API consumption, I decided to buy two Asus Ascent GX10s (like the Nvidia Spark), so I have a pretty powerful inference cluster with 220GB of real available memory.
¿Es el procesamiento 100% offline el verdadero "game changer" de este año? (www.reddit.com via reddit) Speed on m5 pro 48Gb (www.reddit.com via reddit) Hey guys! How would you reckon a 30-50b model would run on a 48 GBs m5 pro?
Local AI coding assistant that runs fully offline (Gemma 4, codebase-aware) (www.reddit.com via reddit) Are the LiteRT versions of Gemma 4 a different architecture? (www.reddit.com via reddit) Opencode + lmstudio : first prompt very slow (www.reddit.com via reddit) Best local LLM that will work fine as a backend for an NSFW discord bot? + having an issue with OpenClaw (www.reddit.com via reddit)