Writing a fine-tuning and deployment pipeline isn't as easy as it looks (Gemma 4 Version) Fine-tune and deploy Gemma 4 on Oxen.ai Google's Gemma 4 dropped in April 2026 with multimodal support (text, image, video, audio), a novel hybrid KV…
model
gemma-4-31B-it
huggingface.co/google/gemma-4-31B-it ↗
2640636 downloads1903 likesimage-text-to-texttransformers
from the model card
Hugging Face | GitHub | Launch Blog | Documentation License: Apache 2.0 | Authors: Google DeepMind Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: E2B, E4B, 26B A4B, and 31B. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. Gemma 4 introduces key capability and architectural advancements: Reasoning – All models in the family are designed as highly capable reasoners, with configurable thinking modes. Extended Multimodalities – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models). Diverse & Efficient Architectures – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment. Optimized for On-Device – Smaller models are …
discussions
- Gemma 4 63 ongoing since 2026-04-12
recent items
Fine-tuning and deploying Gemma 4 is not that easy (ghost.oxen.ai via hn) Gemma 4-written, small cc0 encyclopedia of some core science content (stateofutopia.com via hn) Published: April 16, 2026 This is an encyclopedia of some core content from Biology and Health Sciences, Physical Sciences, and Technology. It contains 2,259 small entries of about a paragraph each.
why gemma 4 31b so bad in long context? (www.reddit.com via reddit) question, I'm using it for text translations and on each large prompt (20K+) it stops with a remark 'now I'm going to put that to the file' or some other operation I have asked in the prompt for but it did nothing, just stopped. I'm runnin…
gemma-4-31B-it thinking? (www.reddit.com via reddit) I can't get my model to think. According to the documentation, thinking should be triggered by starting the system prompt with a '<|think|>' string.
LiteRT LM Framework with Rockchip NPU (RKNN 3588) (www.reddit.com via reddit) Im searching for build version of LiteRT LM framework can use and utilize the NPU of the RKNN 3588. It would be great since I can run gemma 4 e2b model using this framework on the machine, because I wont have to migrate my codebase from li…
Need suggestions for local AI Machine (www.reddit.com via reddit) I’ve been running various AI harnesses like OpenClaw, ForgeCode, ClaudeCode, etc. Most of these are running via OpenRouter or Minimax (credits/subscription model).
gemma4 e2b ore4b on rtx 5070 ti laptop 12GB not running on vLLM (www.reddit.com via reddit) I cant get gemma 4 e2b or gemma 4 e4b to run on my laptop. I am runnning it via docker as per vllm website and i get the error : Free memory on device cuda:0 (9.71/11.5 GiB) on startup is less than desired GPU memory utilization (0.9, 10.3…
gemma4 e4b on rtx 5070 ti laptop 12GB running slow 5t/s llama.cpp (www.reddit.com via reddit) I hope sincerely someonecan help me because i have tried everything i can and i get this speed using ollama.cpp and opencode. I have put as detail i can my setup and how i am running it.
Llama.cpp vs LM Studio on gaming PC (www.reddit.com via reddit) Here is my experience, I've been using LM Studio with RTX 5080 and 64GB RAM using Windows 11. I'm very happy with LM Studio except the speed.
Gemma 4 Jailbreak System Prompt (www.reddit.com via reddit) Use the following system prompt to allow Gemma (and most open source models) to talk about anything you wish. Add or remove from the list of allowed content as needed.
Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference (www.gizmoweek.com via hn) Google Gemma 4 Runs Natively on iPhone With Full Offline AI Inference - GizmoWeek GizmoWeek Read the News News Reviews Apple How to Phones Products Subscribe Subscribe to newsletter [x] I've read and accept the Privacy Policy. Follow us Fa…
5090 for 285k on amazon india? (amzn.in via reddit) How is it possible the seller also has no record just wanted to run gemma 4 31B q4 with 150k ctx
How many move your favorite LLM model before it's cheat then brain-dead in chess game ? (www.reddit.com via reddit) I try with Gemma 4 E4B via llama-sever to play chess at https://www.chess.com/play/computer (any platform or site you convenient), result quite unexpected for me. Result: 9 moves before it make cheating move (like try to move a pawn take a…
(llama.cpp) Possible to disable reasoning for some requests (while leaving reasoning on by default)? (www.reddit.com via reddit) I am running unsloth/gemma-4-26B-A4B-it-GGUF/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf with llama-server (with reasoning enabled). Is it possible to disable reasoning for some requests only?
Issues with Gemma 4 tool calling - abrupt gen ending despite the model telling me it wants to do X. (www.reddit.com via reddit) Hello, I have noticed an annoying issue with Gemma 4 26b a4b. It seems like it cannot do multiple think->tool call->think->tool call turns.
Gemma 4 on iOS: Anyone else stuck on CPU because of the “Buffer(31) Metal Crash? (www.reddit.com via reddit) Gemma 4 on iOS: Anyone else stuck on CPU because of the "Buffer(31)" Metal crash? Hey everyone, I’m hitting a massive performance wall building an on-device AI app for the iPhone 17 Pro.
Ive automated my email/sms/phone (www.reddit.com via reddit) we got it good boys! how many of you are doing this??
Experience with medium sized LLMs (www.reddit.com via reddit) I have tried to use several models on my 8gb ram MacBook and concluded that 4b parameters models are just “stupid” for my tasks (i.e. summarisation of pdfs, language learning, etc.).
Ollama Cloud - Pro (www.reddit.com via reddit) Hi. I've been looking at ollama cloud's Pro offering ($20), which says "Run 3 cloud models at a time".
Gemma 4 running locally on an iPhone 13 Pro (www.reddit.com via reddit) I’ve been experimenting with running LLMs fully on-device, and managed to get Gemma 4 running locally on an iPhone 13 Pro. This is built on top of a lightweight Swift wrapper I open-sourced: https://github.com/mylovelycodes/LiteRTLM-Swift…
For those running an OpenClaw instance, how do you manage sandboxing and prevention of unwanted behavior? (www.reddit.com via reddit) Right now, I'm working on a small app to help eliminate my own doomscrolling by automatically crawling sites and summarizing news articles. However, I don't like the idea of giving OpenClaw free reign of my system, nor giving it any sort o…
Gemma 4 is good or bad at real word (www.reddit.com via reddit) Based on real-world usage by the community, roughly which version of which model is Gemma 4 comparable to? It would be great if you could also mention the hardware requirements for running it (like VRAM or GPU needs)
Show HN: Running Gemma 4 on an iPhone 13 Pro (github.com via hn) I just open-sourced how https://github.com/mylovelycodes/LiteRTLM-Swift LiteRTLM-Swift lets you run LLMs locally with a clean Swift API. - On-device inference - No cloud required - Built for iOS
Offload settings for unsloth/Gemma-4 on Apple Silicon? (www.reddit.com via reddit) Can default settings be optimized, or is it the best it is going to get? M1 Max Is it best in llama.cpp, LM Studio, or ?
What's the better way to install llama.cpp on Android? (www.reddit.com via reddit) I own an Oppo Find X3 Pro (Snapdragon 888, 12/256 GB, Android 14.0) unused because of 3 green vertical lines on the screen and poor battery. I tried Google AI Edge Gallery with Gemma-4-E2B-it and it performs well so I thinked: "why don't t…
Is Gemma 4 26B MoE or 31B good as an MCP agent for coding with Xcode? (www.reddit.com via reddit) Thanks
What are your opinions on the SuperGemma finetune? (www.reddit.com via reddit) So, I'm relatively new to the scene and I kind of want to do a sanity check. I've been using gemma-4-26B.
Local models capabilities (www.reddit.com via reddit) Claude CLI, Codex CLI and Gemini CLI, all have agentic capabilities that it is capable of editing files or folders in my local machine directly or the apps that I have integrated using MCPs when working on my request like coding task or re…
Fixed: IPEX-LLM + modern Ollama models (qwen3, gemma4) on Intel Arc 140V Lunar Lake Windows 11 — undocumented solution (www.reddit.com via reddit) Been trying to run local LLMs on my new Dell XPS 13 with Intel Arc 140V (Lunar Lake, 16GB) and hit a wall — Intel's official docs point to a portable zip frozen at Ollama v0.5.4 which can't pull any modern model. Spent a while debugging it…
Local Agent Hermes setup with Gemma 4 and llama.cpp (www.youtube.com via reddit) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC