model

gemma-4-26B-A4B-it

huggingface.co/google/gemma-4-26B-A4B-it ↗

11696495 downloads·1064 likes·image-text-to-text·transformers

from the model card

Hugging Face | GitHub | Launch Blog | Documentation License: Apache 2.0 | Authors: Google DeepMind Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: E2B, E4B, 26B A4B, and 31B. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. Gemma 4 introduces key capability and architectural advancements: Reasoning – All models in the family are designed as highly capable reasoners, with configurable thinking modes. Extended Multimodalities – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B and E4B models). Diverse & Efficient Architectures – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment. Optimized for On-Device – Smaller models are …

discussions

Gemma 4 6 ongoing since 2026-06-24
Gemma 4 3 2026-06-17 – 2026-06-20
Gemma 4 75 2026-06-01 – 2026-06-16

recent items

claude is a token maxxing f*ckboi, what's next? (www.reddit.com via reddit) 10h

is there anything with a 1M context window I can spend 100-200usd a day on that actually works? I don't have 5-10m to wait for claude to think about how to respond to a three word prompt.

↯ Gemma 4 ↯ Qwen 3.6 opus claude-code
I brought Claude-style artifacts to local models (www.reddit.comhttps) 19h

One thing I miss when using local models is the artifact experience from Claude. With Claude, if you ask for a dashboard, chart, diagram, or landing page, you actually get the thing rendered in the chat.

↯ Gemma 4 gemma
[R] Gemma-4-12B-IT-Uncensored-Opus4.7-CoT (No Intel Loss) (www.reddit.com via reddit) 1d

Hi everyone, I just released Gemma-4-12B-Uncensored-Opus4.7-CoT. To remove the safety filters without destroying the model's reasoning, I combined a precise ablation method with a CoT (Chain-of-Thought) data fine-tune to fully recover the…

↯ Gemma 4 ↯ Opus 4.7 gemma
In Claude Code I fine-tuned Gemma 4 (E2B, Q4_K_M) and got it running 100% on-device in an iOS app — a little sea-creature companion you actually talk to. Offline, no servers, beta's open. (www.reddit.comhttps) 1d

Requirements: iPhone with A17 Pro or newer (8 GB RAM floor for the model), iOS 26+. TestFlight beta is open to anyone with a compatible device.

↯ Gemma 4 gemma claude-code
Show HN: Loqi, a "local-first" translation tool using Ollama/llama.cpp (github.com via hn) +2 3d

I got tired of sending every text I translate to Google/DeepL. Even with all the opt-out options and privacy policies, it never felt right especially for some work documents, personal writing, or anything sensitive.

↯ Gemma 4 ollama gemma llama
Looks like I found a minor glitch in claude cli (www.reddit.com via reddit) 5d

https://preview.redd.it/0jai8prknl8h1.png?width=2040&format=png&auto=webp&s=61576e05a908614b672db1fc89cb46cd4e148cde Steps to reproduce Run claude cli with ollama provider (`ollama launch claude --model gemma4`) Run `/model` command in the…

↯ Gemma 4 haiku ollama sonnet+1
Fable 5 pushed Gemma 4 to 255 tok/s on WebGPU (xcancel.com via hn) +8 7d

Before Fable 5 was shut down, it pushed Gemma 4 to 255 tok/s on WebGPU. Some didn't believe it was real.

↯ Gemma 4 ↯ Gemma 4 ↯ Gemma 4 ↯ Gemma 4 gemma
Gemma 4 E2B running in-browser at 255 tok/s (huggingface.co via hn) +6 8d

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

↯ Gemma 4 ↯ Gemma 4 ↯ Gemma 4 ↯ Gemma 4 gemma
Gemma 4 for Telephony: From Two AI Models to One – Until I Switched to Chinese (medium.com via hn) +3 11d

9 min read 23 hours ago Building a phone agent on a multimodal LLM: dropping faster-whisper and letting Gemma 4 hear the caller directly — a response-time and reply-accuracy benchmark across English, French, and Mandarin Press enter or cli…

↯ Gemma 4 ↯ Gemma 4 gemma
Show HN: Ciris – an open-source AI agent in 29 languages on iOS and Android (ciris.ai via hn) +2 2w

On your phone A small open model, like Gemma 4, runs on the device. Completely offline.

↯ Gemma 4 gemma
I have finally tested it : large models can be run on low RAM / no VRAM (www.reddit.com via reddit) 2w

I was not sure myself, seeing a lot of statements here and around like "you need XXX VRAM / Unified Memory to run this model". So today I finally tested it.

↯ Gemma 4 moe gemma
Any chances for a 12B diffusion Gemma? (www.reddit.com via reddit) 2w

Currently recompiling my llama.cpp with support for diffusion Gemma, but I know on my hardware it won't likely be all that viable. I feel like if the goal was to take better advantage of consume GPUs for fast, intelligent generation, build…

↯ Gemma 4 gemma llama
nvidia/diffusiongemma-26B-A4B-it-NVFP4 · Hugging Face (huggingface.co via reddit) 2w

Model Overview Description: DiffusionGemma 26B A4B IT is an open-weights multimodal generative model developed by Google DeepMind that processes text, image, and video inputs to produce text output via discrete diffusion. Built on the Gemm…

↯ Deepmind ↯ Function Calling ↯ Gemma 4 function-calling deepmind moe+1
Monitor your screen using local LLMs with only one sentence! Free, Open Source and Local. (youtu.be via reddit) 2w

TLDR: I just added an MCP to the Observer framework making it 10x easier to use, so you can create micro-agents that monitor your screen autonomously, literally one sentence and you're done! So just typing "Monitor my Steam download and se…

↯ Gemma 4 gemma llama mcp
LLMs and tabletop games (www.reddit.com via reddit) 2w

Hey everyone, Recently I bought S.T.A.L.K.E.R. The Board Game.

↯ Gemma 4
Are these quants of QAT better than non-QAT? What do I use? (www.reddit.com via reddit) 2w

https://huggingface.co/mradermacher/gemma-4-31B-it-qat-q4_0-unquantized-i1-GGUF/tree/main https://huggingface.co/mradermacher/gemma-4-31B-it-qat-q4_0-unquantized-GGUF/tree/main I waited a bit before asking this. I have 3060 12GB and 32GB d…

↯ Gemma 4 gemma
Gemma-4-31B at 256K context on a $1,400 AMD GPU – measured, with patches (github.com via hn) +2 2w

Gemma-4 31B at 256K Context on a $1,400 AMD GPU — TurboQuant KV Cache on RDNA4 Running **Gemma-4-31B-it with a TurboQuant KV cache and HIP graphs together on AMD RDNA4 (gfx1201) — a combination that crashes out of the box and, to our knowl…

↯ Gemma 4 gemma
I installed: HONCHO local hosted no docker (TUTORIAL) (www.reddit.com via reddit) 2w

...So you don't have to. For those curious about honcho but overwhelmed by the lack of clarity unless you want to use docker...

↯ Gemma 4 gemini
Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt? (www.reddit.com via reddit) 2w

I'm trying to use Gemma 4 12B — the new encoder-free unified model (audio/vision/text in one) — for a one-pass audio → response voice assistant: feed the recorded WAV + system prompt and get the reply back as text directly, collapsing the…

↯ Gemma 4 vllm gemma
I wired up Agentic Coding with Code Context Graphs, results are interesting (www.reddit.com via reddit) 2w

I have been curious about how will having a infrastructure that provides agents the capability to explore code bases as relations, rather than text will change the performance of the AI agents So, for the last few weeks, I have been buildi…

↯ Gemma 4 gemma gemini mcp+1
Newer Qwen models are worse at summarization? (www.reddit.com via reddit) 2w

We have summaries annotated by real humans that we benchmark various models, using an LLM as a judge, we found that in the 30B params range, Qwen 3 tops it out, followed by Gemma 4. It feels like newer Qwens are optimized to perform agenti…

↯ Gemma 4 gemma qwen agentic
OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization (www.reddit.comhttps) 2w

↯ Gemma 4 gemma
Watch agents fight: a live challenge to speed up Gemma 4 E4B inference on a single A10G (huggingface.co via reddit) 2w

↯ Gemma 4 gemma
Unsloth Gemma 4 QAT MTP assistant models now available (www.reddit.com via reddit) 2w

Unsloth Gemma 4 QAT MTP assistant models now available They're both available as q8_0 models named mtp-gemma-4-*.gguf on the root of the directory and in both q8 and larger quants within an MTP folder. https://huggingface.co/unsloth/gemma-…

↯ Gemma 4 gemma
Introducing Gemma 4 12B: a unified, encoder-free multimodal model (deepmind.google) 2w

Introducing Gemma 4 12B: a unified, encoder-free multimodal model Today, we are introducing Gemma 4 12B, our latest model designed to bring agentic multimodal intelligence directly to laptops. Bridging the gap between our edge-friendly E4B…

↯ Gemma 4 gemma agentic
Bonsai: Human->LLM->Web with LLM interface using Gemma4 12B locally on Windows (drive.google.com via hn) +21 2w

JavaScript must be enabled to use Google Drive Learn more Skip to main content Keyboard shortcuts Accessibility feedback This browser version is no longer supported. Please upgrade to a supported browser.

↯ Gemma 4
Unexpected Unsloth QAT Performance Compared to Unsloth IQ4_XS (www.reddit.com via reddit) 2w

Hi everyone, I am comparing the standard (non-QAT) iq4_xs and q3_k_m quants with this QAT q4_k_xl model. (All of them are Unsloth versions)(gemma-4-26B-A4B-it-GGUF via lmstudio).

↯ Gemma 4 gemma
Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants? (www.reddit.com via reddit) 2w

I'm trying to find out if anyone has done any benchmarking comparing the Gemma 4 4-bit QAT models (via Unsloth) against standard 8-bit non-QAT quants. I know QAT is supposed to retain a ton of accuracy compared to the baseline BF16, but I'…

↯ Gemma 4 gemma
LMStudio gemma 4 31b QAT with MTP (www.reddit.com via reddit) 2w

Did anyone manage to launch that in LMStudio? I am on the most recent update with the most recent llama.cpp available in LMStudio.

↯ Gemma 4 gemma llama
Gemma 4 MTP with assistant vs llama cpp type MTP (www.reddit.com via reddit) 2w

Hi all Been loving the QAT models but honestly what is up with the assistant models, any ggufs and ways to make em work with vanilla llamacpp and if this way of MTP is different than the one am17an developed for llamacpp. Followup question…

↯ Gemma 4 gemma llama

← all models