https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/snapdragon/README.md I have an Oneplus 12 with Snapdragon 8 Gen 3. I followed the above README to cross-compile llama.cpp on Ubuntu and then copy to the Termux directory on the…
model
gemma-3-12b-it
huggingface.co/google/gemma-3-12b-it ↗
2515706 downloads704 likesimage-text-to-texttransformers
discussions
recent items
Running llama.cpp on Snapdragon Hexagon NPU seems promising (www.reddit.com) I've created a LoRA for Gemma 3 270M making it probably the smallest thinking model? (www.reddit.com) https://huggingface.co/firstbober/gemma-3-270M-it-smol-thinker Here is an example of the output: ``` ==================== THINKING ==================== Here is the thinking process: This is a large community with a wide range of interests…
Turn an old Android phone into a Local AI Voice Assistant (www.reddit.com) I had a nice old cracked pixel 5a laying around that I wanted to get some use out of, so I turned it into a local AI Voice assistant. A server on a laptop running llama.cpp gemma-3-4b-q4.gguf served by flask connects to a script running on…
Best second GPU for RTX 4070 Super? (www.reddit.com) So i currently have an rtx 4070 super, and it can easily run models like gemma3 12b and even gpt-oss 20b (although it takes up to a minute to generate a response). I want to get a second gpu so i can run larger models around 20b-30b params.
Creation OS: local σ-gated LLM runtime — BitNet/Qwen/Gemma, abstention, conformal gate, MCP, no cloud (www.reddit.com) I’ve been building a local-first AI runtime that wraps local LLMs with a σ-gate — a measurement layer that decides ACCEPT, RETHINK, or ABSTAIN before an answer reaches you. The idea: local models should be able to say “I don’t know” instea…
Good LLM to generate ascii art? (www.reddit.com) I tried with Qwen but it sucked, Gemma3/4 was better but not good enough. From Gemma: https://pastebin.com/raw/Qr5iMgYj Still looks like a bloody car accident though.
Knlowledge Graph and hybrid DB (www.reddit.com) Hello, everybody! I'm building and hybrid database with Qdrant and Neo4j for a few personal projects.
Upset about Nemotron Super (alleged) high precision post-training (www.reddit.com) https://arxiv.org/abs/2604.12374 Another nemotron-super paper was released, but from reading it still seems that NVFP4 post training process was not part of the program. They say they used a PTQ method for the final result.