The Cat Is Under Mayonnaise A tiny transparent layer that changes what a language model believes — without retraining it. I added a single layer to a frozen GPT-2.
model
gpt2
huggingface.co/openai-community/gpt2 ↗
13784271 downloads3209 likestext-generationtransformers
from the model card
GPT-2 Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in this paper and first released at this page. Disclaimer: The team releasing GPT-2 also wrote a model card for their model. Content from this model card has been written by the Hugging Face team to complete the information they provided and give specific examples of bias. Model description GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences. More precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence, shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the predictions for the token i only uses the inputs from 1 to i but not the future tokens. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was p…
discussions
recent items
Show HN: The Cat Is Under Mayonnaise – Modifying LLM Behavior Without Retraining (github.com via hn) I built a 5M model to see if it outperforms my 350M model... (www.reddit.com) Hi r/LocalLLaMA ! I built a 5M Llama model with HF Transformers on 2x T4 in Kaggle to see, if it is able to be as good as my previous Apex 350M model (https://huggingface.co/LH-Tech-AI/Apex-1.6-Instruct-350M).
Claude Mythos: The System Card (thezvi.substack.com via hn) Claude Mythos: The System Card Claude Mythos is different. This is the first model other than GPT-2 that is at first not being released for public use at all.
Trained a 125M LM from scratch instead of fine-tuning GPT-2 — releasing weights + SFT framework for others to build on (www.reddit.com) Trained a 125M LM from scratch (custom tokenizer) + released instruct checkpoint and SFT framework so others can fine-tune their own variants I’ve been experimenting with training small language models fully from scratch (no GPT-2 init, no…
Space: a quiet canvas with support of Nano Banana and gpt image 2 (www.reddit.com) Hi! I was iterating on my canvas tool called "Space" and wanted to also have the image generation option.
LLM from scratch, part 32k – Interventions: gradient accumulation (www.gilesthomas.com via hn) Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation I've been working on a GPT-2-small-style LLM based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)"…