model

gpt2

13784271 downloads·3209 likes·text-generation·transformers

from the model card

GPT-2 Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in this paper and first released at this page. Disclaimer: The team releasing GPT-2 also wrote a model card for their model. Content from this model card has been written by the Hugging Face team to complete the information they provided and give specific examples of bias. Model description GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences. More precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence, shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the predictions for the token i only uses the inputs from 1 to i but not the future tokens. This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was p…

discussions

GPT 2 4 2026-05-04 – 2026-05-08
GPT 2 3 2026-04-15 – 2026-04-17

recent items

Space: a quiet canvas with support of Nano Banana and gpt image 2 (www.reddit.com) +12 7w

Hi! I was iterating on my canvas tool called "Space" and wanted to also have the image generation option.

↯ GPT 2
Show HN: The Cat Is Under Mayonnaise – Modifying LLM Behavior Without Retraining (github.com via hn) +2 7w

The Cat Is Under Mayonnaise A tiny transparent layer that changes what a language model believes — without retraining it. I added a single layer to a frozen GPT-2.

↯ GPT 2
I built a 5M model to see if it outperforms my 350M model... (www.reddit.com) +89 8w

Hi r/LocalLLaMA ! I built a 5M Llama model with HF Transformers on 2x T4 in Kaggle to see, if it is able to be as good as my previous Apex 350M model (https://huggingface.co/LH-Tech-AI/Apex-1.6-Instruct-350M).

↯ GPT 2 llama
LLM from scratch, part 32k – Interventions: gradient accumulation (www.gilesthomas.com via hn) +2 10w

Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation I've been working on a GPT-2-small-style LLM based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)"…

↯ GPT 2 ↯ GPT 2 ↯ GPT 2 ↯ GPT 2 ↯ GPT 2 ↯ GPT 2 ↯ GPT 2
Trained a 125M LM from scratch instead of fine-tuning GPT-2 — releasing weights + SFT framework for others to build on (www.reddit.com) +4416 10w

Trained a 125M LM from scratch (custom tokenizer) + released instruct checkpoint and SFT framework so others can fine-tune their own variants I’ve been experimenting with training small language models fully from scratch (no GPT-2 init, no…

↯ Fine Tuning ↯ GPT 2 fine-tuning

← all models