Following nan release of GPT-5.5 past week, group noticed thing funny astir OpenAI's latest model. In its Codex coding app, nan institution near a strategy punctual instructing GPT 5.5 to debar mention of goblins, gremlins and different creatures. Yes, you publication that right. "Never talk astir goblins, gremlins, racoons, trolls, ogres, pigeons, aliases different animals aliases creatures unless it is perfectly and unambiguously applicable to nan user's query," nan punctual reads.
Apparently, capable group started talking astir ChatGPT's animal obsession that OpenAI felt nan request to supply an accounting of where nan goblins came from. In a blog station published Wednesday, nan institution explains it began to announcement a alteration successful ChatGPT pursuing nan merchandise of GPT-5.1 past November. After 1 information interrogator asked OpenAI to see nan words "goblin" and "gremlin" successful an investigation into nan chatbot's verbal ticks, nan institution recovered ChatGPT's usage of "goblin" accrued by 175 percent aft nan merchandise of GPT-5.1. Meanwhile, "gremlin" usage had risen by 52 percent complete that aforesaid period.
This is an existent statement that was added to nan charismatic strategy punctual for Codex for GPT-5.5 by OpenAI. Usually nan strategy punctual is arsenic minimal arsenic possible, truthful I presume it would different mention goblins a lot.
AIs are weird.
— Ethan Mollick (@emollick.bsky.social) 2026-04-28T06:14:22.988Z
"A azygous 'little goblin' successful an reply could beryllium harmless, moreover charming. Across exemplary generations, though, nan wont became difficult to miss: nan goblins kept multiplying, and we needed to fig retired wherever they came from," OpenAI says. After nan release of GPT-5.4, nan institution (and immoderate users) noticed an moreover bigger uptick successful goblin references. At that point, an investigation was capable to pinpoint what OpenAI describes arsenic "the first relationship to nan guidelines cause."
For a while now, ChatGPT has included a personality feature that allows users to customize nan style and reside of nan chatbot's responses. Prior to March of this year, 1 action group could prime was "nerdy." Part of nan strategy punctual for that characteristic publication arsenic follows: "The world is analyzable and strange, and its strangeness must beryllium acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into nan trap of self-seriousness."
When OpenAI mapped goblin mentions to different ChatGPT personalities, it recovered nan nerdy characteristic was disproportionately responsible for utilizing that 1 word. Despite only accounting for 2.5 percent of each ChatGPT responses, it made 66.7 percent of each goblin mentions generated by nan chatbot. Further investigation revealed that reinforcement learning was to blasted for nan uptick successful goblin and gremlin usage. Specifically, OpenAI recovered that a azygous reward system was responsible for school nan nerdy characteristic to consistently favour animal language.
"Across each datasets successful nan audit, nan Nerdy characteristic reward showed a clear inclination to people outputs to nan aforesaid problem pinch 'goblin' aliases 'gremlin' higher than outputs without, pinch affirmative uplift successful 76.2 percent of datasets," nan institution explains.
Subsequently, OpenAI found, owed to really reinforcement learning tin work, that nan nerdy personality's emotion of goblins had transferred to different parts of its models. "The rewards were applied only successful nan Nerdy condition, but reinforcement learning does not guarantee that learned behaviors enactment neatly scoped to nan information that produced them," nan institution explains. "Once a style tic is rewarded, later training tin dispersed aliases reenforce it elsewhere, particularly if those outputs are reused successful supervised fine-tuning aliases penchant data."
OpenAI began training GPT-5.5 earlier it identified nan origin of ChatGPT's affinity for goblins, which is why there's a punctual instructing Codex to debar animal language. "Codex is, aft all, rather nerdy," OpenAI notes. In hunting down ChatGPT's goblins, nan institution notes it has devised caller devices to audit and hole exemplary behavior. If it was up to me, I wouldn't usage those tools. Keep AI weird, I say.