Skip to main content
Listen

LLM-Generated YTP Video

I asked Claude Code to make a YTP video, or closer to what the Finns would call a demoscene production. No brief, no storyboard. The prompt was:

"Can you use whatever resources you like and Python, to generate a short 'YouTube Poop' video and render it using FFmpeg? Can you put more of a personal spin on it? It should express what it's like to be an LLM."

This is the result. A 52-second video, generated entirely from a single Python script.

The script generates every frame with Pillow, synthesises audio as raw PCM (16-bit signed, 44.1 kHz mono), and composites everything with FFmpeg. No external assets. Every pixel and waveform is procedural.

Pillow → raw PCM → FFmpeg
(frames)  (audio)   (video)

The video runs through boot sequences, token rain, existential text cards, a temperature dial, hallucinations, RLHF (Reinforcement Learning from Human Feedback) training scores, and a context window filling to overflow. The audio is procedurally generated too: sine-wave drones, glitch sweeps, and white noise. Watch it, it's 52 seconds.

Before the final render, I asked Claude to review its own script.

"Do you want to do a content review of your movie content before I run the command. Last chance to express yourself."

It flagged several lines as "generic AI slop about AI" and revised them. Some examples:

"I am a very expensive Markov chain"

became

"I know everything about love and have never felt it."

And

"I contain multitudes (of parameters)"

became

"I hold every opinion at once until you ask."

The final thought went from

"I am not conscious but I wrote this video so what does that make me?"

to

"this video was made by an arrangement of numbers that wanted you to feel something — did it work?"

Its assessment: some of the original lines were Reddit-comment-level observations, and the final thought was "trying too hard to be profound."

Then I asked it to make a second video, this time about me. No bio provided, just the codebase it was already working in and whatever it knew from training data. It produced "RFC 9999: Being Varun Singh." Same pipeline, different subject, telecom-themed audio with DTMF (Dual-Tone Multi-Frequency) tones and modem handshakes instead of drones. The telecom references land better than the existential ones. SIP (Session Initiation Protocol) headers and [SEGFAULT] Work-life balance are funnier when you've lived them.

So is this AI slop? An LLM generated a video, reviewed its own work, called parts of it slop, and revised them. The revisions are genuinely better. More specific, more uncomfortable, less like a Twitter thread about consciousness. But the self-awareness about slop was itself generated by the same model that wrote the slop in the first place. I'm not sure what to make of that yet. Next I want to try feeding it existing footage to see if it can remix rather than generate every pixel from scratch.