▶ Hacker Dojo · Python Meetup · Apr 2026

BE KIND
REWIND.

I hit YouTube's 5,000-video limit with no way out.
So I automated one — and then built a whole library.

MICAH ALPERN · github.com/malpern

The confession

I have a problem.

5,847

videos in Watch Later

400+

creators followed

videos actually watched

The list was a scroll I could never finish. A graveyard of good intentions.
And then one day — YouTube stopped letting me add more.

The wall

Watch Later has a hard limit.

Watch Later capacity

5,000 / 5,000

New saves silently fail — no error, no warning
No export button. No CSV. No JSON. Nothing.
No bulk delete. Remove videos one at a time, forever.
YouTube Data API: Watch Later is a private playlist with no write access
YouTube support: "we have no plans to address this"

▶ ACT · 0

The
jailbreak.

There's no API. So we made our own — by driving a browser.

How YouTube Watch Later actually works

It's just a private playlist.

Watch Later = playlist ID WL on your account
Official API blocks write access to WL — by design
Only reliable path: a browser that knows your cookies
Chrome DevTools Protocol lets you drive Chrome programmatically
Same protocol used by Playwright, Puppeteer, DevTools

yt-mover

Four phases. Fully automated.

Scan — inventory Watch Later: count videos, flag unavailable ones
Copy — batch 50 videos at a time to a target playlist
Verify — confirm each video actually landed
Delete — remove confirmed videos from Watch Later, free the slots

Chrome streams structured JSON back to the app — one event per video, per phase.
Interrupted runs resume exactly where they left off.

~4k

videos moved

videos per chunk

API calls used

OK.
Now what?

We had 5,000 unsorted videos.
No categories. No labels.
No idea what was in there.

▶ ACT · I

Building
the library.

iPhoto for YouTube. Local-first. SQLite. And a surprisingly capable AI librarian.

be-kind-rewind · system architecture

A local-first video library.

Three questions that drove the whole build

What's in the pile?

What should I watch next?

What am I missing?

First → LLM categorization. Second → watch queue. Third → Python scraping.

▶ ACT · II

Teaching
a model
to sort.

Claude Haiku classified 5,000 videos for fifteen cents.

Step 1 of 3 — topic discovery

Don't label.
Discover.

Instead of defining categories upfront, let Claude look at the collection and invent them.

Input: channel names + sample titles (2+ videos per channel only)
Prompt: "Suggest exactly N topic categories for this collection"
Output: ~20 topic names as JSON
Categories reflect your taste, not a generic taxonomy

topics = [
  "AI & Machine Learning",
  "Mechanical Keyboards",
  "Embedded Systems",
  "Python & Dev Tools",
  "Indie Hacking",
  "macOS Development",
  "Security Research",
  "Hardware & Making",
  "Tech Podcasts",
  "Career & Productivity",
  # ... ~10 more
]

Step 2 of 3 — classification at scale

200 videos per prompt. Batched.

Fixed topic list + 200 video titles per call
Returns JSON: video_index → topic_number
Partial batch failure? Skip it, continue.
5,000 videos = 25 calls, ~3 minutes

$0.15

total cost · 5k videos

API calls

BATCH_SIZE = 200

for i in range(0, len(videos), BATCH_SIZE):
    batch = videos[i : i + BATCH_SIZE]

    prompt = f"""Topics:
{topics_as_numbered_list}

Assign each video to exactly one topic
by number. Return JSON.

Videos:
{format_batch(batch)}"""

    result = claude_haiku(prompt)
    apply_assignments(batch, result)

Step 3 of 3 — refinement with Sonnet

The model does
the surgery too.

"Python & Dev Tools"
400 videos — too broad

→

Claude Sonnet

→

Python Basics
Dev Tooling
Testing & CI
Packaging

Split — break a broad topic into 3–8 subtopics automatically
Rename — suggest better names based on sample video titles
Reclassify — reassign a whole subtree with refined rules

Haiku classifies fast and cheap. Sonnet handles nuance.

▶ ACT · III

Scraping
without
a quota.

YouTube gives you 10,000 API units/day. A channel fetch costs 100. Do the math.

YouTube Data API quota

400 creators.
100 API calls.

daily budget

10,000 units

per channel fetch

100 units

channels per day

100 max

We follow 400+ creators. Polling all of them daily exhausts the quota before covering a quarter. Discovery had to work without API calls.

Discovery architecture · the fallback chain

Four layers. Zero quota.

The decision

😒

YouTube Data API
100 units/channel · quota gone by 10am

😌

scrapetube + RSS feeds
no API key · no quota · no problem

youtube_channel_fallback.py

Two sources.
One merge.

scrapetube

YouTube's internal API
great coverage

RSS feed

/feeds/videos.xml
catches Shorts + recents

def fetch_channel_videos(channel_id):
    scrape_vids = list(scrapetube.get_channel(channel_id))
    rss_vids    = fetch_rss(channel_id)

    seen = {}
    for v in rss_vids:                        # RSS wins on overlap
        seen[v["videoId"]] = {**v, "source": "rss"}
    for v in scrape_vids:                     # scrapetube fills gaps
        seen.setdefault(v["videoId"], {**v, "source": "scrape"})

    return list(seen.values())

youtube_channel_about.py · 416 lines of character

Creator links.
A nightmare.

YouTube's /about page: lazy loaded, A/B tested, layout changes constantly
Links wrapped in YouTube redirect URLs: /redirect?q=bit.ly/xxx
Shorteners stacked on shorteners — 3 hops to the real URL
50+ domain patterns → platform detection (GitHub, Twitter, Patreon…)

Solution: dual extraction paths + regex chains + HTTP HEAD to expand URLs.

Core design rule:
return empty, not error.

def get_channel_links(channel_id):
    try:
        # Path A: og:description meta tag
        # stable across layout changes
        links = extract_from_og_desc(channel_id)
        if links:
            return links

        # Path B: ytInitialData JSON blob
        # future-proofing
        links = extract_from_initial_data(
            channel_id
        )
        return links or []

    except Exception:
        return []  # never crash the caller

▶ ACT · IV

Python
inside a
Mac app.

A venv bundled in the app, called from Swift via subprocess. With one nasty gotcha.

The deployment model

A venv, bundled.

.runtime/discovery-venv/ — managed Python env
Built by build-app.sh at build time
requirements-discovery.txt installs scrapetube
Swift calls Python via Process()
Input: channel IDs via args. Output: JSON on stdout
Clean boundary — just pipes, no shared state

let process = Process()
process.executableURL = venvPython
process.arguments = [
    scriptPath,
    "--channel-id", channelId
]

let stdout = Pipe()
let stderr = Pipe()
process.standardOutput = stdout
process.standardError  = stderr

try process.run()

// ⚠️  DO NOT waitUntilExit() yet
//     see next slide

The bug you will hit

The pipe buffer deadlock.

Script outputs >64KB. You wait for exit before reading. Process waits for you to read before it can exit. Neither side moves.

❌ naive — hangs at 64KB

process.run()
process.waitUntilExit() // 💀

let data = stdout
  .fileHandleForReading
  .readDataToEndOfFile()

✓ concurrent drain

process.run()

// drain WHILE process runs
async let out = Task.detached {
  stdout.fileHandleForReading
        .readDataToEndOfFile()
}
async let err = Task.detached {
  stderr.fileHandleForReading
        .readDataToEndOfFile()
}
process.waitUntilExit()
let result = await out

Being a good citizen

Don't get rate-limited.

2s minimum between any two requests
Random jitter — not perfectly spaced (more human)
10-min per-channel cooldown after any failure
5-min global cooldown after sustained failures

YouTube doesn't hard-block —
it slows down and returns garbage.
The symptoms are subtle. The limiter keeps you in the safe zone.

import time, random

class ScrapeRateLimiter:
    MIN_INTERVAL = 2.0   # seconds
    JITTER       = 1.5
    CHANNEL_COOL = 600   # 10 min
    GLOBAL_COOL  = 300   # 5 min

    def wait(self):
        elapsed = time.time() - self.last_req
        gap = (self.MIN_INTERVAL
               + random.uniform(0, self.JITTER))
        if elapsed < gap:
            time.sleep(gap - elapsed)
        self.last_req = time.time()

    def record_failure(self, channel_id):
        self.cooldowns[channel_id] = (
            time.time() + self.CHANNEL_COOL
        )

Personal tools
are where the
real engineering
lives.

No PM. No spec. Just a real problem
you actually care about solving.

What it's made of

The stack.

PLATFORM: macOS · SwiftUI native
DATABASE: SQLite · local-first
LLM (classify): Claude Haiku · fast + cheap
LLM (refine): Claude Sonnet · nuanced ops
CLASSIFY COST: $0.15 for 5,000 videos
TESTS: 147 across 25 suites

SCRAPING: scrapetube + urllib
FALLBACK: YouTube RSS feeds
SYNC: YouTube API + CDP
SUBPROCESS: Swift Process() → JSON
PYTHON ENV: .runtime/discovery-venv
SCRIPTS: 4 Python scripts · ~750 lines

Things worth stealing

Three takeaways.

1

Scrape first, API second. For read-heavy discovery, scrapetube + RSS is faster, cheaper, and more reliable than the official YouTube API.
2

Batch your LLM calls. 200 items per prompt vs. 1 per prompt is the difference between $0.15 and $150. Design for batching from day one.
3

When there's no API, drive the browser. CDP is stable, powerful, and indistinguishable from a real user. It's not a hack — it's the same protocol DevTools uses.

▶ Demo + Q&A

Let's
watch
something.

Questions welcome — especially about the scraping.

MICAH ALPERN · HACKER DOJO · APR 2026

BE KINDREWIND.

I have a problem.

Watch Later has a hard limit.

Thejailbreak.

It's just a private playlist.

Four phases. Fully automated.

OK.Now what?

Buildingthe library.