▶ Hacker Dojo · Python Meetup · Apr 2026

BE KIND
REWIND.

I hit YouTube's 5,000-video limit with no way out.
So I automated one — and then built a whole library.

The confession

I have a problem.

5,847
videos in Watch Later
400+
creators followed
0
videos actually watched

The list was a scroll I could never finish. A graveyard of good intentions.
And then one day — YouTube stopped letting me add more.

The wall

Watch Later has a hard limit.

Watch Later capacity
5,000 / 5,000
  • New saves silently fail — no error, no warning
  • No export button. No CSV. No JSON. Nothing.
  • No bulk delete. Remove videos one at a time, forever.
  • YouTube Data API: Watch Later is a private playlist with no write access
  • YouTube support: "we have no plans to address this"
▶ ACT · 0

The
jailbreak.

There's no API. So we made our own — by driving a browser.

How YouTube Watch Later actually works

It's just a private playlist.

  • Watch Later = playlist ID WL on your account
  • Official API blocks write access to WL — by design
  • Only reliable path: a browser that knows your cookies
  • Chrome DevTools Protocol lets you drive Chrome programmatically
  • Same protocol used by Playwright, Puppeteer, DevTools
Chrome + CDP remote debugging port yt-cli TypeScript · JSON events macOS App progress UI · playlist picker
yt-mover

Four phases. Fully automated.

  • Scan — inventory Watch Later: count videos, flag unavailable ones
  • Copy — batch 50 videos at a time to a target playlist
  • Verify — confirm each video actually landed
  • Delete — remove confirmed videos from Watch Later, free the slots

Chrome streams structured JSON back to the app — one event per video, per phase.
Interrupted runs resume exactly where they left off.

~4k
videos moved
50
videos per chunk
0
API calls used

OK.
Now what?

We had 5,000 unsorted videos.
No categories. No labels.
No idea what was in there.

▶ ACT · I

Building
the library.

iPhoto for YouTube. Local-first. SQLite. And a surprisingly capable AI librarian.

be-kind-rewind · system architecture

A local-first video library.

macOS App SwiftUI · AppKit grid SQLite local · 4,900+ videos Claude API Haiku · Sonnet classify topics Python Scripts channel_fallback.py search_fallback.py channel_about.py channel_icons.py discovery-venv scrapetube · urllib stdout → JSON Process() JSON pipe YouTube API · RSS · CDP sync scrape scrapetube · RSS · no quota
Three questions that drove the whole build

What's in the pile?

What should I watch next?

What am I missing?

First → LLM categorization.   Second → watch queue.   Third → Python scraping.

▶ ACT · II

Teaching
a model
to sort.

Claude Haiku classified 5,000 videos for fifteen cents.

Step 1 of 3 — topic discovery

Don't label.
Discover.

Instead of defining categories upfront, let Claude look at the collection and invent them.

  • Input: channel names + sample titles (2+ videos per channel only)
  • Prompt: "Suggest exactly N topic categories for this collection"
  • Output: ~20 topic names as JSON
  • Categories reflect your taste, not a generic taxonomy
topics = [
  "AI & Machine Learning",
  "Mechanical Keyboards",
  "Embedded Systems",
  "Python & Dev Tools",
  "Indie Hacking",
  "macOS Development",
  "Security Research",
  "Hardware & Making",
  "Tech Podcasts",
  "Career & Productivity",
  # ... ~10 more
]
Step 2 of 3 — classification at scale

200 videos per prompt. Batched.

  • Fixed topic list + 200 video titles per call
  • Returns JSON: video_index → topic_number
  • Partial batch failure? Skip it, continue.
  • 5,000 videos = 25 calls, ~3 minutes
$0.15
total cost · 5k videos
25
API calls
BATCH_SIZE = 200

for i in range(0, len(videos), BATCH_SIZE):
    batch = videos[i : i + BATCH_SIZE]

    prompt = f"""Topics:
{topics_as_numbered_list}

Assign each video to exactly one topic
by number. Return JSON.

Videos:
{format_batch(batch)}"""

    result = claude_haiku(prompt)
    apply_assignments(batch, result)
Step 3 of 3 — refinement with Sonnet

The model does
the surgery too.

"Python & Dev Tools"
400 videos — too broad
Claude Sonnet
Python Basics
Dev Tooling
Testing & CI
Packaging
  • Split — break a broad topic into 3–8 subtopics automatically
  • Rename — suggest better names based on sample video titles
  • Reclassify — reassign a whole subtree with refined rules

Haiku classifies fast and cheap. Sonnet handles nuance.

▶ ACT · III

Scraping
without
a quota.

YouTube gives you 10,000 API units/day. A channel fetch costs 100. Do the math.

YouTube Data API quota

400 creators.
100 API calls.

daily budget
10,000 units
per channel fetch
100 units
channels per day
100 max

We follow 400+ creators. Polling all of them daily exhausts the quota before covering a quarter. Discovery had to work without API calls.

Discovery architecture · the fallback chain

Four layers. Zero quota.

PRIMARY FALLBACK MERGE scrapetube internal YT search API no API key · no quota RSS Feed /feeds/videos.xml catches Shorts + recents YouTube API last resort 100 units/channel · quota merge + deduplicate by videoId · RSS wins on overlap source attribution preserved SQLite · channel_discovery_archive cooldown tracking · per-channel timestamps · quota ledger
The decision
😒
YouTube Data API
100 units/channel · quota gone by 10am
😌
scrapetube + RSS feeds
no API key · no quota · no problem
youtube_channel_fallback.py

Two sources.
One merge.

scrapetube
YouTube's internal API
great coverage
+
RSS feed
/feeds/videos.xml
catches Shorts + recents
def fetch_channel_videos(channel_id):
    scrape_vids = list(scrapetube.get_channel(channel_id))
    rss_vids    = fetch_rss(channel_id)

    seen = {}
    for v in rss_vids:                        # RSS wins on overlap
        seen[v["videoId"]] = {**v, "source": "rss"}
    for v in scrape_vids:                     # scrapetube fills gaps
        seen.setdefault(v["videoId"], {**v, "source": "scrape"})

    return list(seen.values())
youtube_channel_about.py · 416 lines of character

Creator links.
A nightmare.

  • YouTube's /about page: lazy loaded, A/B tested, layout changes constantly
  • Links wrapped in YouTube redirect URLs: /redirect?q=bit.ly/xxx
  • Shorteners stacked on shorteners — 3 hops to the real URL
  • 50+ domain patterns → platform detection (GitHub, Twitter, Patreon…)

Solution: dual extraction paths + regex chains + HTTP HEAD to expand URLs.

Core design rule:
return empty, not error.

def get_channel_links(channel_id):
    try:
        # Path A: og:description meta tag
        # stable across layout changes
        links = extract_from_og_desc(channel_id)
        if links:
            return links

        # Path B: ytInitialData JSON blob
        # future-proofing
        links = extract_from_initial_data(
            channel_id
        )
        return links or []

    except Exception:
        return []  # never crash the caller
▶ ACT · IV

Python
inside a
Mac app.

A venv bundled in the app, called from Swift via subprocess. With one nasty gotcha.

The deployment model

A venv, bundled.

  • .runtime/discovery-venv/ — managed Python env
  • Built by build-app.sh at build time
  • requirements-discovery.txt installs scrapetube
  • Swift calls Python via Process()
  • Input: channel IDs via args. Output: JSON on stdout
  • Clean boundary — just pipes, no shared state
let process = Process()
process.executableURL = venvPython
process.arguments = [
    scriptPath,
    "--channel-id", channelId
]

let stdout = Pipe()
let stderr = Pipe()
process.standardOutput = stdout
process.standardError  = stderr

try process.run()

// ⚠️  DO NOT waitUntilExit() yet
//     see next slide
The bug you will hit

The pipe buffer deadlock.

Script outputs >64KB. You wait for exit before reading. Process waits for you to read before it can exit. Neither side moves.

❌ naive — hangs at 64KB
process.run()
process.waitUntilExit() // 💀

let data = stdout
  .fileHandleForReading
  .readDataToEndOfFile()
✓ concurrent drain
process.run()

// drain WHILE process runs
async let out = Task.detached {
  stdout.fileHandleForReading
        .readDataToEndOfFile()
}
async let err = Task.detached {
  stderr.fileHandleForReading
        .readDataToEndOfFile()
}
process.waitUntilExit()
let result = await out
Being a good citizen

Don't get rate-limited.

  • 2s minimum between any two requests
  • Random jitter — not perfectly spaced (more human)
  • 10-min per-channel cooldown after any failure
  • 5-min global cooldown after sustained failures

YouTube doesn't hard-block —
it slows down and returns garbage.
The symptoms are subtle. The limiter keeps you in the safe zone.

import time, random

class ScrapeRateLimiter:
    MIN_INTERVAL = 2.0   # seconds
    JITTER       = 1.5
    CHANNEL_COOL = 600   # 10 min
    GLOBAL_COOL  = 300   # 5 min

    def wait(self):
        elapsed = time.time() - self.last_req
        gap = (self.MIN_INTERVAL
               + random.uniform(0, self.JITTER))
        if elapsed < gap:
            time.sleep(gap - elapsed)
        self.last_req = time.time()

    def record_failure(self, channel_id):
        self.cooldowns[channel_id] = (
            time.time() + self.CHANNEL_COOL
        )

Personal tools
are where the
real engineering
lives.

No PM. No spec. Just a real problem
you actually care about solving.

What it's made of

The stack.

PLATFORM
macOS · SwiftUI native
DATABASE
SQLite · local-first
LLM (classify)
Claude Haiku · fast + cheap
LLM (refine)
Claude Sonnet · nuanced ops
CLASSIFY COST
$0.15 for 5,000 videos
TESTS
147 across 25 suites
SCRAPING
scrapetube + urllib
FALLBACK
YouTube RSS feeds
SYNC
YouTube API + CDP
SUBPROCESS
Swift Process() → JSON
PYTHON ENV
.runtime/discovery-venv
SCRIPTS
4 Python scripts · ~750 lines
Things worth stealing

Three takeaways.

  • 1
    Scrape first, API second. For read-heavy discovery, scrapetube + RSS is faster, cheaper, and more reliable than the official YouTube API.
  • 2
    Batch your LLM calls. 200 items per prompt vs. 1 per prompt is the difference between $0.15 and $150. Design for batching from day one.
  • 3
    When there's no API, drive the browser. CDP is stable, powerful, and indistinguishable from a real user. It's not a hack — it's the same protocol DevTools uses.
▶ Demo + Q&A

Let's
watch
something.

Questions welcome — especially about the scraping.