DeepFocus YT — Pipeline Walkthrough

Overview

DeepFocus YT Video Pipeline

Automated pipeline for producing 2-4 hour ambient "window view" YouTube videos. Drop clips in a folder, run one command, get a complete video with looped scenery, mixed audio, color grading, thumbnail, and metadata.

What It Produces

Long-form ambient videos for focus, relaxation, and study. Each video features a seamless looping window view (beach, mountain, city, rainforest) with layered ambient audio and gentle color grading.

Target: 2-4 hours per video, upload-ready for YouTube.

Tech Stack

Python — orchestration + scene config
FFmpeg — video processing, looping, grading
Freesound API — ambient audio sourcing
YouTube Data API v3 — metadata + upload

9 Scene Configurations

beach_sunrise beach_sunset mountain_cabin mountain_snow alpine_meadow tokyo_rain london_rain nyc_skyline rainforest_morning

💡

Clip Intake Model: No AI-generated clips needed. You generate or source 6-10 video clips per scene externally, drop them into .tmp/clips/{scene_id}/, and the pipeline handles the rest.

Architecture

Pipeline Architecture

End-to-end flow from raw clips to YouTube-ready video. Each stage is a discrete, testable step.

Full Pipeline Flow

Clip Intake6-10 clips per scene

→

Scene ConfigLoad params

→

Audio FetchFreesound API

→

Loop BuildFFmpeg xfade

→

Color GradeLUT + curves

Audio MixLayer + normalize

→

Final RenderH.264 encode

→

ThumbnailAuto-generate

→

UploadYouTube API v3

Key Files

File Structure
`pipeline.py`	Main orchestrator — runs all stages
`scene_config.py`	Scene definitions and parameters
`audio_mixer.py`	Freesound fetch + layered mixing
`loop_builder.py`	Seamless loop via FFmpeg xfade
`color_grader.py`	Per-scene LUT and curve adjustments
`thumbnail_gen.py`	Auto thumbnail from best frame
`uploader.py`	YouTube Data API v3 upload

Directory Layout

# Working directories
.tmp/
  clips/
    beach_sunrise/    # 6-10 source clips
    mountain_cabin/
    tokyo_rain/
    ...
  audio/              # Downloaded ambient
  renders/            # Final outputs
  thumbnails/         # Generated thumbs

scenes/               # Scene config JSONs
luts/                 # Color grading LUTs
          

⚠️

FFmpeg required: The pipeline depends on FFmpeg with libx264 and libfdk_aac. Ensure ffmpeg and ffprobe are on your PATH before running.

Production

Scene Configurations

Each scene defines the visual style, audio profile, color grading, and loop parameters. 9 configs ship by default.

Scene Configuration Table

All 9 Scene Configs
beach_sunrise	Golden hour beach, gentle waves	warm	Waves, seagulls, breeze	3-4 hrs
beach_sunset	Dusk beach, deep orange tones	sunset	Waves, crickets, distant chatter	2-3 hrs
mountain_cabin	Cozy cabin view of mountains	natural	Fire crackle, wind, birds	3-4 hrs
mountain_snow	Snowy peak, cold blue tones	cold	Wind, snow crunch, silence	2-3 hrs
alpine_meadow	Green meadow, wildflowers	vibrant	Birds, stream, rustling grass	3-4 hrs
tokyo_rain	Neon-lit Tokyo street, rain	cyber	Rain, traffic, city hum	2-3 hrs
london_rain	Victorian window, grey drizzle	muted	Rain on glass, distant traffic	2-3 hrs
nyc_skyline	Penthouse view, city lights	cool	City ambient, distant sirens	3-4 hrs
rainforest_morning	Dense canopy, morning mist	lush	Exotic birds, water drips, insects	3-4 hrs

Config Structure

{
  "scene_id": "beach_sunrise",
  "display_name": "Beach Sunrise",
  "target_duration_hrs": 3,
  "color_grade": {
    "lut": "warm_golden.cube",
    "saturation": 1.15,
    "contrast": 1.05
  },
  "audio_tags": ["ocean-waves", "seagulls", "beach-wind"],
  "xfade_duration": 2.0,
  "resolution": "3840x2160"
}
        

💡

Adding a new scene: Create a JSON config in scenes/, create a matching folder in .tmp/clips/{new_scene_id}/, and add 6-10 clips. The pipeline auto-discovers new scenes.

Input

Clip Intake Model

You generate or source clips externally, then drop them into the intake folder. The pipeline handles normalization, sequencing, and looping from there.

Intake Flow

Source Clips6-10 per scene

→

Drop into folder.tmp/clips/{scene_id}/

→

Auto-normalizeResolution, FPS, codec

→

Sequence & LoopFrame-aligned xfade

Clip Requirements

Specification
Count per scene	6-10 clips (more variety = better loops)
Duration each	15-60 seconds (pipeline will loop to fill target duration)
Resolution	4K (3840x2160) preferred. 1080p minimum. Pipeline upscales if needed.
Framerate	24, 30, or 60 FPS. Pipeline normalizes to scene config FPS.
Format	MP4 (H.264) or MOV. No variable framerate.
Content	Static or slow-pan window views. No abrupt movements or scene changes.
Audio	Can have audio (stripped during processing) or be silent.

🚨

No variable framerate clips. Screen recordings and phone videos often have VFR. Run ffprobe -v error -select_streams v:0 -show_entries stream=r_frame_rate,avg_frame_rate clip.mp4 to check. If r_frame_rate and avg_frame_rate differ, re-encode first.

Folder Structure Example

.tmp/clips/
  beach_sunrise/
    clip_01_waves_wide.mp4
    clip_02_waves_close.mp4
    clip_03_horizon_pan.mp4
    clip_04_palm_silhouette.mp4
    clip_05_foam_detail.mp4
    clip_06_golden_light.mp4
    clip_07_birds_passing.mp4
    clip_08_tide_receding.mp4
        

✅

Naming convention: Files are processed in alphabetical order. Use numbered prefixes (clip_01_, clip_02_) to control sequence order. Descriptive suffixes are optional but helpful.

Audio

Audio Mixing

Ambient audio is sourced from Freesound API based on scene tags, then layered, looped, and normalized to match the video duration.

Audio Pipeline

Scene Tagse.g. "ocean-waves"

→

Freesound SearchAPI query + filter

→

DownloadWAV/FLAC preferred

→

Layer & Mix3-5 tracks blended

→

NormalizeLUFS target: -16

Layering Strategy

Audio Layers
Base layer	Continuous ambient (waves, rain, wind)	0 dB
Texture layer	Secondary ambient (rustling, distant hum)	-6 dB
Detail layer	Intermittent sounds (birds, drips)	-12 dB
Atmosphere	Very low presence (room tone, sub-bass)	-18 dB

Freesound API

API Key: stored in .env as FREESOUND_API_KEY
Rate limit: 2000 requests/day (free tier)
Preferred formats: WAV > FLAC > OGG
Min duration: 30 seconds per sample
License filter: CC0 or CC-BY only
Cache: Downloaded files cached in .tmp/audio/

⚠️

Attribution tracking: CC-BY sounds require attribution. The pipeline auto-generates an attribution list saved to .tmp/renders/{scene_id}_credits.txt. Include this in the YouTube description.

Core

Seamless Loop Technique

The secret to natural-looking long videos: frame-aligned crossfade transitions using FFmpeg's xfade filter. No visible cuts, no jarring jumps.

How xfade Looping Works

Normalize all clips Transcode to matching resolution, FPS, and pixel format. Ensures frame-level alignment.
Calculate transition points For each clip pair, compute the xfade offset: clip_duration - xfade_duration. Default xfade is 2 seconds.
Chain xfade filters FFmpeg filter_complex chains xfade between every adjacent clip. Last clip crossfades back into first for seamless loop.
Extend to target duration The looped sequence is repeated via stream_loop or concat to fill the full 2-4 hour target.
Apply color grading LUT file applied as a video filter in the same FFmpeg command. Per-scene saturation and contrast curves.

FFmpeg xfade Example

ffmpeg \
  -i clip_01.mp4 -i clip_02.mp4 -i clip_03.mp4 \
  -filter_complex "
    [0:v][1:v]xfade=transition=fade:duration=2:offset=13[v01];
    [v01][2:v]xfade=transition=fade:duration=2:offset=26[vout]
  " \
  -map "[vout]" -c:v libx264 -preset slow \
  -crf 18 loop_sequence.mp4
        

🚨

Frame alignment is critical. If clips have different FPS or pixel formats, xfade will fail or produce artifacts. The normalize step handles this, but never skip it. All clips must be identical in format before chaining.

Supported Transitions

fade — default, most natural for ambient
dissolve — softer blend
wiperight — for city scenes
smoothup — gentle vertical pan effect

Duration Guidelines

2.0s xfade — standard for nature scenes
3.0s xfade — very slow, dreamy feel
1.0s xfade — faster cuts for city scenes
Set in scene config xfade_duration

Output

Thumbnail & Metadata

Auto-generated thumbnails from the best frame, plus SEO-optimized titles, descriptions, and tags for YouTube.

Thumbnail Generation

Frame extraction FFmpeg extracts 1 frame per second from the loop sequence.
Quality scoring Each frame scored on: sharpness (Laplacian variance), color vibrancy (saturation histogram), composition (rule of thirds).
Overlay text Scene title and "DeepFocus" branding overlaid using Pillow/PIL. Font, position, and shadow configurable per scene.
Export at 1280x720 YouTube thumbnail spec. Saved to .tmp/thumbnails/{scene_id}.jpg

YouTube Metadata Template

Metadata Fields
Title	`{Scene Name} Window View \| {Duration}hrs Ambient for Study & Focus`
Description	Scene description + audio credits + timestamps + channel links
Tags	Scene-specific + generic ambient tags (max 500 chars total)
Category	Music (ID: 10) for ambient content
Privacy	Unlisted by default, manually publish after review
Playlist	Auto-added to scene-type playlist (Nature, City, etc.)

💡

YouTube Data API v3: Upload quota is 1,600 units/day. Each upload costs 1,600 units = 1 video per day max. Plan batch uploads accordingly. API key in .env as YOUTUBE_API_KEY.

Quick Start

Quick Start Guide

Everything you need to go from zero to a rendered video.

Prerequisites

Python 3.10+
FFmpeg with libx264 (on PATH)
Freesound API key (apply here)
YouTube Data API v3 credentials (for upload step only)

Step-by-Step

Install dependencies pip install -r requirements.txt
Configure .env Set FREESOUND_API_KEY and optionally YOUTUBE_API_KEY
Choose a scene Pick from the 9 scene configs (e.g., beach_sunrise)
Drop clips into intake folder Place 6-10 clips in .tmp/clips/beach_sunrise/
Run the pipeline python pipeline.py --scene beach_sunrise
Review output Final video in .tmp/renders/, thumbnail in .tmp/thumbnails/
Upload (optional) python pipeline.py --scene beach_sunrise --upload

Common Commands

# Full pipeline for one scene
python pipeline.py --scene beach_sunrise

# Skip audio fetch (use cached)
python pipeline.py --scene beach_sunrise --skip-audio

# Render only (no upload)
python pipeline.py --scene tokyo_rain --render-only

# Generate thumbnail only
python thumbnail_gen.py --scene mountain_cabin

# List all available scenes
python pipeline.py --list-scenes
        

DO

Use 4K clips for best quality
Provide at least 6 clips per scene
Check xfade duration matches scene mood
Review the loop sequence before full render
Include Freesound attribution in descriptions
Cache audio downloads for reuse

DON'T

Don't use variable framerate clips
Don't mix landscape and portrait clips
Don't use clips with abrupt movement
Don't skip the normalize step
Don't upload more than 1 video/day (API quota)
Don't delete .tmp/audio/ cache unnecessarily

🎉

Walkthrough complete! You now understand the full DeepFocus YT pipeline — from clip intake through audio mixing, seamless looping, and YouTube upload. Bookmark this page for reference.