DeepFocus YT Video Pipeline
Automated pipeline for producing 2-4 hour ambient "window view" YouTube videos. Drop clips in a folder, run one command, get a complete video with looped scenery, mixed audio, color grading, thumbnail, and metadata.
What It Produces
Long-form ambient videos for focus, relaxation, and study. Each video features a seamless looping window view (beach, mountain, city, rainforest) with layered ambient audio and gentle color grading.
Target: 2-4 hours per video, upload-ready for YouTube.
Tech Stack
- Python — orchestration + scene config
- FFmpeg — video processing, looping, grading
- Freesound API — ambient audio sourcing
- YouTube Data API v3 — metadata + upload
9 Scene Configurations
.tmp/clips/{scene_id}/, and the pipeline handles the rest.Pipeline Architecture
End-to-end flow from raw clips to YouTube-ready video. Each stage is a discrete, testable step.
Full Pipeline Flow
Key Files
| File Structure | |
|---|---|
pipeline.py | Main orchestrator — runs all stages |
scene_config.py | Scene definitions and parameters |
audio_mixer.py | Freesound fetch + layered mixing |
loop_builder.py | Seamless loop via FFmpeg xfade |
color_grader.py | Per-scene LUT and curve adjustments |
thumbnail_gen.py | Auto thumbnail from best frame |
uploader.py | YouTube Data API v3 upload |
Directory Layout
ffmpeg and ffprobe are on your PATH before running.Scene Configurations
Each scene defines the visual style, audio profile, color grading, and loop parameters. 9 configs ship by default.
Scene Configuration Table
| All 9 Scene Configs | ||||
|---|---|---|---|---|
| beach_sunrise | Golden hour beach, gentle waves | warm | Waves, seagulls, breeze | 3-4 hrs |
| beach_sunset | Dusk beach, deep orange tones | sunset | Waves, crickets, distant chatter | 2-3 hrs |
| mountain_cabin | Cozy cabin view of mountains | natural | Fire crackle, wind, birds | 3-4 hrs |
| mountain_snow | Snowy peak, cold blue tones | cold | Wind, snow crunch, silence | 2-3 hrs |
| alpine_meadow | Green meadow, wildflowers | vibrant | Birds, stream, rustling grass | 3-4 hrs |
| tokyo_rain | Neon-lit Tokyo street, rain | cyber | Rain, traffic, city hum | 2-3 hrs |
| london_rain | Victorian window, grey drizzle | muted | Rain on glass, distant traffic | 2-3 hrs |
| nyc_skyline | Penthouse view, city lights | cool | City ambient, distant sirens | 3-4 hrs |
| rainforest_morning | Dense canopy, morning mist | lush | Exotic birds, water drips, insects | 3-4 hrs |
Config Structure
scenes/, create a matching folder in .tmp/clips/{new_scene_id}/, and add 6-10 clips. The pipeline auto-discovers new scenes.Clip Intake Model
You generate or source clips externally, then drop them into the intake folder. The pipeline handles normalization, sequencing, and looping from there.
Intake Flow
Clip Requirements
| Specification | |
|---|---|
| Count per scene | 6-10 clips (more variety = better loops) |
| Duration each | 15-60 seconds (pipeline will loop to fill target duration) |
| Resolution | 4K (3840x2160) preferred. 1080p minimum. Pipeline upscales if needed. |
| Framerate | 24, 30, or 60 FPS. Pipeline normalizes to scene config FPS. |
| Format | MP4 (H.264) or MOV. No variable framerate. |
| Content | Static or slow-pan window views. No abrupt movements or scene changes. |
| Audio | Can have audio (stripped during processing) or be silent. |
ffprobe -v error -select_streams v:0 -show_entries stream=r_frame_rate,avg_frame_rate clip.mp4 to check. If r_frame_rate and avg_frame_rate differ, re-encode first.Folder Structure Example
clip_01_, clip_02_) to control sequence order. Descriptive suffixes are optional but helpful.Audio Mixing
Ambient audio is sourced from Freesound API based on scene tags, then layered, looped, and normalized to match the video duration.
Audio Pipeline
Layering Strategy
| Audio Layers | ||
|---|---|---|
| Base layer | Continuous ambient (waves, rain, wind) | 0 dB |
| Texture layer | Secondary ambient (rustling, distant hum) | -6 dB |
| Detail layer | Intermittent sounds (birds, drips) | -12 dB |
| Atmosphere | Very low presence (room tone, sub-bass) | -18 dB |
Freesound API
- API Key: stored in
.envasFREESOUND_API_KEY - Rate limit: 2000 requests/day (free tier)
- Preferred formats: WAV > FLAC > OGG
- Min duration: 30 seconds per sample
- License filter: CC0 or CC-BY only
- Cache: Downloaded files cached in
.tmp/audio/
.tmp/renders/{scene_id}_credits.txt. Include this in the YouTube description.Seamless Loop Technique
The secret to natural-looking long videos: frame-aligned crossfade transitions using FFmpeg's xfade filter. No visible cuts, no jarring jumps.
How xfade Looping Works
-
Normalize all clips Transcode to matching resolution, FPS, and pixel format. Ensures frame-level alignment.
-
Calculate transition points For each clip pair, compute the xfade offset:
clip_duration - xfade_duration. Default xfade is 2 seconds. -
Chain xfade filters FFmpeg filter_complex chains xfade between every adjacent clip. Last clip crossfades back into first for seamless loop.
-
Extend to target duration The looped sequence is repeated via stream_loop or concat to fill the full 2-4 hour target.
-
Apply color grading LUT file applied as a video filter in the same FFmpeg command. Per-scene saturation and contrast curves.
FFmpeg xfade Example
Supported Transitions
- fade — default, most natural for ambient
- dissolve — softer blend
- wiperight — for city scenes
- smoothup — gentle vertical pan effect
Duration Guidelines
- 2.0s xfade — standard for nature scenes
- 3.0s xfade — very slow, dreamy feel
- 1.0s xfade — faster cuts for city scenes
- Set in scene config
xfade_duration
Thumbnail & Metadata
Auto-generated thumbnails from the best frame, plus SEO-optimized titles, descriptions, and tags for YouTube.
Thumbnail Generation
-
Frame extraction FFmpeg extracts 1 frame per second from the loop sequence.
-
Quality scoring Each frame scored on: sharpness (Laplacian variance), color vibrancy (saturation histogram), composition (rule of thirds).
-
Overlay text Scene title and "DeepFocus" branding overlaid using Pillow/PIL. Font, position, and shadow configurable per scene.
-
Export at 1280x720 YouTube thumbnail spec. Saved to
.tmp/thumbnails/{scene_id}.jpg
YouTube Metadata Template
| Metadata Fields | |
|---|---|
| Title | {Scene Name} Window View | {Duration}hrs Ambient for Study & Focus |
| Description | Scene description + audio credits + timestamps + channel links |
| Tags | Scene-specific + generic ambient tags (max 500 chars total) |
| Category | Music (ID: 10) for ambient content |
| Privacy | Unlisted by default, manually publish after review |
| Playlist | Auto-added to scene-type playlist (Nature, City, etc.) |
.env as YOUTUBE_API_KEY.Quick Start Guide
Everything you need to go from zero to a rendered video.
Prerequisites
- Python 3.10+
- FFmpeg with libx264 (on PATH)
- Freesound API key (apply here)
- YouTube Data API v3 credentials (for upload step only)
Step-by-Step
-
Install dependencies
pip install -r requirements.txt -
Configure .env Set
FREESOUND_API_KEYand optionallyYOUTUBE_API_KEY -
Choose a scene Pick from the 9 scene configs (e.g.,
beach_sunrise) -
Drop clips into intake folder Place 6-10 clips in
.tmp/clips/beach_sunrise/ -
Run the pipeline
python pipeline.py --scene beach_sunrise -
Review output Final video in
.tmp/renders/, thumbnail in.tmp/thumbnails/ -
Upload (optional)
python pipeline.py --scene beach_sunrise --upload
Common Commands
DO
- Use 4K clips for best quality
- Provide at least 6 clips per scene
- Check xfade duration matches scene mood
- Review the loop sequence before full render
- Include Freesound attribution in descriptions
- Cache audio downloads for reuse
DON'T
- Don't use variable framerate clips
- Don't mix landscape and portrait clips
- Don't use clips with abrupt movement
- Don't skip the normalize step
- Don't upload more than 1 video/day (API quota)
- Don't delete
.tmp/audio/cache unnecessarily