Back

Building an AI Video Creator: From Idea to Published Video

Nov 2025 — Present·AI Video Creator
Next.jsFastAPIRemotionSupabaseClaudeElevenLabs

An end-to-end platform that automates the journey from a raw idea to a polished, upload-ready YouTube video — including research, scriptwriting, voiceover, visuals, rendering, and publishing metadata. The platform is functional and used daily in production on a live YouTube channel.

The Problem

Creating quality video content is time-intensive. Researching an idea, writing a compelling script, recording voiceover, sourcing or generating visuals, editing everything together, and preparing metadata can take days for a single video. I wanted to see how far AI could compress that workflow while maintaining a professional result.

How It Works

The user provides an idea or a YouTube video URL as a starting point. From there, the system kicks off a fully automated pipeline: it ingests YouTube transcripts to generate content ideas, researches the idea via Perplexity Sonar, runs a multi-pass scriptwriting chain (structure → draft → AI critic → enhanced final script), synthesizes natural voiceover with ElevenLabs and auto-transcribes for timed captions, creates timed image prompts with style adjustments, generates images via Replicate and Google models, renders the video with Remotion, and produces YouTube-ready metadata.

Stack & Architecture

The backend is built with FastAPI (Python), handling orchestration, AI provider calls, and job management. The frontend is a Next.js 16 App Router application with Tailwind CSS, providing a studio-like interface to manage projects and monitor pipeline progress. Remotion handles programmatic video composition in React — rendering locally during development or scaling to Cloud Run workers in production. Supabase serves as the database, file storage, and authentication layer with full RLS policies and cookie-based auth.

Multi-Model AI

The system uses Claude, GPT, Gemini, Replicate, and ElevenLabs — picking the best model for each step of the pipeline. Perplexity Sonar handles deep research with facts, stats, and context. The scriptwriting chain runs through multiple passes for quality. Prompt templates are customizable per project, allowing different AI behaviour for different channels, audiences, and styles.

Pipeline & Job System

Every long-running step is tracked as a background job with real-time progress, logs, and cancellation support. If a step fails, it can be retried without re-running the entire pipeline. Supabase stores projects, transcripts, prompts, generated images, renders, and detailed job logs. Project names are normalized to avoid duplicates, and logs are trimmed for readability.

Rendering

The backend invokes Remotion with a temporary props file containing the script, audio, images, and timing data. Rendering can run locally during development or on Cloud Run workers for production scale. The rendered video is uploaded to Supabase storage and recorded for reuse. Multiple render configurations (resolution, aspect ratio, style) are supported.

Security & Privacy

The platform is built as a cloud service with subscription-based access. Supabase-backed with full Row Level Security policies and cookie-based authentication ensures each user's data is isolated and secure. No third-party analytics, no data sharing.

Frontend & UX

The frontend provides a project dashboard where each pipeline step is visible with its status, timestamps, and output. Job polling only runs when the browser tab is visible to avoid unnecessary API calls. Users can preview scripts, listen to voiceover, view generated images, and watch rendered videos — all within the app.

Challenges & Lessons Learned

Orchestrating multiple AI services taught me a lot about handling latency, token limits, rate limiting, and graceful degradation. Caching and reuse were essential — regenerating images or voiceover for every tweak would be prohibitively slow and expensive. Building a durable job system with clear observability was the backbone that made the whole pipeline reliable.

Current Status

The platform is functional and used daily on a live YouTube channel. Multi-language support, audience analysis, style consistency controls, and a public SaaS launch are on the roadmap.