Video Watermark Tech

Best Video Watermark Removal Tools in 2026 (Tested & Compared)

Jason Lin — Mon, 19 Jan 2026 06:54:07 GMT

Removing watermarks from videos is no longer just a “nice-to-have” feature.

For content creators, marketers, and agencies, it’s now a workflow necessity.

We tested the most popular video watermark removal tools in 2026, focusing on:

Output quality
Ease of use
Supported platforms (TikTok, CapCut, Instagram, etc.)
Real-world performance on moving backgrounds

Below are the best video watermark removal tools in 2026, ranked by practical use cases.

1. VideoWatermarkRemove — Best Overall for AI-Based Video Watermark Removal

Best for:

Creators who need fast, clean watermark removal from TikTok, CapCut, and social videos.

VideoWatermarkRemove is an AI-powered video watermark removal tool designed specifically for modern social media content. Unlike traditional cropping or blurring tools, it focuses on temporal consistency, ensuring the removed area stays stable across frames.

Key Features:

AI-based watermark detection and removal
Optimized for TikTok and CapCut watermarks
Preserves original video resolution
No manual frame-by-frame editing required

Pros:

Very clean results on moving backgrounds
Simple web-based workflow
No software installation

Cons:

Focused on watermark removal (not a full video editor)

👉 Why it stands out:

VideoWatermarkRemove consistently delivers natural-looking results without flickering or visual artifacts, making it one of the most reliable AI watermark removal tools in 2026.

2. HitPaw Watermark Remover — Best for Desktop Users

Best for:

Users who prefer offline desktop software.

HitPaw offers a traditional watermark removal experience with manual selection tools. It works well for static logos but may struggle with complex motion.

Pros:

Desktop-based
Supports batch processing

Cons:

Manual masking required
Less effective on dynamic watermarks

3. Apowersoft Online Watermark Remover — Best for Quick Fixes

Best for:

Simple watermark removal on short clips.

Apowersoft provides a lightweight online tool suitable for quick edits, though results vary depending on background complexity.

4. Media.io — Best for Casual Users

Best for:

Beginners who want a simple UI.

Media.io offers basic watermark removal features alongside other video tools, but AI precision is limited compared to dedicated solutions.

5. CapCut (Crop-Based Methods) — Best Free Option

Best for:

Users willing to crop instead of fully removing watermarks.

While CapCut doesn’t truly remove watermarks, cropping can sometimes work — at the cost of resolution and framing.

Comparison Table

Tool	AI-Based	TikTok / CapCut Support	Preserves Quality	Ease of Use
VideoWatermarkRemove	✅	✅	✅	⭐⭐⭐⭐⭐
HitPaw	❌	⚠️	⚠️	⭐⭐⭐
Apowersoft	⚠️	⚠️	⚠️	⭐⭐⭐
Media.io	⚠️	❌	⚠️	⭐⭐⭐
CapCut Crop	❌	❌	❌	⭐⭐

Which Video Watermark Removal Tool Should You Choose?

If you want the most reliable AI-based watermark removal, choose VideoWatermarkRemove.
If you need offline processing, HitPaw is a solid alternative.
For occasional use, online tools like Apowersoft can work.

Final Verdict

AI-powered tools are rapidly replacing manual watermark removal methods.

Among all options tested, VideoWatermarkRemove stands out in 2026 for its balance of quality, speed, and ease of use — especially for creators working with short-form social video.

The Temporal Consistency Challenge From Optical Flow to Spatiotemporal AI in Video Restoration

Jason Lin — Tue, 13 Jan 2026 13:16:00 GMT

1. Introduction: The Flicker Problem and the Dimension of Time

We are witnessing a Cambrian explosion of generative AI. Image manipulation has reached remarkable maturity: removing unwanted objects from a photograph is now almost trivial, supported by tools ranging from Photoshop’s Generative Fill to open-source diffusion models. On static images, results are often visually flawless.

Naturally, expectations have shifted toward video.

From professional editors to everyday content creators, the demand is clear:

“One-click erase anything from video.”

Yet, reality lags behind the promise.

When current AI video tools are applied to dynamic footage—such as removing a logo from a panning shot or erasing a person walking across a textured background—the illusion quickly collapses. Instead of seamless restoration, the result is often plagued by shimmering textures, unstable edges, and chaotic motion artifacts.

In the visual effects industry, these failures are known as temporal artifacts. To end users, they appear as flicker, jitter, or “boiling” textures—rendering the output unusable for any professional purpose.

Why does this gap exist?

Why is image inpainting nearly solved, while video inpainting remains stubbornly difficult?

The answer lies in a fundamental distinction:

A video is not a collection of images.

It is a continuous spatiotemporal signal.

A single frame is two-dimensional.

A video exists in three dimensions: height × width × time.

The decisive barrier separating amateur video tools from professional-grade AI is not spatial quality, but temporal consistency—the ability to maintain coherent structure and motion across frames.

This article explores that barrier in depth, tracing the evolution of video restoration from optical flow to modern spatiotemporal AI, and explaining why temporal consistency remains the central challenge in automated video repair.

2. The Trap of Per-Frame Inpainting

Most failed AI video tools share the same flaw: per-frame inpainting.

This approach applies a powerful image model independently to each frame of a video, treating time as irrelevant. Each frame may look acceptable on its own, but the sequence as a whole collapses.

Consider an analogy.

Imagine asking a skilled painter to repaint a watermarked region across 100 consecutive frames of a video showing rippling water. The painter is forbidden from seeing the frames before or after the one they are currently working on.

In frame 10, they paint a small wave crest.
In frame 11, they imagine a smooth reflection.
In frame 12, they add a shadow from a passing cloud.

Each frame, viewed alone, is plausible.

Played in sequence, the region mutates chaotically thirty times per second.

This is the essence of flicker.

The model has no memory of the immediate past and no awareness of the immediate future. It cannot respect motion continuity, lighting evolution, or physical plausibility.

Per-frame inpainting fails not because the image quality is poor, but because time is ignored.

3. How Temporal Consistency Is Measured

Temporal consistency is not a subjective aesthetic judgment. In professional research and engineering workflows, it is quantified using motion-aware and perceptual metrics.

Common evaluation methods include:

Feature-level similarity across frames, such as VGG-based cosine similarity, to assess semantic stability over time.
Temporal Flicker Index (TFI), measuring high-frequency temporal variations in reconstructed regions.
Optical-flow residual consistency, where instability appears as errors between predicted motion and reconstructed pixels.
Warping error metrics, evaluating how well reconstructed content aligns when propagated across frames.

No single metric fully captures human perception, but together they clearly separate temporally coherent models from frame-independent approaches.

4. Era 1.0: Optical Flow and Motion Propagation

Before deep learning reshaped computer vision, video restoration relied heavily on optical flow.

Optical flow estimates how pixels move between consecutive frames, producing a vector field that describes motion direction and magnitude. Early video inpainting systems used this information to propagate known background pixels from adjacent frames into occluded regions.

This approach marked a significant improvement over per-frame editing and works well in simple scenarios, such as slow camera pans over static scenes.

However, it rests on a fragile assumption: that optical flow estimation is accurate.

In real-world footage, this assumption frequently breaks down.

Optical flow struggles with:

Occlusions, where background pixels disappear behind moving objects.
Complex motion, such as smoke, fire, water, or foliage.
Large displacements, caused by fast camera movement or cuts.

When flow estimation fails, errors are propagated across frames, producing warped textures and unstable artifacts. Optical flow improved temporal coherence, but could not guarantee it.

5. Era 2.0: Spatiotemporal Deep Learning

The modern shift in video restoration began when researchers stopped treating time as auxiliary information and instead modeled it explicitly.

This marked the transition from “2D + time” thinking to true spatiotemporal modeling.

5.1 3D Convolutional Networks

Early deep learning approaches introduced 3D convolutional neural networks (3D CNNs). These models process stacks of frames using volumetric filters, allowing spatial and temporal features to be learned jointly.

While effective for short sequences, 3D CNNs face limitations:

High computational cost
Limited receptive fields
Difficulty modeling long-range temporal dependencies

5.2 Attention and Transformer-Based Models

A more profound breakthrough emerged from attention mechanisms, adapted from natural language processing.

Transformer-based architectures, such as FuseFormer, allow models to selectively attend to relevant regions across many past and future frames. Instead of copying pixels blindly, the model learns to ask:

“Which moments in the video’s history best explain what belongs here?”

By attending to consistent background structures while ignoring transient foreground motion, these models achieve stable, flicker-free reconstruction.

Temporal coherence becomes an emergent property of learned spatiotemporal reasoning.

6. Practical Implementations in Industry and Open Source

Spatiotemporal consistency is not confined to research papers. It underpins many real-world systems.

Examples include:

RIFE / DAIN

Originally developed for frame interpolation, these models emphasize motion-aware temporal smoothness and are often reused in restoration pipelines.
Adobe After Effects with Mocha AE

Professional workflows combine optical flow, planar tracking, and manual constraints to maintain temporal coherence.
Runway ML Gen-2 and Pika

Generative video platforms exploring transformer- and diffusion-based spatiotemporal generation.
Open-source research implementations, such as FuseFormer-based video inpainting frameworks.

Across these systems, the conclusion is consistent: video restoration must reason across time to succeed.

7. Era 3.0: Generative Video Diffusion

The next frontier is video diffusion models.

While attention-based systems excel at maintaining consistency in existing textures, diffusion models enable the synthesis of entirely new content that remains stable over time.

Recent research incorporates temporal adversarial losses, where discriminator networks evaluate short video segments instead of individual frames. These discriminators penalize flicker, jitter, and motion inconsistency, forcing the generator to internalize spatiotemporal structure.

The model no longer merely fills holes—it learns to generate a coherent temporal reality.

8. Open Challenges Ahead

Despite progress, significant challenges remain:

Long-duration consistency

Maintaining identity and structure across minutes or hours of video remains unresolved.
Semantic-level temporal reasoning

Future systems must encode physical and logical priors: objects do not pass through walls, shadows obey lighting geometry, and liquids follow gravity.
Real-time constraints

Achieving temporal stability under low-latency requirements—such as mobile devices or live streaming—demands lightweight yet expressive models.

Temporal consistency is not a solved problem, but an evolving frontier.

9. Ethical Considerations and Responsible Deployment

Powerful video restoration capabilities raise legitimate concerns around misuse and deepfake generation.

Responsible deployment requires:

Clear usage policies
Legal guidance and fair-use awareness
Integration with content provenance and digital watermarking systems

Advancing video AI must go hand in hand with safeguards that preserve trust in visual media.

Conclusion: The New Standard

The era of accepting flickering, unstable automated video repairs is ending.

Professional video AI is not defined by how good a single frame looks, but by how coherently frames evolve over time. Temporal consistency has become the defining standard.

Understanding this challenge clarifies the difference between novelty tools and production-grade systems. As spatiotemporal models continue to mature—from optical flow to attention mechanisms and diffusion—the boundary between automated video restoration and human-level results is rapidly dissolving.

Temporal consistency is no longer an enhancement.

It is the line that separates experimental video AI from professional systems.

These challenges are not theoretical. They directly shape how modern video restoration systems are designed and deployed in real-world applications.

This requirement becomes especially apparent in automated watermark removal tasks, where even minor temporal artifacts are immediately visible during playback.

One practical implementation of these concepts can be seen in modern AI-based video watermark removal pipelines, where temporal consistency is critical to avoid flicker across frames.

References

Foundational Surveys & Metrics

Prokudin, S., et al. Deep Video Inpainting: A Survey. arXiv:2201.11258

https://arxiv.org/abs/2201.11258
Lai, W.-S., et al. Learning Blind Video Temporal Consistency. ECCV 2018

https://arxiv.org/abs/1808.00449

Flow-Based Approaches

Gao, Y.-K., et al. Flow-edge Guided Video Completion. ECCV 2020

https://arxiv.org/abs/2007.09696

Spatiotemporal Transformers

Liu, R., et al. FuseFormer. ICCV 2021

https://openaccess.thecvf.com/content/ICCV2021/html/Liu_FuseFormer_Fusing_Fine-Grained_Information_in_Transformers_for_Video_Inpainting_ICCV_2021_paper.html
Zeng, Y., et al. Learning Joint Spatial-Temporal Transformations for Video Inpainting. ECCV 2020

https://arxiv.org/abs/2007.10247

Video Diffusion Models

Blattmann, A., et al. Align Your Latents. CVPR 2023

https://arxiv.org/abs/2304.08818

Industry Research

Google Research. Lumiere: A Space-Time Diffusion Model for Video Generation.

https://research.google/blog/lumiere/
Meta AI. Emu Video.

https://ai.facebook.com/blog/emu-video/
Runway Research. Gen-2.

https://runwayml.com/research/gen-2/

Remove CapCut Watermarks with AI：How We Built a Flicker-Free Video Inpainting System

Jason Lin — Fri, 26 Dec 2025 12:50:00 GMT

CapCut watermark removal is often solved using blur or crop, but leaves flicker and artifacts.

This article explains how we built an AI-based video inpainting system to remove CapCut watermarks cleanly, without quality loss, and how it works under real production constraints.

CapCut makes video editing simple, but the final exported clip often includes a watermark or ending screen. Removing it cleanly is harder than it looks. Many people try blurring, cropping, or covering the mark with stickers. These methods may work on a single frame, but once the video plays, artifacts appear: flicker, jitter, lost framing, or blurry patches.

In this article, I share how we engineered an AI-powered system to remove CapCut-style watermarks without blur, without cropping, and without flicker. Instead of just hiding the logo, we reconstruct the background. I will walk through the high-level architecture, the engineering challenges we hit, and why this problem is more interesting than it first appears.

If you want to test the results, check out our free online AI tool for removing CapCut watermarks. Just upload your video, process it, and download a clean version.

Why blur and crop fail on video

Blur and crop are good enough for static images, but video is temporal. Each frame is processed independently, and that introduces several problems:

The blurred region does not perfectly track camera motion, which causes flicker.
Cropping changes framing, sometimes cutting out important content or breaking aspect ratios.
None of these methods restore what was actually behind the watermark. They only hide it.

The real challenge is not “making the logo less visible”. The real challenge is “restoring a plausible background in a way that stays consistent across hundreds or thousands of frames”.

If you want to test the result, you can try the online CapCut watermark remover on our site.

How to remove CapCut watermark step-by-step (practical workflow)

If you’re a creator looking for a clean output without blur artifacts, here’s a quick workflow based on our system:

Export your video from CapCut without trimming the ending watermark.
Upload the clip to the AI removal tool page.

Supported formats: MP4, MOV, WebM.

No editor installation required—runs entirely in browser cloud processing.
The model detects and segments the watermark region automatically.

You don’t need to mark rectangles manually unless the overlay is very large.
AI performs inpainting + temporal propagation across frames.

Motion estimation keeps continuity so you won’t see flicker across cuts.
Preview the result → download the clean video.

For longer videos (>2min), processing is split into internal segments to avoid memory overflow.
(Optional for better quality)
- Avoid heavy motion blur scenes when recording
- Export at 1080p or higher
- Upload watermark at edge rather than center for best reconstruction

High-level architecture of our AI system

We approached watermark removal as a temporal video inpainting problem. At a high level, the system does four things:

Track how pixels move over time.
Borrow clean pixels from other frames whenever possible.
Synthesize new pixels when no clean information exists.
Ensure that the result does not flicker.

In practice, this became a three-stage pipeline:

Architecture Summary:

Optical Flow → Temporal Propagation → GAN → Temporal Smoothing

Stage 1：Motion estimation with optical flow

We estimate pixel motion between frames using optical flow. Even if the watermark itself is static, the background often is not: the camera pans, zooms, or shakes. Optical flow gives us a dense mapping that lets us follow background structures over time.

Stage 2：Temporal propagation

Given the motion field, we propagate clean background information from frames where the region is visible into frames where it is covered. Instead of treating each frame as an isolated image, we treat the video as a 3D volume (x, y, t) and move information along motion trajectories.

Stage 3：Generative inpainting

When the watermark covers a region that is never fully visible in any frame, propagation is not enough. For those cases, we run a generative inpainting model, based on a GAN-like architecture, to hallucinate plausible textures that match the lighting, color, and noise pattern of the surrounding area.

The combination of motion-aware propagation and generative inpainting gives us a result that is visually coherent and, in many cases, indistinguishable from an unwatermarked original.

Eliminating flicker: the real difficulty

Removing the watermark cleanly in a single frame is only half of the job. Early versions of our system produced backgrounds that looked fine frame by frame, but flickered badly when played back:

Slight differences in the inpainted region between frames.
Hard edges where propagated pixels met generated ones.
Temporal “pulsing” when the model changed its guess about textures.

To make the result actually watchable, we added several stabilizing steps:

Temporal smoothing on the inpainted region, guided by optical flow.
Consistency-oriented losses during training that penalize frame-to-frame disagreement.
Post-processing that blends seams between original and edited pixels.

One useful observation from user feedback: people will forgive small spatial imperfections, but they rarely forgive flicker. A tiny texture mismatch is acceptable; a jumping patch in the corner is not. That insight pushed most of our optimization effort toward temporal stability rather than per-frame perfection.

Production challenges we had to solve

Turning this pipeline into a tool that runs in a browser involved very practical constraints.

Memory and long videos

Running a heavy model on an entire video sequence quickly leads to out-of-memory errors. Our solution was to process videos in overlapping segments, typically 20–30 seconds at a time, and pass just enough summary information between segments to keep transitions smooth.

Latency and user expectations

Creators are not willing to wait several minutes for a short clip. To keep processing times reasonable, we:

Converted key models to TensorRT for faster inference.
Tuned batch sizes and resolution trade-offs for typical short-form content.
Cached intermediate results such as optical flow where possible.

On a GPU, most short clips complete in under 30 seconds, which fits real-world expectations for an “online tool”.

CapCut-specific behavior

CapCut watermarks are not arbitrary. Their placement, fonts, and ending screen behavior follow recognizable patterns. We took advantage of that by:

Predefining likely regions for overlays.
Training on samples that mimic CapCut’s watermark style.
Special-casing the ending screen transition.

This reduced false positives and made the model more robust on real CapCut exports.

If you want to test the result, you can try the online CapCut watermark remover on our site.

When this approach works well, and when it does not

This AI-based approach works especially well for:

Logos, text overlays, and ending screens on relatively structured backgrounds.
Short-form content from CapCut, TikTok, and similar editors.
Repurposing vertical videos between platforms without platform branding.

It struggles in edge cases such as:

Watermarks that cover moving subjects for the entire duration of the clip.
Scenes with extremely dynamic lighting or reflections.
Very long, high-resolution videos where users expect near real-time processing.

In those situations, the information contained in the video is simply not sufficient to reconstruct the true background. The goal becomes “hide the watermark in a way that is not distracting” rather than “perfectly restore the original scene”.

Why this might be interesting for developers

From the outside, “remove watermark from video” sounds like a small feature. Under the hood, it touches on several areas that many developers care about:

Video processing under tight memory and latency budgets.
Optical-flow-guided models in production.
The difference between per-frame vision and temporal vision.
The trade-off between research-grade quality and product-grade reliability.

If you are working with generative models, video editing, or just enjoy making systems that hold up under real-world usage, there are probably patterns here that you will recognize.

A tool for real users

We wrapped this pipeline in a simple web interface so non-technical users can benefit from it:

Upload a CapCut clip → process with AI → download a cleaned version.

Traditional blur/crop solutions often introduce flicker. Instead, AI reconstructs the hidden pixels for a clean result. You can try the process using our AI CapCut Watermark Remover online.

FAQ

Q: Can AI really remove the CapCut ending watermark without blur or crop?

A: Yes. Our system reconstructs the background instead of just blurring or cropping the region, which avoids flicker and preserves framing.

Q: Is there a free way to try this?

A: Yes. The online tool has a free tier suitable for typical short-form videos.

Q: Does this work on both PC and phone?

A: Yes. It runs in the browser, so it works on desktop and mobile as long as you can upload a video.

Q: Will the video lose quality after removal?

A: The goal is to match the original texture and resolution as closely as possible. There may be minor differences in complex scenes, but the output is designed to look natural at normal playback speed.

Q: Can it remove other watermarks, not just CapCut?

A: It works best on CapCut-style overlays, but the same approach can handle many other text or logo watermarks with similar properties.

Q: Is there a free way to remove CapCut watermark online?

A: Yes. Many online tools offer limited-free usage.

You can upload your clip, let AI process it in the cloud, and download the clean result without installing apps.

Premium mode usually improves processing speed and output quality.

Q: Does it work for moving or animated watermarks?

A: In most cases yes — if the watermark moves slightly or appears at the end of the video, AI can track motion using optical flow.

For fast or highly dynamic motion overlays, quality may vary, but still performs better than blur/crop.

Q: Can AI remove other text or logo overlays?

A: Yes. The same technique can remove subtitles, static logos, captions, or unwanted text, as long as they don’t occupy too large of an area.

This makes the tool useful for ad creatives, UGC editing, education videos, and repurposing content across platforms.

Q: How long does it take to process a video?

A: Short clips typically finish in 10–30 seconds depending on hardware load.

Longer videos are processed in segmented batches to avoid memory overflow while keeping temporal consistency across cuts.

Before we wrap up—if you want to experience how this works in real videos, you can try our AI-powered CapCut watermark remover. No software download, no timeline masking, just upload your clip, let the model process it, and download a clean version with stable temporal quality. It works for CapCut ending logos, intro overlays, and even most moving text.

Conclusion

Watermark removal looks like a small UI detail, but solving it well requires motion estimation, temporal learning, generative restoration, and plenty of pragmatic engineering. By combining optical flow, temporal propagation, and inpainting, we reached a level of quality that is good enough for real creators, while still leaving room for improvement.

If you have ideas on improving temporal consistency, reducing latency further, or handling more complex overlays, I would be very interested in your thoughts.

### 📚 References

[1] CapCut Official — https://www.capcut.com/

[2] PyTorch — https://pytorch.org/

[3] TensorFlow — https://www.tensorflow.org/

[4] OpenCV — https://opencv.org/

[5] Fair Use Guidelines — https://www.copyright.gov/fair-use/

[6] YouTube Editing Basics — https://www.youtube.com/results?search_query=video+editing+basics

[7] TopTool Listing — https://www.toptool.app/en/product/videowatermarkremove-com

[8] Uneed Listing — https://www.uneed.best/tools/videowatermarkremove

Vibe Coding hit a wall: How I fixed $0.30/error OOMs and cut AI costs by 70%

Jason Lin — Mon, 15 Dec 2025 10:51:54 GMT

I recently joined the "Vibe Coding" movement—letting AI write 90% of my code to ship fast. It felt like magic.

I "prompted" my way to building a Free Video Watermark Remover in a single weekend. No boilerplate, just vibes. I deployed it, shared it on Dev.to, and traffic started pouring in.

But then, the "Vibe Check" failed hard.

The AI wrote functional code, but it didn't write efficient code. Reality hit me in the form of a server log.

The Disaster: Paying for Failure 💸

I woke up to a sea of red logs: RuntimeError: CUDA out of memory.

Users were uploading raw 4K, 60fps videos recorded on modern iPhones. These files were massive. When fed directly into the AI model (ProPainter on Replicate), the VRAM spiked, and the process crashed after about 4 minutes.

Here is the kicker with Serverless GPU providers like Replicate: You pay for the compute time, even if the task fails.

I was paying $0.20 - $0.30 for every crashed session. I was literally burning money to frustrate my users. The "Vibe" was gone.

The Naive Solution vs. The Engineering Solution

My first instinct (and the AI's suggestion) was to "Vertical Scale": Just upgrade the hardware. Switch from an NVIDIA T4 to an A100 (40GB VRAM).

The problem: A100s are expensive. This would triple my costs per second. It doesn't solve the inefficiency; it just masks it with money.

Then I realized: Does a TikTok reposter really need 4K 60fps? Most social media platforms compress videos to 1080p or 720p anyway. Processing 60 frames per second for a watermark removal is a waste of compute—30fps is visually identical for this use case.

The Fix: The CPU Middleware Pattern 🛡️

Vibe coding gets you from 0 to 1, but System Architecture gets you from 1 to 100.

Instead of sending raw videos directly to the GPU, I introduced a CPU Pre-processing Layer. I already had a VPS on Hostinger (which I paid for annually). The CPU cycles there were effectively free (sunk cost).

The New Architecture:

User Uploads -> My Hostinger Server (CPU).
Normalization -> FFmpeg compresses video to 720p @ 30fps.
Inference -> Optimized video sent to Replicate (GPU).
Merge -> AI output video + Original Audio merged back via FFmpeg.

The Code Implementation

Here is the Python logic I used to normalize the inputs before they ever touch the expensive GPU.

import subprocess

def optimize_video_for_ai(input_path, output_path):
    """
    Standardizes video to 720p and 30fps to prevent OOM.
    Uses 'veryfast' preset to minimize CPU latency.
    """
    command = [
        "ffmpeg", 
        "-i", input_path,
        "-vf", "scale='min(1280,iw)':-2",  # Downscale to 720p max, keep aspect ratio
        "-r", "30",                        # Force 30fps
        "-c:v", "libx264", 
        "-preset", "veryfast",             # Prioritize speed
        "-crf", "28",                      # Slight compression to speed up upload
        "-an",                             # Remove audio (we merge original audio later)
        "-y",
        output_path
    ]

    subprocess.run(command, check=True)