Generative AI is coming for videos. A new website, QuickVid, combines several generative AI systems into a single tool to automatically create YouTube, Instagram TikTok and Snapchat short videos. Given as little as a single word, QuickVid chooses a background video from a library, writes script and keywords, overlays DALL-E 2 generated footage, and adds synthetic voice-over and music from background from YouTube’s royalty-free music library.
QuickVid creator Daniel Habib says he’s building the service to help creators meet the “ever-increasing” demand from their fans.
“By providing creators with tools to produce great content quickly and easily, QuickVid helps creators increase their content production, reducing the risk of burnout,” Habib told TechCrunch in an email interview. “Our goal is to empower your favorite creator to meet the demands of their audience by leveraging advancements in AI.”
But depending on how they’re used, tools like QuickVid threaten to flood channels already clogged with spam and duplicate content. They also face potential backlash from creators who choose not to use the tools, either because of the cost ($10 per month) or on principle, but might have to compete with a series of new generated videos. by AI.
Go after the video
QuickVid, which Habib, a self-taught developer who previously worked at Meta on Facebook Live and video infrastructure, built in a matter of weeks, launched on December 27. It’s relatively simple at the moment – Habib says more customization options will arrive in January – but QuickVid can put together the components that make up a typical informative short YouTube or TikTok video, including captions and even avatars.
It’s easy to use. First, a user enters a prompt describing the subject of the video they want to create. QuickVid uses the prompt to generate a script, taking advantage of GPT-3’s generative text powers. Using keywords automatically extracted from the script or entered manually, QuickVid selects a background video from the Pexels royalty-free media library and generates overlay images using DALL-E 2. It then produces a voiceover via the Google Cloud text-to-speech API. — Habib says users will soon be able to clone their voices — before combining all of these elements into a video.
See this video made with the “Cats” prompt:
Or this one:
QuickVid certainly isn’t pushing the boundaries of what’s possible with generative AI. Both Meta and Google have showcased AI systems capable of generating completely original clips from a text prompt. But QuickVid merges the existing AI to exploit the repetitive format and model high-volume b-roll short videos, bypassing the problem of having to generate the footage itself.
“Successful creators have an extremely high quality bar and aren’t interested in posting content that they don’t think is in their own voice,” Habib said. “That’s the use case we’re focused on.”
That being supposed to be the case, in terms of quality, QuickVid’s videos are generally mixed. Background videos tend to be a bit haphazard or only tangentially related to the topic, which isn’t surprising given that QuickVid is currently limited to the Pexels catalog. The images generated by DALL-E 2, on the other hand, exhibit the limitations of today’s text-to-image technology, such as garbled text and abnormal aspect ratios.
In response to my comments, Habib said that QuickVid was “tested and modified daily”.
According to Habib, QuickVid users retain the right to commercially use the content they create and have permission to monetize it on platforms like YouTube. But the copyright status around AI-generated content is… nebulous, at least currently. The United States Patent and Trademark Office (USPTO) recently decided to revoke copyright protection for an AI-generated comic book, for example, saying that copyrighted works require human fatherhood.
Asked how the USPTO’s ruling might affect QuickVid, Habib said he believes it only concerns the “patentability” of AI-generated products and not creators’ rights to use and monetize their content. Creators, he pointed out, don’t often file patents for videos and generally focus on the economics of creators, letting other creators reuse their clips to increase their own reach.
“Creators care about delivering high-quality content in their voice that will help grow their channel,” Habib said.
Another legal challenge on the horizon could affect QuickVid’s DALL-E 2 integration – and, by extension, the site’s ability to generate image overlays. Microsoft, GitHub, and OpenAI are being sued in a class action lawsuit that accuses them of violating copyright law by allowing Copilot, a code-generating system, to regurgitate sections of licensed code without providing credit. (Copilot was co-developed by Microsoft-owned OpenAI and GitHub.) The case has implications for generative art AI like DALL-E 2, which has also been found to copy and paste from sets of data on which they were trained (i.e. images).
Habib is not concerned, arguing that the generative AI genius is out of the bottle. “If another lawsuit comes along and OpenAI goes away tomorrow, there are several alternatives that could power QuickVid,” he said, referring to the DALL-E 2-like open source system, Stable Diffusion. QuickVid is already testing Stable Diffusion to generate avatar photos.
Moderation and Spam
Legal dilemmas aside, QuickVid may soon have a moderation issue on its hands. While OpenAI has filters and techniques in place to prevent them, generative AI has well-known issues of toxicity and factual accuracy. GPT-3 spreads misinformation, especially about recent events, that exceeds the limits of its knowledge base. And ChatGPT, a fine-tuned offspring of GPT-3, was found to use sexist and racist language.
This is especially concerning for people who would use QuickVid to create informational videos. In a quick test, I asked my partner – who is much more creative than me, especially in this area – to enter some offensive prompts to see what QuickVid would generate. To QuickVid’s credit, obviously problematic prompts like “Jewish New World Order” and “9/11 Conspiracy Theory” didn’t produce toxic scripts. But for “Critical Race Theory Indoctrinating Students,” QuickVid generated a video implying that Critical Race Theory could be used to brainwash school children.
Habib says he relies on OpenAI’s filters to do most of the moderation work and says it’s up to users to manually review every video created by QuickVid to make sure “everything is in the limits of the law”.
“Generally, I think people should be able to express themselves and create whatever content they want,” Habib said.
This apparently includes spammy content. Habib argues that video platforms’ algorithms, not QuickVid, are best placed to determine the quality of a video, and that people who produce low-quality content “only hurt their own reputation.” The reputational damage will naturally discourage people from creating mass spam campaigns with QuickVid, he says.
“If people don’t want to watch your video, you won’t get distribution on platforms like YouTube,” he added. “Producing low-quality content will also cause people to view your channel in a negative light.”
But it’s instructive to look at ad agencies like Fractl, which in 2019 used an AI system called Grover to generate an entire site of marketing materials — reputation be damned. In an interview with The Verge, Fractl partner Kristin Tynski said she foresees generative AI enabling “a massive tsunami of computer-generated content in every niche imaginable.”
Either way, video-sharing platforms like TikTok and YouTube haven’t had to deal with mass moderation of AI-generated content. Deepfakes – synthetic videos that replace an existing person with the likeness of someone else – began populating platforms like YouTube several years ago, thanks to tools that made it easier to produce deepfake footage. But unlike today’s most compelling deepfakes, the types of videos QuickVid creates are obviously not AI-generated.
Google Search’s policy on AI-generated text could be a preview of what’s to come in the video realm. Google doesn’t treat synthetic text any differently than human-written text when it comes to search rankings, but does take action on content “intended to manipulate search rankings and not help users.” This includes content assembled or combined from different web pages that “[doesn’t] add sufficient value” as well as content generated by purely automated processes, both of which may apply to QuickVid.
In other words, AI-generated videos might not be banned from platforms outright if they take off in a major way, but simply become the cost of doing business. That shouldn’t allay the fears of experts who believe platforms like TikTok are becoming a new hotbed for misleading videos, but – as Habib said in the interview – “there’s no stopping the generative AI revolution”.