Deep-Dive into AI Video Creation

Intro

Introducing our new startup, Automagical!

Several months ago, we started a company…Last week, we launched our first product…Tomorrow, we’ll be taking over your news feed…

And today, we’d like to pause for a moment to introduce you to our new company, Automagical, and explain some of the reasoning behind what we’ve been building. We believe this type of transparency surrounding our motivations and product roadmap will only help to improve our company’s trajectory while making sure our assumptions are aligned with the needs of the market.

Towards that end, this post will start off just as we have as a company, focused on a single, well-defined problem that we are uniquely poised to solve, discussing the minimum viable roadmap of what it would take to solve this problem, and concluding by giving some insight into where our roadmap is headed in the future.

The Problem

There is a disconnect between how important video content is to modern marketing and how difficult and expensive it is to produce.

Customers are 4x more likely to watch a video about a product than read about it.

This isn’t news to professional marketers.

They’ve known for quite awhile that marketing videos are significantly more engaging and compelling than traditional forms of marketing, but creating customized video content remains an expensive and time-consuming process.

With the rise of video in social, this disconnect has gradually become worse, with smart brands realizing the need to constantly create new, engaging content to survive.

So, if we know that video content is so important, how can we make it so that video production is simple, fun, and cost-effective, while keeping the production quality bar high?

Introducing Automagical

This problem of simple, accessible video creation is the crux of why we started Automagical, and we’d like to break down our thought process for our unique approach to solving this problem at scale.

Automagical uses AI to allow you to quickly turn blog posts into engaging marketing videos.

Why blog posts?

Producing a video from scratch is a daunting task when faced with a blank canvas. For most of us, this often means hiring an outside team that specializes in video production to help refine your content into a storyboard and then having them produce a video based on that storyboard.

The process of writing a blog post, on the other hand, is a much more common and approachable task. Many companies have their own blog, having already recognized the importance of writing fresh content to reach new audiences and keep existing customers engaged and up-to-date.

By constraining our video creation process to starting from existing blog posts and articles, we alleviate a lot of the conceptual difficulty with how to get started producing a video while at the same time working within a medium that marketers are already comfortable with. On top of that, many companies with mature, existing blogs are immediately able to breathe new life into old content with fresh, engaging videos, without the hassle of worrying about creating original content from scratch.

So, what about AI?

Now that we’ve narrowed our problem somewhat from “how to easily create videos” to “how to convert blog posts to videos”, we still face a very difficult task in general. With AI being all the rage these days, you might think we could just throw some AI at the problem and be done with it, but in practice it’s not so simple.

While deep learning has expanded AI’s capabilities leaps and bounds in the past few years, it still remains best-suited to tackling highly constrained, well-defined problems, and our “convert blog posts to videos” goal is a bit too generalized.

I Am Devloper

@iamdevloper

·Follow

[at a conference] - who's working on ML or AI right now? *lots of hands go up* - any of it in production? *lots of hands go down*

11:54 AM · Jul 31, 2017

2.3K

Read 23 replies

Continuing our thought experiment, we can simplify things a bit by breaking up the blog post’s story into a sequence of scenes, where each scene represents one bullet point in a summary of the source article. Article summarization is then our first targeted use of AI, and it allows us to efficiently convey the most relevant pieces of information from the source article in bite-sized chunks that fit well into a video format.

Source article broken down into a sequence of scenes comprising a Storyboard.

Now that we have a sequence of scenes breaking down the story into key points, we’d ideally like to add some representative visuals to each scene in order to take full advantage of the graphic video format. This brings us to the second major area where AI can help out by using NLP and topic clustering in order to extract keywords and entities from each scene. By combining these topics and keywords with a powerful, multi-provider search across millions of royalty-free stock photos, gifs, and videos, we can add diverse, topically-relevant background visuals to each scene.

At this point, we have generated a sequence of scenes from a blog post, with each scene representing a key point and containing a topically relevant background image or video. Add in some fancy transitions, text effects, audio, and some personalized branding, and you have yourself a pretty solid video summarizing your source content without having done any real work.

So what does all of this actually look like?

The video above is a perfect example of this format at work, and it took less than 10 minutes to put together. You can see all of the ideas we’ve discussed so far, with each distinct scene carrying its own message and visuals punctuated by some subtle transitions and effects. You may also notice customized branding on the outro scene and a professional-quality soundtrack sourced from musicformakers.com.

We believe that this format passes a reasonable bar of quality requiring minimal manual editing and represents a major step towards solving for our target use case. That being said, it’s not perfect yet, and we’re working hard to improve many aspects of the production process as well as the quality and diversity of the video output itself.

Up until now, we’ve provided a baseline for the need to solve video creation at scale and our thought process on how best to achieve it. Let’s now turn our attention towards Automagical’s solution to this problem, where our product roadmap currently stands, and areas we’re hoping to improve in the future.

Automagical Product Roadmap

Starting a company is not easy. Anyone who’s ever founded a company will tell you that. Bootstrapping a startup from initial idea to MVP and getting your first revenue is even harder. I am very proud to say we’ve accomplished just that and want to throw a shout out to all the hard work our world-class team has done in getting us this far. It is, however, just the beginning…

We’d like to break down some of our thoughts on the features we chose to focus on for our recent public launch and discuss our product roadmap transparently in the hopes that we will receive better feedback and be able to adjust our future roadmap accordingly.

Screenshot of our storyboard editor after automatic scene extraction.

v1.0

The goal of our initial launch was simple: be able to intuitively turn an existing blog post into a quality video that would pass a reasonable quality threshold and require minimal human intervention. Here are some of the features that made it into this release:

Extractive text summarization — the source article is parsed and summarized using an algorithm based on tf-idf (see this article for a great summary of text summarization, no pun intended…).

Royalty-free stock image / gif / video search aggregated from eight of the largest stock providers providing a free library of tens of millions of graphics.

Professional-quality, royalty-free library of audio tracks.

Ability to produce videos in three most common aspect ratios: landscape / 16:9, square / 1:1, and portrait / 9:16, each of which is optimized for separate platforms.

Ability to change theme settings such as font, text color, background gradients, etc.

Kenburns panning / zooming effect for background images.

Intuitive, modern editor with a focus on simple UX.

Screenshot of our editor after changing aspect ratio to square with suggested background images displayed on the left.

Near-Term Roadmap

While we are quite happy with our v1.0 release, there are some areas we know we need to improve. The two most common areas of feedback so far are:

Improving the automated storyboard generation in order to minimize the need for human editing.

Improving the quality and diversity of generated videos.

We have a lot of ideas and projects in the works to improve these aspects of our product, and with these key points in mind, here’s a sneak peak at some of the the areas we’re most excited about!

Abstractive text summarization — Currently, our article summarization uses what’s known as extractive summaries, whereby key sentences are extracted verbatim from the source and used for scenes. This tends to result in verbose scenes that don’t fit as well into the soundbite video format. Abstractive text summarization is a more advanced deep learning approach that will allow us to summarize an article’s text even further while retaining the essence of the text’s meaning. For more info on state-of-the-art abstractive text summarization, please check out Google, Facebook, Rush et al 2015, and Vishwani et al 2017.

Smarter media selection — Our AI sometimes selects background media for a scene it thinks will be relevant, but it ends up looking out of place to a human. Ensuring a given image or video is as relevant as possible to a the source story and target scene is a difficult machine learning problem, but it’s also one which has received a decent amount of related research in academia. We are working alongside some of the world’s leading experts in AI to incorporate more intelligence into this process, and we’re also very excited to start experimenting with Google’s Beta Cloud Vision API.

More thematic diversity and higher quality visuals — This one should be fairly obvious and there is a lot of low-hanging fruit here to explore. We’ll be gradually adding more text effects, transitions, filters, and high-level video theme types in the coming months. This means that you’ll have more customization options and higher visual fidelity from the same base storyboard format without any additional work on your end.

Built-in support for CTA and metrics — We know marketers care about ROI and providing measurable calls-to-action, so expect to see integrated support for these included in our product offering soon.

Check Out Automagical Today!

We’re a fairly young startup, but our founders are experienced entrepreneurs, and we’ve hit the ground running. If you’re interested in joining us on the cutting-edge of video marketing, check our our free demo at automagical.ai.

Are you an engineer, designer, or marketer who’s interested in what we’re building? We’re also hiring… Hit us up at info@automagical.ai and let us know what sparked your interest!

Editor's note: Automagical was acquired by Verblio in Fall 2018!

👉

Follow me on twitter for more vibes @transitive_bs