One of the areas that we have not discussed in our Generative AI Playground series yet is the power of AI in video. We thought it was time to show what generative AI can do in this field, so we have been testing different use cases over the last few months. In this fifth article of our Generative AI Playground series, we’re going to dive into how we use deepfake technology to create interactive, hyperrealistic AI avatars.
New to our Generative AI Playground? We’ve already discussed report generation, function calling, multi-agent GPT systems and visual search in our previous parts.
Deepfake technology often gets a bad rap, as its primarily associated with fake news and misinformation. However, the technology behind it can be very useful when used for the right purposes. At Raccoons, we prefer to use its power for positive, valuable use cases. Enter AI avatars—virtual personas that can speak any language, deliver training, and even provide real-time personal assistance. The goal? To bridge the gap between realistic human interaction and the practical efficiency of AI. We’re going beyond the chatbots we know today — restricted to a chat widget in a corner of a website somewhere — and bring them to life by putting a human face on them.
In the past, there were two options for creating AI avatars. On the one hand, we had traditional methods of creating realistic avatars, like those seen in high-end games — think of FC24 player avatars, or even our Ainstein avatar. These kinds of avatars enabled us to have real-time interactions. However, the downside is that the development process of such an avatar is incredibly time-intensive. And while the result is impressive, it still obviously looks animated. On the other hand, we see that the results of deepfake technology have become quite realistic in the last few years. The downside here is the significant processing time it takes to generate even short video clips, making real-time interactions with deepfakes impractical.
Both approaches have upsides and downsides. However, if you know our team at Raccoons, nothing is impossible. So, we looked for a way to combine the upsides of both and eliminate the downsides. And we found the perfect solution: a highly realistic, real-time interactive AI avatar.
With the rise of generative AI, many new tools emerged, and HeyGen immediately caught our attention. Currently, they’re well-known for their high-quality AI avatars that require only a few minutes of video input to generate unlimited video output. This technology already offers many possibilities. Take, for example, instruction videos. Traditionally, creating instructional videos in multiple languages required hiring actors fluent in each language, which was both time-consuming and costly. Any changes to the script meant reshooting the entire video. With AI avatars, you can create and update these videos in 30 different languages with just a few clicks, ensuring consistency and saving significant time and resources. This allows for rapid updates and ensures that instructional content is always up-to-date across all languages.
While this technology was already impressive, HeyGen recently released something even more groundbreaking: avatar streaming technology. This new feature is currently in beta, but this technology already opens up numerous applications, certainly when combined with our expertise in building LLM-powered knowledge assistants. Imagine an HR onboarding process where a new employee can interact with a realistic, knowledgeable AI avatar, or a CEO delivering a keynote at an event where the audience can ask questions in real-time.
To illustrate the power of real-time AI avatars, we decided to create the onboarding avatar mentioned above. The first step was developing a front-end, a system showcasing the streaming avatar in a user-friendly interface. There, the avatar remains in an idle state, awaiting input, moving naturally to simulate a human waiting. Users then can start asking questions (captured with OpenAI’s Whisper speech-to-text model), which are then processed by a large language model (LLM) connected to a knowledge database. This response is then uttered by the AI avatar almost instantly — a response time similar to a human’s. Not too fast, not too slow.
We did, however, encounter two challenges in this experiment:
Overlooking these hurdles, however, we achieved a strong end result. Our onboarding avatar can now guide new hires through their first day, answering questions and providing information more engagingly than, for example, an intranet portal with FAQs. This could help employees acclimate more quickly and enable them to ask any question without hesitation or obstacles.
The potential of AI avatars extends beyond our onboarding experiment. We already see many possibilities in various businesses and industries, including:
This list is just the tip of the iceberg. The possibilities are endless, limited only by your imagination and business needs. By partnering with experts in generative AI, you can quickly integrate AI avatars into your existing systems. And if you’re wondering about ethical and legal implications, we can assure you HeyGen technology is locked behind a range of security measures. You cannot create an avatar without someone’s literal permission.
Our experience with HeyGen and AI avatars in the Generative AI Playground has been eye-opening. The blend of deepfake realism and real-time interactivity offers endless possibilities across various sectors. And by saying this, we’re wrapping up our fifth article in our “Generative AI Playground”. However, no fear, we’re already working on our next experiment!
Written by
Daphné Vermeiren
Want to know more?