<- all articles

The Generative AI Playground - Part 5: Bye deepfakes, hello AI avatars

Daphné Vermeiren

One of the areas that we have not discussed in our Generative AI Playground series yet is the power of AI in video. We thought it was time to show what generative AI can do in this field, so we have been testing different use cases over the last few months. In this fifth article of our Generative AI Playground series, we’re going to dive into how we use deepfake technology to create interactive, hyperrealistic AI avatars.

New to our Generative AI Playground? We’ve already discussed report generation, function calling, multi-agent GPT systems and visual search in our previous parts.

Deepfake technology often gets a bad rap, as its primarily associated with fake news and misinformation. However, the technology behind it can be very useful when used for the right purposes. At Raccoons, we prefer to use its power for positive, valuable use cases. Enter AI avatars—virtual personas that can speak any language, deliver training, and even provide real-time personal assistance. The goal? To bridge the gap between realistic human interaction and the practical efficiency of AI. We’re going beyond the chatbots we know today — restricted to a chat widget in a corner of a website somewhere — and bring them to life by putting a human face on them.

Creating realistic AI avatars

Traditional avatars

In the past, there were two options for creating AI avatars. On the one hand, we had traditional methods of creating realistic avatars, like those seen in high-end games — think of FC24 player avatars, or even our Ainstein avatar. These kinds of avatars enabled us to have real-time interactions. However, the downside is that the development process of such an avatar is incredibly time-intensive. And while the result is impressive, it still obviously looks animated. On the other hand, we see that the results of deepfake technology have become quite realistic in the last few years. The downside here is the significant processing time it takes to generate even short video clips, making real-time interactions with deepfakes impractical.

Both approaches have upsides and downsides. However, if you know our team at Raccoons, nothing is impossible. So, we looked for a way to combine the upsides of both and eliminate the downsides. And we found the perfect solution: a highly realistic, real-time interactive AI avatar.

HeyGen

With the rise of generative AI, many new tools emerged, and HeyGen immediately caught our attention. Currently, they’re well-known for their high-quality AI avatars that require only a few minutes of video input to generate unlimited video output. This technology already offers many possibilities. Take, for example, instruction videos. Traditionally, creating instructional videos in multiple languages required hiring actors fluent in each language, which was both time-consuming and costly. Any changes to the script meant reshooting the entire video. With AI avatars, you can create and update these videos in 30 different languages with just a few clicks, ensuring consistency and saving significant time and resources. This allows for rapid updates and ensures that instructional content is always up-to-date across all languages.

While this technology was already impressive, HeyGen recently released something even more groundbreaking: avatar streaming technology. This new feature is currently in beta, but this technology already opens up numerous applications, certainly when combined with our expertise in building LLM-powered knowledge assistants. Imagine an HR onboarding process where a new employee can interact with a realistic, knowledgeable AI avatar, or a CEO delivering a keynote at an event where the audience can ask questions in real-time.

Let’s experiment

To illustrate the power of real-time AI avatars, we decided to create the onboarding avatar mentioned above. The first step was developing a front-end, a system showcasing the streaming avatar in a user-friendly interface. There, the avatar remains in an idle state, awaiting input, moving naturally to simulate a human waiting. Users then can start asking questions (captured with OpenAI’s Whisper speech-to-text model), which are then processed by a large language model (LLM) connected to a knowledge database. This response is then uttered by the AI avatar almost instantly — a response time similar to a human’s. Not too fast, not too slow.

We did, however, encounter two challenges in this experiment:

  1. Beta technology — As we already mentioned, HeyGen’s streaming avatars are still in beta release. The company is still refining this technology to make interactions smoother. We are expecting a major update in June, increasing server capacity and enhancing performance.
  2. Response time — While HeyGen’s speech generation is almost instant, LLMs can take a moment to formulate response. To manage this, we incorporate predefined phrases like “Hmm, let me think…” to maintain a natural conversation flow. Additionally, newer models like OpenAI’s GPT-4o or Google’s Gemini 1.5 Pro/Flash promise faster processing times, potentially reducing delays significantly.

Overlooking these hurdles, however, we achieved a strong end result. Our onboarding avatar can now guide new hires through their first day, answering questions and providing information more engagingly than, for example, an intranet portal with FAQs. This could help employees acclimate more quickly and enable them to ask any question without hesitation or obstacles.

Beyond the experiment: Endless use cases

The potential of AI avatars extends beyond our onboarding experiment. We already see many possibilities in various businesses and industries, including:

  • Event interactions: A virtual CEO at a corporate event can answer real-time questions from the audience, providing a dynamic and innovative experience. A real conversation starter, to say the least.
  • Training simulations: AI avatars can simulate realistic role-play scenarios for sales training, medical consultations, and more — enhancing learning and preparedness. This can significantly improve the quality of training by providing a safe space to practice difficult conversations and skills.
  • Customer service: AI avatars can handle customer inquiries in a more personalized and engaging manner, providing a human touch to automated responses. This goes beyond the chat widget we know (and love or hate) and adds a new dimension to conversational, intelligent agents.
  • E-learning and education: AI avatars can serve as virtual tutors, offering real-time explanations and interactive lessons to students, making learning more engaging and effective.
  • Retail and e-commerce: Virtual shopping assistants can help customers find products, answer questions, and provide recommendations, enhancing the online shopping experience.
  • Entertainment and media: AI avatars can host virtual events, interviews, or shows, providing an engaging experience for audiences without the need for physical presence. This can expand the reach of entertainment and media content, making it accessible to a wider audience.

This list is just the tip of the iceberg. The possibilities are endless, limited only by your imagination and business needs. By partnering with experts in generative AI, you can quickly integrate AI avatars into your existing systems. And if you’re wondering about ethical and legal implications, we can assure you HeyGen technology is locked behind a range of security measures. You cannot create an avatar without someone’s literal permission.

What’s next?

Our experience with HeyGen and AI avatars in the Generative AI Playground has been eye-opening. The blend of deepfake realism and real-time interactivity offers endless possibilities across various sectors. And by saying this, we’re wrapping up our fifth article in our “Generative AI Playground”. However, no fear, we’re already working on our next experiment!

Written by

Daphné Vermeiren

Want to know more?

Related articles