Prototyping an AI presentation assistant

The challenge

For one of our clients, we built a presentation assistant that uses artificial intelligence techniques to evaluate training videos and give targeted feedback. Could this be the end of dull speeches and sleep-inducing videos?

Integrating multiple AI domains in one single project

Some time ago, we were invited for a meeting with an international chemicals group that showed a strong interest in the possibilities of artificial intelligence. They were, however, not sure about what they could do with it. As a service company, we are used to demonstrating the endless possibilities with a sales presentation. However, what came next surprised us: “Why don’t you build something to convince us?”. And so we did.

This shoot first, ask questions later-approach is exactly how companies should deal with innovative technologies, in our opinion. At Raccoons, proving the value of new technologies in a cost and time-effective manner is a challenge we love to tackle. After some brief introductions, we started thinking. For chemical companies, there are a lot of opportunities, like for instance, using predictive analytics to improve quality and reduce downtime in production lines.

We went a little further... A little more out-of-the-box. Eventually, we landed on an application that allowed us to demonstrate multiple domains of AI in one single project without having to scour the clients’ systems for relevant training data.

An international chemical company has to comply with a lot of rules and safety regulations. New employees get up-to-speed with these regulations by watching multiple instruction videos. How much attention they pay, could mean the difference between life and death. So, those videos mustn’t be too sleep-inducing… which meant we needed a judge. A presentation assistant.

The assistant

To rate our clients’ videos, we built a presentation assistant that can evaluate a speech and give targeted feedback. To evaluate a speech, you not only need to focus on the content of the speech but also on how it is brought by the speaker. Therefore, we need algorithms that can process images, audio, and text.

In the end, we designed a web-based application in which users can upload a video of anybody giving a speech, and the assistant gives them feedback by generating a number that humans can understand.

Take, for example, our emotions detection algorithms. These algorithms take in a picture, audio or text, and give as output a series of numbers between 0 and 1 for different emotions (Anger, Sadness, Happiness).

It is then up to humans to determine the cutoff where certain emotions are inappropriate. Some amount of cheer is of course perfectly fine, but you don’t want a serious technical video to contain large amounts of emotion. The result: a super-objective AI judge that can tell you whether your video is engaging enough.

Want to know more?

Related cases

Bridge crack detection using deep learning for MOW Vlaanderen

In this project, we validated the use of deep learning for automated crack detection using drones.

Developing a generative AI-driven study buddy for Acco