The rapid advancements in the field of generative AI have truly amazed us over the past year. As AI experts, it was a challenge to keep up with new technologies and iterations of them being released daily — let alone that we could build something each time to experiment with them. Of course, we wouldn’t be real experts if we didn’t try!
To experiment with these new technologies, we established a Generative AI Playground at Raccoons. This playground allows our developers to explore their ideas and build playful yet highly valuable applications based on cutting-edge (generative) AI technology. Doing this enables us to experiment and learn, ultimately with the aim of integrating new-found insights into our client projects. By starting small, we can build a solid foundation of understanding for future use.
In the past months, we've built impressive demos illustrating how to use generative AI in a broader business setting. To inspire you, we’re launching a new insights series: The Generative AI Playground. This first part will explore how we turned incomprehensible structured data into a captivating, human-readable report. Meet our Automated Soccer Match Report, our first experiment.
Our experiment started with a straightforward idea: what if we could automatically generate match reports from the data of women’s soccer matches, a segment often neglected by traditional journalism? If we say so ourselves, it's a great idea for two reasons. First, it may bring attention to the lack of women’s sports coverage in the news, and second, because it gave us an enormous playground to operate in.
You may or may not know that soccer games generate a ton of data. Every player and the ball are full of sensors, which translate to data points. Ultimately, you get a file with a chronological order of all actions during a game. Think of individual actions, passes, fouls, cards, corners, penalties, shots on target, and more. While this data is crucial for analysis, it’s far from user-friendly.
The result is an endless scroll of data with no clear distinction between what’s essential and what’s not... A reality of working with datasets that have 70.000 lines on average. These files are created by computers, for computers, making them nearly indecipherable for humans. Let's see if computers, specifically large language models (LLMs), could help us decipher these enormous data files.
Before involving artificial intelligence, we needed to preprocess our data, as we did not need every action that happens in a game — only the essential ones. Think of it as a journalist who only notes the “big actions” and pens them down in an exciting narrative. So, we got to work and simplified the data — separating the signal from the noise — making it AI-ready. Data engineering is an essential step of any AI project, really, as it enables us to use data more efficiently.
What we’re now left with, however, is still plain data. It is already an improvement, but a critical insight is that LLMs — such as GPT-4 — excel at understanding and generating natural language. So, they are more proficient in processing textual data than structured data. Even if we were to provide the LLM with a description of each piece of structured information, it would struggle to derive meaningful insights. However, by translating this structured data into a format that the LLM can readily understand, we ensure it performs much better.
Here’s how it works:
Now, it’s time for some AI magic. We send our structured preprocessed data to the LLM, in addition to some highly engineered prompts, and the result is nothing short of remarkable. What was once an indecipherable file is transformed into a vivid match report. The result is a report that lists events and interprets them in the context of the match, much like a sports journalist.
For example, if the model notices a cluster of “shots on target” within a short period, it might interpret this as “the team is ramping up the pressure”. The LLM does more than give an objective summary of the game; it also provides insights and a narrative. It’s mind-blowing how the model can give creative insights while staying grounded in the actual events. Another example: if the team commits several errors, the system might narratively conclude that “the team was under a lot of stress, leading to a more aggressive play style”. This isn’t just a bland statement of facts; it’s an insightful, story-like interpretation.
"Rond de 28ste minuut brak chaos uit in het doel van ZW toen Schrijvers van OHL het eerste doelpunt van de wedstrijd maakte na een fout van doelman S. Bossut van ZW. De doelpunten bleven komen aan de zijde van OHL met doelpunten van Þorsteinsson in de 40ste minuut en De Norre in de 44ste minuut, waardoor de eerste helft eindigde met een comfortabele 3-0 voorsprong voor OHL.Halverwege de tweede helft werkte ZW zich eindelijk op het scorebord. Vossen van ZW transformeerde een penalty in een doelschot in de 72ste minuut, waardoor de ploeg nieuwe hoop kreeg. De spanning bleef echter aan de kant van OHL. Þorsteinsson kreeg een penalty en scoorde zijn tweede doelpunt van de wedstrijd in de 75ste minuut. Zulte Waregem behield zijn weerstand en scoorde een laatste doelpunt door Zinho Gano in de 81ste minuut. Desalniettemin werd de wedstrijd afgesloten met een 4-2 overwinning voor OHL."
While we successfully converted this data into readable match reports, we asked ourselves: How can we make this experience more interactive and engaging for the user? And who thinks of (Belgian) soccer games thinks of Jan Dewijngaert, a well-known sports journalist. What if you could ask him what he thought of the game and make an analysis? We decided to build a Jan Dewijngaert bot that could precisely answer factual questions, thanks to the data, but also provide subjective insights. We prompted the chatbot similarly to Jan's tone of voice, making the interaction engaging and genuine.
Of course, Jan has a distinct voice, so we decided to add a cherry on top. Another technology entered the playground — voice cloning! This way, users could voice-message Jan Dewijngaert with any question and get an (almost scarily) accurate answer. To explain how we did this on a more technical level, we did the following to ensure a quick answer:
The result is a smooth conversation with “the one and only” Jan Dewijngaert!
Of course, this playground experiment goes beyond soccer or chatbots — for us, it highlights the power of generative AI in transforming dry, complex data into attractive reports for any field, be it in sports, healthcare, finance, or any field overwhelmed with data yet starved for meaning. To show the possibilities, we translated it into five different use cases. Of course, this is a non-exhaustive list of what could be done with this technology.
Interesting use cases, but of course, the possibilities are endless and can be mapped on any type of data or industry. Not sure if your data is fit for this kind of project? We can take a look together — no strings attached.
By doing this experiment, we have gathered a few takeaways that we would love to share with you.
So, three key takeaways and fun conversations with Jan Dewijngaert. We couldn’t be happier with our first experiment in our generative AI playground. And we promise, there’s more to come where this one came from! Next up: function calling in a concrete use case.
Written by
Daphné Vermeiren
Want to know more?