Procedural Audio Generation

3 years ago by stales

Share this post:

Share on Bluesky Share on Twitter Share on Facebook

Procedural Audio Generation

this is part of a technical writeup, for the initial concept, see Jam Entry

The game felt a little too plain with just text boxes and multiple-choice options. In a more classic board game, you would have players read action cards aloud. This would give games a bit of flavour.

So I wanted to make the game as close to fully voice acted as possible.

Obviously, hiring voice actors is completely impractical for a game jam.

Being a programmer, I decided to stick with the profession and use code to generate audio.

To do this, I leveraged an open-source application called "MARY TTS " for voice synthesis.

This works in 3 parts, the script engine of my game would decipher the text to be shown, hash the text into a unique key, and feed it into the synthesis software. This would produce a .wav file which I could bundle with the game. At run-time all I needed to do is re-compute the hash and play the related .wav file back.

Below is a sample of the text that is output and the corresponding hash

audio snippet text:
a12f604a
After a long day player Dungeons & Dragons You find yourself bleary-eyed You consider driving home...

Overall, I am quite happy with how this turned out. The voices were much higher-quality than I expected, and I alternated between 2 different voices to try and get some variance that I would never have been able to achieve if I was to attempt voice acting it out myself. Additionally, generating the audio is much faster than attempting to record it all myself. Recording would necessitate consuming as much time as there is dialogue, skipping this step programmatically was a huge time save.

However, there was one big problem with this approach. Maybe you can notice it in the text above? Typos!

The text here should say:

After a long day playing Dungeons & Dragons. You find yourself bleary-eyed. You consider driving home...

Typos get dictated verbatim using voice synthesis, and since it can't see newlines, missing a full-stop can cause the text to run together.

The problem is that since the audio is as a hash of the text, even a single character change will make a new hash. This means every small change in the script results in large amounts of new files being output. The only way to identify which files are new/old is to re-compute the hashes for each version.

To this end, a large number of typos which were found, and kept in the game. Some of them are fixed in the audio output (with the text retained to preserve the hash). This is an unfortunate consequence, but not game-breaking, so it was an adequate trade-off to save considerable time.

Drivers & Dragons

Road Safety Week Game Jam 2022

Add Game To Collection

Status	Released
Author	stales

Car bumps
May 13, 2022
11th hour
May 13, 2022
Sprites and Javascript
May 12, 2022
Scripts and Quests
May 12, 2022
Boardgame.io
May 12, 2022
Jam Entry
May 12, 2022

Procedural Audio Generation

Procedural Audio Generation

Drivers & Dragons

More posts