This post will point you in a direction that will help explain how to add punctuation to audio file transcriptions. As a bonus we will also capitalize the text.
I have a series of blog posts that discuss how to transcribe audio files (podcasts) to text using the Vosk open source framework and python. You can check out the entire series here. The output of the Vosk transcription is a text file of the spoken words in the audio file. One problem with the generated text file is that it is all lowercase and no punctuation.
One of my readers has extended the python code I’ve published with his own work that will capitalize and punctuate the text. His post describes how to add punctuation to audio file transcriptions. This is a German website but I found the translation to english was very readable. The code makes use of a python library called recasepunc that is hosted on github.
I haven’t had a chance to look at this yet, but when I do I will expand this post with more how to information.