Home 2021 Python Speech To Text – Which Library To Use

Notice: Function wpdb::prepare was called incorrectly. The query does not contain the correct number of placeholders (6) for the number of arguments passed (4). Please see Debugging in WordPress for more information. (This message was added in version 4.8.3.) in /var/www/wp-includes/functions.php on line 6078

904 views

Python Speech To Text – Which Library To Use

July 19, 2023

How To Use Python To Convert Speech To Text

This post discusses which python libraries to use for audio transcription. The entire series of posts provides a python solution for converting speech to text.

Speech to text is one component of a larger set of capabilities called Natural Language Processing or NLP for short.

There are a number of python libraries available for speech to text conversion. The libraries themselves don’t actually do the conversion, instead they use a cloud based or local service to crunch the data.

Before we look at the choices let’s talk about requirements. I have some specific goals in mind that will drive the decision on what python tools to use.

Use Python to access the NLP functionality.
Available as open source or have a free non-commercial version (at least).
Process large amounts of speech.

Goals For Converting Speech To Text

The goal is to create a text transcript from my favorite podcast (actually the best podcast in the universe) – No Agenda. Because they produce 3 hours twice a week that’s a lot of audio to transcribe and as we will see this will go beyond all of the “free” options of the major cloud services. My ultimate goal is to extract the meaning or relevant information being discussed on the podcast which will require further analysis of the text to derive it’s meaning but this is a topic for later posts. For now it is interesting to note that speech to text applications seem to divide into two categories:

Knowledge Extraction – determine semantic intent from natural language (my goal)
Conversational applications – speech bots, automated assistants, etc.

Python Libraries For Speech to Text Conversion

Let’s look at the python libraries available to us for speech to text conversion.

Python Library	Last Version	Service	Free?
Pocket Sphinx	2018	CMU Sphinx	Open Source
API.AI	2017	Google Dialog Flow	Try Free
assemblyai	2018	AssemblyAI	3 hour per month free
vosk	2021	Vosk (offline)	open source
pywit	2015	wit.ai – Facebook cloud service	Free
speechrecognition	2017	Multiple Services supported
		IBM Watson	Try Free
		Google	Try Free
		Bing	Try Free
		Wit.AI (Facebook)	Free
		Sphinx	Open Source
		Houndify
		Snowboy

Which Python Library To Use For Speech To Text

When you search the internet on “Python speech to text” the vast majority of blog posts will cover the “speechrecognition” python library. This library provides a common interface to a number of cloud based services shown above (IBM, Google, Bing, Facebook). It has a hardcoded test password to the Google API so you can try this out without having to create an account – easy for blog posts but not a good long term solution.

Anyway, the big three cloud services (IBM, Google, Bing) only have a “try free” option which is really just a small credit towards the use of the cloud service so trying these services is about all you can do before you start paying. If you are developing an enterprise scale app this might be a good way to go, but if you are looking for a long term “free” option this won’t work. Facebook is “free” but without reading the license terms closely, is anything from Facebook actually “free”?

The “CMU Sphinx” option looks promising as an open source option but it appears to be no longer supported according to their website so that option is out. They now refer you to another site that features the “Vosk” server and python library. The interesting thing about Vosk (and Sphinx) is that it runs offline which means you can install it on your local machine and not rely on a cloud service. Vosk itself is built on the Kaldi project which you could use by itself but this is not for the faint of heart.

Finally, the “assemblyai” python library supports the cloud based services from the AssemblyAI company. They have a free option but its only 3 hours per month but it does seem to be free forever instead of “try” free.

So given my requirements it looks like I’m going to try Vosk first. I may also try AssemblyAI to compare performance/capabilities etc. Keep reading these posts for a complete python solution!

Next Post – How To Setup A Python Environment For Vosk

The Complete App

If you want to go straight to the full solution then check out this complete python application.

2 comments

Comments are closed.

Sign Up for Our Newsletters

Get Notified of new posts. No spam, No marketing.

2021
NodeEra

Mac Version of NodeEra Now Available

July 19, 2023

12 views

Speech To Text Python Environment Setup Using Vosk

July 19, 2023

2.6K views

Hand-Picked Top-Read Stories

How To Create A Block Font Using FontForge

The Gilpin Tramway Ore Car – History and Models

The Gilpin Tramway Caboose – History and Models

Trending Tags