Python Speech To Text – Which Library To Use

by John Singer June 15, 2021 at 8:09 am

How To Use Python To Convert Speech To Text

This post begins my effort to implement speech to text conversion using python. Speech to text is one component of a larger set of capabilities called Natural Language Processing or NLP for short.

There are a number of python libraries available for speech to text conversion. The libraries themselves don’t actually do the conversion, instead they use a cloud based or local service to crunch the data.

Before we look at the choices let’s talk about requirements. I have some specific goals in mind that will drive the decision on what python tools to use.

  1. Use Python to access the NLP functionality.
  2. Available as open source or have a free non-commercial version (at least).
  3. Process large amounts of speech.

Goals For Converting Speech To Text

The goal is to create a text transcript from my favorite podcast (actually the best podcast in the universe) – No Agenda. Because they produce 3 hours twice a week that’s a lot of audio to transcribe and as we will see this will go beyond all of the “free” options of the major cloud services. My ultimate goal is to extract the meaning or relevant information being discussed on the podcast which will require further analysis of the text to derive it’s meaning but this is a topic for later posts. For now it is interesting to note that speech to text applications seem to divide into two categories:

  • Knowledge Extraction – determine semantic intent from natural language (my goal)
  • Conversational applications – speech bots, automated assistants, etc.

Python Libraries For Speech to Text Conversion

Let’s look at the python libraries available to us for speech to text conversion.

Python LibraryLast VersionServiceFree?
Pocket Sphinx2018CMU SphinxOpen Source
API.AI2017Google Dialog FlowTry Free
assemblyai2018AssemblyAI3 hour per month free
vosk2021Vosk (offline)open source
pywit2015wit.ai – Facebook cloud serviceFree
speechrecognition2017Multiple Services supported
IBM WatsonTry Free
Google Try Free
BingTry Free
Wit.AI (Facebook)Free
SphinxOpen Source
Houndify
Snowboy

Which Python Library To Use For Speech To Text

When you search the internet on “Python speech to text” the vast majority of blog posts will cover the “speechrecognition” python library. This library provides a common interface to a number of cloud based services shown above (IBM, Google, Bing, Facebook). It has a hardcoded test password to the Google API so you can try this out without having to create an account – easy for blog posts but not a good long term solution.

Anyway, the big three cloud services (IBM, Google, Bing) only have a “try free” option which is really just a small credit towards the use of the cloud service so trying these services is about all you can do before you start paying. If you are developing an enterprise scale app this might be a good way to go, but if you are looking for a long term “free” option this won’t work. Facebook is “free” but without reading the license terms closely, is anything from Facebook actually “free”?

The “CMU Sphinx” option looks promising as an open source option but it appears to be no longer supported according to their website so that option is out. They now refer you to another site that features the “Vosk” server and python library. The interesting thing about Vosk (and Sphinx) is that it runs offline which means you can install it on your local machine and not rely on a cloud service. Vosk itself is built on the Kaldi project which you could use by itself but this is not for the faint of heart.

Finally, the “assemblyai” python library supports the cloud based services from the AssemblyAI company. They have a free option but its only 3 hours per month but it does seem to be free forever instead of “try” free.

So given my requirements it looks like I’m going to try Vosk first. I may also try AssemblyAI to compare performance/capabilities etc. Stay tuned for more posts on this topic!

Next Post – How To Setup A Python Environment For Vosk

Add Comment