Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wp-graphql domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /var/www/wp-includes/functions.php on line 6114
Python Speech To Text - Which Library To Use - SingerLinks

Python Speech To Text – Which Library To Use

How To Use Python To Convert Speech To Text

This post discusses which python libraries to use for audio transcription. The entire series of posts provides a python solution for converting speech to text.

Speech to text is one component of a larger set of capabilities called Natural Language Processing or NLP for short.

There are a number of python libraries available for speech to text conversion. The libraries themselves don’t actually do the conversion, instead they use a cloud based or local service to crunch the data.

Before we look at the choices let’s talk about requirements. I have some specific goals in mind that will drive the decision on what python tools to use.

  1. Use Python to access the NLP functionality.
  2. Available as open source or have a free non-commercial version (at least).
  3. Process large amounts of speech.

For complete python applications with a user interface check out these posts:

Goals For Converting Speech To Text

The goal is to create a text transcript from my favorite podcast (actually the best podcast in the universe) – No Agenda. Because they produce 3 hours twice a week that’s a lot of audio to transcribe and as we will see this will go beyond all of the “free” options of the major cloud services. My ultimate goal is to extract the meaning or relevant information being discussed on the podcast which will require further analysis of the text to derive it’s meaning but this is a topic for later posts. For now it is interesting to note that speech to text applications seem to divide into two categories:

  • Knowledge Extraction – determine semantic intent from natural language (my goal)
  • Conversational applications – speech bots, automated assistants, etc.

Python Libraries For Speech to Text Conversion

Let’s look at the python libraries available to us for speech to text conversion.

Python LibraryLast VersionServiceFree?
Pocket Sphinx2018CMU SphinxOpen Source
API.AI2017Google Dialog FlowTry Free
assemblyai2018AssemblyAI3 hour per month free
vosk2021Vosk (offline)open source
pywit2015wit.ai – Facebook cloud serviceFree
speechrecognition2017Multiple Services supported
IBM WatsonTry Free
Google Try Free
BingTry Free
Wit.AI (Facebook)Free
SphinxOpen Source
Houndify
Snowboy

Which Python Library To Use For Speech To Text

When you search the internet on “Python speech to text” the vast majority of blog posts will cover the “speechrecognition” python library. This library provides a common interface to a number of cloud based services shown above (IBM, Google, Bing, Facebook). It has a hardcoded test password to the Google API so you can try this out without having to create an account – easy for blog posts but not a good long term solution.

Anyway, the big three cloud services (IBM, Google, Bing) only have a “try free” option which is really just a small credit towards the use of the cloud service so trying these services is about all you can do before you start paying. If you are developing an enterprise scale app this might be a good way to go, but if you are looking for a long term “free” option this won’t work. Facebook is “free” but without reading the license terms closely, is anything from Facebook actually “free”?

The “CMU Sphinx” option looks promising as an open source option but it appears to be no longer supported according to their website so that option is out. They now refer you to another site that features the “Vosk” server and python library. The interesting thing about Vosk (and Sphinx) is that it runs offline which means you can install it on your local machine and not rely on a cloud service. Vosk itself is built on the Kaldi project which you could use by itself but this is not for the faint of heart.

Finally, the “assemblyai” python library supports the cloud based services from the AssemblyAI company. They have a free option but its only 3 hours per month but it does seem to be free forever instead of “try” free.

So given my requirements it looks like I’m going to try Vosk first. I may also try AssemblyAI to compare performance/capabilities etc. Keep reading these posts for a complete python solution!

Next Post – How To Setup A Python Environment For Vosk

The Complete App

If you want to go straight to the full solution then check out this complete python application.

Total
0
Shares
2 comments

Comments are closed.

Previous Post

Mac Version of NodeEra Now Available

Next Post

Speech To Text Python Environment Setup Using Vosk

Related Posts