This post describes how to setup Python for spaCy. The various “how to” posts on spaCy programming in Python depend on this setup. This is part of the overall spaCy Python tutorial.
Python Environment Setup
The setup instructions assume a Windows 10 or newer environment. I’m using native python approaches so you should be able to replicate this on Mac/Linux environments without too much change.
Step 1 – create a python installation
Download and install the latest python windows version from python.org which at this time was 3.10.7. If you have an existing python installation it will probably be fine as long as it is 3.x but no guarantees.
Step 2 – create a python environment
I use the venv command to create python virtual environments. To do this in Windows 10 start a command line window and enter the python command shown below. This will create a python virtual environment based on your current python installation. Now you can install python libraries in the virtual environment and keep them away from other python projects you might be working on. Run the command below:
python -m venv C:\Users\xxx\pyenv\spacy
On my system I created a folder call “pyenv” in my user folder (xxx will be your windows userid). I then created a folder called “spacy” for these projects.
Step 3 – activate the NLP python environment
When “venv” created the python virtual environment it created an “activate” batch file. In your command line window run the following command:
cd C:\Users\xxx\pyenv\spacy C:\Users\xxx\pyenv\spacy\scripts\activate.bat
This will “activate” the python virtual environment. You will notice the command prompt changes to include the environment name. You must activate the environment before completing the rest of the setup tasks.
To make life easier, put the 2 lines above into a batch file “spacy.bat” and save the file to your “c:\users\xxx” folder. This is the folder location initially displayed when you start the command prompt window. Now you can simply open the command prompt window, enter “spacy” and the batch file will change directories to the python environment and run the activate script. At this point you are ready to work.
Step 4 – Create a batch file that starts Idle
Idle is a simple python GUI that ships with python. I will use Idle to run the tutorial python scripts. You can enter the following line at the command prompt to start Idle or you can put this in a file called “idle.bat” and save it to the root of your python environment – C:\Users\xxx\pyenv\spacy\idle.bat
python -m idlelib.idle
Now you can simply enter “idle” to start an “Idle” session that is running in the “spacy” python environment.
Step 5 – Install python libraries
There are two python libraries we need to display sentence diagrams and the text highlights with entity types.
Pip install svglib Pip install tkinterweb
- svglib is used to read and write the image file generated by displaCy for the sentence diagram
- tkinterweb is used to display the HTML generated by displayCy that formats the text with entity types
Step 8 – Install spaCy
spaCy is the python library that does all the heavy lifting. It is installed with a simple pip command.
pip install spacy
Step 8 – Install spaCy Language Models
spaCy uses trained language “models” to analyse your text. There are many models for different languages and purposes. To see a list of the language models supported go to this link: Models and Languages. Find the language you want and click on the link in the column labeled “Packages”. This will take you to another page that shows the packages you can download for that language. In my case, I’m using the “small” english package called “en_core_web_sm”. Once you have identified the name of the package you are ready to install.
spaCy provides a download command that will download and install a language model as a package in your python environment. Run the following command to install
python -m spacy download en_core_web_sm
You can also use “pip” to install language models. For more advanced information on how to install spaCy language models go to this link: Download and Install
That’s it! This setup will allow you to begin experimenting with spaCy for natural language processing or NLP. In the following post we will present the python code used to accomplish basic tasks in spaCy.
The Complete App
If you want to go straight to the full solution then check out this complete python application.
This post describes how to setup the python environment for spaCy and the spaCy NLP Workbench.
That’s it for setting up the Python environment and installing the NLP Workbench code. Check out the detail posts if you want to walk through how the Python code works.A