This post will describe how to diagram a sentence using spaCy and Python. We will look at a standalone Python application that takes a piece of text and produces a diagram (image) of the sentence. This application provides a full graphical user interface using Tkinter (included with Python). The code presented here is much more detailed than you will find in other blog posts.
If you are interested in an application that provides lots of spaCy functionality, I recommend the following two posts.
Sentence Diagram Using spaCy
The following is a screenshot from the application…
The sentence “spaCy can create an image file that contains a diagram of a sentence.” is hardcoded into the application. You can easily change the text and rerun the program to get a different diagram.
Python Code
The following is the complete code for the application.
'''
Copyright 2022 SingerLinks Consulting
This file is part of spaCyWorkbench.
spaCyWorkbench is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
spaCyWorkbench is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with NLPSpacy. If not, see <https://www.gnu.org/licenses/>.
'''
'''
WinSentenceDiagram is a standalone application that demonstrates how to display a sentence diagram generated by spaCy.
The sentence text is hard coded for simplicity.
For a more complete solution look at spaCyWorkbench.py
'''
try:
from tkinter import *
from tkinter import ttk
from tkinter.ttk import *
except ImportError:
from Tkinter import *
from Tkinter import ttk
from Tkinter.ttk import *
from pathlib import Path
import PIL
from PIL import ImageTk
from svglib.svglib import svg2rlg
from reportlab.graphics import renderPM
import spacy
from spacy import displacy
class WinSentenceDiagram(Tk):
def __init__(self):
super().__init__()
self.title("spaCy Sentence Diagram")
self.tab3 = ttk.Frame(self)
self.diagramCanvas = Canvas(self.tab3)
#vertical scrollbar
self.vsbDiagram = Scrollbar(self.tab3, orient=VERTICAL)
self.diagramCanvas.config(yscrollcommand=self.vsbDiagram.set)
self.vsbDiagram.config(command=self.diagramCanvas.yview)
# horizaontal scrollbar
self.hsbDiagram = Scrollbar(self.tab3, orient=HORIZONTAL)
self.hsbDiagram.config(command=self.diagramCanvas.xview)
self.diagramCanvas.config(xscrollcommand=self.hsbDiagram.set)
# grid the widgets
self.tab3.grid(row = 0, column = 0, padx = 5, pady = 5, sticky=NSEW)
self.diagramCanvas.grid(row = 0, column = 0, padx = 5, pady = 5, sticky=NSEW)
self.vsbDiagram.grid(row = 0, column = 1, padx = 5, pady = 5, sticky=NS)
self.hsbDiagram.grid(row = 1, column = 0, padx = 5, pady = 5, sticky=EW)
# handle resizing
self.grid_columnconfigure(0, weight=1)
self.grid_rowconfigure(0, weight=1)
self.tab3.grid_columnconfigure(0, weight=1)
self.tab3.grid_rowconfigure(0, weight=1)
# sentence text
self.text = "spaCy can create an image file that contains a diagram of a sentence."
# load spaCy model
self.loadSpacyModel()
# create the doc object
self.createDoc()
# generate the diagram image and display it
self.generateDiagram()
# size the window
self.geometry("{}x{}".format(int(self.winfo_screenwidth()*.8), int(self.winfo_screenheight()*.7)))
def loadSpacyModel(self):
'load the SpaCy Model'
self.NLP = None
try:
modelName = "C:\\Users\\jsing\\pyenv\\NLP\\Lib\\site-packages\\en_core_web_sm\\en_core_web_sm-3.2.0"
self.NLP = spacy.load(modelName)
except Exception as e:
print("Error loading SpaCy pipeline:{} - {}".format(modelName, e))
def createDoc(self):
'''
Create a spacy document object from the selected text in the raw text area.
'''
self.doc = None
try:
# create the document object
self.doc = self.NLP(self.text)
except Exception as e:
print("Error Creating Doc Object - {}".format(e))
def generateDiagram(self):
# convert the iterator to a list and get the first sentence span in the document object
self.sentence = list(self.doc.sents)[0]
# generate svg diagram
mySVG = displacy.render(self.sentence, style="dep")
# save svg as a png image file
output_path = Path('temp.svg')
output_path.open("w", encoding="utf-8").write(mySVG)
drawing = svg2rlg('temp.svg')
renderPM.drawToFile(drawing, "temp.png", fmt="PNG")
# get the image file and make load it into the frame
with PIL.Image.open('temp.png') as img:
pimg = ImageTk.PhotoImage(img)
self.diagramCanvas.create_image(0,0,anchor='nw',image=pimg)
self.diagramCanvas.configure(scrollregion=self.diagramCanvas.bbox("all"))
# position to the bottom of the diagram
self.diagramCanvas.yview_moveto('1.0')
'''start the app running'''
if __name__ == "__main__":
app = WinSentenceDiagram()
app.mainloop()
You can copy the code above into Idle and run it. But first you need to do two things.
- Setup a python virtual environment with the appropriate libraries installed. You can find the instructions to do this here.
- Change the modelName variable to point to the spaCy model you installed (described in the post linked above).
Code Details
Let’s go through the code in detail.
Imports
try:
from tkinter import *
from tkinter import ttk
from tkinter.ttk import *
except ImportError:
from Tkinter import *
from Tkinter import ttk
from Tkinter.ttk import *
This is the import for Tkinter. Tkinter is the graphical user interface provided with Python. I’m not going to spend any time on Tkinter code as there are many good tutorials on the web and I don’t really want to write a Tkinter tutorial. The try: block attempts to import the latest version of Tk and if that fails (i.e. you have an old version of Python) the except: block will import the older version.
from pathlib import Path
import PIL
from PIL import ImageTk
from svglib.svglib import svg2rlg
from reportlab.graphics import renderPM
import spacy
from spacy import displacy
Here we have the imports needed for spaCy and the diagramming utility called “dispaCy”.
- pathlib – used to create path objects to save image files
- PIL and ImageTk – PIL is the “Python Imaging Library”. ImageTk is a module of PIL that converts standard image files to image files that Tkinter can handle.
- svglib and svg2rlg – we need this library to convert svg files to reportlabs graphic files.
- reportlab and renderPM – we need this to convert the reportlab graphic file to PNG.
- spacy – this is the spacy module that does all the heavy lifting.
- displacy – this is the spacy module that generates the sentence diagram.
WinSentenceDiagram Class
class WinSentenceDiagram(Tk):
def __init__(self):
# lots of tkinter code
.......
# sentence text
self.text = "spaCy can create an image file that contains a diagram of a sentence."
# load spaCy model
self.loadSpacyModel()
# create the doc object
self.createDoc()
# generate the diagram image and display it
self.generateDiagram()
Without going into a lot of detail, the WinSentenceDiagram class is the top level Tk object that represents the main window of the UI. The __init__ method is called to initialize the object. I left out the first part with is a bunch of tkinter code that defines all the window widgets.
Next you see the “self.text” variable set to a sentence text. You can replace this sentency with any text you want. This is what will be diagramed.
Next “self.loadSpaceModel()” calls a method to load the spaCy language model.
Next “self.createDoc()” calls a method to create the spaCy document. This is what analyzes the text.
Next “generateDiagram()” calls a method to actually generate the diagram from the analyzed text.
Load The spaCy Model
def loadSpacyModel(self):
'load the SpaCy Model'
self.NLP = None
try:
modelName = "C:\\Users\\jsing\\pyenv\\NLP\\Lib\\site-packages\\en_core_web_sm\\en_core_web_sm-3.2.0"
self.NLP = spacy.load(modelName)
except Exception as e:
print("Error loading SpaCy pipeline:{} - {}".format(modelName, e))
This code loads the spaCy language model you installed as a part of the setup process.
The “modelName” variable is set to the folder that contains the language model.
The “self.NLP” variable is set to the loaded language model using the spacy.load method.
An exception block will display a message if any error occurs.
Now we can analyze the text in “self.text” using the spaCy pipeline (language model) stored in “self.NLP”.
Create The spaCy Document
def createDoc(self):
'''
Create a spacy document object from the selected text in the raw text area.
'''
self.doc = None
try:
# create the document object
self.doc = self.NLP(self.text)
except Exception as e:
print("Error Creating Doc Object - {}".format(e))
Now we will create the spaCy Document object which causes spaCy to perform it’s analysis of the text (self.NLP(self.text)) with all the results saved in the document object (“self.doc”).
An exception block will display a message if any error occurs.
Generate The Diagram Using displaCy
def generateDiagram(self):
# convert the iterator to a list and get the first sentence span in the document object
self.sentence = list(self.doc.sents)[0]
# generate svg diagram
mySVG = displacy.render(self.sentence, style="dep")
# save svg as a png image file
output_path = Path('temp.svg')
output_path.open("w", encoding="utf-8").write(mySVG)
drawing = svg2rlg('temp.svg')
renderPM.drawToFile(drawing, "temp.png", fmt="PNG")
# get the image file and make load it into the frame
with PIL.Image.open('temp.png') as img:
pimg = ImageTk.PhotoImage(img)
self.diagramCanvas.create_image(0,0,anchor='nw',image=pimg)
self.diagramCanvas.configure(scrollregion=self.diagramCanvas.bbox("all"))
# position to the bottom of the diagram
self.diagramCanvas.yview_moveto('1.0')
Finally we get to the part where we generate the diagram of the sentence stored in self.text. In order to display this using Tkinter we have to take a rather strange route. I’ll describe each step:
self.sentence = list(self.doc.sents)[0] –
And Finally…
This post looked at the code that illustrates how to diagram a sentence using spaCy and Python. We also looked at a python library for displaying HTML in a frame. This post is a part of a series of spaCy how to posts that I encourage you to look at.