How To Diagram A Sentence Using spaCy and Python

This post will describe how to diagram a sentence using spaCy and Python. We will look at a standalone Python application that takes a piece of text and produces a diagram (image) of the sentence. This application provides a full graphical user interface using Tkinter (included with Python). The code presented here is much more detailed than you will find in other blog posts.

If you are interested in an application that provides lots of spaCy functionality, I recommend the following two posts.

Sentence Diagram Using spaCy

The following is a screenshot from the application…

WinSpacyDiagram Application
WinSpacyDiagram Application

The sentence “spaCy can create an image file that contains a diagram of a sentence.” is hardcoded into the application. You can easily change the text and rerun the program to get a different diagram.

Python Code

The following is the complete code for the application.

'''
Copyright 2022 SingerLinks Consulting

This file is part of spaCyWorkbench.
spaCyWorkbench is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.
spaCyWorkbench is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.
You should have received a copy of the GNU General Public License
    along with NLPSpacy. If not, see <https://www.gnu.org/licenses/>.
'''
'''
WinSentenceDiagram is a standalone application that demonstrates how to display a sentence diagram generated by spaCy.
The sentence text is hard coded for simplicity.
For a more complete solution look at spaCyWorkbench.py
'''
try:
    from tkinter import *
    from tkinter import ttk
    from tkinter.ttk import *
except ImportError:
    from Tkinter import *
    from Tkinter import ttk
    from Tkinter.ttk import *


from pathlib import Path
import PIL
from PIL import ImageTk 
from svglib.svglib import svg2rlg
from reportlab.graphics import renderPM

import spacy
from spacy import displacy

class WinSentenceDiagram(Tk):
     
    def __init__(self):
         
        super().__init__()
        self.title("spaCy Sentence Diagram")
        
        self.tab3 = ttk.Frame(self)
        self.diagramCanvas = Canvas(self.tab3)
        #vertical scrollbar
        self.vsbDiagram = Scrollbar(self.tab3, orient=VERTICAL)
        self.diagramCanvas.config(yscrollcommand=self.vsbDiagram.set)
        self.vsbDiagram.config(command=self.diagramCanvas.yview)        
        # horizaontal scrollbar
        self.hsbDiagram = Scrollbar(self.tab3, orient=HORIZONTAL)
        self.hsbDiagram.config(command=self.diagramCanvas.xview)     
        self.diagramCanvas.config(xscrollcommand=self.hsbDiagram.set)
        
        # grid the widgets
        self.tab3.grid(row = 0, column = 0, padx = 5, pady = 5, sticky=NSEW)
        self.diagramCanvas.grid(row = 0, column = 0, padx = 5, pady = 5, sticky=NSEW)
        self.vsbDiagram.grid(row = 0, column = 1, padx = 5, pady = 5, sticky=NS)
        self.hsbDiagram.grid(row = 1, column = 0, padx = 5, pady = 5, sticky=EW)    
        # handle resizing
        self.grid_columnconfigure(0, weight=1)
        self.grid_rowconfigure(0, weight=1)
        self.tab3.grid_columnconfigure(0, weight=1)
        self.tab3.grid_rowconfigure(0, weight=1)    
        
        # sentence text
        self.text = "spaCy can create an image file that contains a diagram of a sentence."
        
        # load spaCy model
        self.loadSpacyModel()
        
        # create the doc object
        self.createDoc()
        
        # generate the diagram image and display it
        self.generateDiagram()
        
        # size the window
        self.geometry("{}x{}".format(int(self.winfo_screenwidth()*.8), int(self.winfo_screenheight()*.7)))

    def loadSpacyModel(self):
        'load the SpaCy Model'
        self.NLP = None
        try:
            modelName = "C:\\Users\\jsing\\pyenv\\NLP\\Lib\\site-packages\\en_core_web_sm\\en_core_web_sm-3.2.0"
            self.NLP = spacy.load(modelName)
        except Exception as e:
            print("Error loading SpaCy pipeline:{} - {}".format(modelName, e))  
        
    def createDoc(self):
        '''
        Create a spacy document object from the selected text in the raw text area.
        '''
        self.doc = None
        try:
            # create the document object
            self.doc = self.NLP(self.text)
        except Exception as e:
            print("Error Creating Doc Object - {}".format(e))
        
    def generateDiagram(self):

        # convert the iterator to a list and get the first sentence span in the document object
        self.sentence = list(self.doc.sents)[0]
        # generate svg diagram
        mySVG = displacy.render(self.sentence, style="dep")
        # save svg as a png image file
        output_path = Path('temp.svg')
        output_path.open("w", encoding="utf-8").write(mySVG)
        drawing = svg2rlg('temp.svg')        
        renderPM.drawToFile(drawing, "temp.png", fmt="PNG")
        # get the image file and make load it into the frame
        with PIL.Image.open('temp.png') as img:
            pimg = ImageTk.PhotoImage(img)
        self.diagramCanvas.create_image(0,0,anchor='nw',image=pimg)
        self.diagramCanvas.configure(scrollregion=self.diagramCanvas.bbox("all"))
        # position to the bottom of the diagram
        self.diagramCanvas.yview_moveto('1.0')



'''start the app running'''
if __name__ == "__main__":
    app = WinSentenceDiagram()
    app.mainloop()

You can copy the code above into Idle and run it. But first you need to do two things.

  • Setup a python virtual environment with the appropriate libraries installed. You can find the instructions to do this here.
  • Change the modelName variable to point to the spaCy model you installed (described in the post linked above).

Code Details

Let’s go through the code in detail.

Imports

try:
    from tkinter import *
    from tkinter import ttk
    from tkinter.ttk import *
except ImportError:
    from Tkinter import *
    from Tkinter import ttk
    from Tkinter.ttk import *

This is the import for Tkinter. Tkinter is the graphical user interface provided with Python. I’m not going to spend any time on Tkinter code as there are many good tutorials on the web and I don’t really want to write a Tkinter tutorial. The try: block attempts to import the latest version of Tk and if that fails (i.e. you have an old version of Python) the except: block will import the older version.

from pathlib import Path
import PIL
from PIL import ImageTk 
from svglib.svglib import svg2rlg
from reportlab.graphics import renderPM

import spacy
from spacy import displacy

Here we have the imports needed for spaCy and the diagramming utility called “dispaCy”.

  • pathlib – used to create path objects to save image files
  • PIL and ImageTk – PIL is the “Python Imaging Library”. ImageTk is a module of PIL that converts standard image files to image files that Tkinter can handle.
  • svglib and svg2rlg – we need this library to convert svg files to reportlabs graphic files.
  • reportlab and renderPM – we need this to convert the reportlab graphic file to PNG.
  • spacy – this is the spacy module that does all the heavy lifting.
  • displacy – this is the spacy module that generates the sentence diagram.

WinSentenceDiagram Class

class WinSentenceDiagram(Tk):
     
    def __init__(self):
        # lots of tkinter code
        .......
        # sentence text
        self.text = "spaCy can create an image file that contains a diagram of a sentence."
        
        # load spaCy model
        self.loadSpacyModel()
        
        # create the doc object
        self.createDoc()
        
        # generate the diagram image and display it
        self.generateDiagram()

Without going into a lot of detail, the WinSentenceDiagram class is the top level Tk object that represents the main window of the UI. The __init__ method is called to initialize the object. I left out the first part with is a bunch of tkinter code that defines all the window widgets.

Next you see the “self.text” variable set to a sentence text. You can replace this sentency with any text you want. This is what will be diagramed.

Next “self.loadSpaceModel()” calls a method to load the spaCy language model.

Next “self.createDoc()” calls a method to create the spaCy document. This is what analyzes the text.

Next “generateDiagram()” calls a method to actually generate the diagram from the analyzed text.

Load The spaCy Model

    def loadSpacyModel(self):
        'load the SpaCy Model'
        self.NLP = None
        try:
            modelName = "C:\\Users\\jsing\\pyenv\\NLP\\Lib\\site-packages\\en_core_web_sm\\en_core_web_sm-3.2.0"
            self.NLP = spacy.load(modelName)
        except Exception as e:
            print("Error loading SpaCy pipeline:{} - {}".format(modelName, e))  

This code loads the spaCy language model you installed as a part of the setup process.

The “modelName” variable is set to the folder that contains the language model.

The “self.NLP” variable is set to the loaded language model using the spacy.load method.

An exception block will display a message if any error occurs.

Now we can analyze the text in “self.text” using the spaCy pipeline (language model) stored in “self.NLP”.

Create The spaCy Document

    def createDoc(self):
        '''
        Create a spacy document object from the selected text in the raw text area.
        '''
        self.doc = None
        try:
            # create the document object
            self.doc = self.NLP(self.text)
        except Exception as e:
            print("Error Creating Doc Object - {}".format(e))

Now we will create the spaCy Document object which causes spaCy to perform it’s analysis of the text (self.NLP(self.text)) with all the results saved in the document object (“self.doc”).

An exception block will display a message if any error occurs.

Generate The Diagram Using displaCy

    def generateDiagram(self):

        # convert the iterator to a list and get the first sentence span in the document object
        self.sentence = list(self.doc.sents)[0]
        # generate svg diagram
        mySVG = displacy.render(self.sentence, style="dep")
        # save svg as a png image file
        output_path = Path('temp.svg')
        output_path.open("w", encoding="utf-8").write(mySVG)
        drawing = svg2rlg('temp.svg')        
        renderPM.drawToFile(drawing, "temp.png", fmt="PNG")
        # get the image file and make load it into the frame
        with PIL.Image.open('temp.png') as img:
            pimg = ImageTk.PhotoImage(img)
        self.diagramCanvas.create_image(0,0,anchor='nw',image=pimg)
        self.diagramCanvas.configure(scrollregion=self.diagramCanvas.bbox("all"))
        # position to the bottom of the diagram
        self.diagramCanvas.yview_moveto('1.0')

Finally we get to the part where we generate the diagram of the sentence stored in self.text. In order to display this using Tkinter we have to take a rather strange route. I’ll describe each step:

self.sentence = list(self.doc.sents)[0] –

And Finally…

This post looked at the code that illustrates how to diagram a sentence using spaCy and Python. We also looked at a python library for displaying HTML in a frame. This post is a part of a series of spaCy how to posts that I encourage you to look at.

Total
0
Shares
Previous Post

Python Setup for spaCy

Next Post

How To Format Named Entities Using spaCy and Python

Related Posts