In this story we will focus on Vocabulary and Matching.

So far we’ve seen how a body of text is divided into tokens, and how individual tokens are parsed and tagged with parts of speech, dependencies and lemmas. If you have missed my previous stories, check it out here.


If you have missed Part 1. Have a look into it if you need some basic knowledge on basic installation, basic commands, tokenization , POS tagging, dependency, token attributes, spans and sentences.


The first step in creating a `Doc` object is to break down the incoming text into component pieces…

spaCy ( is an open-source Python library that parses and “understands” large volumes of text. Separate models are available that cater to specific languages (English, French, German, etc.).

Image from

Installation and Setup

Installation is a two-step process. First, install spaCy using either conda or pip. Next, download the specific model you want, based on…

Santanu Dutta

Enthusiast Programmer , Writer and Blogger

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store