spaCy: Industrial-strength Natural Language Processing


spaCy is a library for advanced Natural Language Processing in Python and Cython. It’s built on the very latest research and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It’s commercial open-source software, released under the MIT license.


  • Fastest syntactic parser in the world
  • Named entity recognition
  • Non-destructive tokenization
  • Support for 20+ languages
  • Pre-trained statistical models and word vectors
  • Easy deep learning integration
  • Part-of-speech tagging
  • Labelled dependency parsing
  • Syntax-driven sentence segmentation
  • Built-in visualizers for syntax and NER
  • Convenient string-to-hash mapping
  • Export to numpy data arrays
  • Efficient binary serialization
  • Easy model packaging and deployment
  • State-of-the-art speed
  • Robust, rigorously evaluated accuracy


Bug fixes

  • Fix issue #1507#1512#1513#1514#1516: Improve new documentation and list of backwards incompatibilities.
  • Fix issue #1515: Correct print statement in example.
  • Fix issue #1518: Make Vectors.resize work as expected.
  • Fix conda build.

Other changes

  • Add text examples for Hindi.


Leave a Reply

Your email address will not be published. Required fields are marked *