Tokenization
It is a simple concept where we split a text into meaningful segments.
data:image/s3,"s3://crabby-images/67857/67857d774069d66f7686f8b722198cb8888f9686" alt=""
Similarity Using GloVe
Knowing similarity between two sentences/words helps a lot. GloVe helps in finding similarity. GloVe is an unsupervised learning algorithm for obtaining vector representations for words.
data:image/s3,"s3://crabby-images/a59ea/a59eacaf57e7c55a65c8f6a71cc261c5ed003eb6" alt=""
They are 78% similar !! WooW !
Named Entity Recognition
It is also called entity identification or entity extraction. It is a process of finding and classifying named entities existing in the given text into pre-defined categories.
data:image/s3,"s3://crabby-images/caa7b/caa7b33869feb83450f29d6a63cb008adea51bd0" alt=""
NORP stands for Nationalities or religious or political groups
Regular Expressions
It is primarily used for pattern-matching, which ensures that the data we are processing is correct or not.