NLP … What is it?
First thing I did was test this:
I got a laugh out of the fact that the first thought I had was to trick it with a double-negative. And also out of the fact that I actually thought I did trick it. Granted, it’s 53.7% but if you think things through like I eventually did you would realize that it’s actually pretty accurate. If the percentage was below 50 that would mean there’s a greater chance of the meaning actually being positive so the results of this particular sentiment analysis are pretty much in the middle. And if you think about it, “to not dislike” has a rather mediocre or neutral feel to it — not particularly good or bad. Of course I tested other, more real cases but this one stuck out to me. Pretty darn impressive … that or the fact that there is a neutral tag result and I overhyped how awesome this is.
So anyways, NLP stands for Natural Language Processing which is an interesting branch of AI. TechTarget explained the main benefit of NLP in an insightful way, which I myself interpreted to be the improvement in communication between humans and computers, which can cause the former to go on to do many intuitive and amazing things once the latter understands the human language.
Natural language processing is used for:
- Machine translation
- Text extraction or summarization, useful for large texts
- Text classification or sentiment analysis
- Speech recognition
- Automation
- Digital assistants
- Etc.
There are many real world applications of NLP functions that analyze different types of data in order to achieve a specified purpose.
However, obstacles of natural language processing are hard to address:
- Precision — since human speech is often open to interpretation and linguistic structure depends on many variables
- Tone of voice, inflection, connotation
- Evolution of language — language has not been constant throughout and history and most likely never will be
2 Key Phases in NLP
In regards to data preprocessing, raw data must be prepared in a way that allows for machines to be able to analyze it (using NLP). There different ways to do this such as:
- Tokenization — the splitting of text into tokens, or smaller units
- Stop word removal — the elimination of common words such as “the” so that key words can be used to assemble useful information about the meaning of the sentence
- Lemmatization and stemming — words are reduced to their root words such as “running” to “run”
- Part-of-speech tagging — grammar labelling
After preparation of data, algorithm development is used for analysis and training. The two main types are:
- Rules-based system — follows the set linguistic guidelines
- Machine learning-based system — use a combination of machine learning, deep learning, neural networks, and NLP algorithms to create own set of linguistic rules that best fit the large training data
2 Main Types of NLP Techniques and Methods
Syntax techniques include:
- Parsing — grammatical analysis
- Word segmentation — dividing text into meaningful units such as words or sentences
- Sentence breaking — recognition of the separation between sentences
- Morphological segmentation — words are broken up into smaller units of morphemes (similar to syllables)
- Stemming
Semantic analysis involves the consistency in meaning of ambiguous, open to interpretation words, phrases, and sentences. Techniques include:
- Word sense disambiguation — derives the meaning of a word based on context, which is extremely useful when considering the not-so-rare occurrence of words with double meanings
- Named entity recognition — recognizing the difference between words used in a different context (again double meanings)
- Natural language generation — generates new text based on the semantic analysis of text stored in a database
Earlier approaches to NLP focused more on a rules-based system, but today results are getting better and better as NLP turns to deep learning, “a type of AI that examines and uses patterns in data to improve a program’s understanding” (TargetTech). The main issue is collecting and labelling massive amounts of data in order to ensure variety in training data while maintaining validity.
Natural Language Processing has much more room to grow. The field is limitless and its applications more so. Perhaps one day they will become a part of society itself.
Sources:
- Lutkevich, B., & Burns, E. (2021, March 2). What is natural language processing? an introduction to NLP. SearchEnterpriseAI. Retrieved September 28, 2022, from https://www.techtarget.com/searchenterpriseai/definition/natural-language-processing-NLP
- Natural language processing (NLP): What is it & how does it work? MonkeyLearn. (n.d.). Retrieved September 28, 2022, from https://monkeylearn.com/natural-language-processing/