The Role of Machine Learning in Natural Language Processing

Machine Learning and Natural Language Processing are important subfields of Artificial Intelligence that have gained prominence in recent times. Machine Learning and Natural Language Processing play a very important part in making an artificial agent into an artificial ‘intelligent’ agent. An Artificially Intelligent system can accept better information from the environment and can act on the environment in a user-friendly manner because of the advancement in Natural Language Processing.

Similarly, an Artificially Intelligent System can process the received information and perform better predictions for its actions because of the adoption of Machine Learning techniques.

Role of Machine Learning in Natural Language Processing

Processing of natural language so that the machine can understand the natural language involves many steps. These steps include Morphological Analysis, Syntactic Analysis, Semantic Analysis, Discourse Analysis, and Pragmatic Analysis, generally, these analysis tasks are applied serially. Machine Learning acts as important value addition in almost all these processes in some form or the other. Let us try to understand how.

Morphological Analysis:

As already mentioned, the data received by the computing system is in the form of 0s and 1s. These 0s and 1s can be converted into alphabets using the ASCII code. So, it can be said that a machine receives a bunch of characters when a sentence or a paragraph has been provided to it. At the level of morphological analysis, the first task is to identify the words and the sentences. This identification is called tokenization. Many Different Machine Learning and Deep Learning algorithms have been employed for tokenization including Support Vector Machine and Recurrent Neural Network.

Once the tokenization is complete the machine has with it a bunch of words and sentences. Most of the sentences which are formed contain affixes. These affixes complicate the matter for the machines as, having a word meaning dictionary containing all the words with all its possible affixes is almost impossible. So, the next task that the morphological analysis level is removing these affixes. These affixes can be removed either using stemming or lemmatization. Machine Learning algorithms like the random forest and decision tree have been quite successful in performing the task of stemming.

Syntactic Analysis

The next task in natural language processing is to check whether the given sentence follows the grammar rule of a language. To do this the words are first tagged with their part of speech. This helps the syntactic parsers in checking the grammar rules. Machine learning and Deep learning algorithms like the random forest and the recurrent neural network has been successfully used implemented for this task. Machine learning algorithms like K- nearest neighbor have been used for implementing syntactic parsers as well.

Semantic Analysis

At this level, the word meanings are identified using word-meaning dictionaries. The problem encountered here is, the same word might have different meanings according to the context of the sentence. For example, the word ‘Bank’ might mean a Blood Bank or a Financial Bank, or even a Riverbank / Shore, this creates ambiguity. So, removing this ambiguity is one of the important tasks at this level of natural language processing called Word Sense Disambiguation.

Word sense disambiguation is one of the classical classification problems which have been researched with different levels of success. Machine learning like the random forest, gradient boosting and decision trees have been successfully employed. But, in recent times it is the deep learning algorithms like the recurrent neural network, long short-term memory based recurrent neural network, gated recurrent unit based recurrent neural network and convolution neural network have been researched and have produced very good results.

Discourse Analysis

There are instances where pronouns are used or certain subjects/objects are referred to, which are outside of the current preview of the analysis. In such cases, the semantic analysis will not be able to give proper meaning to the sentence. This is another classical problem of reference resolution which has been tackled by machine learning and deep learning algorithms.

Pragmatic Analysis

Many a time sentences convey a deeper meaning than what the words can describe. That is, the machine has to discard the word meaning understood after semantic analysis and capture the intended or the implied meaning. It is easier said than done. For many years now this is of natural language process has intrigued researchers. One of the classic examples of pragmatic analysis is sarcasm detection.

Many, in fact almost all the different machine learning and deep learning algorithms have been employed with varied success for performing sarcasm detection o for performing pragmatic analysis in general.

Wrap Up

Machine Learning gives the system the ability to learn from past experiences and examples. General algorithms perform a fixed set of executions according to what it has been programmed to do so and they do not possess the ability to solve unknown problems. And, in the real world, most of the problems faced contain many unknown variables which makes the traditional algorithms very less effective. This is where machine learning comes to the fore. With the help of past examples, a machine learning algorithm is far better equipped to handle such unknown problems.