NLP Pipeline - Part 3: Modeling; Deep Learning

In the previous blog post about modeling, I covered traditional machine learning algorithms in NLP. Recent developments in artificial intelligence have led to the emergence of a new field called deep learning. Deep learning is a subset of machine learning that utilizes neural networks that mimic the way human neurons connect to each other in the brain. This structure allows the algorithm to gain insight into complex patterns within data. One of the main downsides of using deep learning is that due to its inherent complexity, deep learning requires a substantial amount of data. Regardless, numerous different types of structures are beneficial to an NLP model. Some of these structures include Transformer Networks, Gated Recurrent Units, Deep Belief Networks, and Generative Adversarial Networks. However, some of the most popular are Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). In this post, I will focus on CNNs and RNNs.

Convolutional Neural Networks (CNNs)

CNNs were originally developed for image classification. The way they work on images is by applying filters over the image throughout multiple layers. The deeper the CNN goes into its layers, the more complex the filters get. The purpose of the filter is to block out information in the image that is not relevant. For example, if a CNN is trying to identify a human face in a picture of someone standing in a forest, the first layer might filter out the ground, the second layer might filter out the trees, the third layer might filter out the sky, and the last layer might filter out the rest of the human body. After this process, it will be able to determine which features of the image are most important to identify the face.

This process of filtering can be applied to text for text classification or sentiment analysis. Essentially, the CNN can filter out the words that aren’t important to understanding the overall text, treating words like groups of pixels in an image.

Recurrent Neural Networks (RNNs)

RNNs are sequential models that take the previous step’s output as input. This functionality makes it good for tasks such as text generation. Essentially, an RNN would continuously predict the next word to form a full sentence. For example, if given “Hello” as an input, rather than filling out the sentence to be “Hello, how are you?”, it would output “Hello, how” before coming to the conclusion that the next word after “Hello, how” would likely be “Hello, how are”, repeating this process until it comes to a conclusion.

Obviously, depending on its training data, an RNN will make a different prediction for a given sentence. For example, if it were trained purely on interactions between strangers, then it would be more likely to guess “name” if given “What is your” as an input compared to interactions between close friends, where it might predict something more personal due to the fact that “What is your name?” would likely never show up in this particular dataset. However, with enough training, RNNs are able to make good predictions on sequences that never even show up in training data. This ability is called generalization, where the RNN is able to apply patterns it learns from other word sequences to new sequences it’s never seen before.

One downside of using RNNs is that after many different inputs in a sequence, it may lose the information from an early input. This issue is known as the vanishing gradient problem. Advanced variations of RNNs such as Long Short-Term Memory Networks and Gated Recurrent Units exist to resolve this issue.

In the next post, I will cover model evaluation, the final step in the NLP Pipeline.