A recurrent neural network (RNN) is a type of neural network that is helpful in modeling sequence data. Derived from feedforward networks, RNNs exhibit similar behavior to how human brains function. Simply put: recurrent neural networks produce predictive leads to sequential data that other algorithms can’t. RNNs are a strong and healthy kind of neural network and belong to the foremost promising algorithms in use because it’s the sole one with an indoor memory.
Similar to numerous other deep learning algorithms, recurrent neural networks are relatively old. They were first created within the 1980s. We have seen their true potential only in recent years. A rise in computational power alongside the huge amounts of knowledge that we now need to work with, and therefore the invention of long STM (LSTM) within the 1990s, has really brought RNNs to the foreground.
Long Short Term Memory (LSTM)
Long STM networks are an addition to recurrent neural networks that essentially extend the memory. Therefore it’s compatible to find out from important experiences that have very while lags in between. The units of an LSTM are used as building units for the layers of an RNN, repeatedly called an LSTM network.
LSTMs enable RNNs to recollect inputs over an extended period of our time. This is often because LSTMs contain information during a memory, very similar to the memory of a computer. This is a great ability of the LSTM that it can read, write and delete information from its memory.
This memory is frequently seen as a gated cell, with gated meaning the cell decides whether or to not store or delete information (i.e., if it opens the gates or not), supported by the importance it assigns to the knowledge. The assigning of importance occurs overweights that too are learned by the algorithm. This simply means it learns over time what information is vital and what’s not.
In an LSTM we’ve got three gates: input, forget, and output gate. These gates define whether or to not allow new input in (input gate), delete the knowledge because it isn’t important (forget gate), or let it impact the output at the present time step (output gate).
How can we generate sequence data?
The universal thanks to generating sequence data in deep learning are to coach a network. It’s usually an RNN or a convnet to predict subsequent tokens or the next few tokens during a sequence, using the previous tokens as input. As an example, given the input “the cat is on the ma,” the network is trained to predict the target t, subsequent character. As was common when working with text data, tokens are typically words or characters, and any network which will model the probability of subsequent tokens given the previous ones is named a language model. The language model seizures the latent space of language as its statistical structure.
Once we’ve such a trained language model, we’ll sample from it (generate new sequences): we feed it an initial string of text (called conditioning data), ask it to get the subsequent character or subsequent word (we can even generate several tokens at once), add the generated output back to the input file, and repeat the method repeatedly. This loop allows us to get sequences of arbitrary length that reflect the structure of the info on which the model was trained: sequences that look almost like human-written sentences.