Meta-learning is a sub-branch of machine learning. In this subfield of machine learning, automatic learning algorithms are implemented to metadata about machine learning experiments.
In this article, we will learn in-depth about LSTM meta-learner. That what is the key idea to using such metadata to know? And how automatic learning may become flexible in solving learning difficulties.
The LSTM meta-learner is a kind of meta-learning. It has two stages:
- Meta-learner: In this stage, the model emphasizes learning general knowledge through different tasks.
- Base learner: In this stage, the model tries to enhance to learn parameters for a task-specific objective
The main idea of the LSTM meta-learner is to train an LSTM cell. That is to learn an update rule for the innovative task. An LSTM cell will be used as the meta learner in meta-learning framework terms. However, task-specific objectives, for example, dog breed classification, would be the base learner.
Why will we use an LSTM cell?
The cell-state update in LSTM is parallel to a gradient-based update in backpropagation. It can be used to learn the update rule of the base learner objective.
We can better understand that LSTMs store information history with numerous gates, as shown in the above diagram. We are similarly conscious that there are many differences in stochastic gradient descent, for example, momentum, RMS prop, Adam, and many more. These basically store information about past learning to allow better optimization.
As a result, an LSTM cell may be believed as a better optimizing plan. It allows the model to seizure the knowledge of together the short term of a specific task and the common long term.
- There is a resemblance between the gradient-based update in backpropagation and the cell-state update in LSTM.
- Understanding the history of gradients advantages the gradient update; deliberate about how momentum works.
The update for the learner’s parameters at time step t with a learning rate αt is:
Θt = θ t−1−αt∇θt−1Lt
It has a similar form as the cell state update in LSTM if we set;
- forget gate ft=1,
- input gate it=αt,
- cell state ct=θt, and
- new cell state c~t=−∇θt−1Lt:
The training process describes what occurs during the test. Meanwhile, it has been shown to be helpful in Matching Networks. We first trial a dataset D = (Drain, Detest)∈D^meta-train and then sample mini-batches out of Dtrain to update θ for T rounds throughout every training epoch. The last state of the learner parameter θT is used to train the meta-learner. That is on the test data Dtest.
There are two applications details to pay additional care to:
- How to compress the parameter space in LSTM meta-learner? As the meta-learner is modeling parameters of another neural network. It would have hundreds of thousands of variables to learn. Next, the idea of input parameters through directs.
- The meta-learner assumes that the loss Lt and the gradient ∇θt−1Lt are independent to shorten the training process.