The pretrained network is a common and highly effective approach to deep learning on small image datasets. A pretrained network may be a saved network that was previously trained on an outsized dataset, typically on a large-scale image classification task. If this original dataset is bigger enough and general enough, then the spatial hierarchy of features learned by the pre-trained network can effectively act as a generic model of the visual world, and hence its features can prove useful for several different computer-vision problems, albeit these new problems may involve completely different classes than those of the first task.
As an example, we would possibly train a network on ImageNet (where classes are mostly animals and everyday objects) then repurpose this trained network for something as remote as identifying furniture items in images. Such portability of learned features across different problems may be a key advantage of deep learning compared to several older, shallow-learning approaches, and it makes deep learning very effective for small-data problems. during this case, let’s consider an outsized convnet trained on the ImageNet dataset (1.4 million labeled images and 1,000 different classes). ImageNet contains many animal classes, including different species of cats and dogs, and we’ll thus expect to perform well on the dogs-versus-cats classification problem.
We’ll use the VGG16 architecture, developed by Karen Simonyan and Andrew Zisserman in 2014; it’s an easy and widely used convnet architecture for ImageNet. 1 Although it’s an older model, faraway from the present state of the art and somewhat heavier than many other recent models, I chose it because its architecture is analogous to what we’re already conversant in and is straightforward to know without introducing any new concepts. this might be our first encounter with one of these cutesy model names— VGG, ResNet, Inception, Inception-ResNet, Xception, then on; we’ll get used to them because they’re going to come up frequently if you retain doing deep learning for computer vision. Feature extraction and fine-tuning are two ways to use a pre-trained network.