### Neural Network Structure

We already know that training a neural network revolves around the following objects:

- Layers, that are joined into a network or model.
- The input data and consistent targets.
- The loss function describes the feedback signal used for learning.
- The optimizer, which determines how learning proceed.

**We can imagine their interaction as;**

**The network:** composed of layers that are bound together, maps the input data to predictions. **The loss function:** formerly compares these forecasts to the targets, producing Loss value: a measure of how well the network’s predictions match what was expected. **The optimizer:** practices this loss value to inform the network’s weights.

### Layers: the building blocks of deep learning

The important data structure in neural networks is the layer. A layer is a data-processing module. That receipts as input one or more tensors and that output one or more tensors. Several layers are stateless. But extra commonly layers have a state. One or some tensors learned with stochastic gradient origin layer’s weights that contain the network’s information.

Different layers are suitable for different types of tensor formats and data processing. Simple vector data that is kept in samples and features of 2D tensors of shape is frequently processed by thickly linked layers. They are also called fully connected or dense layers. Sequence data stored in samples, time steps, and features of 3D tensors of shape is normally processed by recurrent layers. Image data that is kept in 4D tensors is usually processed by 2D difficulty layers. We may think of layers as the LEGO bricks of deep learning. That is a metaphor, which is made clear by frameworks like Keras. Models of deep-learning build in Keras. Those are done by cutting composed well-matched layers to make useful data-transformation pipelines. The idea of layer compatibility now states exactly the fact that every layer would only receive input tensors of a certain shape. It will return output tensors of a certain shape.

Example:

from Keras import layerslayer = layers. Dense (32, input_shape= (784,))

We have been making a layer that would merely accept as input 2D tensors where the first dimension is 784. Thus any value would be accepted i.e. axis 0, the batch dimension is unspecified. This layer would yield a tensor. Result of Tensor where the first dimension has been transformed to be 32. This layer might only be related to a downstream layer that imagines 32- dimensional vectors as its input. We don’t have to worry about compatibility when using Keras due to the layers we add to our models are vigorously built to match the shape of the incoming layer.

Example: Assume we write the following

from Keras import modelsfrom Keras import layersmodel = models.Sequential()model.add(layers.Dense(32, input_shape=(784,)))model.add(layers.Dense(32))The second layer didn’t obtain an input shape argument. As an alternative, it automatically concluded its input form as being the output form of the layer that came before.

### Models: networks of layers

Focused and acyclic graph of layers become in a deep-learning model. The most shared instance is a linear stack of layers. That is mapping a single input to a single output. We’ll be bare to a much wider diversity of network topologies, as we would have to move forward. Certain common ones include the following:

- Two-branch networks
- Multihead networks
- Inception blocks

The topology of a network describes a hypothesis space. Choosing the right network architecture is additional an art than a science. There are certain finest practices and principles, we can rely on. The only practice that may help us become good neural-network architects.

### Loss functions and optimizers

We still have to select two more things once the network building is defined:

**Loss function:**This is also known as the objective function. It is the quantity that will be reduced during training. It signifies an amount of success for the task at hand.**Optimizer:**It controls how the network would be efficiently founded on the loss function.

An exact variant of stochastic gradient descent (SGD) is being implemented by it.

One neural network that has many outputs may have multiple loss functions. The gradient-origin process must be based on a single scalar loss value. All losses are shared into a single scalar quantity for multi-loss networks.

This is very important to choose the right objective function for the right problem. Our network will take some shortcuts it can to minimize the loss.

Our network will end up doing things we may not have wanted if the objective doesn’t fully correlate with success for the task at hand. Choose the objective wisely, or we’ll have to face unintentional side effects. There are normal guidelines we may follow to choose the correct loss.