 # Supervised Learning Algorithms

### Introduction

Supervised learning algorithms are, roughly speaking, learning algorithms that learn to associate some input with some output, given a training set of samples of inputs x and outputs y. In different cases, the outputs y could also be difficult to gather automatically and must be provided by a person’s “supervisor,” but the term still applies even when the training set targets were collected automatically.

### Probabilistic Supervised Learning

Most supervised learning algorithms are supported by estimating a probability distribution p(y | x). we will do that just by using maximum likelihood estimation to seek out the simplest parameter vector θ for a parametric family of distributions p ( y | x; θ ). We’ve already known that rectilinear regression corresponds to the family
p ( y | x ; θ ) = N ( y ; θ x , I ).
We can generalize rectilinear regression to the classification scenario by defining a special family of probability distributions. If we’ve two classes, class 0 and sophistication 1, then we’d like only to specify the probability of 1 of those classes. The probability of sophistication 1 determines the probability of sophistication 0 because these two values must add up to 1.
The normal distribution over real-valued numbers that we used for rectilinear regression is parametrized in terms of a mean. Any value we provide for this mean is valid. Distribution over a binary variable is slightly more complicated because its mean should be between 0 and 1. a method to unravel this problem is to use the logistic sigmoid function to squash the output of the linear function into the interval (0, 1) and interpret that value as a probability:
p ( y = 1 | x ; θ ) = σ (θ x).
This approach is understood as logistic regression (a somewhat strange name since we use the model for classification instead of regression). within the case of rectilinear regression, we were ready to find the optimal weights by solving the traditional equations. Logistic regression is somewhat harderthere’s no closed-form solution for its optimal weights. Instead, we must look for them by maximizing the log-likelihood. We will do that by minimizing the negative log-likelihood (NLL) using gradient descent. This similar strategy is often applied to essentially any supervised learning problem, by writing down a parametric family of contingent probability distributions over the proper quiet input and output variables. ##### Mansoor Ahmed
Mansoor Ahmed is Chemical Engineer, web developer, a writer currently living in Pakistan. My interests range from technology to web development. I am also interested in programming, writing, and reading.
Posts created 422

## Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.