Estimators, Bias, and Variance


The field of statistics provides us with a lot of tools that may be used to attain the Machine Learning goal of resolving a task. That is not only helpful for the training set then likewise to take a broad view. Introductory concepts for example parameter estimation, bias and variance are valuable to strictly distinguish ideas of broad view, underfitting, and overfitting. In this post, we would learn about estimators, Bias, and Variance in Machine Learning.


First of all, let’s look at the point estimation.

Point Estimation

In statistics, the procedure of discovering the estimated value of some parameter for instance the mean or average of a population from random samples of the population. The correctness of any specific estimate is not identified exactly. However, probabilistic statements regarding the accuracy of such numbers as creating over several experiments may be constructed.

From a deep learning perspective, the Point Estimation is the effort to make available the single best prediction of some quantity of interest. That quantity of interest may be a single parameter. That can be a vector of parameters as weights in linear regression and a complete function.

For differentiating the estimates of parameters from their true value, a point estimate of a parameter θ is represented by ˆ θ.

Let {x(1) , x(2) ,..x (m) } be m independent and identically distributed data points. At that time a point estimator or statistic is any function of the data

ˆ θ m = g(x(1) ,…x(m) )

Therefore, a statistic is any function of the data. It requires not to be close to the true θ. The best estimator is a function whose output is close to the factual underlying θ that created the data. We adopt that the true parameter value θ is fixed on the other hand unknown. Despite the fact that the point estimate θˆ is a function of the data. Meanwhile, the data is drawn from a random process. Some function of the data is random. For that reason, θˆ is a random variable.

Function Estimation

Point estimation may also state the estimation of the link between input and target variables. We talk about these types of point estimates as function estimators. Mentioned as function estimation here we predict a variable y given input x. We adopt f(x) as the relationship between x and y. We can assume y=f(x) +ε .Here ε stands for a part of y not predictable from x. We are concerned with approximating f with a model. Function estimation is similar to estimating a parameter θ. Where ˆf is a point estimator in function space. We are similarly estimating a parameter w or estimating a function mapping from x to y in polynomial regression.

Properties of Point Estimators

Bias and Variance are the most normally studied properties of point estimators. They notify us about the estimators. The bias of an estimator is the difference between its estimates and the correct values in the data. Naturally, it is a measure of how close or far is the estimator to the actual data points. That the estimator is trying to estimate. All of those models would be trained on different sample sets Xᵢ, Yᵢ for the factual data. That resultant in their parameters taking different values of θ in an offer to explain, fit and estimate that specific sample finest.

The bias of an estimator

  • The bias of an estimator for parameter θ is well-defined as;

bias ˆ θ( m ) = E ˆ θ m ⎡ ⎣ ⎤ ⎦ − θ

  • The estimator is unbiased if bias(ˆ θ m )=0 which implies that;

E ⎣ ˆ θ m ⎦ = θ

  • An estimator is asymptotically unbiased if

lim m→∞ bias ˆ θ ( m) = 0

Variance and Standard Error

One more property of an estimator is that how much we imagine the estimator to differ as a function of the data sample. As we computed the probability of the estimator to decide its bias so we may compute its variance. The variance of an estimator is just Var ( ) where the random variable is the training set. The square root of the variance is named the standard error, denoted SE( ).

Importance of Standard Error

It deals with how we would expect the estimate to be different as we obtain different samples from the same distribution. We can define the standard error of the mean as;

Importance of Standard Error

  • Where σ2 is the factual variance of the samples x(i)
  • The standard error is frequently estimated using an estimate of σ
  • Though not unbiased, the approximation is sensible.
  • The standard deviation is a smaller amount and underestimates than a variance.

Standard Error in Machine Learning

We repeatedly estimate generalization error by computing error on the test set. There is a number of samples in the test set that determine its accuracy. Meanwhile, the mean will be normally distributed as according to Central Limit Theorem, we can compute the probability that true expectation falls in any selected interval. For example, 95 percent confidence interval centered on the mean is

Standard Error in Machine Learning

Machine learning algorithm A is better than Machine learning algorithm B if the upper bound of A is less than the lower bound of B.

Confidence Intervals for error

95 percent confidence intervals for error estimate

Confidence Intervals for error

Trading-off Bias and Variance

Bias and Variance measure two varied bases of error of an estimator. Bias measures the estimated deviation from the factual value of the function or parameter. Variance delivers a measure of the expected deviation that any particular sampling of the data is likely to cause.