Independently distributed and identically distributed (i.i.d)

From Virtual Reality, Augmented Reality Wiki
Jump to: navigation, search

Introduction

In machine learning, independently distributed and identically distributed (i.i.d). This is a statistical assumption that data points in a dataset are independently drawn using the same probability distribution.

Mathematical Definition

If we have a dataset of n samples denoted by X = [x1, xn], and a probability distribution function (pdf] denoted P, then the i.i.d assumption is that:

  • Each sample xi comes from the distribution P.
  • The samples are identically distributed. This means that they are drawn from the exact same distribution.

Applications in Machine Learning

The i.i.d assumption can be used in supervised learning algorithms like linear regression and neural network. It can also be used in unsupervised learning algorithms like k-means Clustering or principal Component Analysis.

Machine learning algorithms often use the i.i.d assumption because it allows them to use powerful mathematical tools like the central limit theory and the Law of Large Numbers, which allow them to draw inferences about the underlying distribution.

Explain Like I'm 5 (ELI5)

In machine learning, I.i.d means that all data is identical and independent of each others. Each candy is identical and you can only eat one at a given time. This idea is used to help machine learning understand patterns in data.