Understanding Maximum Likelihood Estimation (MLE) in Machine Learning

Moklesur Rahman
4 min readJul 22, 2023

Maximum Likelihood Estimation (MLE) is a fundamental concept in machine learning and statistics that plays a crucial role in parameter estimation for probabilistic models. It forms the backbone of many algorithms and techniques used in modern data-driven applications. In this blog, we will explore the concept of Maximum Likelihood Estimation, its importance, and how it is applied in various machine learning scenarios.

What is Maximum Likelihood Estimation?

Maximum Likelihood Estimation is a statistical method used to estimate the parameters of a probabilistic model based on observed data. The goal of MLE is to find the set of parameter values that maximize the likelihood function, which measures the probability of observing the given data under the assumed model.

Let’s understand the key components of MLE:

1. Likelihood Function: The likelihood function, denoted by L(θ | D), is a function of the model parameters θ and the observed data D. It represents the probability of observing the data D given a specific value of the parameters θ. For independent and identically distributed (i.i.d.) data, the likelihood is often expressed as the product of the individual data point probabilities.

2. Log-Likelihood Function: In practice, it is common to work with the log-likelihood function, denoted by log L(θ | D), which is the natural logarithm of the likelihood function. Taking the logarithm makes computations easier and helps prevent numerical underflow for large datasets.

3. MLE Objective: The MLE objective is to find the parameter values θ that maximize the log-likelihood function:

θ_MLE = argmax log L(θ | D)


Let’s walk through a simple example of Maximum Likelihood Estimation using Python. We’ll create a synthetic dataset and then use MLE to estimate the parameters of a Gaussian distribution that generated the data.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generating synthetic data from a Gaussian distribution with known parameters
true_mean = 5.0
true_std = 2.0
num_samples = 100
data = np.random.normal(loc=true_mean, scale=true_std, size=num_samples)

# Visualization of the data distribution
plt.hist(data, bins=20, density=True, alpha=0.6, color='b', label='Data')…



Moklesur Rahman

PhD student | Computer Science | University of Milan | Data science | AI in Cardiology | Writer | Researcher