Sinkhorn Knopp: Unraveling Optimal Transport for Data Alignment

Moklesur Rahman
4 min readJun 22, 2023

In the realm of data alignment and optimal transport, the Sinkhorn Knopp algorithm has emerged as a powerful tool for solving transportation optimization problems. With applications ranging from image matching to text alignment and network analysis, Sinkhorn Knopp offers an elegant solution to the challenging task of finding optimal mappings between two sets of data points. In this blog post, we will explore the fundamentals of Sinkhorn Knopp and its significance in the field of optimal transport.

Understanding Optimal Transport: Optimal transport, also known as Wasserstein distance, is a mathematical framework that measures the minimum cost of transforming one probability distribution into another. It finds its roots in transportation planning, where it was initially used to study the efficient allocation of resources between supply and demand. However, its applications have now extended to various domains in data science and machine learning.

The Problem of Optimal Transport: Given two sets of data points with different distributions, the task of optimal transport is to find an optimal mapping that aligns the two distributions while minimizing the total transportation cost. This cost is typically defined by a distance or similarity metric between the data points.

Sinkhorn Knopp Algorithm: The Sinkhorn Knopp algorithm provides an efficient and iterative method to solve the optimal transport problem. It leverages the theory of entropy regularization and the concept of iterative scaling to compute a near-optimal solution. The algorithm takes as input two probability distributions, along with a regularization parameter that controls the trade-off between accuracy and computational efficiency.

The steps of the Sinkhorn Knopp algorithm can be summarized as follows:

  1. Initialization: Start with two non-negative matrices representing the source and target distributions.
  2. Iterative Scaling: Alternate between row and column normalization operations on the matrices until convergence. These normalization steps ensure that the matrices remain doubly stochastic, meaning that the rows and columns sum up to one.
  3. Convergence: Stop the iteration when a stopping criterion is met, such as the maximum number of iterations or a desired level of accuracy.
  4. Optimal Mapping: Extract the optimal mapping between the two distributions…



Moklesur Rahman

PhD student | Computer Science | University of Milan | Data science | AI in Cardiology | Writer | Researcher