Normalization

1. L2 (or Euclidean) normalization

Firstly, the dataset X is assumed to be zero-centered, which means that the average of all the points in X is zero. Next, the L2 normalization is applied to each point in X. The L2 normalization involves dividing each point in X by the Euclidean distance of that point from the origin (which is zero, since the dataset is zero-centered). This transformation causes each point to lie on the surface of a hypersphere with a unit radius.

where ||𝑥𝑖 || is the Euclidean distance of the point 𝑥𝑖 from the origin.

The hypersphere is centered at the origin, which is defined as having a value of zero for each feature in the dataset. All the points on the surface of the hypersphere have the same distance from the origin, which is one unit (since the radius of the hypersphere is one).

Overall, the L2 normalization of a zero-centered dataset transforms each point in X to lie on the surface of a hypersphere centered at the origin, with a radius of one unit.

“Normalizing a dataset leads to a projection where the existing relationships are kept only in terms of angular distance.” - [1]

Explanation of the above statement[1]:

Normalizing a dataset involves transforming the values of the dataset so that they have a common scale or range. When this is done, the relative magnitudes of the different features of the dataset are preserved, but their absolute magnitudes are changed.

When a dataset is normalized using the L2 (or Euclidean) norm, each point in the dataset is transformed to lie on the surface of a hypersphere with unit radius and centered at the origin. This transformation only preserves the relative magnitudes of the features, but not their absolute magnitudes. However, it preserves the relationships between the features only in terms of their angular distance from each other.

What this means is that the cosine of the angle between two points in the normalized dataset is proportional to their correlation. This is because the cosine of the angle between two vectors is a measure of their similarity, and in the case of normalized data, this similarity is based solely on their relative directions. Thus, the angular distance between any two points in the normalized dataset is a good measure of their similarity or dissimilarity, and the cosine of the angle between them is a measure of their correlation.

In summary, when a dataset is normalized, the relationships between the features are preserved only in terms of their angular distance from each other. This means that the relative orientation of the data points in the dataset is preserved, while their absolute magnitudes are lost. The cosine of the angle between two points in the normalized dataset is proportional to their correlation.

Terms:

1. Relative magnitudes: Refers to the relationship between the magnitudes of different features or variables in a dataset. In the context of normalization, relative magnitudes are pre[1]served, meaning that the relationships between the features are maintained after normalization.

2. Absolute magnitudes: Refers to the actual numerical values of the features or variables in a dataset. In the context of normalization, absolute magnitudes are changed, meaning that the values of the features are transformed to a common scale or range.

3. Angular distance: Refers to the angle between two vectors in a high-dimensional space. In the context of normalization, the angular distance between two points in the dataset is a measure of their similarity or dissimilarity, and is used to preserve the relative orientation of the data points.

4. Relative directions: Refers to the direction of a vector relative to the origin or another vector in a high-dimensional space. In the context of normalization, the direction of each point in the dataset is preserved, while their absolute magnitudes are transformed.

2. Whitening

Whitening is a preprocessing step that is often used in machine learning to normalize and decorrelate the features of a dataset. The goal of whitening is to transform the data so that it has zero mean and unit variance, and the features are uncorrelated with each other.

The whitening process involves two main steps. First, the dataset is normalized by subtracting the mean of each feature from all data points. This results in a zero-centered dataset. Next, the covariance matrix of the dataset is computed, and its eigenvectors and eigenvalues are calculated. The eigenvectors represent the directions of maximum variance in the data, while the eigenvalues represent the amount of variance along those directions.

Advantages of whitening

1. Decorrelation: Whitening decorrelates the features of a dataset, which can improve the performance and stability of many machine learning algorithms.

2. Normalization: Whitening also normalizes the features of a dataset, which can be useful in cases where the different features have very different scales or units.

3. Invariance to orthogonal transformations: Whitening is invariant to orthogonal transformations, which means that any orthogonal transformation induced by a matrix P will also be whitened. This can be useful when applying different types of transformations to the data, such as rotations or reflections.

Disadvantages of whitening

1. Increased dimensionality: Whitening can increase the dimensionality of the dataset, which can make it more difficult to visualize or work with. In particular, the whitening process can create new features that are not present in the original data.

2. Sensitivity to outliers: Whitening can be sensitive to outliers in the data, which can cause the covariance matrix to become unstable or ill-conditioned.

3. Computational complexity: Whitening can be computationally expensive for large datasets, as it involves computing the covariance matrix and performing an eigenvalue decomposition. This can make it difficult to apply in real-time or online settings.