statistics
Class Initialization

java.lang.Object
  extended by statistics.Initialization

public abstract class Initialization
extends java.lang.Object

Collection of algorithms to initialize a codebook

Author:
sikoried

Nested Class Summary
static class Initialization.DensityRankingMethod
           
 
Constructor Summary
Initialization()
           
 
Method Summary
static int assignToCluster(double[] x, Density[] d)
          Compute the ID of the nearest centroid.
static int assignToCluster(double[] x, java.util.List<Density> list)
          Compute the ID of the nearest centroid.
static MixtureDensity gMeansClustering(java.util.List<Sample> data, double alpha, int maxc, boolean diagonalCovariances)
          Perform a Gaussian-means (G-means) clustering on the given data set.
static MixtureDensity hierarchicalGaussianClustering(java.util.List<Sample> data, int maxc, boolean diagonalCovariances, Initialization.DensityRankingMethod rank)
          Perform a hierarchical Gaussian clustering: Beginning with only one density, always split the cluster with highest variance in two parts, finding the new means by following the strongest eigen vector.
static MixtureDensity kMeansClustering(java.util.List<Sample> data, int nd, boolean diagonalCovariances)
          Perform a simple k-means clustering on the data.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Initialization

public Initialization()
Method Detail

assignToCluster

public static int assignToCluster(double[] x,
                                  Density[] d)
Compute the ID of the nearest centroid.

Parameters:
x -
d - Densities
Returns:

assignToCluster

public static int assignToCluster(double[] x,
                                  java.util.List<Density> list)
Compute the ID of the nearest centroid.

Parameters:
x -
list -
Returns:

gMeansClustering

public static MixtureDensity gMeansClustering(java.util.List<Sample> data,
                                              double alpha,
                                              int maxc,
                                              boolean diagonalCovariances)
                                       throws TrainingException
Perform a Gaussian-means (G-means) clustering on the given data set. The number of mixture components is learned by testing the clusters for Gaussian distribution and in case splitting clusters violating this constraint (see also Hamerly03-LTK).

Parameters:
data -
alpha - significance level in {0.1,0.05,0.025,0.01}
maxc - maximum number of clusters
Returns:
Throws:
TrainingException

hierarchicalGaussianClustering

public static MixtureDensity hierarchicalGaussianClustering(java.util.List<Sample> data,
                                                            int maxc,
                                                            boolean diagonalCovariances,
                                                            Initialization.DensityRankingMethod rank)
                                                     throws TrainingException
Perform a hierarchical Gaussian clustering: Beginning with only one density, always split the cluster with highest variance in two parts, finding the new means by following the strongest eigen vector.

Parameters:
data - List of data samples
maxc - Maximum number of clusters
diagonalCovariances -
Returns:
ready-to-use MixtureDensity
Throws:
TrainingException

kMeansClustering

public static MixtureDensity kMeansClustering(java.util.List<Sample> data,
                                              int nd,
                                              boolean diagonalCovariances)
                                       throws TrainingException
Perform a simple k-means clustering on the data. The number of cluster needs to be specified in advance.

Parameters:
data -
nd - number of clusters
Returns:
Primitive MixtureDensity
Throws:
TrainingException