mdsj

Class Data


public class Data
extends java.lang.Object

Convenience class for data handling.

Method Summary

static double
distance(double[][] matrix, int i, int j)
Gives the Euclidean distances of two data points in a data matrix.
static double[][]
distanceMatrix(double[][] matrix)
Gives the complete matrix of Euclidean distances in a configuration of data points in D-dimensional Euclidean space, where D=matrix[0].length.
static void
doubleCenter(double[][] matrix)
Double-centers the matrix so that each rows and each columns sums to zero, by subtracting the row mean in each row, subtracting the column mean in each column, and adding the overall mean for each entry.
static String
format(double[][] matrix)
Print all entries of a matrix.
static int[]
landmarkIndices(double[][] matrix)
From a rectangular k x n matrix of dissimilarities, an index array of length k is computed.
static double[][]
landmarkMatrix(double[][] matrix)
From a rectangular k x n matrix of dissimilarities, a square k x k dissimilarity matrix is computed which contains just the dissimilarities among the k objects described by the k rows.
static double[][]
maxminPivotMatrix(double[][] matrix, int k)
Gives a pivot matrix for a configuration of data points in Euclidean space.
static void
multiply(double[][] matrix, double factor)
Scales each entry in a matrix by a factor.
static double
normalize(double[] x)
Normalizes a vector to have unit length
static void
normalize(double[][] x)
Scales every column of a matrix to have length one.
static double[][]
pivotRows(double[][] matrix, int k)
Given a set of dissimilarity rows, a subsample of rows is constructed which should be as representative as possible, using a greedy farthest-minimal dissimilarity approach.
static double
prod(double[] x, double[] y)
Computes inner product of two vectors of the same length, the sum of entry-wise products x[0]*y[0]+...+x[n-1]*y[n-1], where n is the minimum length of the vectors
static double[][]
randomPivotMatrix(double[][] matrix, int k)
Gives a pivot matrix for a configuration of data points in Euclidean space.
static void
randomize(double[][] matrix)
Fills a matrix with pseudo-random entries in the range -0.5 to 0.5 with uniformly probability distribution.
static void
scale(double[][] x, double[][] D)
Scales a configuration such that the sum of the corresponding distances equals the sum of the input dissimilarities
static void
selfprod(double[][] d, double[][] result)
Computes the self product of a matrix d with its transpose d'.
static void
squareDoubleCenter(double[][] matrix)
Squares each entry of a matrix and then double-centers it.
static void
squareEntries(double[][] matrix)
Squares each entry in a matrix.

Method Details

distance

public static double distance(double[][] matrix,
                              int i,
                              int j)
Gives the Euclidean distances of two data points in a data matrix. It is computed as (sum(matrix[*][i]-matrix[*][i])^2)^(1/2), where the sum is over all row indices, indicated by *.
Parameters:
matrix - the data matrix
i - the index of a data point
j - the index of a data point
Returns:
the Euclidean distance between the two data points

distanceMatrix

public static double[][] distanceMatrix(double[][] matrix)
Gives the complete matrix of Euclidean distances in a configuration of data points in D-dimensional Euclidean space, where D=matrix[0].length.
Parameters:
matrix - high-dimensional coordinates for data pointss
Returns:
matrix of Euclidean distances among the data points

doubleCenter

public static void doubleCenter(double[][] matrix)
Double-centers the matrix so that each rows and each columns sums to zero, by subtracting the row mean in each row, subtracting the column mean in each column, and adding the overall mean for each entry.
Parameters:
matrix - the matrix to be double-centered

format

public static String format(double[][] matrix)
Print all entries of a matrix. Each row in the matrix gets a space-separated line representation.
Parameters:
matrix - the matrix to be printed
Returns:
a string representation of the matrix

landmarkIndices

public static int[] landmarkIndices(double[][] matrix)
From a rectangular k x n matrix of dissimilarities, an index array of length k is computed. The entry at position i/code> is the original index in the range [0,n-1] for which the i-th row in the original matrix stands.
Parameters:
matrix - a rectangular k x n dissimilarity matrix
Returns:
array of indices of length k

landmarkMatrix

public static double[][] landmarkMatrix(double[][] matrix)
From a rectangular k x n matrix of dissimilarities, a square k x k dissimilarity matrix is computed which contains just the dissimilarities among the k objects described by the k rows.
Parameters:
matrix -
Returns:
a square dissimilarity matrix

maxminPivotMatrix

public static double[][] maxminPivotMatrix(double[][] matrix,
                                           int k)
Gives a pivot matrix for a configuration of data points in Euclidean space. The result is a subset of columns of the matrix of Euclidean distances. A point whose columns is to be included is called pivot. The pivots are selected with a maxmin strategy. The first pivot is selected randomly; the i-th pivot is selected to maximize the minimum Euclidean distance to the i-1 pivots selected so far.
Parameters:
matrix - coordinates of the points
k - number of pivots
Returns:
the pivot matrix of Euclidean distances

multiply

public static void multiply(double[][] matrix,
                            double factor)
Scales each entry in a matrix by a factor.
Parameters:
matrix - the matrix
factor - the scaling factor

normalize

public static double normalize(double[] x)
Normalizes a vector to have unit length
Parameters:
x - a vector
Returns:
the vector's former length

normalize

public static void normalize(double[][] x)
Scales every column of a matrix to have length one.
Parameters:
x - A matrix

pivotRows

public static double[][] pivotRows(double[][] matrix,
                                   int k)
Given a set of dissimilarity rows, a subsample of rows is constructed which should be as representative as possible, using a greedy farthest-minimal dissimilarity approach. The number of rows to be selected must not be larger than the number of rows present in the input matrix, i.e., k<=matrix.length. The indices in the second dimension (i.e., within every row) are the actual ones. The indices in the first dimension are meaningless, but the original index corresponding to a row i may be determined by checking for which j the equality d[i][j]=0 holds.
Parameters:
matrix - a collection of dissmilarity rows
k - the number of rows to be selected
Returns:
a subset of the original dissimilarity rows

prod

public static double prod(double[] x,
                          double[] y)
Computes inner product of two vectors of the same length, the sum of entry-wise products x[0]*y[0]+...+x[n-1]*y[n-1], where n is the minimum length of the vectors
Parameters:
x - a vector
y - a vector
Returns:
the inner product of the vectors

randomPivotMatrix

public static double[][] randomPivotMatrix(double[][] matrix,
                                           int k)
Gives a pivot matrix for a configuration of data points in Euclidean space. The result is a subset of columns of the matrix of Euclidean distances. A point whose column is to be included is called pivot. The pivots are selected at random; it is possible for a point to be selected as pivot more than once.
Parameters:
matrix - coordinates of the points
k - number of pivots
Returns:
the pivot matrix of Euclidean distances

randomize

public static void randomize(double[][] matrix)
Fills a matrix with pseudo-random entries in the range -0.5 to 0.5 with uniformly probability distribution. This is useful for initializing the iterative computation of eigenvectors
Parameters:
matrix - the matrix to be filled

scale

public static void scale(double[][] x,
                         double[][] D)
Scales a configuration such that the sum of the corresponding distances equals the sum of the input dissimilarities
Parameters:
x - configuration matrix
D - square matrix of input dissimilarities

selfprod

public static void selfprod(double[][] d,
                            double[][] result)
Computes the self product of a matrix d with its transpose d'.
Parameters:
d - the matrix
result - the (symmetric) product dd'

squareDoubleCenter

public static void squareDoubleCenter(double[][] matrix)
Squares each entry of a matrix and then double-centers it.
Parameters:
matrix - the matrix

squareEntries

public static void squareEntries(double[][] matrix)
Squares each entry in a matrix.
Parameters:
matrix - the matrix