public abstract class AbstractSOMLibSparseInputData extends java.lang.Object implements InputData
InputData
. Sub-classes have to
implement constructors and methods to read input vectors and create an InputData
object, for example by
reading from a file or a database.Modifier and Type | Field and Description |
---|---|
protected SOMLibClassInformation |
classInfo
Any class label information attached to the input vectors.
|
protected ContentType |
contentType
The content type of the vectors ("text", "audio", ...).
|
java.lang.String[] |
dataNames
The label/name of the vector.
|
protected int |
dim
The dimension of the input vectors, i.e.
|
private double[][] |
distanceMatrix
A matrix containing the pairwise distances between two vectors.
FIXME: use LeightWeightMemoryInputVectorDistanceMatrix instead |
protected static java.lang.String |
ERROR_MESSAGE_FILE_FORMAT_CORRUPT |
private double[] |
extremes
holds the computed results of
getMinValue() and getMaxValue() |
protected int |
featureMatrixCols
Column dimension of the feature matrix before having been vectorized to input vector.
|
protected int |
featureMatrixRows
Row dimension of the feature matrix before having been vectorized to input vector.
|
private double[][] |
intervals
holds the computed results of
getDataIntervals() |
protected boolean |
isNormalized
Indicates whether the input data has been normalised.
|
protected cern.colt.matrix.impl.DenseDoubleMatrix1D |
meanVector
The mean of all the input vectors.
|
protected double |
mqe0 |
protected java.util.LinkedHashMap<java.lang.String,java.lang.Integer> |
nameCache
A mapping from the name to the index of an input vector, for faster access.
|
protected int |
numVectors
The number of vectors in this input data collection.
|
protected java.util.Random |
rand |
protected java.lang.String |
source
Where this input data was read from, e.g.
|
protected TemplateVector |
templateVector
A
TemplateVector attached to this input data. |
private double[][] |
transformedVectors
A transformation of the input vectors.
|
inputFileNameSuffix, MISSING_VALUE
Modifier | Constructor and Description |
---|---|
protected |
AbstractSOMLibSparseInputData() |
protected |
AbstractSOMLibSparseInputData(boolean norm,
java.util.Random random) |
protected |
AbstractSOMLibSparseInputData(java.lang.String[] dataNames,
int dim,
boolean norm,
java.util.Random rand,
TemplateVector tv,
SOMLibClassInformation clsInfo) |
Modifier and Type | Method and Description |
---|---|
private boolean |
assertEqual(java.lang.Object name,
java.lang.Object i1,
java.lang.Object i2) |
SOMLibClassInformation |
classInformation()
Gets the class info associated with this input data.
|
static AbstractSOMLibSparseInputData |
create(InputDatum[] inputData,
SOMLibClassInformation classInfo) |
int |
dim()
Gets the dimension of the input data.
|
boolean |
equals(java.lang.Object obj) |
InputDatum[] |
getByNameDistanceSorted(double[] vector,
java.util.Collection<java.lang.String> inputNames,
DistanceMetric metric)
Retrieves the
InputDatum corresponding to the given input names, and sorted by their distance to the
given vector. |
ContentType |
getContentType()
Gets the content type.
|
double[][] |
getData()
Return the input data as a double array, i.e.
|
double[][] |
getData(java.lang.String className)
Returns the vectors of all inputs associated with the given class name
|
double[][] |
getDataIntervals()
Return the min and max values for each feature, in a matrix of dim x 2
|
java.lang.String |
getDataSource()
returns the name/URI/etc.
|
double[][] |
getDistanceMatrix() |
java.util.ArrayList<InputDistance> |
getDistances(int inputIndex,
DistanceMetric metric)
Returns the distances to the index of the given vector of the dataset.
|
java.util.Hashtable<java.lang.Integer,java.lang.Integer> |
getFeatureDensities()
Returns feature densities statistics of the input data, namely a mapping from the number of input objects a
specific feature is not zero in, to the total number of features with that density .
|
int |
getFeatureMatrixColumns()
Gets the number of columns before vectorisation.
|
int |
getFeatureMatrixRows()
Gets the number of rows before vectorisation.
|
static java.lang.String |
getFileNameSuffix() |
static java.lang.String |
getFormatName() |
InputDatum |
getInputDatum(java.lang.String label)
Get an input datum with a specified label.
|
InputDatum[] |
getInputDatum(java.lang.String[] labels)
Returns an array of input data with the specified labels.
|
int |
getInputDatumIndex(java.lang.String label) |
java.lang.String |
getLabel(int index)
Return the label of the input vector at the given index.
|
java.lang.String[] |
getLabels()
Returns an array containing the labels of all the input data.
|
double |
getMaxValue()
Return the maximum value in the input data.
|
cern.colt.matrix.DoubleMatrix1D |
getMeanVector()
Gets the mean vector of the input vectors.
|
cern.colt.matrix.DoubleMatrix1D |
getMeanVector(java.lang.String[] labels)
Returns mean vector of specified vectors provided by String[] array.
|
double |
getMinValue()
Return the minimum value in the input data.
|
SmallestElementSet<InputDistance> |
getNearestDistances(int inputIndex,
int neighbours,
DistanceMetric metric) |
InputDatum[] |
getNearestN(double[] vector,
DistanceMetric metric,
int number)
Retrieves the given number of
InputDatum that are closest to the given vector. |
InputDatum[] |
getNearestN(int inputIndex,
DistanceMetric metric,
int number)
Returns the n nearest input vectors for the index of the given vector of the dataset.
|
InputDatum[] |
getNearestNUnsorted(int inputIndex,
DistanceMetric metric,
int number) |
private InputDatum[] |
getNNearest(java.util.ArrayList<InputDistance> distances) |
private InputDatum[] |
getNNearest(int number,
java.util.ArrayList<InputDistance> distances) |
InputDatum |
getRandomInputDatum(int iteration,
int numIterations)
Gets a random input sample from the input data set.
|
void |
initDistanceMatrix(DistanceMetric metric)
Calculates the
distanceMatrix - careful, this is a lengthy process and should be done only if needed. |
boolean |
isNormalizedToUnitLength()
Indicates whether this data set has been normalised to the unit length.
|
int |
numVectors()
Gives the size of this input data set.
|
void |
setClassInfo(SOMLibClassInformation classInfo) |
void |
setTemplateVector(TemplateVector templateVector)
Sets the template vector to be associated with this input data.
|
TemplateVector |
templateVector()
Gets the template vector associated with this input data.
|
void |
transformValues(DistanceMetric metric)
Calculates the matrix of
transformedVectors using DistanceMetric.transformVector(double[]) of
the given metric. |
clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getInputDatum, getInputVector, getValue, mqe0, subset
protected static final java.lang.String ERROR_MESSAGE_FILE_FORMAT_CORRUPT
protected java.lang.String source
protected SOMLibClassInformation classInfo
public java.lang.String[] dataNames
protected ContentType contentType
An input file should use the following header format for content types:
$DATA_TYPE text
or
$DATA_TYPE audio-rp
protected int featureMatrixRows
protected int featureMatrixCols
protected int dim
protected boolean isNormalized
protected cern.colt.matrix.impl.DenseDoubleMatrix1D meanVector
protected double mqe0
protected int numVectors
protected java.util.Random rand
protected TemplateVector templateVector
TemplateVector
attached to this input data.private double[][] transformedVectors
private double[][] distanceMatrix
LeightWeightMemoryInputVectorDistanceMatrix
insteadprotected java.util.LinkedHashMap<java.lang.String,java.lang.Integer> nameCache
private double[][] intervals
getDataIntervals()
private double[] extremes
getMinValue()
and getMaxValue()
protected AbstractSOMLibSparseInputData(java.lang.String[] dataNames, int dim, boolean norm, java.util.Random rand, TemplateVector tv, SOMLibClassInformation clsInfo)
protected AbstractSOMLibSparseInputData(boolean norm, java.util.Random random)
protected AbstractSOMLibSparseInputData()
public int dim()
InputData
public ContentType getContentType()
InputData
getContentType
in interface InputData
public int getFeatureMatrixRows()
InputData
getFeatureMatrixRows
in interface InputData
public int getFeatureMatrixColumns()
InputData
getFeatureMatrixColumns
in interface InputData
public cern.colt.matrix.DoubleMatrix1D getMeanVector()
InputData
getMeanVector
in interface InputData
public cern.colt.matrix.DoubleMatrix1D getMeanVector(java.lang.String[] labels)
InputData
getMeanVector
in interface InputData
labels
- label names of the input data.public boolean isNormalizedToUnitLength()
InputData
isNormalizedToUnitLength
in interface InputData
public int numVectors()
InputData
numVectors
in interface InputData
public TemplateVector templateVector()
InputData
templateVector
in interface InputData
public SOMLibClassInformation classInformation()
InputData
classInformation
in interface InputData
public void setTemplateVector(TemplateVector templateVector)
InputData
setTemplateVector
in interface InputData
templateVector
- the new template vector.public InputDatum getInputDatum(java.lang.String label)
InputData
getInputDatum
in interface InputData
label
- the name of the input datum.public int getInputDatumIndex(java.lang.String label)
public InputDatum getRandomInputDatum(int iteration, int numIterations)
InputData
getRandomInputDatum
in interface InputData
public InputDatum[] getInputDatum(java.lang.String[] labels)
InputData
getInputDatum
in interface InputData
labels
- the labels of the input data.public void transformValues(DistanceMetric metric)
transformedVectors
using DistanceMetric.transformVector(double[])
of
the given metric.metric
- the metric to be used to transform the values.public void initDistanceMatrix(DistanceMetric metric) throws MetricException
distanceMatrix
- careful, this is a lengthy process and should be done only if needed.
Requires the matrix of transformedVectors
being initialised (e.g. via
transformValues(DistanceMetric)
).metric
- the metric to use for calculating the distances.MetricException
- if DistanceMetric.distance(double[], double[])
encounters a problem.public InputDatum[] getNearestN(int inputIndex, DistanceMetric metric, int number) throws MetricException
inputIndex
- the index of the vector.metric
- the metric to use for the distance comparison. Only used when the distanceMatrix
is not
pre-calculated.number
- the number of nearest input vectors desired.MetricException
- if DistanceMetric.distance(DoubleMatrix1D, double[])
encounters a problem.public java.util.ArrayList<InputDistance> getDistances(int inputIndex, DistanceMetric metric) throws MetricException
inputIndex
- the index of the vector.metric
- the metric to use for the distance comparison. Only used when the distanceMatrix
is not
pre-calculated.MetricException
- if DistanceMetric.distance(DoubleMatrix1D, double[])
encounters a problem.public SmallestElementSet<InputDistance> getNearestDistances(int inputIndex, int neighbours, DistanceMetric metric) throws MetricException
MetricException
private InputDatum[] getNNearest(java.util.ArrayList<InputDistance> distances)
private InputDatum[] getNNearest(int number, java.util.ArrayList<InputDistance> distances)
public InputDatum[] getNearestNUnsorted(int inputIndex, DistanceMetric metric, int number) throws MetricException
MetricException
public InputDatum[] getNearestN(double[] vector, DistanceMetric metric, int number) throws MetricException
InputDatum
that are closest to the given vector.MetricException
public InputDatum[] getByNameDistanceSorted(double[] vector, java.util.Collection<java.lang.String> inputNames, DistanceMetric metric) throws MetricException
InputDatum
corresponding to the given input names, and sorted by their distance to the
given vector.MetricException
public double[][] getData()
InputData
public double[][] getData(java.lang.String className) throws SOMToolboxException
InputData
getData
in interface InputData
SOMToolboxException
- If no class information file is loadedpublic void setClassInfo(SOMLibClassInformation classInfo)
setClassInfo
in interface InputData
public double[][] getDistanceMatrix()
public double[][] getDataIntervals()
InputData
getDataIntervals
in interface InputData
public double getMinValue()
InputData
getMinValue
in interface InputData
public double getMaxValue()
InputData
getMaxValue
in interface InputData
public java.util.Hashtable<java.lang.Integer,java.lang.Integer> getFeatureDensities()
public java.lang.String[] getLabels()
InputData
public java.lang.String getLabel(int index)
InputData
public boolean equals(java.lang.Object obj)
equals
in class java.lang.Object
private boolean assertEqual(java.lang.Object name, java.lang.Object i1, java.lang.Object i2)
public static AbstractSOMLibSparseInputData create(InputDatum[] inputData, SOMLibClassInformation classInfo)
public static java.lang.String getFormatName()
public static java.lang.String getFileNameSuffix()
public java.lang.String getDataSource()
InputData
getDataSource
in interface InputData