public class DatasetInformation
extends java.lang.Object
InputData
and
SOMLibClassInformation
, respectively ! Modifier and Type | Field and Description |
---|---|
private SOMLibClassInformation |
classInfo |
private java.lang.String |
classInformationFilename |
private java.lang.String[] |
classNames |
(package private) boolean |
denseData |
private boolean[] |
discrete
only an estimation - we call values discrete if they are integer values
|
static int |
DISCRETE |
private EditableReportProperties |
EP |
private InputData |
inputData |
private java.lang.String |
inputDataFilename |
private TemplateVector |
inputTemplate |
private double[] |
max
holds for each dimension the maximal value
|
static int |
MAX_VALUE |
private double[] |
mean
holds for each dimension the mean value
|
static int |
MEAN_VALUE |
private double[] |
min
holds for each dimension the minimal value
|
static int |
MIN_VALUE |
private boolean[] |
only01
we check whether there are values != 0 or 1
|
static int |
ONLY01 |
private java.util.Vector<java.lang.Integer> |
selectedIndices |
private java.lang.String |
tvFilename |
private double[] |
var
holds for each dimension the variance
|
static int |
VAR_VALUE |
static int |
ZERO_VALUE |
private int[] |
zeroValues
holds for each dimension the number of 0 - values.
|
Constructor and Description |
---|
DatasetInformation(java.util.Vector<java.lang.Integer> selectedIndices,
java.lang.String inputDataFilename,
java.lang.String tvFilename,
java.lang.String classInformationFile,
EditableReportProperties EP)
creates a new object storing information about a given dataset
|
DatasetInformation(java.util.Vector<java.lang.Integer> selectedIndices,
java.lang.String inputDataFilename,
java.lang.String tvFilename,
java.lang.String classInformationFile,
EditableReportProperties EP,
CommonSOMViewerStateData state) |
Modifier and Type | Method and Description |
---|---|
private static java.lang.String |
applyNameFix(java.lang.String target)
small helper method for getTrainingDataInfo
|
double |
calculateAccumulatedVariance()
this method is just a small helper method, used to display the Dimensions in the top-part of the output document
It accumulates the Variances and calculates this Percentage from the total Variance
|
private void |
checkDatatypes()
runs over all dimensions of the input vectors and tries to fetch some information about their data ranges and
other properties information gathered are:
min and max value within each dimension (this.min, this.max)
does a dimension contain only 0/1 values (this.only01)
does a dimension contain only plain integer values (this.discrete)
how many 0 (=missing?) values are in each dimension (this.zeroValues)
the results are stored in the appropriate arrays
|
boolean |
classInfoAvailable()
returns whether class information are attached to the input vectors does not check whether it is a valid file,
only whether a String with length > 0 has been specified as path
|
java.lang.String |
getAttributeLabel(int dim)
returns the label (that is the name defined for an attribute in the template vector file) for the specified
attribute.
|
boolean |
getBoolDataProps(int type,
int attribute)
FIXME: split this into simple single getter methods...
|
int[] |
getClassColorRGB(int c)
returns an array of length three containing the r,g,b values of the colour used to colour the specified class
|
int |
getClassIndexOfInput(java.lang.String inputLabel)
returns the index of the class the input vector specified by its index belongs to
|
SOMLibClassInformation |
getClassInfo() |
java.lang.String |
getClassInformationFilename()
returns the path of the file containin the class information
|
double[] |
getClassMeanVector(int classId)
returns the mean vector of all input items belonging to the given class
|
java.util.Vector<java.lang.String> |
getClusterName(ClusterNode node,
int clusterByValue,
int nodeDepth)
Tries to name a cluster by the input data mapped to units lying within the cluster For naming the cluster, some
very simple heuristics are used: First, if there are any labels of the clusters, which correpsond to 0/1
attributes, and their values are all 0 (or 1) in the cluster, the name of this attribute is included to the name
of the cluster.
|
EditableReportProperties |
getEP()
Returns the Editable Report Properties for the Semantic Report
|
InputData |
getInputData()
returns the InputData object storing information about the input data used for training the som.
|
java.lang.String |
getInputDataFilename()
returns the complete filename of the file containing the input data complete filename means including the path.
|
InputDatum |
getInputDatum(int d)
returns the InputDatum at the specified index
|
InputDatum |
getInputDatum(java.lang.String name)
returns the InputDatum labelled with the specified name
|
java.lang.String[] |
getInputLabelsofClass(int classId)
returns a list of labels of all input items belonging to the given class
|
java.lang.String |
getNameOfClass(int c)
returns the name of the class specified by the index
|
int |
getNumberOfClasses()
returns the number of classes.
|
int |
getNumberOfClassmembers(int c)
returns the number of input elements belonging to the given class if no class information is attached to this
input, -1 is returned
|
int |
getNumberOfInputVectors()
returns the number of input vectors used for training the SOM, that is the number of different vectors present in
the input file for the SOM training.
|
int |
getNumberOfSelectedInputs()
returns the number of inputs the user has selected to get information about their position on the SOM
|
int |
getNumberOfZeroValues(int index)
returns the number of input vectors that have 0 as value in the given dimension
|
double |
getNumericalDataProps(int type,
int attribute)
FIXME: split this into simple single getter methods...
|
double[][] |
getPCAdeterminedDims()
This method calculates the most important Dimensions of the Dataset according to the results of a PCA, and rows
the resulting dim-index in a new array on first index.
|
int |
getSelectedInputId(int index)
returns the id of the inputVector at position index in the list of selected inputs each input vector is
identified by an id, which is its index in the complete input.
|
java.lang.String |
getTemplateFilename()
returns the complete filename of the file containing the template data complete filename means including the
path.
|
java.lang.String[] |
getTrainingDataInfo()
Returns the names of the 3 files, used for training
|
int |
getVectorDim()
returns the dimension of the input vectors, that is the same as the number of attributes used to describe the
objects.
|
boolean |
is01(int index)
returns whether the values in the given dimension are all only 0 or 1
|
boolean |
isDiscrete(int index)
returns whether our heuristic estimates this dimension to contain discrete values This is the case, if all values
in this dimension are exact integer values.
|
boolean |
isNormalized()
returns whether the input set has been normalized (in fact, this functions returns the result of
InputData.isNormalizedToUnitLength())
|
public static final int MIN_VALUE
public static final int MAX_VALUE
public static final int MEAN_VALUE
public static final int VAR_VALUE
public static final int ZERO_VALUE
public static final int ONLY01
public static final int DISCRETE
private java.util.Vector<java.lang.Integer> selectedIndices
private InputData inputData
private java.lang.String inputDataFilename
private java.lang.String tvFilename
private TemplateVector inputTemplate
private SOMLibClassInformation classInfo
private java.lang.String[] classNames
private java.lang.String classInformationFilename
private EditableReportProperties EP
private boolean[] only01
private boolean[] discrete
private double[] min
private double[] max
private double[] mean
private double[] var
private int[] zeroValues
boolean denseData
public DatasetInformation(java.util.Vector<java.lang.Integer> selectedIndices, java.lang.String inputDataFilename, java.lang.String tvFilename, java.lang.String classInformationFile, EditableReportProperties EP)
selectedIndices
- Vector of indices of the input items selected for more informationinputDataFilename
- the path to the file containing the input datatvFilename
- the path to the file containin the template vectorclassInformationFile
- the path to the file containing the class informationEP
- the customized Report Features of the Semantic Reportpublic DatasetInformation(java.util.Vector<java.lang.Integer> selectedIndices, java.lang.String inputDataFilename, java.lang.String tvFilename, java.lang.String classInformationFile, EditableReportProperties EP, CommonSOMViewerStateData state)
public boolean classInfoAvailable()
public SOMLibClassInformation getClassInfo()
public int getNumberOfInputVectors()
public double[] getClassMeanVector(int classId)
classId
- the id of the class for which the mean vector shall be calculatedpublic int getVectorDim()
public boolean is01(int index)
index
- the dimension (starting with 0) for which this property is requestedpublic boolean isDiscrete(int index)
index
- the dimension (starting with 0) for which the estimation is requestedpublic int getNumberOfZeroValues(int index)
index
- the dimension (starting with 0) for which the number is requestedpublic boolean isNormalized()
public double getNumericalDataProps(int type, int attribute)
type
- specifies the type of information to be returned: allowed are some constants defined by this class
(see above)attribute
- the index of the attribute for which the value shall be returned (starting with 0)public boolean getBoolDataProps(int type, int attribute)
type
- specifies the type of information to be returned: allowed are some constants defined by this class
(see above)attribute
- the index of the attribute for which the value shall be returned (starting with 0)public java.lang.String getAttributeLabel(int dim)
dim
- the index within the vector of the attribute whose label shall be returnedpublic int getNumberOfClasses()
public java.lang.String getNameOfClass(int c)
c
- the index of the class (starting with 0)public java.lang.String[] getInputLabelsofClass(int classId)
classId
- the id of the class for which the input items are requestedpublic int[] getClassColorRGB(int c)
c
- the index of the class for which the colour is requestedpublic int getNumberOfClassmembers(int c)
c
- the index of the class (starting with 0)public int getClassIndexOfInput(java.lang.String inputLabel)
public java.lang.String getClassInformationFilename()
private void checkDatatypes()
public InputData getInputData()
public InputDatum getInputDatum(java.lang.String name)
public InputDatum getInputDatum(int d)
public int getNumberOfSelectedInputs()
public int getSelectedInputId(int index)
index
- the index of the vector in the list of selected inputspublic java.lang.String getInputDataFilename()
public java.lang.String getTemplateFilename()
public java.util.Vector<java.lang.String> getClusterName(ClusterNode node, int clusterByValue, int nodeDepth)
node
- the node representing the cluster tha shall be namedclusterByValue
- indicates whether the labels for the cluster shall be created by value (is handed unchanged
to ClusterNode.getLabels(clusterByValue, boolen)nodeDepth
- the depth of the node in the tree, whereby the root (i.e. the cluster containing the whole map)
node has depth 1public double[][] getPCAdeterminedDims()
public double calculateAccumulatedVariance()
public java.lang.String[] getTrainingDataInfo()
private static java.lang.String applyNameFix(java.lang.String target)
public EditableReportProperties getEP()