Package 'Evacluster' reference manual

Title:	Evaluation Clustering Methods for Disease Subtypes Diagnosis
Description:	Contains a set of clustering methods and evaluation metrics to select the best number of the clusters based on clustering stability. Two references describe the methodology: Fahimeh Nezhadmoghadam, and Jose Tamez-Pena (2021)<doi:10.1016/j.compbiomed.2021.104753>, and Fahimeh Nezhadmoghadam, et al.(2021)<doi:10.2174/1567205018666210831145825>.
Authors:	Fahimeh Nezhadmoghadam,Jose Gerardo Tamez-Pena
Maintainer:	Fahimeh Nezhadmoghadam <[email protected]>
License:	LGPL (>= 2)
Version:	0.1.0
Built:	2025-03-15 03:57:31 UTC
Source:	https://github.com/cran/Evacluster

Evaluation Clustering Methods for Disease Subtypes Diagnosis (Evacluster)

Description

Contains a set of clustering methods and evaluation metrics to select the best number of the clusters based on clustering stability.

Details

Package:	Evacluster
Type:	Package
Version:	0.1.0
Date:	2022-03-25
License:	LGPL (>= 2)

Purpose: The design of clustering models and evaluation metrics for finding the cluster's number via computing clustering stability. The best number of clusters is selected via consensus clustering and clustering stability.

Author(s)

Fahimeh Nezhadmoghadam, Jose Gerardo Tamez-Pena, Maintainer: <[email protected]>

References

Nezhadmoghadam, Fahimeh, and Jose Tamez-Pena. "Risk profiles for negative and positive COVID-19 hospitalized patients.(2021) Computers in biology and medicine 136 : 104753.
Fahimeh Nezhadmoghadam, et al., Robust Discovery of Mild Cognitive impairment subtypes and their Risk of Alzheimer's Disease conversion using unsupervised machine learning and Gaussian Mixture Modeling (2021), Current Alzheimer Research, 18 (7), 595-606.

Examples

    ## Not run: 
    ### Evacluster Package Examples ####
    library(datasets)
    data(iris)

   # Split data to training set and testing set
   rndSamples <- sample(nrow(iris),100)
   trainData <- iris[rndSamples,]
   testData <- iris[-rndSamples,]

  
   ## Expectation Maximization Clustering
   # Perform Expectation Maximization Clustering on training set with 3 clusters 
   clsut <- EMCluster(trainData[,1:4],3)
   
   # Predict the labels of the cluster for new data based on cluster labels of the training set
   pre <- predict(clsut,testData[,1:4])
   
   
   ## Fuzzy C-means Clustering
   # Perform Fuzzy C-means Clustering on training set with 3 clusters 
   clsut <- FuzzyCluster(trainData[,1:4],3)
   
   # Predict the labels of the new data 
   pre <- predict(clsut,testData[,1:4])
   
   
   ## hierarchical clustering
   # Perform hierarchical clustering on training set with 3 clusters 
   clsut <- hierarchicalCluster(trainData[,1:4],distmethod="euclidean",clusters=3)
   
   # Predict the labels of the new data 
   pre <- predict(clsut,testData[,1:4])
   
   
   ## K-means Clustering
   # Perform K-means Clustering on training set with 3 clusters 
   clsut <- kmeansCluster(trainData[,1:4],3)
   
   # Predict the labels of the new data 
   pre <- predict(clsut,testData[,1:4])
   
   
   ## Partitioning Around Medoids (PAM) Clustering
   # Perform pam Clustering on training set with 3 clusters 
   clsut <- pamCluster(trainData[,1:4],3)
   
   # Predict the labels of the new data 
   pre <- predict(clsut,testData[,1:4])
   
   
   ## Non-negative matrix factorization (NMF)
   # Perform nmf Clustering on training set with 3 clusters 
   clsut <- nmfCluster(trainData[,1:4],rank=3)
   
   # Predict the labels of the new data 
   pre <- predict(clsut,testData[,1:4])
   
   
   ## t-Distributed Stochastic Neighbor Embedding (t-SNE)
   
   library(mlbench)
   data(Sonar)
 
   rndSamples <- sample(nrow(Sonar),150)
   trainData <- Sonar[rndSamples,]
   testData <- Sonar[-rndSamples,]
 
   # Perform tSNE dimensionality reduction method on training data 
   tsne_trainData <- tsneReductor(trainData[,1:60],dim = 3,perplexity = 10,max_iter = 1000)
   
   # performs an embedding of new data using an existing embedding
   tsne_testData <- predict(tsne_trainData,k=3,testData[,1:60])
   
   
   ## clustering stability function
   # Compute the stability of clustering to select the best number of clusters.
   library(mlbench)
   data(Sonar)
 
   Sonar$Class <- as.numeric(Sonar$Class)
   Sonar$Class[Sonar$Class == 1] <- 0
   Sonar$Class[Sonar$Class == 2] <- 1
   
   # Compute the stability of clustering using kmeans clustering, UMAP as 
   dimensionality reduction method, and feature selection technique
   
  ClustStab <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                              n_components = 3,featureselection="yes", outcome="Class",
                              fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=3)
   
   
   # Get the labels of the subjects that share the same connectivity
   clusterLabels <- getConsensusCluster(ClustStab,who="training",thr=seq(0.80,0.30,-0.1))


     # Compute the stability of clustering using PAM clustering, tSNE as
     dimensionality reduction method, and feature selection technique
     
   ClustStab <- clusterStability(data=Sonar, clustermethod=pamCluster, dimenreducmethod="tSNE",
                              n_components = 3, perplexity=10,max_iter=100,k_neighbor=2,
                             featureselection="yes", outcome="Class",fs.pvalue = 0.05,
                               randomTests = 100,trainFraction = 0.7,k=3)
          
    # Get the labels of the subjects that share the same connectivity
   clusterLabels <- getConsensusCluster(ClustStab,who="training",thr=seq(0.80,0.30,-0.1))
                     
                     
    # Compute the stability of clustering using hierarchical clustering,
    PCA as dimensionality reduction method, and without applying feature selection
                                 
   ClustStab <- clusterStability(data=Sonar, clustermethod=hierarchicalCluster, 
                               dimenreducmethod="PCA", n_components = 3,featureselection="no",
                               randomTests = 100,trainFraction = 0.7,distmethod="euclidean", 
                               clusters=3)
                               
 # Get the labels of the subjects that share the same connectivity
   clusterLabels <- getConsensusCluster(ClustStab,who="training",thr=seq(0.80,0.30,-0.1))
   
   
   # Show the clustering stability resuldts
   mycolors <- c("red","green","blue","yellow")
 
   ordermatrix <- ClustStab$dataConcensus
 
   heatmapsubsample <- sample(nrow(ordermatrix),70)
 
   orderindex <- 10*clusterLabels + ClustStab$trainJaccardpoint
 
   orderindex <- orderindex[heatmapsubsample]
   orderindex <- order(orderindex)
   ordermatrix <- ordermatrix[heatmapsubsample,heatmapsubsample]
   ordermatrix <- ordermatrix[orderindex,orderindex]
   rowcolors <- mycolors[1+clusterLabels[heatmapsubsample]]
   rowcolors <- rowcolors[orderindex]
 
 
   hplot <- gplots::heatmap.2(as.matrix(ordermatrix),Rowv=FALSE,Colv=FALSE,
                            RowSideColors = rowcolors,ColSideColors = rowcolors,dendrogram = "none",
                            trace="none",main="Cluster Co-Association \n (k=3)")
                            
   
   # Compare the PAC values of clustering stability with different numbers of clusters 
   
   ClustStab2 <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                              n_components = 3,featureselection="yes", outcome="Class",
                              fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=2)
 
   ClustStab3 <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                                n_components = 3,featureselection="yes", outcome="Class",
                                fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=3)
 
   ClustStab4 <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                                n_components = 3,featureselection="yes", outcome="Class",
                                fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=4)
                                
                                
   color_range<- c(black="#FDFC74", orange="#76FF7A", skyblue="#B2EC5D")
 
 
   max.temp <- c(ClustStab2$PAC,ClustStab3$PAC,ClustStab4$PAC) 
 
   barplot(max.temp,xlab = "Number of clusters",ylab = "PAC", names.arg = c( "2","3","4"), 
          ylim=c(0,0.3),col= color_range[1:length(c(1,6,2,6,1))])
                            
   
## End(Not run)
## Not run: 
    ### Evacluster Package Examples ####
    library(datasets)
    data(iris)

   # Split data to training set and testing set
   rndSamples <- sample(nrow(iris),100)
   trainData <- iris[rndSamples,]
   testData <- iris[-rndSamples,]

  
   ## Expectation Maximization Clustering
   # Perform Expectation Maximization Clustering on training set with 3 clusters 
   clsut <- EMCluster(trainData[,1:4],3)
   
   # Predict the labels of the cluster for new data based on cluster labels of the training set
   pre <- predict(clsut,testData[,1:4])
   
   
   ## Fuzzy C-means Clustering
   # Perform Fuzzy C-means Clustering on training set with 3 clusters 
   clsut <- FuzzyCluster(trainData[,1:4],3)
   
   # Predict the labels of the new data 
   pre <- predict(clsut,testData[,1:4])
   
   
   ## hierarchical clustering
   # Perform hierarchical clustering on training set with 3 clusters 
   clsut <- hierarchicalCluster(trainData[,1:4],distmethod="euclidean",clusters=3)
   
   # Predict the labels of the new data 
   pre <- predict(clsut,testData[,1:4])
   
   
   ## K-means Clustering
   # Perform K-means Clustering on training set with 3 clusters 
   clsut <- kmeansCluster(trainData[,1:4],3)
   
   # Predict the labels of the new data 
   pre <- predict(clsut,testData[,1:4])
   
   
   ## Partitioning Around Medoids (PAM) Clustering
   # Perform pam Clustering on training set with 3 clusters 
   clsut <- pamCluster(trainData[,1:4],3)
   
   # Predict the labels of the new data 
   pre <- predict(clsut,testData[,1:4])
   
   
   ## Non-negative matrix factorization (NMF)
   # Perform nmf Clustering on training set with 3 clusters 
   clsut <- nmfCluster(trainData[,1:4],rank=3)
   
   # Predict the labels of the new data 
   pre <- predict(clsut,testData[,1:4])
   
   
   ## t-Distributed Stochastic Neighbor Embedding (t-SNE)
   
   library(mlbench)
   data(Sonar)
 
   rndSamples <- sample(nrow(Sonar),150)
   trainData <- Sonar[rndSamples,]
   testData <- Sonar[-rndSamples,]
 
   # Perform tSNE dimensionality reduction method on training data 
   tsne_trainData <- tsneReductor(trainData[,1:60],dim = 3,perplexity = 10,max_iter = 1000)
   
   # performs an embedding of new data using an existing embedding
   tsne_testData <- predict(tsne_trainData,k=3,testData[,1:60])
   
   
   ## clustering stability function
   # Compute the stability of clustering to select the best number of clusters.
   library(mlbench)
   data(Sonar)
 
   Sonar$Class <- as.numeric(Sonar$Class)
   Sonar$Class[Sonar$Class == 1] <- 0
   Sonar$Class[Sonar$Class == 2] <- 1
   
   # Compute the stability of clustering using kmeans clustering, UMAP as 
   dimensionality reduction method, and feature selection technique
   
  ClustStab <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                              n_components = 3,featureselection="yes", outcome="Class",
                              fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=3)
   
   
   # Get the labels of the subjects that share the same connectivity
   clusterLabels <- getConsensusCluster(ClustStab,who="training",thr=seq(0.80,0.30,-0.1))


     # Compute the stability of clustering using PAM clustering, tSNE as
     dimensionality reduction method, and feature selection technique
     
   ClustStab <- clusterStability(data=Sonar, clustermethod=pamCluster, dimenreducmethod="tSNE",
                              n_components = 3, perplexity=10,max_iter=100,k_neighbor=2,
                             featureselection="yes", outcome="Class",fs.pvalue = 0.05,
                               randomTests = 100,trainFraction = 0.7,k=3)
          
    # Get the labels of the subjects that share the same connectivity
   clusterLabels <- getConsensusCluster(ClustStab,who="training",thr=seq(0.80,0.30,-0.1))
                     
                     
    # Compute the stability of clustering using hierarchical clustering,
    PCA as dimensionality reduction method, and without applying feature selection
                                 
   ClustStab <- clusterStability(data=Sonar, clustermethod=hierarchicalCluster, 
                               dimenreducmethod="PCA", n_components = 3,featureselection="no",
                               randomTests = 100,trainFraction = 0.7,distmethod="euclidean", 
                               clusters=3)
                               
 # Get the labels of the subjects that share the same connectivity
   clusterLabels <- getConsensusCluster(ClustStab,who="training",thr=seq(0.80,0.30,-0.1))
   
   
   # Show the clustering stability resuldts
   mycolors <- c("red","green","blue","yellow")
 
   ordermatrix <- ClustStab$dataConcensus
 
   heatmapsubsample <- sample(nrow(ordermatrix),70)
 
   orderindex <- 10*clusterLabels + ClustStab$trainJaccardpoint
 
   orderindex <- orderindex[heatmapsubsample]
   orderindex <- order(orderindex)
   ordermatrix <- ordermatrix[heatmapsubsample,heatmapsubsample]
   ordermatrix <- ordermatrix[orderindex,orderindex]
   rowcolors <- mycolors[1+clusterLabels[heatmapsubsample]]
   rowcolors <- rowcolors[orderindex]
 
 
   hplot <- gplots::heatmap.2(as.matrix(ordermatrix),Rowv=FALSE,Colv=FALSE,
                            RowSideColors = rowcolors,ColSideColors = rowcolors,dendrogram = "none",
                            trace="none",main="Cluster Co-Association \n (k=3)")
                            
   
   # Compare the PAC values of clustering stability with different numbers of clusters 
   
   ClustStab2 <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                              n_components = 3,featureselection="yes", outcome="Class",
                              fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=2)
 
   ClustStab3 <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                                n_components = 3,featureselection="yes", outcome="Class",
                                fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=3)
 
   ClustStab4 <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                                n_components = 3,featureselection="yes", outcome="Class",
                                fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=4)
                                
                                
   color_range<- c(black="#FDFC74", orange="#76FF7A", skyblue="#B2EC5D")
 
 
   max.temp <- c(ClustStab2$PAC,ClustStab3$PAC,ClustStab4$PAC) 
 
   barplot(max.temp,xlab = "Number of clusters",ylab = "PAC", names.arg = c( "2","3","4"), 
          ylim=c(0,0.3),col= color_range[1:length(c(1,6,2,6,1))])
                            
   
## End(Not run)

clustering stability function

Description

This function computes the stability of clustering that helps to select the best number of clusters. Feature selection and dimensionality reduction methods can be used before clustering the data.

Usage

clusterStability(
  data = NULL,
  clustermethod = NULL,
  dimenreducmethod = NULL,
  n_components = 3,
  perplexity = 25,
  max_iter = 1000,
  k_neighbor = 3,
  featureselection = NULL,
  outcome = NULL,
  fs.pvalue = 0.05,
  randomTests = 20,
  trainFraction = 0.5,
  pac.thr = 0.1,
  ...
)
clusterStability(
  data = NULL,
  clustermethod = NULL,
  dimenreducmethod = NULL,
  n_components = 3,
  perplexity = 25,
  max_iter = 1000,
  k_neighbor = 3,
  featureselection = NULL,
  outcome = NULL,
  fs.pvalue = 0.05,
  randomTests = 20,
  trainFraction = 0.5,
  pac.thr = 0.1,
  ...
)

Arguments

`data`	A Data set
`clustermethod`	The clustering method. This can be one of "Mclust","pamCluster","kmeansCluster", "hierarchicalCluster",and "FuzzyCluster".
`dimenreducmethod`	The dimensionality reduction method. This must be one of "UMAP","tSNE", and "PCA".
`n_components`	The dimension of the space that data embed into. It can be set to any integer value in the range of 2 to 100.
`perplexity`	The Perplexity parameter that determines the optimal number of neighbors in tSNE method.(it is only used in the tSNE reduction method)
`max_iter`	The maximum number of iterations for performing tSNE reduction method.
`k_neighbor`	The k_neighbor is used for computing the means of #neighbors with min distance (#Neighbor=sqrt(#Samples/k) for performing an embedding of new data using an existing embedding in the tSNE method.
`featureselection`	This parameter determines whether feature selection is applied before clustering data or not. if used, it should be "yes", otherwisw "no".
`outcome`	The outcome feature is used for feature selection.
`fs.pvalue`	The threshold pvalue used for feature selection process. The default value is 0.05.
`randomTests`	The number of iterations of the clustering process for computing the cluster stability.
`trainFraction`	This parameter determines the ratio of training data. The default value is 0.5.
`pac.thr`	The pac.thr is the thresold to use for computing the proportion of ambiguous clustering (PAC) score. It is as the fraction of sample pairs with consensus indices falling in the interval.The default value is 0.1.
`...`	Additional arguments passed to clusterStability().

Value

A list with the following elements:

randIndex - A vector of the Rand Index that computes a similarity measure between two clusterings.
jaccIndex - A vector of jaccard Index that measures how frequently pairs of items are joined together in two clustering data sets.
randomSamples - A vector with indexes of selected samples for training in each iteration.
clusterLabels - A vector with clusters' labels in all iterations. jaccardpoint
jaccardpoint - The corresponding Jaccard index for each data point of testing set
averageNumberofClusters - The mean Number of Clusters.
testConsesus - A vector of consensus clustering results of testing set.
trainRandIndex - A vector of the Rand Index for training set.
trainJaccIndex - A vector of the jaccard Index for training set.
trainJaccardpoint - The corresponding Jaccard index for each data point of training set.
PAC - The proportion of ambiguous clustering (PAC) score.
dataConcensus - A vector of consensus clustering results of training set.

Examples


library("mlbench")
data(Sonar)

Sonar$Class <- as.numeric(Sonar$Class)
Sonar$Class[Sonar$Class == 1] <- 0 
Sonar$Class[Sonar$Class == 2] <- 1

ClustStab <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                              n_components = 3,featureselection="yes", outcome="Class",
                              fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=3)


ClustStab <- clusterStability(data=Sonar, clustermethod=pamCluster, dimenreducmethod="tSNE",
                              n_components = 3, perplexity=10,max_iter=100,k_neighbor=2,
                              featureselection="yes", outcome="Class",fs.pvalue = 0.05,
                              randomTests = 100,trainFraction = 0.7,k=3)


ClustStab <- clusterStability(data=Sonar, clustermethod=hierarchicalCluster, 
                              dimenreducmethod="PCA", n_components = 3,featureselection="no",
                              randomTests = 100,trainFraction = 0.7,distmethod="euclidean",
                              clusters=3)


library("mlbench")
data(Sonar)

Sonar$Class <- as.numeric(Sonar$Class)
Sonar$Class[Sonar$Class == 1] <- 0 
Sonar$Class[Sonar$Class == 2] <- 1

ClustStab <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                              n_components = 3,featureselection="yes", outcome="Class",
                              fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=3)


ClustStab <- clusterStability(data=Sonar, clustermethod=pamCluster, dimenreducmethod="tSNE",
                              n_components = 3, perplexity=10,max_iter=100,k_neighbor=2,
                              featureselection="yes", outcome="Class",fs.pvalue = 0.05,
                              randomTests = 100,trainFraction = 0.7,k=3)


ClustStab <- clusterStability(data=Sonar, clustermethod=hierarchicalCluster, 
                              dimenreducmethod="PCA", n_components = 3,featureselection="no",
                              randomTests = 100,trainFraction = 0.7,distmethod="euclidean",
                              clusters=3)

Expectation Maximization Clustering

Description

This function perform EM algorithm for model-based clustering of finite mixture multivariate Gaussian distribution.The general purpose of clustering is to detect clusters of data and to assign the data to the clusters.

Usage

EMCluster(data = NULL, ...)
EMCluster(data = NULL, ...)

Arguments

`data`	A Data set
`...`	k: The number of Clusters

Value

A list of cluster labels and a returned object from init.EM

Examples

library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

clsut <- EMCluster(trainData[,1:4],3)
library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

clsut <- EMCluster(trainData[,1:4],3)

Fuzzy C-means Clustering Algorithm

Description

This function works by assigning membership to each data point corresponding to each cluster center based on the distance between the cluster center and the data point. A data object is the member of all clusters with varying degrees of fuzzy membership between 0 and 1.

Usage

FuzzyCluster(data = NULL, ...)
FuzzyCluster(data = NULL, ...)

Arguments

`data`	A Data set
`...`	k: The number of Clusters

Value

A list of cluster labels and a R object of class "fcm ppclust"

Examples

library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

cls <- FuzzyCluster(trainData[,1:4],3)
library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

cls <- FuzzyCluster(trainData[,1:4],3)

Consensus Clustering Results

Description

This function gets the labels of the subjects that share the same connectivity.

Usage

getConsensusCluster(object, who = "training", thr = seq(0.8, 0.3, -0.1))
getConsensusCluster(object, who = "training", thr = seq(0.8, 0.3, -0.1))

Arguments

`object`	A object of "clusterStability" function result
`who`	This value shows the consensus clustering result of training and testing sets. If who="training" for training set, otherwise other sets.
`thr`	This is the seq function with three arguments that are: initial value, final value, and increment (or decrement for a declining sequence). This produces ascending or descending sequences.

Value

A list of samples' labels with same connectivity.

Examples


library("mlbench")
data(Sonar)

Sonar$Class <- as.numeric(Sonar$Class)
Sonar$Class[Sonar$Class == 1] <- 0 
Sonar$Class[Sonar$Class == 2] <- 1

ClustStab <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                              n_components = 3,featureselection="yes", outcome="Class",
                              fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=3)

clusterLabels <- getConsensusCluster(ClustStab,who="training",thr=seq(0.80,0.30,-0.1))

library("mlbench")
data(Sonar)

Sonar$Class <- as.numeric(Sonar$Class)
Sonar$Class[Sonar$Class == 1] <- 0 
Sonar$Class[Sonar$Class == 2] <- 1

ClustStab <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                              n_components = 3,featureselection="yes", outcome="Class",
                              fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=3)

clusterLabels <- getConsensusCluster(ClustStab,who="training",thr=seq(0.80,0.30,-0.1))

hierarchical clustering

Description

This function seeks to build a hierarchy of clusters

Usage

hierarchicalCluster(data = NULL, distmethod = NULL, clusters = NULL, ...)
hierarchicalCluster(data = NULL, distmethod = NULL, clusters = NULL, ...)

Arguments

`data`	A Data set
`distmethod`	The distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski".
`clusters`	The number of Clusters
`...`	Additional parameters passed to hclust function

Value

A list of cluster labels

Examples

library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

cls <- hierarchicalCluster(trainData[,1:4],distmethod="euclidean",clusters=3)
library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

cls <- hierarchicalCluster(trainData[,1:4],distmethod="euclidean",clusters=3)

K-means Clustering

Description

This function classifies unlabeled data by grouping them by features, rather than pre-defined categories. It splits the data into K different clusters and describes the location of the center of each cluster. Then, a new data point can be assigned a cluster (class) based on the closed center of mass.

Usage

kmeansCluster(data = NULL, ...)
kmeansCluster(data = NULL, ...)

Arguments

`data`	A Data set
`...`	center: The number of centers

Value

A list of cluster labels and a R object of class "kmeans"

Examples

library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

cls <- kmeansCluster(trainData[,1:4],3)
library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

cls <- kmeansCluster(trainData[,1:4],3)

Non-negative matrix factorization (NMF)

Description

This function factorizes samples matrix into (usually) two matrices W the cluster centroids and H the cluster membership,

Usage

nmfCluster(data = NULL, rank = NULL)
nmfCluster(data = NULL, rank = NULL)

Arguments

`data`	A Data set
`rank`	Specification of the factorization rank

Value

A list of cluster labels, a R object of class "nmf" and the centers of the clusters

Examples

library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

cls <- nmfCluster(trainData[,1:4],rank=3)
library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

cls <- nmfCluster(trainData[,1:4],rank=3)

Partitioning Around Medoids (PAM) Clustering

Description

This function partitions (clustering) of the data into k clusters "around medoids". In contrast to the k-means algorithm, this clustering methods chooses actual data points as centers

Usage

pamCluster(data = NULL, ...)
pamCluster(data = NULL, ...)

Arguments

`data`	A Data set
`...`	k: The number of clusters

Value

A list of cluster labels and a R object of class "pam cluster"

Examples

library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

cls <- pamCluster(trainData[,1:4],3)
library(datasets)
data(iris)

rndSamples <- sample(nrow(iris),100)
trainData <- iris[rndSamples,]
testData <- iris[-rndSamples,]

cls <- pamCluster(trainData[,1:4],3)

EMCluster prediction function

Description

This function predicts the labels of the cluster for new data based on cluster labels of the training set.

Usage

## S3 method for class 'EMCluster'
predict(object, ...)
## S3 method for class 'EMCluster'
predict(object, ...)

Arguments

`object`	A returned object of EMCluster
`...`	New sample set

Value

A list of cluster labels

FuzzyCluster prediction function

Description

This function predicts the labels of the cluster for new data based on cluster labels of the training set.

Usage

## S3 method for class 'FuzzyCluster'
predict(object, ...)
## S3 method for class 'FuzzyCluster'
predict(object, ...)

Arguments

`object`	A returned object of FuzzyCluster function
`...`	New samples set

Value

A list of cluster labels

hierarchicalCluster prediction function

Description

This function predicts the labels of the cluster for new data based on cluster labels of the training set.

Usage

## S3 method for class 'hierarchicalCluster'
predict(object, ...)
## S3 method for class 'hierarchicalCluster'
predict(object, ...)

Arguments

`object`	A returned object of hierarchicalCluster function
`...`	New samples set

Value

A list of cluster labels

kmeansCluster prediction function

Description

This function predicts the labels of the cluster for new data based on cluster labels of the training set.

Usage

## S3 method for class 'kmeansCluster'
predict(object, ...)
## S3 method for class 'kmeansCluster'
predict(object, ...)

Arguments

`object`	A returned object of kmeansCluster function
`...`	New samples set

Value

A list of cluster labels

nmfCluster prediction function

Description

This function predicts the labels of the cluster for new data based on cluster labels of the training set.

Usage

## S3 method for class 'nmfCluster'
predict(object, ...)
## S3 method for class 'nmfCluster'
predict(object, ...)

Arguments

`object`	A returned object of nmfCluster
`...`	New samples set

Value

A list of cluster labels

pamCluster prediction function

Description

This function predicts the labels of the cluster for new data based on cluster labels of the training set.

Usage

## S3 method for class 'pamCluster'
predict(object, ...)
## S3 method for class 'pamCluster'
predict(object, ...)

Arguments

`object`	A returned object of pamCluster function
`...`	New samples set

Value

A list of cluster labels

tsneReductor prediction function

Description

This function performs an embedding of new data using an existing embedding.

Usage

## S3 method for class 'tsneReductor'
predict(object, k = NULL, ...)
## S3 method for class 'tsneReductor'
predict(object, k = NULL, ...)

Arguments

`object`	A returned object of tsneReductor function
`k`	The number is used for computing the means of #neighbors with min distance (#Neighbor=sqrt(#Samples/k).
`...`	New samples set

Value

tsneY:An embedding of new data

Examples

library("mlbench")
data(Sonar)

rndSamples <- sample(nrow(Sonar),150)
trainData <- Sonar[rndSamples,]
testData <- Sonar[-rndSamples,]

tsne_trainData <- tsneReductor(trainData[,1:60],dim = 3,perplexity = 10,max_iter = 1000)

tsne_testData <- predict(tsne_trainData,k=3,testData[,1:60])
library("mlbench")
data(Sonar)

rndSamples <- sample(nrow(Sonar),150)
trainData <- Sonar[rndSamples,]
testData <- Sonar[-rndSamples,]

tsne_trainData <- tsneReductor(trainData[,1:60],dim = 3,perplexity = 10,max_iter = 1000)

tsne_testData <- predict(tsne_trainData,k=3,testData[,1:60])

t-Distributed Stochastic Neighbor Embedding (t-SNE)

Description

This method is an unsupervised, non-linear technique used for data exploration and visualizing high-dimensional data.This function constructs a low-dimensional embedding of high-dimensional data, distances, or similarities.

Usage

tsneReductor(data = NULL, dim = 2, perplexity = 30, max_iter = 500)
tsneReductor(data = NULL, dim = 2, perplexity = 30, max_iter = 500)

Arguments

`data`	Data matrix (each row is an observation, each column is a variable)
`dim`	Integer number; Output dimensional (default=2)
`perplexity`	numeric; Perplexity parameter (should not be bigger than 3 * perplexity < nrow(X) - 1, default=30)
`max_iter`	Integer; Number of iterations (default: 500)

Value

tsneY: A Matrix containing the new representations for the observation with selected dimensions by user

Examples

library("mlbench")
data(Sonar)

rndSamples <- sample(nrow(Sonar),150)
trainData <- Sonar[rndSamples,]
testData <- Sonar[-rndSamples,]

tsne_trainData <- tsneReductor(trainData[,1:60],dim = 3,perplexity = 10,max_iter = 1000)

library("mlbench")
data(Sonar)

rndSamples <- sample(nrow(Sonar),150)
trainData <- Sonar[rndSamples,]
testData <- Sonar[-rndSamples,]

tsne_trainData <- tsneReductor(trainData[,1:60],dim = 3,perplexity = 10,max_iter = 1000)

Package 'Evacluster'

Help Index

Evaluation Clustering Methods for Disease Subtypes Diagnosis (Evacluster)

Description

Details

Author(s)

References

Examples

clustering stability function

Description

Usage

Arguments

Value

Examples

Expectation Maximization Clustering

Description

Usage

Arguments

Value

Examples

Fuzzy C-means Clustering Algorithm

Description

Usage

Arguments

Value

Examples

Consensus Clustering Results

Description

Usage

Arguments

Value

Examples

hierarchical clustering

Description

Usage

Arguments

Value

Examples

K-means Clustering

Description

Usage

Arguments

Value

Examples

Non-negative matrix factorization (NMF)

Description

Usage

Arguments

Value

Examples

Partitioning Around Medoids (PAM) Clustering

Description

Usage

Arguments

Value

Examples

EMCluster prediction function

Description

Usage

Arguments

Value

FuzzyCluster prediction function

Description

Usage

Arguments

Value

hierarchicalCluster prediction function

Description

Usage

Arguments

Value

kmeansCluster prediction function

Description

Usage

Arguments

Value

nmfCluster prediction function

Description

Usage

Arguments