Package Overview

Implements the Expectation Maximisation Algorithm for clustering the multivariate and univariate datasets. There are two versions of EM implemented- EM and EM* (converge faster by avoiding revisiting the data). For more details on EM*, see the ‘References’ section below.
The package has been tested with numerical datasets (not recommended for categorical/ordinal data). The package comes bundled with a dataset for demonstration (ionosphere_data.csv). More help about the package can be seen by typing ?DCEM in the R console (after installing the package).

Currently, data imputation is not supported and user has to handle the missing data before using the package.

Contact

For any Bug Fixes/Feature Update(s)

[Parichit Sharma: parishar@iu.edu]

For Reporting Issues

Issues

GitHub Repository Link

Github Repository

Installation Instructions

Installing from CRAN

install.packages(dcem)

Installing from the Binary Package

install.packages(dcem_1.0.0.tgz, repos = NULL, type="source")

How to use the package (An Example: working with the default bundled dataset)

ionosphere_data = read.csv2(
  file = paste(trimws(getwd()),"/data/","ionosphere_data.csv",sep = ""),
  sep = ",",
  header = FALSE,
  stringsAsFactors = FALSE
)

Paste the below code in the R session to clean the dataset.

ionosphere_data =  trim_data("35,2", ionosphere_data)

Paste the below code in the R session to call the dcem_train() function.

dcem_out = dcem_train(data = ionosphere_data, threshold = 0.0001, iteration_count = 50, num_clusters = 2)
          [1] Posterior Probabilities: `**dcem_out$prob**`: A matrix of posterior-probabilities for the 
              points in the dataset.
              
          [2] Mean(s): `**dcem_out$mean**`
              
              For multivariate data: It is a matrix of means for the gaussians. Each row in the  
              matrix corresponds to a mean for the gaussian.
              
              For univariate data: It is a vector if means. Each element of the vector corresponds 
              to one gaussian.
              
          [3] Co-variance matrices 
          
              For multivariate data: `**dcem_out$cov**`: list of co-variance matrices for the gaussians.
          
              For univariate data: Standard-deviation `**dcem_out$sd**`: vector of standard deviation(s) 
              for the gaussians.
               
          [4] Priors: `**dcem_out$prior**`: a vector of priors for the gaussians.

How to access the help (after installing the package)

?dcem_star_train
?dcem_train
?dcem_test
?DCEM