Package Overview

Implements the Expectation Maximisation Algorithm for clustering the multivariate and univariate datasets. There are two versions of EM implemented-EM* (converge faster by avoiding revisiting the data) and EM. For more details on EM*, see the ‘References’ section below.

The package has been tested with both real and simulated datasets. The package comes bundled with a dataset for demonstration (ionosphere_data.csv). More help about the package can be seen by typing ?DCEM in the R console (after installing the package).

Currently, data imputation is not supported and user has to handle the missing data before using the package.

Contact

For any Bug Fixes/Feature Update(s)

[Parichit Sharma: parishar@iu.edu]

For Reporting Issues

Issues

GitHub Repository Link

Github Repository

Installation Instructions

Installing from CRAN

install.packages(DCEM)

Installing from the Source Package

R CMD install DCEM_2.0.2.tar.gz

How to use the package (An Example: working with the default bundled dataset)

ionosphere_data = read.csv2(
  file = paste(trimws(getwd()),"/data/","ionosphere_data.csv",sep = ""),
  sep = ",",
  header = FALSE,
  stringsAsFactors = FALSE
)

Paste the below code in the R session to clean the dataset.

ionosphere_data =  trim_data("35,2", ionosphere_data)

Paste the below code in the R session to call the dcem_train() function.

dcem_out = dcem_train(data = ionosphere_data, threshold = 0.0001, iteration_count = 50, num_clusters = 2)
          [1] Posterior Probabilities: dcem_out$prob: A matrix of posterior-probabilities for the 
              points in the dataset.
              
          [2] Meu(s): dcem_out$meu
              
              For multivariate data: It is a matrix of meu(s). Each row in the  
              matrix corresponds to one meu.
              
              For univariate data: It is a vector if meu(s). Each element of the vector corresponds 
              to one meu.
              
          [3] Co-variance matrices 
          
              For multivariate data: dcem_out$sigma: List of co-variance matrices.
          
              For univariate data: dcem_out$sigma: Vector of standard deviation(s).
               
          [4] Priors: dcem_out$prior: A vector of prior.
          
          [5] Membership: dcem_out$membership: A vector of cluster membership for data.

How to access the help (after installing the package)

?DCEM
?dcem_test
?dcem_star_train
?dcem_train