R logo@DePauw.edu   DPU logo

       The community of R users at DePauw


ChemoSpec

ChemoSpec is an R package for the chemometric analysis of spectra.

It consists of functions for plotting spectra (NMR, IR etc) and carrying out various forms of exploratory data analysis, such as HCA and PCA. The design allows comparison of data from samples which fall into groups such as treatment vs. control. Robust methods appropriate for this type of high-dimensional data are available. ChemoSpec is designed to be very user friendly for people with limited background in R. Considerable effort was made to ensure consistency across the various functions and plots.

You can access the tarball and source files for ChemoSpec at GitHub.  The tarball is also available at R-Forge (note I am having subversion issues at R-Forge so use GitHub for now). ChemoSpec is composed of only R source files, nothing is complied. Hence, it should be platform independent.

Some of the plots that ChemoSpec can create are shown here.  Click on the small version to see a larger version. These were created using a built-in data set of IR spectra of plant cuticles.  In addition to creating plots such as these, the data sets may be edited to remove particular samples, or to remove particular frequency ranges.  Binning of the frequency data is also provided.  For more information, download and install the package and check out the documentation.  Questions, comments or suggestions, please e-mail.

Plotting Spectra

Spectra may be plotted offset or overlaid. The offset can be specified, as can the vertical magnification.  The location of the sample name label can be controlled, including not plotting it at all.  The color-coding is automatically generated during the import of the data and is user specified.

Principal Components Analysis (PCA) Score Plots

Either classical or robust PCA can be carried out using various scaling options.  Which PC to plot on each axis can be specified.  Ellipses correspond to the groupings specified during data import and use the same color scheme as the samples. The ellipses may be drawn based upon classical or robust methods (not to be confused with how the PCA is conducted), or they may be omitted.  Each point can be labeled with its sample number.  The labeling can be controlled to label all points or just the most extreme points.  A key and information about the data processing are automatically generated.

Scree Plot of PCA Results

This is the typical means of determining how many PCs are needed to describe the data.  Both individual and cumulative contributions of each PC to the variance described are plotted.  The 95% line is dotted as many researchers consider this a good explanatory threshold. A notation is made about the data processing history.

Bootstrap Analysis of the Number of Principal Components

A bootstrap or cross-validation approach is taken in which some samples are used the compute the PCs and others are used to check the results.  This is an alternate means of deciding how many PCs should be kept for further work.  Bootstrap analysis is only available for classical PCA.

PCA Diagnostics

Two different plots are available for PCA diagnostics, as a means of identifying potential outliers.

Loadings Plots

Loadings to be plotted may be specifed, as can a reference spectrum.  The gap that you see here in the loadings is due to some data being removed from regions in the IR spectrum that don't carry any information.

Heirarchical Cluster Analysis (HCA)

HCA can be performed using the clustering options available in R.  The result is plotted with the same color coding by sample as in the other plots.  The clustering method employed is also plotted.

3D display of PCA Scores

ChemoSpec can display 3D plots of scores, with a simple mechanism to change the view.  Much better, ChemoSpec data can be passed to the interactive display system GGobi for tours of any number of PCs, including projection pursuit methods.  The resulting views can be saved to a graphics file. 

Projection pursuit from GGobi


This page maintained by Bryan Hanson, Dept of Chemistry & Biochemistry. Last update: October 22, 2009