Authors: Julio Trevisan Mr, Plamen P. Angelov Dr. & Francis L. Martin Dr.
One attractive possibility of infrared (IR) spectroscopy is that it may be applied to investigate class (i.e., treatment, tissue type, etc)-specific alterations in the absorption signature. Such alterations can act as biomarkers of mechanism associated with pathways or effects. One may be interested in investigating such alterations from different standpoints including: (1) intensity; (2) statistical significance; and, (3) composite (multi-spectral-region) alterations. These three view-concepts were implemented computationally and named BM1, BM2 and BM3. They can be easily applied to datasets of classed IR spectra through a user-friendly MATLAB interface.
BM1. Most intuitive of the methods. The mean spectrum from a given class is subtracted from the mean spectrum from a reference class (e.g., “vehicle control”) thus obtaining a “difference-between-means curve”.
BM2. Each variable (wavenumber) is taken at a time as input to a univariate linear classifier thus obtaining a per-wavenumber “classification rate curve” (1). Cross-validation is used to determine classification rates. This method is close to the t-test criterion (2), but more precise.
BM3. When multiple variables are assessed together, the joint-best variables for classification may differ substantially from the rank of the individual best variables. This method generates a histogram that represents how many times each wavenumber appeared within the TopVars (method parameter: number of “best variables”) “best variable set” achieved through feature selection, which is repeated many times according to NoBootstraps (method parameter: number of validation bootstraps).
The aim of this protocol is identify and visualize class-related biomarkers in IR spectral datasets by means of a simple sequence of steps to be executed under a user-friendly interface (Figure 1). Two visual representations are provided where all BM results are presented concurrently allowing for comparison of results generated by each method.
Running this protocol will require:
Preparing data files
BMTool can read text files in CSV format (http://tools.ietf.org/html/rfc4180). A sample file is provided (“txt/vc_mnng.txt”), which can also be opened by Excel (www.microsoft.com), OpenOffice (www.openoffice.org) or a similar program. BMTool expects CSV files to be internally organized in the following way:
IR spectra represented in the file need to be pre-processed. Commonly-employed pre-processing sequences are: baseline correction followed by normalization to the Amide I peak3,4 (or Amide II peak 5) or second differentiation followed by vector normalization6.
Alternatively, data can be imported from a spectroscopy database, a resource that is intended to be made publicly available in the future.
3.Using bmtool – the GUI contains a set of panels that are numbered to facilitate the following operation steps. These steps go from loading a dataset to visualizing analysis results:
Time is dependent on the computer setup, number of spectra, number of variables (i.e., wavenumbers) in the dataset, and choice of method parameters. Times reported below result from the following settings: Intel Core i5-750 processor and a dataset conta
Results from the sample data file are presented. The sample data file contains two treatment regimens (VC and N-nitroso-N-methylnitroguanidine (MNNG)) in Syrian hamster embryo (SHE) cells. The application of the BMs to this dataset allows one to investigate the effects of MNNG compared to corresponding control in SHE cells.
Figure 3a shows BM1, BM2 and BM3 curves with their respective five most important peaks marked. Peaks that are present in two different BM curves within a distance of 25cm-1 are connected by a dashed line that signifies confirmation of the importance of the respective IR region.
Figure 3b shows a compact version of the previous in a plot named biomarker-localization (BL) plot, where only the markers from Figure 3a are retained and symbol size is proportional to peak intensity.
In addition, a BL plot containing seven comparisons of different treatment conditions vs. VC in a SHE study is shown in Figure 3c. The BL plot allows for quick visualization of biomarker weighting and comparison between classes in a class-rich dataset.
This work was funded by Unilever as “part of Unilever’s ongoing effort to develop novel ways of delivering consumer safety”.
Figure 1: Schematic of the biomarker identification protocol implemented in the toolkit.
Infrared spectra, pre-processed and classed (i.e., each spectrum is assigned a class, e.g., “vehicle control”, “treatment 1”, “treatment 2” etc), are inputted into three different biomarker identification methods (BM1, BM2 and BM3). Each method individually generates a result curve (see text). Results are combined by means of visualization strategies.
Figure 2: Screenshot of the main window of BMTool.
This graphical user interface contains numbered panels organized in the most probable operating sequence.
Figure 3: Anticipated results.
_ BM1, BM2 and BM3 curves with their respective five most important peaks marked. Peaks that are present in two different BM curves within a distance of 25cm-1 are connected by a dashed line that signifies confirmation of the importance of the respective infrared region. _(b) Compact version of the previous in a plot named biomarker-localization (BL) plot, where only the markers from Figure 3a are retained. (c) Another BL plot containing all seven comparisons between treatment condition and VC of an eight-treatment-regimen Syrian hamster embryo (SHE) study; allows for quick visualization and comparison between classes in a class-rich dataset.
Julio Trevisan Mr & Plamen P. Angelov Dr., Department of Communication Systems, Lancaster University, Lancaster LA1 4WA, UK; Centre for Biophotonics, Lancaster Environment Centre, Lancaster University, Lancaster LA1 4YQ, UK
Francis L. Martin Dr., Centre for Biophotonics, Lancaster Environment Centre, Lancaster University, Lancaster LA1 4YQ, UK
Source: Protocol Exchange (2010) doi:10.1038/nprot.2010.97. Originally published online 4 May 2010.