Harnessing the complexity of LC-MSn data

Robert Mistrik

HighChem, Ltd., Cajakova 18, 81105 Bratislava, Slovakia

Abstract

HPLC coupled with tandem mass spectrometry is rapidly becoming the technique of choice for high-throughput identification of small molecules. However, the complexity of samples and the number of resulting spectra preclude the manual analysis of the generated data. The advent and proliferation of novel computer-based procedures have enabled the efficient processing and interpretation of the large volume of convoluted data acquired using LC/MSn techniques. In this lecture we will present systematic data analysis of LC/MSn runs derived from complex biological samples using a multi-step computational approach. This multi-step process begins with a novel preprocessing method for component detection and MSn spectra deconvolution from data-dependent experiments. To optimize component detection and subsequent compound identification, it is important to extract spectra from all the generated MS stages. The logical data structure that best reflects spectra dependencies is a tree and this has been adopted as the standard data structure in all our data systems. Recent software advancements offer management and database processing capabilities for MSn spectral trees.

Deconvoluted LC/MSn data is processed in three steps. The first step involves the library searching of detected spectral trees in commercially available and user-created ESI-IT data collections. To confirm compound identification, the data is analyzed using principal component analysis. An example will be given which demonstrates the confident identification of a flavonol glycoside in three different fruits of the genus Citrus in complex LC/MS3 runs.

Due to the vast structural diversity of small molecules and the statistically low probability of identifying a compound using library searching, the next interpretative step can often not be avoided. The effective interpretation of mass spectral data ultimately requires an understanding of the mechanisms of gas-phase ion fragmentation. In order to advance the understanding of the ion chemistry, the mechanistic fragmentation knowledge was systematized in a comprehensive database collated from printed media dedicated to mass spectrometry. The accumulated knowledge regarding fragmentation mechanisms based on advanced algorithms can be utilized to generate the fragmentation pathways. Although this approach requires human interaction, it promises to improve the effectiveness with which the data is interpreted in a high-throughput environment. The second example given will be metabolites identification via a fragmentation library.

In studies involving a large number of complex samples (e.g. System Biology, Metabonomics, Dereplication), the third step in the data analysis chain can be applied. Since samples from a related origin exhibit repetitive components, cross-matching technique may prove to be significant. The lecture will discuss the development and implementation of chromatographic libraries containing data-reduced chromatograms with deconvoluted LC/MSn spectral trees. Retention times and TIC profiles, in addition to selected scans or detected component spectra or trees can be exported to user libraries and searched using various criteria. Extensive annotation capabilities and various correlation algorithms are essential for cross-matching studies. An example from dereplication analysis will be given and the database technology applied will be discussed.

Contact: This e-mail address is being protected from spam bots, you need JavaScript enabled to view it