Home arrow Mass Spectra Classification arrow Mass Spectra Classification
Mass Spectra Classification

The primary goal of spectra classification is to find correlation between the properties of compounds and their mass spectra. Because physical and chemical properties and biological activities of chemical compounds are to a large extent a function of molecular structure, the results of classification analysis reflect structural features that are determined by fragmentation ions appearing in a mass spectrum. From the user viewpoint the important advantage of classification methods is the fact that the user does not require detailed knowledge of the complex spectra-structure relationship to get satisfactory results. Classification strategy in Mass Frontier is based a user-friendly graphic presentation of the results, which can be easily viewed on the screen.

Mass Frontier contains three classification methods: Principal Component Analysis (PCA), Fuzzy Clustering, and Self-Organizing Maps (SOM) , which is a special class of Neural Networks. These methods are based on different principles and allow the user to explore complex data from various perspectives. PCA uses multivariate statistics, Fuzzy clustering assigns data to clusters and SOM is based on competitive learning.

Image

In the multivariate statistic each spectrum can be considered as a single point in an n-dimensional space, with the intensities being the coordinates of this point. A dimension (axis) of that space represents a mass-to-charge ratio m/z of the considered peak. Therefore, the dimensionality is determined by the m/z value of the last peak in the spectrum. For example the EI spectrum of hydrogen exhibits two peaks at m/z =1 (intensity 2%) and m/z = 2 (100%). This spectrum can be viewed as a point in a two dimensional space with the coordinates [2, 100]. In reality we deal with spectra which have a far higher dimensionality than two. If the dimensionality is too high, or several coordinates are equal to zero (usually a mass spectrum does not have peaks at every m/z value), the classification methods may not provide the results we require. Therefore, a reduction of dimensionality is carried out either before a spectrum is placed in n-dimensional space, or during the classification process.

The basic hypothesis of multivariate statistical methods is the assumption that the distance between points (spectra) in an n-dimensional space is related to a relevant property of the compounds which represent these points. If the points are close enough to form a cluster or a separated region, we can assume that the compounds that correspond to these points exhibit common or similar properties. To ensure the results of the classification methods have statistical significance, we should place a large amount of spectra (usually one or more groups, each with 10 - 1 000 spectra) in the same n-dimensional space. Then, we apply multivariate statistical methods, with various parameters, in order to evaluate these points (spectra). The objective of a classification process is to separate these points (spectra) into two or more classes according to the desired structural or other properties.