A New MSn Database Concept With Sample Library

Robert Mistrik1; Alexej Nikiforov2; Ernst Pittenauer3; Milos Suchy1 and Juraj Lutisan1

1HighChem, Ltd., Cajakova 18, 81105 Bratislava, Slovakia; 2Institute of Organic Chemistry, University of Vienna, Waehringerstr. 38, 1090 Vienna, Austria; 3Austrian Agency for Health and Food Safety, Agricultural Inspection Services and Research Vienna, Spargelfeldstr. 191, 1226 Vienna, Austria

Overview

A software concept of a modern mass spectral database is described. A sample library was collected to demonstrate the capabilities of the presented database concept. The most import feature of our database solution is the ability to manage all kinds of heterogeneous mass spectral data. This solution allows flexible searches to be conducted, connecting data fields with external sources, annotating peaks, spectra and chromatograms, and dynamically changing and adding data fields. We aim to set a standard for mass spectral libraries with this database concept.

Introduction

Because of the complexity of the processes occurring in the mass spectrometer, the spectra-library search is still considered the most reliable method for identification of the unknown. The need for rapid compound identification together with emerging new mass spectral techniques creates a growing demand for libraries that can manage various types of data. We decided to develop a new database concept that will cover a broad range of mass spectrometric techniques. Sample collection of ESI/MSn spectra was created and stored in the newly developed database.

Database features

Open Access Format – The database structure will be freely available from www.highchem.com. Limited free support will be provided by HighChem Ltd.
Microsoft SQL Server – A recognized and broadly accepted relational database system. There are no licensing fees on this level. Low overhead database.
Vendor, Instrument and Software Independent – The database format is broadly applicable and independent of instrument or software. Parameter-matching values can be stored for each record. Standardized Data Format – To protect the database compatibility, the key spectral and chromatographic data must comply with suggested database format. We are open to any suggestions and ideas while this field is under development.
Dynamic Data Fields – Customizable data fields to suit individual needs. Users and software companies can adjust or expand data fields.
Highly Annotated Library – Spectral and chromatographic data can be associated with additional information (experimental, compound characteristic, biochemical data, etc.). These fields may be expanded by the user without additional programming.
MSn Capable – Data from tandem experiments are stored in spectral trees and are hierarchically consistent. Parallel product spectra may be stored (various collision energies, average and composite spectra, SCID spectra, wide-band activation spectra, zoom spectra, various isolation width ...).
Low & High Resolution – Storage and search capabilities of both high and low resolution data are available. To speed up the search process pre-screening data are automatically generated.
Peak Type Assignment – For correct library search uncharacteristic ions from LC/MS experiments should be manually flagged when creating reference data (cluster or adduct ions, dimers, doubly-charged ions). Those ions will be processed with different metrics.
Storage of chromatograms – Data-reduced or whole chromatograms of data-dependent, product ion or full scan chromatograms may be maintained and searched. Chromatographic peaks can be annotated.
Spectrum-Chromatogram Equivalence Principle – GC/ or LC/MS chromatograms are considered as a set of spectra in spectra-searching. This feature is useful for libraries of unidentified compounds (metabolomics). This capability allows target-compound analysis in a series of chromatograms.
Completely Searchable – Everything is searchable using a simple or a combined search. The spectra-search algorithm for MSn data is provided separately and will not be included in the database. The standard NIST search algorithm will be directly implemented in the library.
Fully Structurally Oriented – Chemical structures can be assigned to every record, MSn tree and individual mass spectral peak. Structures are stored in MDL mol format (small molecules) and Brookhaven PDB format (proteins). As with the spectra-search, the structure search algorithm is not provided with the database.
Fragmentation Patterns – Fragmentation and rearrangement-mechanisms can be assigned to every mass spectral peak from fragmentation library (feature possible with Mass Frontier 4.0).


MSn Data Tree Structure

The Sample ESI/MSn library

The initial collection contains 300 MSn spectra of commercially available human and veterinary pharmaceuticals. The goal of this data collection is to assist toxicologists in the identification of drugs and their metabolites in biological samples. Since the widely used pharmaceuticals are increasingly becoming environmental pollutants, this library can be used to identify unknown compounds in environmental matrices.

All measurements were performed using a Thermo Finnigan LCQ Deca XP (Thermo Finnigan, San Jose, CA, USA) fitted with an orthogonal electrospray ion source in positive- and/or negative-ion mode. Full scan-spectra were taken from the mass range of interest by maximizing the molecular ion species ([M+nH]n+, [M+nCat-(n-H)]+, [M-nH]n-) by autotune (n = 1,2,3; Cat = NH4+, Na+, K+). Source CID-spectra were taken by manual adjustment for maximum product ion-intensity. The relative collision energy for MS2 as well as MS3 spectra was optimized by reducing the original precursor ion-intensity manually to roughly 5-20 (30) % relative abundance, compared to the product ion representing the base peak (i.e., 100%), either with or without wideband activation. A minimum of 30 scans (1 scan representing 3 microscans) in the centroid mode was acquired for all experiments. Further experimental parameters are: roughly 1-5mg analyte per 100ml solvent (either water : methanol or water : acetonitril = 1 : 1, with or without additive (additives are 0.1% acetic acid, 10mM ammonium acetate, 0.1% ammonia, etc.), flow rate: 5-10 µl/min, capillary temperature: 150-300°C depending on the chemical nature of the analyte; isolation width: typically 4 mass units; and spray voltage: 4-5 kV.

All cluster, adduct, dimer and doubly-charged ions were manually annotated to produce high quality reference data. The search algorithm processes those peaks with special metrics. The sample library was used as a training set for the development of the presented database concept. The concept was continuously adjusted as the library size and complexity increased. In the near future, we plan to assign fragmentation mechanisms that will be stored in the fragmentation library (different project) to prominent peaks in this library. This sample library demonstrates a highly sophisticated mass spectral data collection.


An example of user interface design for presented database concept

Please join our initiative

We would like to encourage scientists and software companies to join our initiative for a new database format that can become a standard for mass spectral libraries. This database format should be instrument- and vendor-independent. The database format specifications along with and empty database will be available for download from www.highchem.com/database without the database accessing software. We are open for any ideas and suggestions that might be important for specific applications.

Image
Data relationship diagram in presented database

 

Contact: This e-mail address is being protected from spam bots, you need JavaScript enabled to view it