scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Priority based functional group identification of organic molecules using machine learning

TL;DR: This is the first effort to address the priority based functional group identification of organic molecules problem using machine learning (ML), and a unique aspect of this study is the incorporation of domain specific information into the process of classification by employing a set of priority rules generated from expert knowledge.
Abstract: Functional groups in organic compounds determine the properties of the compounds/molecules. When multiple functional groups are present, the dominant functional group determines majority of the properties of the compound. Hence priority based identification of functional groups is an important problem in chemistry. Fourier-transform Infrared spectroscopy (FTIR) is a commonly used spectroscopic method for identifying the presence or absence of functional groups within a compound, and the current approach for this task mainly relies on visual inspection and analysis of the FTIR spectral data. However, such visual identification process by humans is error prone, especially when patterns in the FTIR spectrum overlap, resulting in loss of uniqueness of features which help in identification of different functional groups in the unknown sample. Therefore, the main goal of this paper is to develop a machine-learning based classification system which can perform priority based functional group identification of organic molecules. To the best of our knowledge, this is the first effort to address this problem using machine learning (ML), and a unique aspect of our study is the incorporation of domain specific information into the process of classification by employing a set of priority rules generated from expert knowledge. We have carried out extensive study on real IR spectral data, first using a rule based approach and then using ML in an effort to improve the classification accuracy. Our analysis indicates that the basic rule based method is reasonably effective in predicting the presence (or absence) of functional groups. However, such approach is practically not accurate enough for the more challenging problem of priority based identification, and ML based classification offers much higher identification accuracies in this case. The primary reason is that ML algorithm can adaptively exploit data patterns to classify the functional group unlike the rule-based approach which uses a fixed set of rules for the said purpose. Finally, we have also carried out extensive statistical analysis of the results by using confidence intervals and permutation tests, in an effort to gain more descriptive information about the learning process, and not simply treat it as a black box.
Citations
More filters
01 Mar 1992
TL;DR: In this paper, the authors present a survey of the state-of-the-art technologies used in the field of data collection and analysis in the context of data mining.http://www.gramota.net/materials/3/2017/9/58.html
Abstract: КРИЗИС ВЛАСТИ В КАБАРДИНО-БАЛКАРИИ В АВГУСТЕ 1991 ГОДА Статья посвящена исследованию влияния путча 1991 г. на этнополитическую обстановку в Кабардино-Балкарской Республике. Анализируется процесс "включения" национального фактора в разгоревшееся политическое противоборство, вылившееся в кризис власти. Авторами впервые сформулирована и обоснована мысль о том, что события августа 1991 г. привели к объединению к тому времени уже нелояльных по отношению друг к другу национальных движений "титульных" народов Кабардино-Балкарии, что нашло выражение в выдвижении общего требования отставки первых лиц республиканской власти. Адрес статьи: www.gramota.net/materials/3/2017/9/58.html

160 citations

References
More filters
Book
01 Jan 1963
TL;DR: In this paper, a sequence of procedures for identifying an unknown organic liquid using mass, NMR, IR, and UV spectroscopy is presented, along with specific examples of unknowns and their spectra.
Abstract: Presents a sequence of procedures for identifying an unknown organic liquid using mass, NMR, IR, and UV spectroscopy, along with specific examples of unknowns and their spectra,

11,753 citations

Book
18 Jun 2004
TL;DR: This new edition of this highly successful manual is not only a revised text but has been extended to meet the interpretive needs of Raman users as well as those working in the IR region, creating a uniquely practical, comprehensive and detailed source for spectral interpretation.
Abstract: The third edition of this highly successful manual is not only a revised text but has been extended to meet the interpretive needs of Raman users as well as those working in the IR region. The result is a uniquely practical, comprehensive and detailed source for spectral interpretation. Combining in one volume, the correlation charts and tables for spectral interpretation for these two complementary techniques, this book will be of great benefit to those using or considering either technique.In addition to the new Raman coverage the new edition offers:* new section on macromolecules including synthetic polymers and biomolecules;* expansion of the section on NIR (near infrared region) to reflect recent growth in this area;* extended chapter on inorganic compounds including minerals and glasses;* redrawn and updated charts plus a number of new charts covering data new to this edition.This new edition will be invaluable in every industrial, university, government and hospital laboratory where infrared (FT-IR) and Raman spectral data need to be analysed.

6,428 citations

Book
01 Oct 2000

4,025 citations


Additional excerpts

  • ...We have implemented four supervised machine learning algorithms such as Mulitlayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest Classifier (RFC), and compared the results for this problem [5]....

    [...]

Reference EntryDOI
15 Sep 2006
TL;DR: In this paper, the authors present a first-pass interpretation of the infrared spectrum of a molecule, based on structural features of the molecule, whether they are the backbone of the molecules or the functional groups attached to the molecule.
Abstract: The vibrational spectrum of a molecule is considered to be a unique physical property and is characteristic of the molecule. As such, the infrared spectrum can be used as a fingerprint for identification by the comparison of the spectrum from an “unknown” with previously recorded reference spectra. This is the basis of computer-based spectral searching. In the absence of a suitable reference database, it is possible to effect a basic interpretation of the spectrum from first principles, leading to characterization, and possibly even identification of an unknown sample. This first principles approach is based on the fact that structural features of the molecule, whether they are the backbone of the molecule or the functional groups attached to the molecule, produce characteristic and reproducible absorptions in the spectrum. This information can indicate whether there is backbone to the structure and, if so, whether the backbone consists of linear or branched chains. Next it is possible to determine if there is unsaturation and/or aromatic rings in the structure. Finally, it is possible to deduce whether specific functional groups are present. If detected, one is also able to determine local orientation of the group and its local environment and/or location in the structure. The origins of the sample, its prehistory, and the manner in which the sample is handled all have impact on the final result. Basic rules of interpretation exist and, if followed, a simple, first-pass interpretation leading to material characterization is possible. This article addresses these issues in a simple, logical fashion. Practical examples are included to help guide the reader through the basic concepts of infrared spectral interpretation.

3,824 citations


"Priority based functional group ide..." refers background in this paper

  • ...Fourier-transform Infrared (FTIR) spectroscopy is an important, commonly used spectroscopic method for identifying the presence or absence of functional groups within a compound and thereby helps in the structural identification of unknown molecules [2, 4]....

    [...]

Journal ArticleDOI
TL;DR: This manuscript brings together some of the leaders in this field to allow the standardization of methods and procedures for adapting a multistage approach to a methodology that can be applied to a variety of cell biological questions or used within a clinical setting for disease screening or diagnosis.
Abstract: IR spectroscopy is an excellent method for biological analyses. It enables the nonperturbative, label-free extraction of biochemical information and images toward diagnosis and the assessment of cell functionality. Although not strictly microscopy in the conventional sense, it allows the construction of images of tissue or cell architecture by the passing of spectral data through a variety of computational algorithms. Because such images are constructed from fingerprint spectra, the notion is that they can be an objective reflection of the underlying health status of the analyzed sample. One of the major difficulties in the field has been determining a consensus on spectral pre-processing and data analysis. This manuscript brings together as coauthors some of the leaders in this field to allow the standardization of methods and procedures for adapting a multistage approach to a methodology that can be applied to a variety of cell biological questions or used within a clinical setting for disease screening or diagnosis. We describe a protocol for collecting IR spectra and images from biological samples (e.g., fixed cytology and tissue sections, live cells or biofluids) that assesses the instrumental options available, appropriate sample preparation, different sampling modes as well as important advances in spectral data acquisition. After acquisition, data processing consists of a sequence of steps including quality control, spectral pre-processing, feature extraction and classification of the supervised or unsupervised type. A typical experiment can be completed and analyzed within hours. Example results are presented on the use of IR spectra combined with multivariate data processing.

1,340 citations


"Priority based functional group ide..." refers background in this paper

  • ...Fourier-transform Infrared (FTIR) spectroscopy is an important, commonly used spectroscopic method for identifying the presence or absence of functional groups within a compound and thereby helps in the structural identification of unknown molecules [2, 4]....

    [...]

Trending Questions (1)
How do phenol functional group influence properties of organic molecules?

The phenol functional group influences the properties of organic molecules due to its unique characteristics, impacting reactivity, solubility, and other chemical behaviors in the compound.