
We aggregated the 3-dimensional py-GC-MS data of the samples for each of the nine categories. Modern plants (non-photosynthetic). Modern plants (photosynthesis). Fossil microorganisms (photosynthesis). Fossil coal, wood, and oil shale. Fossil animals. modern fungi. Carbonaceous meteorite. and a synthetic sample. These graphs display the peak intensities (vertical axis, normalized to the highest peak intensity in each category) of 3,240 elution time bins or “scans” (right axis) and their mass spectra over 150 m/z bins (left axis). — PNAS
significance
Elucidating biochemical information from ancient organic-rich sediments, especially the timing of the emergence of photosynthesis relative to the estimated oxygenation of the Earth’s atmosphere, remains a challenging opportunity. To address this question, we analyzed 406 diverse ancient and modern samples and used supervised machine learning to distinguish between samples of biological and abiotic origin and between photosynthetic and non-photosynthetic physiology.
Compare training data with organic-rich samples of uncertain affinity. Among the microbial samples are sedimentary rocks that are 3.33 billion years old, and rocks that are as old as 2.52 billion years old are associated with more recent photosynthetic life. Therefore, the application of supervised machine learning approximately doubles the period over which fossil organisms can be shown to retain molecular information about evolutionary relationships and physiology.
abstract
Throughout Earth’s history, organic molecules of both abiotic and biological origin have been buried in sedimentary rocks. Most of these organic molecules have been significantly altered by geological processes over time.
Nevertheless, the nature and distribution of these ancient fragmentary organic remains have the potential to reveal diagnostic biomolecular information after billions of years of burial. Here, we used pyrolysis gas chromatography and mass spectrometry to analyze 406 fossil, modern, meteorite, and synthetic samples.
We explored these analytical data via supervised machine learning methods to identify samples of biogenic and abiotic origin, plant-animal phylogenetic affinities, and photosynthetic and non-photosynthetic physiology.
By dividing 272 samples with known phylogenetic affinities and physiology into nine categories, each of which is further divided into a 75% training set and a 25% test set, our Random Forest model can distinguish between modern organic matter and fossil or meteorite organic matter (100% correct assignment), fossil plant tissue and meteorite organic matter (97%), and modern and fossil plant tissue (98%). Accurately predict the pairwise allocation of . and modern plant and animal tissues (95%). Pairwise comparisons between fossil biogenic and abiotic samples yielded 93% correct classification, and analysis of modern and ancient photosynthetic and non-photosynthetic samples also yielded 93% correct assignments.
Our analysis shows that molecular biosignatures can survive in ancient fossils, allowing the origin and trait identification of organisms. Consistent with previous morphological and isotopic inferences, we present evidence for biogenic molecular assemblages in Paleoarchean rocks (3.33 Ga) and photoautotrophy in Neoarchean rocks (2.52 Ga).
Pyrolysis – GC – MS and Supervised Machine Learning, Organic Geochemical Evidence for Life in Archean Rocks Identified by PNAS (Open Access)
astrobiology, astrogeology,
