BEYOND PREDICTION: GAINING PEDOLOGIC KNOWLEDGE FROM SENSORS AND MACHINE-LEARNING MODELS
pedometrics; soil mapping; soil sensing; proximal sensing; remote sensing; machine-learning; explainable AI; pXRF; Vis-NIR; MIR; soil organic carbon; soil texture.
Soils are heterogeneous and vary greatly through space and time. Recently, soil sensing has become popular, facilitating the assessment of soils and their properties, especially when combined with machine-learning models. This thesis explores the application of proximal and remote sensors in the characterization of soil profiles, optimization of soil sampling, and in the estimation of soil properties. In addition, the reliability of black-box models is assessed using machine-learning interpretation tools. Portable X-ray fluorescence (pXRF), visible and near-infrared diffuse reflectance spectroscopy (Vis-NIR), mid-infrared spectroscopy (MIR), the Nix colorimeter, and Sentinel-2 imagery were used in several studies aiming to characterize soils and estimate their properties. Partial differential plots, individual conditional expectation, permutation feature importance, and Shapley values were applied to investigate the reliability of soil texture and organic carbon estimation models. When applied to the characterization of soil profiles, Vis-NIR and MIR were able to identify variations in soil organic matter and mineralogy, whereas pXRF unveiled lithologic discontinuities in soil profiles. Temporal trends extracted from Sentinel-2 imagery acted as proxies of soil moisture and vegetation variations that allowed for the optimization of soil sampling. Sampling grids guided by remotely sensed temporal trends coupled with conditioned Latin hypercube sampling represented soil variability with fewer samples compared to conventional random and grid sampling strategies. When using 1749 soil samples from five countries, pXRF was capable of estimating sand, silt, and clay contents accurately (R2>0.84), as well as soil organic carbon (R2=0.74), regardless of the very different climate, parent materials, and weathering intensities. Machine-learning model interpretation techniques indicated that models learned physically sound relationships between covariates and soil texture. Si was mostly related to quartz contents in the sand fraction of soils, while Fe and Al acted as fingerprints of clay content. Silt was particularly difficult to model using pXRF data, likely because it lacks characteristic minerals present only in the silt fraction. Si, Fe, and Al were the most relevant chemical elements for estimating soil texture. The use of machine-learning interpretation methods is advised to diagnose potential inconsistencies in models and to learn about the relationships between covariates and soil properties. Soil sensing has come a long way and has proven reliable. Sensors and machine-learning combined can reliably be applied to soil characterization and mapping if coupled with sound modeling techniques.