Pedologic interpretation of machine-learning models used in digital soil mapping.
Pedometrics, pedology, sand fraction, clay fraction, soil organic carbon, pXRF, explainable AI, interpretable machine-learning.
Machine-learning algorithms are capable of modeling complex soil phenomena using many environmental covariates, but bear the drawback of being hard to interpret. The goal of this work is to use model interpretation methods to make machine-learning models more transparent and explainable, aiming to learn from relationships found by models between environmental covariates and soil properties. The chemical composition of soil samples were measured using portable X-ray fluorescence (pXRF) and combined with topographic attributes to train machine-learning models to estimate sand, clay, and soil organic carbon contents (SOC). The utilized interpretation methods include partial dependence plots (PDP), accumulated local effects (ALE), and Shapley values. Preliminary results showed that the total contents of Si, Al, Fe, and K were the main drivers of soil texture and differed substantially between soils derived from gabbro and gneiss, attesting the influence of parent material even after long and intense weathering. Total Ca contents explain most of SOC variability, but this relationship cannot be easily generalized due to Ca contents being highly influenced by land use. Results so far indicate that machine-learning interpretation methods can make information produced by models more accessible to the scientific community, increasing the reliability and trust in models, turning models into sources of pedologic knowledge instead of utilitarian prediction tools.