place of learning
Yulin City (Figure 1) is an inland city located in the northernmost tip of China's Shaanxi Province, with an area of 42,920.2 square kilometers. The city's topography ranges from highlands in the northwest to lowlands in the southeast, with significant elevation changes. Yulin has a mild, arid to semi-arid continental monsoon climate with four distinct seasons. This region consists of sandy beaches in the north and loess hills and canyons in the south.In recent years, with rapid economic growth, Yulin's ecological environment has significantly improved. However, given the city's location on the border of the Loess Plateau and the Moorish Sands, this progress was accompanied by the challenge of balancing economic and social development with ecological protection. It belongs to the category of typical agro-pastoral ecosystems, characterized by increased ecological sensitivity and relatively fragile environmental ecosystems. Addressing the long-term challenge of harmonizing economic and social development with ecological environment conservation and guiding the layout of urban space will be Yulin City's central concern. The region serves as an important energy and chemical base in China, an important ecological shield in the middle reaches of the Yellow River basin, and a model area for ecological civilization within the Loess Plateau, and is unique and multifaceted in the broader context. plays a role.

The study area is Yulin City, Shaanxi Province, China: (be) Administrative map of China, (b) Administrative map of Shaanxi Province, (c) Digital elevation map of Yulin City. Note that we use QGIS software (https://qgis.org/en/site/, version: 3.34) for plotting.
Data sources and processing
The study draws on diverse data sources, including basic geographic, land, natural resource, location, socio-economic, and climate data, all taken from 2020. Using QGIS 3.34, the coordinate system is standardized to the CGCS2000 projection and the spatial scale is: 90 m x 90 m, resulting in 5,298,221 rating units. See Table 1 for data source and processing details. Land use data was reclassified into six categories using QGIS 3.34: cultivated land, forest land, grassland, water body, constructed land, and unused land. We created annual precipitation distribution data by spatially distributing precipitation data using Kriging interpolation. Location condition data were generated using Euclidean distance analysis to represent the distance distribution of different road types. The population data were revised based on the 2020 Yulin City Statistical Yearbook according to the method outlined by Liu and Hu.39. Points of interest (POI) data were converted to spatial data using kernel density analysis.
Feature engineering
Following the principles of scientific rigor, representation, and accessibility in feature selection, important feature elements are listed in Table 2 (Supplementary Figure S1 online). Drawing insights from previous studies and accounting for the unique ecological dynamics of the Yulin region, we have wisely selected a comprehensive set of 22 elements. These include important indicators such as soil erosion, NDVI, and NPP. Regarding ecological space, we carefully selected land resources and natural resources to form the characteristics of the evaluation.3, 16, 25. In the area of agricultural space, we focused on land resources, water resources, weather conditions, and environmental variables that constitute the evaluation function of agricultural space.17,27We drilled down into the urban space bounded by defined development parameters, incorporating basic geographic data (topography and slope), socio-economic indicators collected from a Point of Interest (POI) dataset, and climatic data from an assessment function.19,20.
Interpreting the model
-
(1)
Logistic regression A linear model used to solve classification problems. It models the probability of an event occurring by taking the logarithm of the odds (log odds), which is a linear combination of one or more independent variables.Excellent performance for exploring linear relationships32.
-
(2)
naive bayes A simple and effective classification algorithm based on Bayes theorem. Assigns instances to categories with the highest probability by leveraging the assumption of independence between features. It is suitable for large datasets and high-dimensional feature spaces, and is relatively simple and shows good adaptability to high-dimensional data.twenty two.
-
(3)
Gradient Boosted Decision Tree (GBDT) GBDT is an ensemble learning method that improves model performance through iterative training of decision trees using gradient boosting. It can be applied to regression and classification problems and captures nonlinear relationships in the data. Powerful fitting capabilities provide excellent performance even on complex datasets.33.
-
(Four)
Random Forest (RF) An ensemble learning method that reduces overfitting by training multiple decision trees and aggregating their voting results. RF is used for both classification and regression and shows good performance on high-dimensional and large-scale datasets.21,41.
-
(Five)
Artificial Neural Network (An) A model designed based on a neural network architecture that allows complex nonlinear modeling with multiple layers of neurons. It is suitable for tasks such as large-scale data handling, image processing, and natural language processing, and exhibits powerful fitting capabilities.5,42,43.
-
(6)
Self-Attention Residual Neural Network Diagram (SARes-NET)
We designed a deep neural network model structure based on the characteristics of the data, as shown in Figure 2. Select 22 features, including basic geographic data such as topography, slope, slope, and land resource data such as soil erosion and soil. Natural resource data such as texture, land use, NPP, NDVI, rainfall, location data such as proximity to roads and rivers, socio-economic data such as population density, night lighting, various commercial and service aggregation points, weather Data such as annual average temperature and humidity (Supplementary Figure S1). These features form a feature vector as input to the network. The network first increases feature dimensionality to 100 through fully connected layers. Then, it performs nonlinear computation through two self-attention residual modules to obtain feature vectors with higher-level semantic information. Then, the vector dimension is reduced to 50 and the features are densified by a set of fully connected layers and another self-attention residual module. The resulting feature vector of dimension 50 is fed through a fully connected layer to calculate scores for the three land use categories. Finally, a softmax function converts these scores into predicted probabilities for ecological, agricultural, and urban land use categories. The sum of the probabilities for the three categories is 1.

Self-Attention Residual Neural Network Diagram (SARes-NET), utilizes 22 features to form a feature vector. The network increases the feature dimension to 100 and then reduces the dimension to 50 through two self-attention residual modules for nonlinear computation. This densified feature vector is processed by a fully connected layer to calculate scores for the three land use categories and convert them to predicted probabilities in the following way: Softmax function.
experimental design
This study integrates diverse geospatial data to assess the suitability of ecological, agricultural, and urban spaces (see Supplementary Figure S1 online). Evaluation metrics were first selected, and subsequent data preprocessing facilitated the construction of a comprehensive sample set. To evaluate the EAU space, a comparative validation was performed using a series of models including ANN, GBDT, LR, SARes-NET, NB, and RF algorithms. This comparison aims to identify the best model. The selected superior models, determined through the validation process, were used for detailed evaluation of the EAU space, as shown in Figure 3.

Workflow of the proposed work: We compared SARes-NET with five other models and selected the best model for spatial assessment of three spaces.
Building the dataset
(1) Multi-source spatial data aggregation: Uniformly aggregate the data into a spatial resolution 90 m × 90 m grid and construct a feature matrix. The entire study area is subdivided into 5,298,221 units. (2) EAU Spatial Sample Division: Ecological protection red lines, nature reserves, and drinking water sources are important ecological areas designated by the Chinese government. In this study, these regions serve as ecological spatial samples (1,052,712). The agricultural space sample (983,306) was selected within permanent basic agricultural land, while the urban space sample (60,961) was selected within the existing built-up area of the city due to the perceived limited potential for change. is focused on. (3) Dataset partitioning based on stratified sampling: To ensure consistency between test and training samples, we use 7:3 stratified sampling to partition the samples into training and testing sets. The training set is used to train the model, and the test set is used to validate the model and evaluate its performance.
model evaluation merrick
To comprehensively evaluate the performance of the model, this study combines the confusion matrix and the ROC curve with AUC area to evaluate the classification performance of the model.44 (See Supplementary Methods). Based on the confusion matrix, the accuracy (ACC), precision (PRE), recall (REC), and kappa coefficient of the models were calculated separately and the performance was compared.45. The specific meaning of these indicators is detailed in the Supplementary Methods.
Implementation details
The model was trained on a 64-bit Windows 10 operating system using the Python programming language, Pytorch 1.12.0 deep learning framework, and CUDA 11.3. The GPU model used was NVIDIA RTX A4000 (16 GB VRAM), the CPU model was an 8-core, 16-thread Intel(R) Xeon(R) W-2245 CPU @ 3.90 GHz, and the total memory size was 64 GB. .
mutual information law
Rooted in the field of information theory, mutual information (MI) was first introduced by Claude Shannon in 1948 in his landmark paper “A Mathematical Theory of Communication.''46 Even if he didn't explicitly refer to it as “mutual information.” The term “mutual information” was later coined by Robert Fano.47 MI serves as a measure designed to quantify the level of information exchange between two random variables. Mathematically, MI computes the Kullback-Leibler divergence between the product of the joint probability distribution of two random variables and the marginal probability distribution of these two variables. In real-world applications, MI is used extensively across different domains, especially in machine learning and data mining. It is often used as a feature selection method to evaluate the importance of features and select the most relevant features by measuring the mutual information between the features and the target variable.18,48 The calculation formula is as follows.
$${\text{MI}}({{\text{x}}}_{{\text{i}}};{\text{y}})=\sum_{{{\text{x}} }_{{\text{i}}}}\sum_{{\text{y}}}{\text{p}}({{\text{x}}}_{{\text{i}}} ,{\text{y}}){\text{log}}\frac{{\text{p}}({{\text{x}}}_{{\text{i}}},{\text {y}})}{{\text{p}}({{\text{x}}}_{{\text{i}}}){\text{p}}({\text{y}} )}$$
(1)
In the formula, “\({\text{p}}({{\text{x}}}_{{\text{i}}},{\text{y}})\)' represents the joint probability distribution of the two variables.\({{\text{x}}}_{{\text{i}}}\)' and 'y', while '\({\text{p}}({{\text{x}}}_{{\text{i}}})\)' and '\({\text{p}}({\text{y}})\)' represents the marginal probability distribution of\({{\text{x}}}_{{\text{i}}}\)and “y” respectively. here, '\({{\text{x}}}_{{\text{i}}}\)' represents the i-th input feature and 'y' represents the segmentation label.