Fine-tuned multimodal large language model for autonomous state cognition system of shape-recognition 6-bar tensegrity integrated with flexible sensors

The sections encompass the structure’s performance, sensor’s performance, machine learning methods, alarm system, fine-tuned LLM, and fault diagnosis, collectively addressing the integration of structural analysis, sensing capabilities, and AI-driven diagnostic methods for tensegrity systems.

Structure’s performance

To evaluate the functional performance of the tensegrity structure, we constructed a model of a six-strut tensegrity system and developed an experimental platform comprising six bars, 24 sensors, and two Arduino Mega microcontrollers (Fig. 2a, b). The data acquisition system employs an Arduino Mega 2560 microcontroller for synchronized signal collection. Strain sensors are distributed symmetrically across representative tension members according to a structural observability principle, ensuring sufficient geometric reconstruction constraints while minimizing redundancy. The sampling frequency is set to 10 Hz, which is sufficient to capture the dynamic structural responses under driving forces and external disturbances. We systematically conducted the experiments, including deformation displacements, rotational angles, strain variations, and torsional angles during the dynamic testing of the 6-bar tensegrity structure under distinct kinematic loading modes. More specifically, the platform’s single-cycle compression response exhibited characteristic nonlinear behavior in the external force-displacement relationship (Fig. 2c). The structure exhibited negligible deformation under increasing external force until a critical threshold was reached during loading process, at which point a distinct transition to a stiffness-stiffening response occurred due to geometric realignment and internal force redistribution. Pronounced hysteresis was observed during the unloading process: the structure required only 13 N, compared to 32 N during the loading process at a displacement of 10 mm. This quantitatively illustrates energy dissipation induced by internal friction and geometric nonlinearity. Moreover, the relationship between stress and rotation angle exhibited a quasi-linear correlation within the strain magnitude range of 0–0.5 (Fig. 2d). Beyond this threshold, the compressive members reached their elastic buckling limit, transitioning the system into a tension-dominant deformation regime in which the rotational angles asymptotically approached 80° with progressive stabilization. This nonlinear phase arises from prestress redistribution in tension cables, forming a spatial truss configuration that enforces kinematic constraints through three-dimensional self-locking mechanisms. As the applied load increases, the corresponding strain intensifies throughout the period of tensile deformation. Specifically, when the load reaches 25 N, the strain attains a magnitude of 0.3. However, as the load continues to increase, the rate of strain accumulation gradually decelerates, likely due to the intrinsic tensile characteristics (Fig. 2e). This trend is further reflected in the overall shear angle of the tensegrity structure, as illustrated in Fig. 2f. The shear angle exhibits a linear increase with escalating load, reaching a maximum value of 22° at an applied load of 47 N.

**Fig. 2: Performance validation of 6-bar tensegrity structure.**

Sensor’s performance

In this study, we investigated the performance and characteristics of the strain sensor under dynamic conditions. Figure 3a illustrates the fitting strategy for the relationship between relative resistance variation and strain in the flexible sensor fabricated from diene rubber under normal, bending, and stretching conditions. When the sensor is in its normal state, the relative resistance can be expressed as follows:

$$\frac{\Delta R}{R}=\frac{{R}_{i}-{R}_{0}}{{R}_{0}}(i=1,2)$$

(6)

**Fig. 3: Performance and characteristics of strain sensor under dynamic conditions.**

Both the strain characteristics and the relative resistance variation change accordingly when the sensor undergoes deformation. In the bending condition, a polynomial function fitting approach is employed to describe the relative resistance variation. In the stretching condition, a machine learning-based estimation method is utilized to construct a predictive model of the resistance change. Figure 3b demonstrates the relative resistance characteristics of the sensor under a compressive strain ranging from −0.9 to 0 mm, at a bending rate of 50 mm/min. The sensor demonstrates consistent responsiveness across varying levels of compressive strain; specifically, as the strain increases, the relative resistance variation also increases. A polynomial fitting method was applied to the experimental data, yielding a coefficient of determination (R²) of 0.99. This high R² value indicates that the fitted regression function accurately represents the observed data, thereby validating the applicability of the numerical model.

We investigate the influence of environmental conditions on the performance of the sensor. Figure 3c shows the effect of temperature on the sensor’s resistance. Within the temperature range of 20–80 °C, there exists a significant linear relationship between temperature and sensor resistance. Similarly, Fig. 3d shows that the relative resistance variation exhibits a consistent increasing trend with rising humidity from 20 to 90%. The sensor also demonstrates stable performance over more than 540 cycles of tensile strain. During the initial cycles (0–30) and the final cycles (510–540), no significant decrease is observed in the fluctuation range of the relative resistance variation, indicating excellent long-term durability (Fig. 3e). Notably, the fabricated device exhibits highly consistent conductive periodicity even under large deformations.

Machine learning methods

Accurate model fitting serves as a fundamental prerequisite for establishing robust and reliable simulation platforms. Here, we systematically introduced, scrutinized, and comparatively evaluated four representative models: polynomial regression, recurrent neural networks (ResNets), sparse identification of nonlinear dynamics (SINDy), and long short-term memory (LSTM) in terms of their fitting principles and predictive capabilities, using mean squared error (MSE) and computational time as key evaluation metrics. Polynomial regression captures nonlinear data relationships by constructing higher-order polynomial terms of input features, thereby embedding nonlinear patterns within a linear regression framework. Its fundamental principle involves expanding the feature space by incorporating polynomial terms, effectively transforming inherently nonlinear problems into linearly separable problems within a higher-dimensional space (Fig. 4a). The model fails to effectively capture complex data patterns, resulting in a relatively modest fitting score of 0.933 and a mean squared error (MSE) of 14.28 × 10⁻³, indicating relatively high errors (Fig. 4b). As the polynomial degree n increases, the model’s fitting performance improves, with the score rising from 0.933 to 0.966 and the MSE decreasing by 49.36%.

**Fig. 4: Performance analysis of machine learning estimation methods for fitting sensor relative resistance change models.**

ResNets address the vanishing/exploding gradient problem in deep network training by introducing skip connections to construct residual blocks. This approach enables layer-wise capture of high-order nonlinear features while maintaining stable gradient flow, thereby granting the model robust nonlinear fitting capabilities (Fig. 4c). The relationship between training epochs and mean squared error (MSE) reveals two distinct phases (Fig. 4d). In the under-trained phase (fewer epochs), the model exhibits suboptimal alignment between training and test data, with an MSE of 63.24 × 10⁻³, indicating substantial errors. As training progresses, the MSE declines by 96.83%, and predictions on both training and test datasets converge to nearly identical patterns, signifying effective generalization and the attenuation of overfitting. Also, SINDy seeks to systematically identify sparse governing equations of a system from data, employing the minimal set of nonlinear terms necessary to accurately characterize its evolutionary dynamics (Fig. 4e). SINDy demonstrates pronounced limitations when applied to fitting sensor relative resistance change models (Fig. 4f). The fitted model’s MSE consistently exceeds 0.4 across varying rates, a level deemed unacceptably high for practical applications. This limitation originates from the derivative-dependent nature of SINDy: numerical differentiation is inherently sensitive to noise, and the presence of input noise in sensor signals likely amplifies errors during dynamics reconstruction, thereby compromising the model’s fitting accuracy.

We compared the MSE and computational efficiency of four machine learning estimation methods for model fitting (Fig. 4g). Polynomial regression achieves rapid computation (∼2 s) but with a relatively high MSE of 0.00771, underscoring its limitations in nonlinear time series fitting. In contrast, ResNets balance accuracy and efficiency, attaining an MSE of 0.002 with moderate computation time (∼6 s). However, its susceptibility to the vanishing gradient problem impairs predictive accuracy for long-term dependencies, resulting in inferior performance to LSTM. Notably, SINDy performs anomalously, exhibiting an MSE of 0.414676: nearly 400 times that of LSTM, and the longest computation time (∼11 s). This outcome suggests that the complex nonlinear interactions inherent in real-world systems often surpass the representational capacity of predefined basis function libraries, hindering the effective capture of system dynamics via sparse differential equations. The LSTM model achieves the lowest MSE of 0.0011, markedly outperforming other methods. While its computational time (∼7 s) slightly exceeds that of ResNets, this discrepancy is increasingly acceptable in our applications.

Alarm system

We developed an alert system to improve the discernibility of the tensegrity robot’s operational states. Real-time resistance signals are obtained by directly measuring the inter-nodal distances. These resistance variation signals are processed using polynomial fitting and LSTM models to reconstruct bending and stretching states with high fidelity. We achieved synchronized state reconstruction of the tensegrity robot in both virtual and physical environments through the deployment of Arduino and Python¹². Subsequently, data from fault detection are transmitted via two Arduino units to an LED control board to simulate emergency alerts, such as activating LEDs to notify users when sensor disconnections occur. Finally, we relay the monitoring data to mobile devices via Wi-Fi using a Python Flask server (Fig. 5a).

We devised a differentiation strategy to assess the efficacy of the alarm system. Specifically, the system was configured to activate a single LED upon disconnection of the A12B21 link, whereas the disconnection of the B21C12 link triggered the activation of two LEDs (Fig. 5b). Figure 5c presents the quantified transmission and computational delays observed during sensor failure scenarios. In the experiments, manually disconnecting two tendons from the structure resulted in prolonged computational time for LED activation. We systematically measured time intervals across three critical stages: (1) disconnection detection by the Arduino Mega, (2) data reception by the Python processing module, and (3) final LED activation. Each of the 24 conductive tendons undergoes parallel computational workflows for state monitoring. However, the cumulative processing load increases linearly with the number of tendons, thereby substantially extending the overall computation time. The Arduino Mega’s serial interface introduces delays due to its limited bandwidth, which necessitates the sequential processing of both incoming sensor data and outgoing computational outputs.

To enhance remote monitoring capabilities, we integrated a router with a Python Flask server and developed a mobile-accessible web interface (Fig. 5d). The mobile application was implemented as a simple, responsive web page. The testing workflow is as follows: First, sensor data is collected by the Arduino Mega and transmitted to the Python Flask server. Subsequently, a pre-configured web interface on mobile devices retrieves the resistance data via Wi-Fi connectivity. Finally, the resistance values on the web interface are automatically refreshed upon sensor state changes (Supplementary Video S1).

Fine-tuned LLM

The system architecture for shape-recognition and fault diagnosis of our tensegrity structure based on a fine-tuned LLM is shown in Fig. 6. The architecture primarily consists:

Specifically, the time-series resistance signals from the 24 flexible sensors are first segmented into fixed-length temporal windows. Each window is treated as a one-dimensional temporal patch. A linear projection layer then maps each patch into a fixed-dimensional embedding vector. These embedding vectors are concatenated sequentially to form time-series tokens, which are subsequently fed into the multimodal fusion module. Positional encoding is applied along the temporal dimension to preserve chronological dependencies before fusion with textual and image embeddings.

For each RGB structural image, we adopt a patch-based embedding strategy similar to Vision Transformer architectures. The image is divided into fixed-size spatial patches, and each patch is flattened and linearly projected into an embedding vector. Spatial positional encodings are added to preserve geometric layout information of the tensegrity structure. These image tokens are then aligned dimensionally with the time-series tokens before multimodal fusion.

Then, employing a lightweight modal alignment adapter, the “Patch Re-programmer”³³, converts heterogeneous embeddings (time-series and image tokens) into the semantic token space of pre-trained large language models. This adapter ensures compatibility between physical sensor embeddings and text embeddings through dimensionality projection and feature normalization.

After tokenization and embedding, text prompt tokens, time-series tokens, and image tokens are concatenated into a unified token sequence. The pretrained LLM backbone processes this sequence via self-attention, enabling cross-modal interaction between structural geometry, temporal dynamics, and linguistic instructions.

To specialize the model for domain-specific diagnostic tasks, we adopt Low-Rank Adaptation (LoRA). During fine-tuning, only low-rank adapter matrices are updated, while the original backbone weights remain frozen. This parameter-efficient strategy allows the model to learn tensegrity-specific structural reasoning without overfitting or incurring high computational cost.

In the anomaly detection stage, an autoencoder is trained on normal operational data to minimize the mean squared reconstruction error. During inference, reconstruction errors are used as anomaly scores. Inter-feature normalization is applied to remove scale discrepancies across sensor channels, and statistical thresholds combined with domain-specific frequency criteria are used to categorize anomalies into sensor interference, sensor breakage, rod deformation, or normal state.

Fault diagnosis

We systematically compared the pretrained LLM and LLM tuned by LoRA models across three dimensions: training cycles, prompt length, and text generation quality. Figure 7a demonstrates the relationship between training cycles and perplexity. Both models exhibit significant reductions in perplexity, ultimately converging to comparable levels, indicating analogous model capacities. Notably, the LLM tuned by LoRA model shows a sharper perplexity decline early in training, suggesting that its low-rank adapters expedite convergence through parameter-efficient fine-tuning that reduces computational redundancy. Subsequently, we present comparative BLEU scores under varying prompt lengths (Fig. 7b). Both models exhibit statistically consistent trends in BLEU score progression as context length increases, indicating that the parameter-efficient fine-tuning approach employed by the LoRA method effectively preserves the model’s adaptability to long-text generation. The marginally higher BLEU scores achieved by the tuned LLM on extended sentences, in comparison to the pretrained LLM, suggest that LoRA may enhance the coherence of long texts through targeted fine-tuning strategies. The final evaluation compares text quality across fluency, alignment with human expression, and logicality (Fig. 7c, Supplementary Materials Fig. S1). In terms of fluency, the slightly higher scores of LLM tuned by LoRA suggest that its adapter-tuning may optimize vocabulary distributions, thereby reducing grammatical errors. Regarding alignment with human expression, LoRA’s minor improvement may result from leveraging human feedback data to enhance naturalness. Most notably, LLM tuned by LoRA exhibits substantial gains in logicality, indicating that low-rank adapters effectively reinforce contextual reasoning capabilities, thereby substantiating the efficacy of task-specific fine-tuning.

Additionally, we conducted an analysis of the proposed monitoring system’s overall accuracy under varying input modalities. The results indicate that all three input modalities exhibit progressive accuracy improvements as training epochs increase (Fig. 7d–f, Supplementary Materials Fig. S2). During the initial training phase (0–10 epochs), all input modalities demonstrate a rapid increase in accuracy. However, in the mid-phase training (10–40 epochs), heightened accuracy volatility emerges in unimodal systems, particularly evident in image-based models. This finding suggests that the learning processes of unimodal systems may be more susceptible to interference. In the final training phase (≥40 epochs), all models demonstrate convergence. Sensor data input achieves an accuracy of 0.88–0.90 (Fig. 7d), image data input stabilizes between 0.80 and 0.85 (Fig. 7e), while multimodal integration input attains approximately 0.95 accuracy (Fig. 7f).

We classify fault states into four categories: sensor interference, sensor breakage, rod deformation, and normal. Confusion matrices are provided for monitoring systems trained on three distinct input data types, illustrating their classification performance across these states (Fig. 7g–i). Figure 7g presents the confusion matrix for the diagnostic strategy using only sensor data input. The matrix reveals classification accuracy rates of 91% for Class I (sensor interference), 89% for Class II (sensor breakage), 75% for Class III (rod deformation), and 82% for Class IV (normal). In comparison, Fig. 7h demonstrates that the image-based monitoring strategy achieves significantly lower classification confidence for Class I and II states (91% → 71 and 89% → 71% respectively) than the sensor-based approach. However, it shows improved confidence levels for Class III (75% → 84%). This discrepancy may be attributed to the inherent limitations of image recognition techniques. Classes I and II primarily involve internal structural fault that are inherently difficult to clearly capture in single-frame images, whereas sensor data can effectively reflect these internal state variations. In contrast, Class III state variations exhibit more visually discernible characteristics, rendering image-based classification more effective for this particular condition. Figure 7i introduce multimodal inputs to enhance the diagnostic strategy’s training model. This modification yields significant improvements in classification accuracy across all categories while effectively addressing the limitations inherent to both sensor-based and image-based single-modality approaches.

The diagnostic module performs four-state classification, as quantitatively validated in Fig. 7. The fine-tuned LLM operates downstream, receiving structured diagnostic outputs and generating expert-level reasoning reports. Representative prompt–response examples corresponding to each classified state are provided in Fig. S1, demonstrating the integration of structural mechanics knowledge and probabilistic fault interpretation.

In this study, we present a test case to assess the performance of the Autonomous Integrity Cognition System following the fine-tuning of the LLM (Fig. 8). Users induce structural deformation by compressing the tensegrity system, simulating ground collision scenarios (Fig. 8a). Concurrently, multimodal sensors capture real-time physical deformation data, which is transmitted to the fine-tuned LLM system (Supplementary Video S2). The system relies on fine-tuning LLM to integrate the machine body information and perception data, and finally realizes intelligent interaction through the natural language interface. To demonstrate the interaction mechanisms, we present typical dialog scenarios including structural status queries, sensor value readings, and proprioceptive parameter verification. Figure 8b demonstrates the logical architecture for the system’s fault diagnosis. The workflow initiates with physical fault triggers, employs multi-sensor fusion and synchronized structural image capture to transmit multimodal inputs to the fine-tuned large language model for cross-modal analysis. The monitoring system then precisely identifies and classifies fault characteristics, ultimately generating interpretable technical diagnostic reports.

**Fig. 8: Demonstration of the autonomous integrity cognition system via fine-tuned LLM.**

Finally, we simulated a sensor fracture failure scenario (Fig. 8c, Supplementary Materials Fig. S2, Supplementary Video S3). During the experimental execution phase: the operator applied axial stress to the sensing unit until fracture occurred, triggering a redistribution of structural states. The mechanical response data obtained during the image generation phase reveals that, during the failure progression, sensing channels A12B21 and B21C12 exhibited abrupt decreases in length values at 24 s and 20 s, respectively, prompting virtual model reconstruction based on these state changes. The system achieved 94% confidence in Type I fault identification, significantly outperforming other fault categories (Type II-IV combined <6%), demonstrating exceptional autonomous integrity cognition capabilities during the diagnostic phase. The system ultimately generates both fault classification results and interpretable technical diagnostic reports through the LLM system.

Source link

binance register commented on Everyone’s A System Designer With Heterogeneous Integration: Thanks for sharing. I read many of your blog posts
注册 commented on AI Startups Face Procurement Hurdles for Enterprise SAAS Sales: Your point of view caught my eye and was very inte
创建Binance账户 commented on Google Pixel 8 Pro vs Samsung Galaxy S23 Ultra: I don't think the title of your article matches th
binance registrering commented on Cover Story: Shaping Automation Trends in 2024: Your point of view caught my eye and was very inte
gratis binance-konto commented on What Is Generative AI: A super-Simple Explanation Anyone Can Understand: Your article helped me a lot, is there any more re

Fine-tuned multimodal large language model for autonomous state cognition system of shape-recognition 6-bar tensegrity integrated with flexible sensors

Structure’s performance

Sensor’s performance

Machine learning methods

Alarm system

Fine-tuned LLM

Fault diagnosis

RECENT POSTS

Sea’s Shopee cuts hundreds of developer jobs as it pivots to AI

Factua acquires Intelsio, bringing 10 years of blue chip performance marketing to its AI-first acquisition platform

Belgium ranked in top 5 in Europe for corporate use of AI

Structure’s performance

Sensor’s performance

Machine learning methods

Alarm system

Fine-tuned LLM

Fault diagnosis

Related Posts