Adaptive power-saving mode control in NB-IoT networks using soft actor-critic reinforcement learning for optimal power management

In recent years, extensive research has been conducted to enhance energy efficiency and power-saving mechanisms in NB-IoT networks. Several studies have explored optimization techniques, reinforcement learning approaches, and energy-efficient scheduling mechanisms to improve network performance.

Hadjadj-Aoul and Ait-Chellouche proposed a deep reinforcement learning-based access control mechanism to mitigate congestion in NB-IoT networks. They modeled the access problem as a Markov decision process and used the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm to optimize the Access Class Barring (ACB) mechanism. Unlike heuristic-based methods, their approach dynamically adjusted to network variations, even with incomplete system state information. Simulations showed superior performance over adaptive and PID-based techniques, maintaining optimal access attempts. Their study highlights reinforcement learning as a promising alternative for NB-IoT access control³⁵.

Al Rabee et al. introduced an actor-critic reinforcement learning-based power allocation framework for energy harvesting (EH) NOMA relay-assisted mmWave networks to enhance energy efficiency and data throughput. Their two-phase approach first optimizes power allocation at an EH-capable source node using an actor-critic RL method, adapting to unpredictable EH and channel conditions. In the second phase, a NOMA-based mechanism assigns power levels to users for efficient relay transmission. Unlike conventional techniques that struggle with non-convex optimization, their method uses sequential convex approximation for better convergence. Simulations showed superior performance in maximizing data rates and improving energy efficiency, highlighting RL’s potential for resource allocation in next-generation networks³⁶.

Lauridsen et al. conducted empirical power consumption measurements on early-generation NB-IoT devices to develop a battery lifetime estimation model. Their study provided the first publicly available dataset on real-world NB-IoT power usage across different operational states. Results showed that uplink transmissions at 23 dBm consumed 716 mW due to low power amplifier efficiency (37%), while receiving control/data channels used 213 mW. Idle-mode eDRX and PSM consumed 21 mW and 13 µW, respectively. Real-world power consumption exceeded 3GPP estimates, reducing battery life by 5.10% when PSM was applied. The study suggested firmware and hardware improvements to enhance future NB-IoT energy efficiency³⁷.

Migabo et al. introduced the Energy-Efficient Adaptive Channel Coding (EEACC) scheme for NB-IoT to improve energy efficiency while maintaining network reliability. This two-dimensional approach dynamically selects the optimal channel coding scheme based on real-time channel conditions classified as bad, medium, or good, using periodic Block Error Rate (BLER) assessments. EEACC also reduces transmission repetitions by leveraging successful transmission probabilities, ensuring efficient resource use. Simulations showed that EEACC outperforms existing Narrowband Link Adaptation (NBLA) techniques in energy efficiency, reliability, scalability, and latency. Its resilience to channel impairments makes it ideal for energy-constrained IoT applications, with future validation planned for smart water metering and further theoretical optimization³⁸.

Barbauzx et al. developed an analytical model to evaluate the balance between capacity and energy efficiency in NB-IoT systems, with a focus on battery life. Using M/D/H/K queues, their model assessed energy performance across different coverage distributions, payload sizes, and communication rates. Comparison with 3GPP results confirmed the models accuracy for single-terminal cells. Their analysis revealed that Early Data Transmission (EDT) not only improves latency and connection density but also enhances energy efficiency. However, higher loads in multi-terminal cells negatively impact battery life due to control channel demodulation. The authors proposed an efficient solution that extends battery life without modifying standard communication modules, making their model a valuable tool for optimizing NB-IoT energy efficiency³⁹.

Khan and Alam developed an empirical model to evaluate the baseline energy consumption of NB-IoT radio transceivers, focusing on the Radio Resource Control (RRC) protocol. Using two commercial NB-IoT boards and test networks from two mobile operators, they collected data to create an accurate energy consumption model. Their profiling of the BG96 NB-IoT module showed evaluation errors between 0.33 and 15.38%, confirming the models reliability. This work fills a gap in energy profiling literature and serves as a benchmark for optimizing NB-IoT battery life. Future research will explore energy-saving strategies tailored to specific application requirements using this model⁴⁰.

Manzar et al. investigated downlink (DL) packet reception energy consumption in NB-IoT and proposed a Particle Swarm Optimization (PSO)-based strategy to enhance energy efficiency. They analyzed key parameters such as transport block size, repetition count, and segmentation, optimizing factors like received power, sub-frames, and MAC header length. Their results showed an 84.98% energy reduction when optimizing PRX and HRLCMAC together and 61.07% when optimizing PRX and NSF. The study demonstrated PSOs potential for improving NB-IoT energy efficiency, with applications in smart homes, vehicles, and grids. Further enhancements could be achieved by integrating low-energy modulation and optimized MAC protocols⁴¹.

Andres-Maldonado et al. developed and validated an analytical energy consumption model for NB-IoT devices, aimed at improving energy management in Low-Power Wide-Area (LPWA) networks. Using a six-state Markov chain, their model estimated average energy consumption and latency for periodic uplink reporting. Experiments with two commercial NB-IoT devices connected to a base station emulator validated the model across scheduling, coverage extension, and single subcarrier configurations, with a maximum error of 21%. Their findings showed that NB-IoT UEs can achieve a 10 years battery life and 10-second latency under optimal conditions. This study contributes to energy-efficient strategies for future LPWA applications⁴².

Di Lecce et al. investigated cooperative relaying techniques to enhance energy efficiency in NB-IoT networks, aiming for further optimization despite its low power consumption. They proposed an optimal relay selection algorithm to minimize energy use within a cell and introduced a greedy algorithm that achieved near-optimal performance with lower computational complexity. Simulations showed that cooperative relaying reduced energy consumption by up to 30%, with the greedy algorithm consuming only 10% more than the optimal strategy. Their findings highlight cooperative relaying as an effective energy-saving approach. Future research will explore throughput, delays, and advanced power control mechanisms for further optimization⁴³.

Jiang et al. proposed a Cooperative Multi-Agent Deep Q-Learning (CMA-DQN) approach to optimize multi-group NB-IoT networks, addressing configuration challenges without prior traffic statistics. In this model, Deep Q-Network (DQN) agents independently control configuration variables and are cooperatively trained based on transmission feedback. Compared to heuristic-based load estimation (LE-URC), CMA-DQN significantly outperformed it, especially in heavy traffic, by dynamically adjusting repetition values to optimize resource allocation. This improved Random Access Opportunities (RAOs) and reduced collisions. Their results highlight CMA-DQN as an effective solution for managing scarce resources and enhancing NB-IoT performance under varying traffic conditions⁴⁴.

Jiang et al. developed Q-learning-based methods to optimize uplink resource configurations in NB-IoT networks, maximizing served IoT devices per Transmission Time Interval (TTI). They introduced tabular Q-learning (tabular-Q), Linear Approximation Q-learning (LA-Q), and Deep Q-learning (DQN), all of which outperformed heuristic-based load estimation (LE-URC) approaches. LA-Q and DQN achieved similar performance to tabular-Q but required less training time. To handle high-dimensional configurations, they extended LA-Q and DQN with Action Aggregation (AA-LA-Q, AA-DQN), improving convergence. Additionally, Cooperative Multi-Agent DQN (CMA-DQN) was introduced for parallel sub-task optimization, showing superior efficiency. Their findings highlight Q-learning as a robust solution for real-time NB-IoT resource allocation⁴⁵.

Michelinakis et al. conducted an empirical study on NB-IoT energy consumption, analyzing the impact of configuration parameters on efficiency. Using measurements from two NB-IoT boards and two European operators, they found that while NB-IoT is marketed as plug-and-play, energy efficiency depends on proper configuration. Paging intervals in the connected state significantly affected power use, with some operators misconfiguring these settings. Packet size and signal quality had minimal impact unless signal strength was very poor. Adjustments like enabling RAI and eDRX led to major energy savings. Their findings emphasize the role of module settings, operator configurations, and energy-saving mechanisms in battery life, suggesting further research on protocol tuning for improved efficiency⁴⁶.

Rastogi et al. proposed a semi-Markov-based energy-saving model for NB-IoT devices, introducing an Auxiliary State in the DRX mechanism to reduce power consumption, especially for small data transmissions. Unlike traditional approaches, this method optimizes energy use by minimizing unnecessary activity when data packets are minimal. Evaluations showed power-saving improvements of up to 97.1 and 98.25% by adjusting eDRX and PSM timers. The model effectively conserved energy without significant delay increases across various data arrival rates. Their findings highlight the potential of integrating additional states into NB-IoT mechanisms for better energy efficiency, with future research focusing on further parameter optimization⁴⁷.

Zhang et al. proposed a power control scheme to enhance energy efficiency (EE) in the Narrowband Physical Uplink Shared Channel (NPUSCH) of NB-IoT, addressing interference from its non-orthogonality with NPRACH. They introduced guard bands to mitigate interference and formulated an EE optimization problem considering circuit power consumption and minimum data rates. Using fractional programming, they developed an iterative power control algorithm that quickly converged to near-optimal solutions. Simulations showed significant EE improvements with guard bands, especially for low data rate communications. However, the trade-off between EE and spectral efficiency (SE) requires further exploration for high-data-rate applications⁴⁸.

Sultania et al. developed an energy consumption model for NB-IoT devices using Power Saving Mode (PSM) and Extended Discontinuous Reception (eDRX) to evaluate energy efficiency in large-scale IoT deployments. Based on a Poisson arrival process, their model showed an average error of 11.82% compared to NS-3 simulations. Results indicated that with a 5 Wh battery and optimized PSM/eDRX settings, NB-IoT devices could last over 12 years with one packet transmission per day. However, small Idle state timers increased energy consumption by 3 to 7 times. Their findings emphasize the importance of proper power-saving configurations for extended battery life in IoT applications like shared bicycle tracking⁴⁹.

Navarro et al. conducted a comparative study on the energy consumption of various communication protocols MQTT, TCP, UDP, and LwM2M used in NB-IoT applications with the BG96 module. They found that UDP had the lowest energy consumption, especially with frequent transmissions, while MQTT was the most cost-effective for feature-rich IoT applications. Payload size (10100 bytes) had minimal impact on energy use, allowing flexible data transmission. Their results emphasize that protocol choice should align with system requirements, with UDP suited for low-power needs and MQTT for cost-efficient solutions. Their methodology provides a foundation for further optimizing energy consumption in IoT communication protocols⁵⁰.

Elhaddad et al. evaluated the energy consumption of three NB-IoT modules under simulated LTE network conditions, analyzing factors such as T3324, T3412, uplink transmit power, and SIB message parameters. They developed a data traffic-dependent energy model to estimate battery lifetime under different communication scenarios, including periodic UDP uplink transmissions. Their findings showed that energy consumption per bit varied with NPUSCH repetitions, highlighting the impact of transmission periods on power usage. Optimizing power amplifier (PA) design and hardware architecture could further improve energy efficiency. The study suggests that firmware, hardware, and network optimizations will enhance future NB-IoT device battery life⁵¹.

Abbas et al. studied NB-IoT energy consumption, analyzing the impact of tunable and non-tunable parameters on efficiency. They found that enabling full Discontinuous Reception (DRX), especially connected-mode DRX (cDRX), could cut energy use by up to 50% over 10 years. The RRC inactivity timer played a crucial role, while CoAP retransmission timers and eDRX cycles had minimal impact. Traffic intensity and burstiness significantly influenced energy usage, with lower-intensity data bursts reducing power consumption. Their study provided guidelines for optimizing the NB-IoT protocol stack to meet the 3GPP 10-year battery life target. Future research will compare full vs. partial DRX support and validate findings through real-world NB-IoT testbed measurements⁵².

Chen et al. proposed an energy-efficient multi-hop LoRa broadcasting scheme (FLBS) for IoT networks, optimizing transmission energy consumption and large-scale data distribution. Using reinforcement learning for optimal relay selection, FLBS reduced communication time by 87.4% and saved 12.61% more energy than traditional methods. It proved highly effective for small-scale IoT applications like remote upgrades in circular areas but faced challenges in large regions with limited channels. Future work will extend FLBS to larger areas, integrate caching, explore device-to-device (D2D) communication, and apply it to smart city and power delivery systems to further enhance energy efficiency⁵³.

Yu and Lo studied energy-efficient non-anchor channel allocation in NB-IoT cellular networks, identifying that increasing non-anchor channels can sometimes raise device energy consumption. Unlike traditional allocation problems, this exhibits a non-convex property. To address this, they developed a dynamic programming algorithm to determine the optimal number of non-anchor channels per base station, minimizing energy use. They also proposed an energy-efficient channel reuse algorithm, reducing energy consumption by 66% compared to baseline methods. Their findings highlight the need for careful channel allocation to prevent unnecessary power consumption in NB-IoT transmissions⁵⁴.

Yu and Wu investigated energy-efficient scheduling for search-space periods in NB-IoT, aiming to reduce blind decoding (BD) and idle time. Since base stations can only schedule devices with the same search-space period per subframe, resource allocation is limited. They proposed an algorithm to optimize search-space periods and a scheduling method that reduces BD and idle time while meeting data demands. Their approach lowered energy consumption by 77% compared to baseline methods. Findings showed that reducing search-space periods and DCI repetitions had a greater impact on base station energy use than on devices. Future work will explore multiple non-anchor channels and base stations for further optimization⁵⁵.

Liang et al. tackled energy-efficient uplink resource unit (RU) scheduling for ultra-reliable NB-IoT communications, modeling it as an NP-complete optimization problem. They proposed a two-phase scheduling scheme: the first phase optimizes default transmission settings to minimize energy use while meeting QoS requirements, while the second phase balances transmission urgency and flexibility to ensure delay constraints. Simulations showed that their method effectively reduced energy consumption while serving more devices with guaranteed QoS. Their findings demonstrate NB-IoT’s capability to support large-scale IoT applications with minimal energy usage, making it a strong candidate for energy-efficient 5G communications⁵⁶.

Zholamanov et al. proposed an enhanced reinforcement learning algorithm, Double Deep Q-Network with Prioritized Experience Replay (DDQN-PER), to optimize energy consumption (EC) and packet delivery ratio (PDR) in LoRa wireless networks. Their method selects optimal transmission parameters, such as spreading factor (SF) and transmission power (TP), to minimize energy use while maximizing PDR. Simulations showed a 17.2% PDR improvement over Adaptive Data Rate (ADR) and a 6.2 to 8.11% boost over other RL-based methods. DDQN-PER excelled in large-scale networks (1000 devices) and maintained performance in obstacle-prone environments. Future research will validate the algorithm in real LoRaWAN networks, explore mobile node adaptation, and integrate it with other communication protocols for greater efficiency⁵⁷.

Bortnik et al. developed a machine learning (ML)-based method to estimate NB-IoT device energy consumption using statistical modem data instead of additional circuitry. They created a labeled dataset using an NB-IoT module with an onboard current measurement circuit, analyzing parameters like radio channel quality, transmission power, and TX/RX time. Feature selection showed strong correlations between energy consumption and temporal parameters. Among 11 ML models evaluated, Decision Tree Regression (DTR), Gradient Boosting (GBR), XGBoost (XGBR), and Polynomial Regression (PR) achieved up to 93.8% accuracy with minimal memory use (as low as 3 KB). Future research will explore advanced ML models, improved feature selection, and on-device self-estimation for energy efficiency⁵⁸.

Lingala et al. compared Power Saving Mode (PSM) and Power Down Mode (PDM) in NB-IoT modems using a Quectel modem. While PDM had lower current consumption for over 95% of the time, PSM proved more energy-efficient overall, considering active, idle, and sleep periods. PDM introduced additional signaling overhead and delays in uplink/downlink transmissions, reducing its advantages. PSM consistently outperformed PDM in most scenarios, except when base stations provided lower-than-required T3412 timer values. The study concluded that PSM is the preferred mode for NB-IoT, offering a better balance between power savings and communication efficiency⁵⁹.

Caso et al. conducted a large-scale data-driven analysis of the Random Access (RA) procedure in NB-IoT networks, examining the impact of deployment, radio coverage, and operator configurations. While RA generally met performance requirements, increasing connectivity and scenario variability posed optimization challenges. They proposed a Machine Learning (ML)-based enhancement, using radio conditions like RSRP, SINR, and RSRQ to predict RA success and delay with high accuracy. Their approach optimized RA configurations, reducing power consumption by at least 50%. Future work will explore implementation in dynamic environments and advanced system scenarios for further optimization⁶⁰.

Lukic et al. conducted a real-world evaluation of NB-IoT module energy consumption using a custom-designed high-resolution data collection platform. Their study analyzed energy usage across different transmission phases, highlighting the impact of both device and network-side configurations. Experiments with a mobile operator revealed significant variations in energy consumption depending on UE and eNB settings. Future plans include scaling the study to 100 NB-IoT nodes to gather extensive data under various configurations. Their findings provide valuable insights into real-world NB-IoT energy efficiency, crucial for maximizing battery life in large-scale deployments⁶¹.

Zhao et al. proposed an intelligent NB-IoT-based street lighting system with an energy-saving algorithm to reduce energy consumption, maintenance costs, and operational complexity. The system integrates a cloud server, remote monitoring, and streetlight control terminals, using NB-IoT and Power Line Carrier (PLC) communication for intelligent local and remote control. It adjusts brightness based on ambient light and vehicle speeds, enabling on-demand lighting to save energy. Additionally, it supports environmental monitoring, fault alarms, and abnormal protection. The system improves adaptability, cost efficiency, and real-time responsiveness, making it a promising solution for future smart city infrastructure⁶².

Kim et al. proposed a multi-agent reinforcement learning (MARL) framework, MAQ-OCB, to optimize energy efficiency (EE) and minimize user outages in ultra-dense small cell networks. Using distributed Q-learning for outage-aware cell breathing, the framework reduces network energy consumption while maintaining QoS in 6G wireless networks. Simulations showed MAQ-OCB outperformed traditional algorithms like No TPC, On-Off, and centralized Q-learning (C-OCB). Two variations were tested: one using neighboring small cell base station (SBS) state information and another relying only on its own state. Results confirmed MAQ-OCB’s effectiveness in improving EE and reducing outages, demonstrating its potential for energy-efficient 6G networks⁶³.

Alamu et al. reviewed machine learning (ML) applications in energy harvesting (EH) IoT networks, focusing on challenges from stochastic energy sources and wireless fading channels. They explored ML techniques such as reinforcement learning (RL), deep learning (DL), and deep reinforcement learning (DRL) for optimizing energy usage. While RL adapts well to environmental changes, it struggles with large state-action spaces in massive IoT deployments. DRL offers better data processing but requires energy-efficient optimization for practical use. The study highlighted the need for lightweight DRL models to support EH in large-scale IoT networks, particularly for future 6G applications⁶⁴.

Guo and Xiang proposed a multi-agent reinforcement learning (MARL) framework to optimize energy efficiency in NB-IoT networks by improving power ramping and preamble allocation. Traditional random preamble allocation in LTE lacks efficiency for large-scale IoT deployments. Their joint optimization approach integrates power ramping and preamble selection, enhancing energy efficiency and random access (RA) success probability. Using a Win-or-Learn-Fast Policy Hill-Climbing (WoLF-PHC) algorithm with a simplified “stateless” modification, simulations demonstrated significant energy savings. Future work will incorporate Power Saving Mode (PSM), coverage enhancement (CE) classes, and state variables like RSRP to further refine optimization⁶⁵.

Chen et al. proposed an energy-efficient LoRa broadcasting scheme, FLBS, for IoT applications like remote upgrades in circular areas. By combining LoRa protocols with multi-hop technology, the scheme optimizes relay selection and transmission power to reduce energy consumption. Using a reinforcement learning-based algorithm, FLBS outperformed traditional methods in energy savings. The study emphasized the importance of considering actual LoRa hardware parameters and environmental factors. Future research will explore caching, device-to-device (D2D) communication, and integrating LoRa mesh to enhance scalability and applicability in complex IoT scenarios⁶⁶.

Haridas et al. examined the use of energy-harvesting technologies to extend NB-IoT device battery life in smart home applications. Their analysis of energy consumption across coverage classes revealed discrepancies between actual and expected 10-year lifespans. They explored ambient energy sources for harvesting, showing that, in ideal conditions, perpetual operation was possible but highly dependent on energy availability. Key challenges included managing unpredictable energy sources and optimizing long-term sustainability. Their findings highlight the potential of energy harvesting for improving NB-IoT efficiency, with future work focusing on overcoming integration challenges for reliable IoT applications⁶⁷.

Chang et al. optimized NB-IoT power consumption using adaptive radio access (RA) strategies, focusing on enhanced coverage levels (ECLs). Through field measurements on two testbeds, they identified inefficiencies in ECL selection and proposed an adaptive RA approach incorporating predictive ECL selection and opportunistic packet transmission. Their method reduced UE power consumption by up to 36% while maintaining block error rate (BLER) performance. The study emphasized ECL selection’s role in improving energy efficiency without compromising reliability. Future work will refine uplink quality predictions and optimize ECL selection from both UE and eNodeB perspectives⁶⁸.

Sultania et al. developed an analytical model to evaluate NB-IoT power consumption and downlink (DL) latency using Power Saving Mode (PSM) and extended Discontinuous Reception (eDRX). Based on a Markov chain, the model accurately predicted energy consumption and latency, achieving over 91% accuracy compared to ns-3 simulations. Their multi-objective Pareto analysis identified optimal parameter configurations, favoring smaller timer values for low-latency or infrequent uplink (UL) traffic scenarios. Future research will explore additional power-saving techniques, such as Release Assistance Indication (RAI), Wake-up signals, and Early Data Transmission, to further enhance NB-IoT energy efficiency⁶⁹.

Jorke et al. analyzed the power consumption of NB-IoT and eMTC in smart city environments, comparing data rate, battery life, latency, and spectral efficiency under different coverage conditions. Their study found that eMTC outperformed NB-IoT in moderate conditions (144 dB coupling loss) with a 4% longer battery life and higher data rates. However, in extreme conditions (164 dB coupling loss), NB-IoT provided an 18% longer battery life due to reduced transmission repetitions. While eMTC performed better at 155 dB or lower, NB-IoT’s superior spectral efficiency and lower bandwidth needs make it ideal for large-scale IoT deployments⁷⁰.

Duhovnikov et al. evaluated the feasibility of NB-IoT for low-power aircraft applications, conducting experiments with a Sodaq NB-IoT module on private and commercial networks. Their findings showed that optimizing Power Saving Mode (PSM) could extend battery life for several years, but configuration and hardware design play a crucial role in aviation use cases. While NB-IoT demonstrated promise for certain applications, 5G was deemed necessary for more demanding aviation needs. The study emphasized optimizing peripheral energy consumption and extending transmission cycles to improve battery life, with future research focusing on further enhancements for aviation scenarios⁷¹.

Lee and Lee proposed a Prediction-Based Energy Saving Mechanism (PBESM) to enhance NB-IoT uplink transmission efficiency by reducing energy consumption. PBESM includes a deep packet inspection-based network architecture to predict uplink packet occurrences and an algorithm that optimizes scheduling requests by pre-assigning radio resources. This reduces random access attempts, lowering transmission energy use by up to 34%. Additionally, PBESM improved session active time by 16% without requiring hardware modifications on IoT devices. Future research will integrate software-defined networking for better packet inspection and explore contention resolution in multi-user scenarios to enhance efficiency²¹.

Alobaidy and Singh conducted a real-world evaluation of NB-IoT performance in Malaysia, analyzing coverage, path loss, packet delivery rate (PDR), latency, and power consumption. NB-IoT achieved a 91.76% PDR, supporting high data rates even with low signal quality, but latency variations significantly impacted battery efficiency. Compared to LoRaWAN and Sigfox, NB-IoT had a much shorter battery life 344.9 days versus 1608.9 and 1527.6 days, respectively. While NB-IoT excelled in data rate and coverage, its power consumption was higher than expected. The study emphasized optimizing power management and deployment strategies for better efficiency and highlighted their measurement platform as a useful tool for IoT network tracking⁷².

Alkhayyal and Mostafa conducted a systematic literature review on the role of machine learning (ML) and artificial intelligence (AI) in enhancing LoRaWAN energy efficiency and performance for IoT applications. Their review highlighted the effectiveness of deep reinforcement learning (DRL) and supervised learning in optimizing resource allocation, network stability, and energy consumption. Key factors such as Spreading Factor (SF), bandwidth (BW), and coding rate (CR) were identified as crucial for balancing communication range, data rate, and power efficiency. The study emphasized the need for adaptive ML-based algorithms to dynamically adjust network parameters. Future research will focus on real-time adaptive systems and cross-layer optimization for improved network performance⁷³.

Nauman et al. investigated Intelligent Device-to-Device (I-D2D) communication to optimize data delivery and energy efficiency in NB-IoT, particularly for delay-sensitive applications like healthcare IoT. They addressed the high power consumption caused by repeated control and data transmissions between NB-IoT User Equipment (UE) and base stations. Their proposed two-hop D2D communication model reduced transmission repetitions, improving efficiency. Relay selection was formulated as a Multi-Armed Bandit (MAB) problem and solved using a Reinforcement Learning (RL) approach. Simulations showed that I-D2D improved Packet Delivery Ratio (PDR) and reduced End-to-End Delay (EED). Future work will focus on large-scale deployment and real-world integration into IoT networks⁷⁴.

Pei, Zhang, and Li proposed an energy-saving mechanism for NB-IoT based on extended discontinuous reception (eDRX), focusing on power consumption and access delay. They developed a Markov model to analyze NB-IoT device states, incorporating the random access process often overlooked in energy calculations. Their findings showed that backoff time after access failures significantly impacts energy consumption and delay. By linearly increasing backoff time, they reduced variations in access delays and improved energy efficiency. This study provides valuable insights into optimizing NB-IoT power management, particularly in scenarios with frequent network access attempts⁷⁵.

Bali et al. explored the energy efficiency of NB-IoT in smart applications, emphasizing its role in reducing IoT energy consumption and carbon footprints. They highlighted the integration of Green IoT with NB-IoT as a promising approach, particularly for large-scale applications like smart agriculture. NB-IoT’s low power usage, massive connectivity, and strong indoor coverage make it well-suited for sustainable IoT solutions. They proposed a green NB-IoT model for agriculture to promote energy-efficient technologies. While NB-IoT is cost-effective and reliable, challenges remain in further optimizing energy efficiency for large-scale deployments, necessitating continued research¹⁸.

Anbazhagan and Mugelan proposed an energy-saving technique for NB-IoT, integrating a Proxy state and enhanced Release Assistance Indication (ERAI) within a semi-Markov framework. This approach optimizes the Discontinuous Reception (DRX) mechanism by reducing unnecessary wake-ups, significantly improving the Power Saving Factor (PSF). Their method achieved up to 99.4% energy savings with optimized eDRX durations and 99.9% with optimized PSM settings, extending device battery life for low-data applications. Future work will refine the semi-Markov model, validate it in real-world scenarios, and explore trade-offs between energy efficiency and communication delays, with potential adaptation for other LPWAN technologies⁷⁶.

Anbazhagan and Mugelan introduced a Soft Actor-Critic (SAC) reinforcement learning algorithm to optimize resource allocation in NB-IoT networks, tackling challenges like dynamic user demands and variable channel conditions. SAC outperformed traditional methods like DQN and PPO, improving energy efficiency by 10.25%, throughput by 214.98%, and fairness (Jain’s index) by 614.46%. It also enhanced recovery time and marginally improved latency, making it ideal for energy-efficient, low-latency applications. SAC demonstrated scalability across urban, industrial, and rural IoT deployments, proving to be a robust solution for optimizing NB-IoT resource allocation and network performance⁷⁷.

Technical gaps and research motivation

Despite significant advancements in energy-efficient NB-IoT systems, several challenges remain. Most existing studies focus on either static optimization techniques or isolated power-saving mechanisms, lacking a comprehensive and adaptive approach. While considerable attention has been given to downlink optimization, uplink energy efficiency essential for prolonged device operation has been largely neglected. Additionally, while reinforcement learning has been explored for resource allocation, its integration with power-saving mechanisms and intelligent decision-making models is still in its early stages. Empirical studies also reveal inconsistencies between theoretical models and real-world energy consumption, highlighting the need for adaptive power control strategies. A major limitation of current approaches is the lack of dynamic power-saving mode switching based on real-time network conditions. Most existing mechanisms operate under fixed configurations, resulting in inefficient energy utilization. Furthermore, although cooperative relaying and multi-hop strategies have been investigated in other domains, their potential for enhancing power-saving in NB-IoT remains largely unexplored.

To address these challenges, this research introduces an adaptive power-saving mode control framework based on Soft Actor-Critic (SAC) reinforcement learning. Unlike conventional methods, this approach dynamically adjusts power-saving modes in response to changing network conditions, ensuring optimal energy efficiency while maintaining Quality of Service (QoS). By bridging the gap between theoretical energy models and practical deployment constraints, this framework offers a more effective and scalable solution. By integrating reinforcement learning with established power-saving modes such as Power Saving Mode (PSM) and extended Discontinuous Reception (eDRX), this research provides a flexible and adaptive power management strategy. Unlike traditional methods the proposed approach ensures a real-time balance between energy efficiency and service quality, making it particularly well-suited for large-scale NB-IoT deployments.

Traditional static power-saving strategies in NB-IoT, such as fixed DRX (Discontinuous Reception) or PSM (Power Saving Mode) configurations, rely on pre-defined timers and thresholds or deterministic scheduling rules that do not respond to dynamic changes in network traffic, signal quality, or application requirements. While such rule-based approaches are simple to implement and computationally inexpensive, they lack the flexibility to adapt in real time. As a result, they often lead to suboptimal energy consumption, increased latency, or reduced reliability under fluctuating conditions.

In contrast, the proposed Soft Actor-Critic (SAC)-based power management approach continuously interacts with the environment and learns to adapt its mode-switching policy based on evolving system states. This adaptability allows the SAC agent to balance energy efficiency and transmission reliability more effectively than static methods. Given the inherently time-varying and device-specific nature of NB-IoT deployments, static models were deemed unsuitable for simulation in this context. Instead, our focus was on benchmarking against dynamic deep reinforcement learning algorithms (DQN and PPO), which offer a more realistic performance baseline for intelligent control in heterogeneous and uncertain IoT environments.

This research leverages the Soft Actor-Critic (SAC) algorithm to improve power management in NB-IoT networks, offering significant advancements over existing methods. Unlike traditional approaches, SAC dynamically adjusts power-saving modes based on real-time network conditions, enhancing adaptability and efficiency. SAC stands out from other reinforcement learning algorithms like Proximal Policy Optimization (PPO), and Deep Q-Networks (DQN), through its handling of continuous action spaces and entropy regularization, ensuring robust exploration and preventing premature convergence to suboptimal policies. Additionally, SAC’s off-policy learning and stochastic policy capabilities allow for efficient data utilization and adaptability to fluctuating network conditions, leading to more reliable and context-aware power-saving decisions. By addressing the limitations of existing solutions, the SAC-based approach significantly enhances power efficiency, network performance, and overall sustainability in NB-IoT networks, paving the way for more resilient and scalable IoT deployments.

Source link