This study introduced the LDA-BiLSTM model based on the globally ordered yet locally unordered data characteristics of medical clinical pathways, constructing a more interpretable clinical pathway model, as shown in Fig. 3.

The Core Workflow of the Paper. This flow diagram outlines the process of transforming patient log data into a format suitable for a sequence-to-sequence deep learning model. The workflow begins with patient log data, which is subjected to a topic model LDA to extract features represented as topics with their corresponding probabilities. Subsequently, a data augmentation decision process enriches the data by associating days with the extracted topics, leading to a more robust dataset. This augmented data then feeds into the sequence-to-sequence deep learning model, which utilizes a bidirectional LSTM network with an attention mechanism for better capturing the temporal dependencies and nuances in the data. The model encodes diagnostic and treatment labels, predicting the patient’s progression through a time-distributed fully connected layer. The output is a sequence of predictions, with each day’s prediction informed by the previous days’ data and decisions, aiming to provide accurate foresight into the patient’s clinical pathway.
Data collection and preprocessing
Our study used data from a local municipal tertiary hospital, analyzing myocardial infarction inpatient logs (ICD code I21) from 2018 to 2023. The data were obtained from anonymized datasets, which do not contain any personally identifiable information. The hospital provided informed consent for the use of this data for research purposes, ensuring compliance with data protection and privacy standards. The dataset included 1280 STEMI and 10 NSTEMI patients, with a total of 107,566 medical order entries. In preprocessing, we filtered out irrelevant orders, converted order times to hospitalization days, merged diagnostic and treatment items in a MySQL database, and deduplicated the advice field for data accuracy. Table 1 illustrates the patient data characteristics for myocardial infarction treatments spanning 3–15 days, showcasing the average hospital stay, case count, diagnostic and treatment item numbers, and hospital admission log lengths, highlighting the complexity and diversity of clinical pathways.
We first applied the LDA topic model to the dataset to extract thematic content from daily hospitalization data, identifying key diagnostic and treatment patterns. Then, using the total count of diagnostic and treatment items as multi-labels, we employed the BiLSTM deep learning model to predict personalized clinical pathways for STEMI patients under various treatment plans.
LDA-BiLSTM model
We first present the relevant symbols and definitions involved, followed by an introduction to the research methods used.
Definition 1
(Clinical Activity): A clinical activity \(\:a\) is a medical event occurring at a specific point in time, representing the smallest unit in the diagnostic and treatment process.
Definition 2
(Clinical Item): A clinical item \(\:I=\left\{{a}_{1},{a}_{2},\dots\:,{a}_{i},\dots\:\right\}\) is a set composed of multiple clinical activities.
Definition 3
(Clinical Day): A clinical day \(\:d\) refers to the sum of all clinical items occurring in a patient on a specific day, denoted as \(\:d=\left\{{I}_{1},{I}_{2},\dots\:,{I}_{j},\dots\:\right\}\). Items within a clinical day are not ordered by time sequence.
Definition 4
(Patient Trace): A patient trace consists of several clinical days, and the collection of all patient traces for a specific disease is denoted as \(\:T=\left\{{d}_{1},{d}_{2},\dots\:,{d}_{n},\dots\:\right\}\).
Latent Dirichlet Allocation (LDA) is based on the assumption that a set of latent topics exists within multiple document data, each topic comprising a group of related words. In this paper, we correspond clinical days and clinical items to documents and words in the LDA concept, respectively, to extract latent topics of the treatment process, considering them as potential diagnostic and treatment patterns. The specific generation process is described as follows (Table 2 illustrates related symbol annotations).
For each clinical day \(\:d\):
-
(a)
According to the Dirichlet prior distribution, generate the clinical day-topic distribution \(\:{\theta\:}_{d} \sim Dir\left(\alpha\:\right)\) and the topic-clinical item distribution \(\:{\phi\:}_{k} \sim Dir\left(\beta\:\right)\).
-
(b)
For the \(\:{I}_{j}\)-th clinical item on the \(\:d\)-th clinical day:
-
i.
Generate a topic \(\:{z}_{d,k} \sim Multi \left({\theta\:}_{d}\right)\) from the clinical day-topic distribution \(\:{\theta\:}_{d}\left(d,{\phi\:}_{k},k=\text{1,2},\dots\:.,K\right)\).
-
ii.
Generate \(\:{s}_{k,j} \sim Multi \left({\phi\:}_{k}\right)\) from the topic-clinical item distribution \(\:{\phi\:}_{k}\left({I}_{j},{P}_{j},j=\text{1,2},\dots\:,J\right)\).
In this study, we utilized the LDA topic model implemented via variational Bayesian inference methods23 to analyze STEMI patient clinical day logs. The LDA model uses variational parameters (\(\:\gamma\:\) and \(\:\delta\:\)) to approximate the true posterior distribution \(\:P(Z,\:S\:|\:\alpha\:,\:\beta\:)\), as shown in Eq. (2), enabling the discovery of latent themes in clinical logs. Our goal is to identify the clinical day document-topic distribution \(\:{\theta\:}_{d}\) and the topic-clinical item vocabulary distribution \(\:{\phi\:}_{k}\), essential for understanding STEMI treatment process.
$$\:P\left(Q,Z,S|\alpha\:,\beta\:\right)=P\left(Z|\alpha\:\right)\bullet\:P\left(S|Z,\beta\:\right)=\int\:P\left(Z|\theta\:\right)P\left(\theta\:|\alpha\:\right)d\theta\:\bullet\:\int\:P\left(S|Z,\phi\:\right)\bullet\:P\left(\phi\:|\beta\:\right)d\phi\:$$
(2)
To optimize the model, we focus on minimizing the Kullback-Leibler(KL) divergence between the variational distribution \(\:Q\) and the true posterior distribution \(\:P\), thereby maximizing the evidence lower bound (ELBO) of the log marginal likelihood. Within the variational expectation-maximization (EM) steps in LDA, we alternately adjust the hyperparameters \(\:\alpha\:,\beta\:\) to maximize the ELBO. Specifically, \(\:\gamma\:\) is optimized in the variational E-step, and \(\:\delta\:\) in the M-step.
In this study, we use the LDA model for data augmentation, generating new datasets by mining diagnostic and therapeutic themes for each clinical day. The key strategy is to transform clinical item text into thematic combinations, representing various diagnostic and therapeutic processes. High-quality themes are selected based on the highest coherence scores, with the top n consistency themes used to form combinations for each clinical day, akin to resampling in high-dimensional feature space.
Coherence scores, indicating semantic relatedness with in topics24, are crucial for topic quality and stability, especially in the medical field. High-coherence topics, being more interpretable and representative of specific treatment processes, are easier for medical professionals to understand and use. By focusing on high-coherence themes, we ensure that the model is trained on the most representative and meaningful features, reducing the impact of irrelevant data. This strategy enhances model generalizability and accuracy in predicting clinical pathways by allowing the model to focus on the most relevant and informative features from the dataset.
Our method extracts meaningful treatment process themes from clinical log data, providing valuable insights for analyzing STEMI patient treatments. We leverage these latent themes from the topic model, combining them with clinical day sequences to capture treatment process variations. To analyze the dynamic evolution of these themes, we employ deep learning models for time-series data analysis.
Recurrent Neural Networks (RNNs) are designed for sequential data processing, but they often struggle with long sequences due to vanishing and exploding gradient issues. Long Short-Term Memory networks (LSTMs) address this by preserving long-term dependencies and regulating information flow with three gates: input, forget, and output gates25,26. Bidirectional LSTMs (BiLSTMs) further enhance this by combining forward and backward LSTM architectures, considering both past and future context in data sequences. This results in a more comprehensive contextual output at each time step in BiLSTMs27. The ability of BiLSTM to capture both past and future dependencies in treatment sequences is crucial for accurately modeling clinical pathways, where the timing and order of interventions can have significant impacts on patient outcomes. This bidirectional approach ensures that the model accounts for all relevant temporal dynamics, thereby improving the prediction of patient-specific treatment plans.
BiLSTM extends LSTM with two layers processing data in both forward and reverse directions, capturing dependencies from a sequence’s past and future. Its output combines the hidden states of both layers, addressing the gradient vanishing issue in traditional RNNs for long sequences. This makes BiLSTM ideal for time-series prediction and labeling tasks. BiLSTM units perform four key operations at each time step—forget gate(Eq. (3)), input gate, cell state, and output gate(Eq. (4))—ensuring efficient data processing.
$$\:{f}_{t}=\sigma\:\left({W}_{f}\bullet\:\left[{h}_{t-1},{x}_{t}\right]+{b}_{f}\right)$$
(3)
$$\:{i}_{t}=\sigma\:\left({W}_{i}\bullet\:\left[{h}_{t-1},{x}_{t}\right]+{b}_{i}\right)$$
(4)
Thus, the combination of LDA for thematic data augmentation and BiLSTM for capturing temporal dependencies allows for a more robust and precise prediction model. This not only enhances the interpretability of the treatment pathways but also improves key performance metrics such as accuracy, precision, and recall, as demonstrated in our experimental results.
Model architecture
To clarify, the raw clinical day records contain variable-length sets of medical orders, which are transformed into fixed-length binary topic vectors using LDA and MultiLabelBinarizer, as detailed in the following section. This transformation enables the model to abstract meaningful semantic patterns from daily medical activity and then model their temporal evolution through the BiLSTM framework. Combining LDA and BiLSTM, our model utilizes LDA’s capability to extract key diagnostic and therapeutic patterns from clinical logs and BiLSTM’s strength in revealing temporal trends. This synergy allows for identifying crucial treatment patterns and their evolution over time, offering more precise and personalized plans for STEMI patients. The LDA model enhances the accuracy and robustness of deep learning models by providing diagnostic and therapeutic patterns as prior knowledge, especially important given the original data’s limited volume and the clinical day data’s high dimensionality. High-coherence theme combinations add diversity to medical text data, enriching the information for model training and reducing overfitting. BiLSTM’s bidirectional memory captures changes in pattern states across clinical days, offering an advantage over unidirectional memory networks in understanding the entire treatment process.
Previous research in clinical pathway construction displayed treatment stages in tabular formats, showing the presence or absence of diagnostic and treatment items28. Our approach uses the LDA model to mine multiple diagnostic and treatment patterns, encoding these items into a binary multi-label feature for specific diseases. This encoding enriches clinical day data with detailed item information. We then utilize a BiLSTM network, connected to a fully connected layer, to capture the sequence of clinical days, handling the set of labels within each day in one time step. To address the locally unordered but globally ordered nature of clinical pathways, we develop a Bidirectional Sequence Labelling LSTM (BiDiSeL-LSTM) model. This model integrates local multi-label features of clinical day items with the global sequential characteristics of the treatment process.
This study introduces a sequence labeling model combining Bidirectional Long Short-Term Memory networks (BiLSTM) and a fully connected layer for temporal distribution. Designed for sequential data, the model classifies outputs at each time step, akin to sequence labeling in natural language processing. In the context of clinical pathways, it assigns labels to diagnostic and treatment items per day, based on multi-label feature encoding of these items. This model captures the evolution in clinical day sequences as diagnostic and treatment item states change. The model’s architecture is depicted in Fig. 3.
Multi-label feature encoding
Preprocessing for the BiDiSeL-LSTM model involves multi-label feature encoding of treatment data. We transformed unstructured text data into structured numerical features, linking treatment patterns with diagnostic and treatment item vocabularies as multi-label features. Using MultiLabelBinarizer, we converted clinical item labels into binary arrays in two steps. First, we identified N diagnostic and treatment items per disease, creating a label set based on patterns mined by the LDA model. Second, for each clinical day, MultiLabelBinarizer generates a binary feature vector, marking the presence (1) or absence (0) of each item. This results in a binary array \(\:{\left(\text{0,1}\right)}^{N}\) for each clinical day, correlating each element with a label. The process is illustrated in Fig. 4 and Eqs. (5) and (6).
$$\:L=Labels\:I=Labels\left\{{I}_{1},{I}_{2},\dots\:,{I}_{N}\right\}=\left\{{l}_{1},{l}_{2},\dots\:,{l}_{N}\right\}$$
(5)
$$\:Labels\:{L}_{d}=\left\{{l}_{1},\dots\:,{l}_{m}\right\}$$
(6)

Multi-label feature encoding. This diagram demonstrates the data preprocessing stages for predictive modeling in a healthcare setting. It begins with the integer encoding of diagnostic and treatment items, where each item is assigned a unique numerical identifier. This encoded data is then used by a post-ranking topic model LDA (Latent Dirichlet Allocation) to determine the distribution of topics for each day. Each topic is characterized by a set of probabilities associated with various diagnostics and treatment terms, represented as vectors. The process is depicted for multiple days, illustrating the dynamic nature of the topic distributions over time. The extracted topics and their probabilities are then formatted for multi-label feature encoding, which transforms the data into binary vectors suitable for input into predictive models. This binary representation encapsulates the presence or absence of specific topics on a given day, facilitating the subsequent application of machine learning algorithms for diagnostic and treatment predictions.
The binary feature array effectively represents the multi-label characteristics of diagnostic and treatment items within each pattern. Employing the BiLSTM model for multi-label classification on each clinical day is key to identifying treatment patterns. These models process sequential data and maintain relationships between labels. In the BiLSTM process, label features from each clinical day are inputs, enabling the model to recognize diagnostic and treatment item features related to specific treatment patterns for each day.
Bidirectional long short-term memory network (BiLSTM)
The BiLSTM’s bidirectional structure effectively captures the dependencies of clinical activities in both forward and backward directions during the treatment process. It analyzes clinical day logs by considering the patterns of clinical activities occurring both before and after a given day, thus accurately understanding the contextual relationships throughout the entire treatment sequence.
Temporal distribution fully connected layer
In our time series labeling model, it’s crucial that the model outputs relevant attribute labels at each time step, allowing for independent predictions. The fully connected layer ensures consistent weight matrix usage across all BiLSTM layer units at every sequence step, enhancing parameter efficiency and ensuring a thorough understanding of data, especially for complex clinical activities. The model captures the entirety of clinical activity patterns. For multi-label classification, we use a sigmoid activation function (Eq. (7)), which adjusts the output gate’s opening to generate label probabilities, indicating the likelihood of clinical activities on a specific day. A probability threshold is set for activating labels, determining necessary clinical activities for each day.
$$\:Sigmoid\:Function:\sigma\:\left(x\right)=\frac{1}{1+{e}^{-x}}$$
(7)
By integrating a time-distributed fully connected layer with a sigmoid activation function, our model efficiently predicts label information in time series. This method is particularly beneficial for time series analysis in medicine, crucial for deepening our understanding of disease-specific clinical activity data.
While Fig. 3 describes the technical workflow of the LDA-BiLSTM model, the following figure (Fig. 5) emphasizes the interpretability of the framework and illustrates how each module contributes to actionable clinical insights. For example, the “Extracted Clinical Topic” module extracts topics such as “Coagulation Four Panel with D-Dimer,” which the “Clinical Pathway” module leverages to predict sequential treatments. This dual emphasis on technical and clinical aspects ensures the model’s efficacy and applicability in real-world scenarios.

An Interpretable Framework for Sequential Clinical Treatment Prediction. The figure is divided into three parts, illustrating a simplified framework for real-world usage. Initially, clinicians make decisions for the first-day treatment plan based on the patient’s diagnosis and condition. This treatment plan is then input into the LDA-BiLSTM core model, pre-trained on historical data, which predicts the treatment strategy for the subsequent day. The next day’s treatment plan can also be used as input for iterative predictions, generating a sequential treatment pathway until the patient’s discharge. If any treatment adjustments are made during the process, the model can adapt and predict the revised treatment strategy for the following days, maintaining continuity in the clinical pathway across time steps.
Model training and parameter tuning
LDA topic model
In our LDA topic model training, we split the dataset into training and validation sets and set the number of topics. We explore different topic numbers within a range of 2 to 30 to identify optimal treatment patterns for each clinical day. Model performance is evaluated using coherence and perplexity metrics, with coherence being prioritized due to the fixed set of clinical items in treatment patterns. We select the model with the lowest perplexity within a 0.005 consistency score threshold to balance generalization ability in mining patterns from unknown records, for Acute ST-Segment Elevation Myocardial Infarction treatment data, as shown in Fig. 6. For STEMI patients, we select one treatment plan per day based on the highest consistency scores, with consistency fluctuations between 0.001 and 0.03. For NSTEMI patients, the highest consistency is observed with the smallest number of topics, despite complexity variations (see Supplementary Fig. S1-2 online). These metrics visually demonstrate the impact of topic number on model accuracy and pattern identification consistency.

Consistency and perplexity score curves for the second day of treatment in acute ST-segment elevation myocardial infarction. This figure displays the evaluation of topic models based on coherence and perplexity metrics for Day 0 of the dataset. On the left, the ‘Coherence for Day 0’ graph plots the topic coherence against different numbers of topics, which is a measure of the semantic similarity between high scoring words within a topic. The coherence score peaks suggest the most interpretable topic numbers. On the right, the ‘Perplexity for Day 0’ graph shows the perplexity score as a function of the number of topics, indicating the model’s prediction power; lower perplexity values suggest better model performance. The steep drop followed by a plateau in the perplexity graph suggests the point of diminishing returns, guiding the optimal choice of topic numbers for the model. Both metrics are crucial for determining the quality and the optimal number of topics to use in topic modeling for this specific dataset.
Coherence, measuring the association strength between treatment items, is calculated using mutual scores of high-probability words and employs a co-occurrence matrix and normalized pointwise mutual information. The top n topics with the highest coherence scores are selected each day. The coherence score (CS) is defined by the following formula (Eq. (8)), assesses the association strength between topics for each diagnosis day.
$$\:CS\left({z}_{d}\right)={\sum\:}_{k=1}^{K}CoherenceScore\left({z}_{d,k}\right)$$
(8)
Perplexity, as per Eq. (9), evaluates the model’s predictive accuracy, with lower values indicating better fit. We chose hyperparameters to minimize perplexity through iterative training.
$$\:Perplexity\left({Doc}_{test}\right)=exp\left(-\frac{{\sum\:}_{doc-1}^{M}logp\left({doc}_{w}\right)}{{\sum\:}_{doc-1}^{M}{N}_{doc}}\right)$$
(9)
Where \(\:{Doc}_{test}\) is the collection of documents in the test set, \(\:M\) is the number of documents in the test set, \(\:{doc}_{w}\) is the set of words in the \(\:doc\)-th document, \(\:{N}_{doc}\) is the total number of words in the \(\:doc\)-th document, and \(\:p\left({doc}_{w}\right)\) is the joint probability of document \(\:doc\) in the model.
The topics from the model are ranked by coherence scores. For each clinical day, the highest coherence topics are chosen, and specific treatment patterns are determined based on the chronological order of clinical activities. This method constructs rich, personalized treatment plans and enhances the generalization capability of deep learning models in understanding diverse treatment pattern variations (The variables in the following pseudocode are visible in Table 2).

LDA topic optimization and clinical pathway effectiveness analysis.
BiDiSeL- LSTM model
In our study, the BiDiSeL-LSTM model uses sliding time steps to pair treatment days, building a memory network that captures treatment item sets and their temporal dependencies. The BiLSTM network memorizes temporal and label information from treatment sequences, treating each as a separate queue and resetting parameters after each sequence. This process enables the prediction of subsequent treatment days and the generation of a clinical pathway from the first day’s data.
The model’s training strategy involves learning different treatment processes in each epoch to prevent overfitting and enhance pattern recognition. This is supported by parameter tuning, including using the Adam optimizer with various learning rates and adjusting β1 and β2 parameters. Finally, thematic analysis selects the top two treatment patterns with the highest coherence scores for each disease, creating diverse therapeutic processes. Data is divided into training, validation, and test sets in an 8:1:1 ratio, with dimensions corresponding to treatment item labels per disease. The model parameter settings are shown as Table 3.

