Welcome to the final installment of our production LSTM series. In Part 2, we detailed training three separate models, each predicting one output. This is the standard practice of using separate models for oil, gas, and water production. However, to expand your understanding, this part extends the learning from Part 2 to a multivariate model, training a single model to predict three outcomes (oil, gas, and water).Use the same production data (Figure 1) of Equinor's Volve field. The TensorFlow code for this part can be found here. We highly recommend that you follow the code while reading this article in a separate window.
Source: All images created by the author.
Use data from the longest producing well (15/9-F-12): ON_STREAM_HRS, AVG_DOWNHOLE_PRESSURE, AVG_DOWNHOLE_TEMPERATURE, AVG_DP_TUBING, AVG_ANNULUS_PRESS, AVG_CHOKE_SIZE_P, AVG_CHOKE_UOM, AVG_WHP_P, AVG_WHT_P, DP_CHOKE_SIZE , OL for training the model, BORE_GAS_VOL, and BORE_WAT_VOL.
Figure 2 A plot of oil, gas, and water production over time. As expected, water volume is inversely correlated with oil and gas. moreover, Figure 3 shows post-processed data (null removal, negative generation, data scaling).
There are a total of 13 rows and 2991 columns, as shown in the bottom left of Figure 3. The first line is DATEPRD and is used for plotting purposes only. So there are 12 rows (features). Figure 4 shows the data after splitting into a 70% training, 15% validation, and 15% test dataset. Now it looks like this:
- Number of samples in train set: 2093
- Number of samples in validation set: 449
- Number of samples in test set: 449
As detailed in the previous part, LSTM feeds data from the previous time step (the previous day) and predicts the target for the current time step (the current day). Through experimentation, we found that using 5 time steps (last 5 days) as input and 1 time step as output gives the best results in this case. Therefore, enter all the characteristics (including oil, gas, and water production) for 5 days (day 1, day 2, day 3, day 4, day 5) and set the target (oil, gas , and water production). Day 6. Then enter all the characteristics for (day 2, 3, 4, 5, 6) and predict the target (oil, gas, water production) for day 7, etc. To do. To do this, define a “windowed_dataset” function to have features from the previous 5 time steps and a target for the current time step, as shown in the code.
After applying the “windowed_dataset” function to the data, I converted the input to a 3D tensor and the output to a 2D tensor. The input has the shape (batch_size, window_size = 5, features), but the output has the shape (batch_size, window_size = 1, target). here,
- Batch size is 32. This means processing her 32 samples at once.
- window_size (for input) is 5, indicating that each sample spans 5 time steps or 5 days.
- The window_size (for output) is 1 because we are predicting a target for only one day in the future.
- The features are 12 and represent the number of input data points at each time step.
- target is 3, representing one oil, gas, or water production per time step.
To be concrete, let's examine some data from the first batch, as shown below.
There are 5 time steps (5 days) as input, each consisting of 12 features, and the output is a single time step (1 day) with 3 target values. Note that the last three of the 12 inputs are the previous day's oil, gas, and water production. You can see that the current time step's goal (oil, gas, water production) is the input for the next time step. This pattern continues for subsequent time steps. Please refer to the following.
Input data, 5 time steps:
[[4.60000000e-01 9.78644267e-01 9.71262840e-01 7.19610527e-01 6.20908184e-01 1.44844313e-01 7.77839472e-01 1.76237915e-01 7.45028032e-01 4.83384250e-02 4.65185867e-02 6.02341404e-02]
[9.60000000e-01 9.62688840e-01 9.80421321e-01 6.51296485e-01 5.48525494e-01 2.28329865e-01 8.80399601e-01 5.77277891e-01 7.22026979e-01 3.17506950e-01 3.41211161e-01 1.11547602e-03]
[9.00000000e-01 9.39029466e-01 9.83888895e-01 6.50305816e-01 4.21707581e-01 3.12499664e-01 8.28179460e-01 7.86704514e-01 6.45492548e-01 5.30523767e-01 6.10007499e-01 1.81479969e-04]
[9.26000000e-01 9.45626620e-01 9.83669275e-01 6.46602008e-01 1.75352493e-01 2.85189302e-01 8.50846736e-01 7.57051390e-01 6.78770230e-01 4.42930431e-01 4.58654585e-01 1.51754802e-04]
[9.60000000e-01 9.40139637e-01 9.84220296e-01 6.45418394e-01 4.12548449e-01 3.02279023e-01 8.40678816e-01 7.73517171e-01 6.63527573e-01 5.18247692e-01 5.00578165e-01 7.65031939e-04]]
Output data, one step:
[5.06501446e-01 5.05708662e-01 2.89429261e-04]
The following input data, 5 time steps:
[[9.60000000e-01 9.62688840e-01 9.80421321e-01 6.51296485e-01 5.48525494e-01 2.28329865e-01 8.80399601e-01 5.77277891e-01 7.22026979e-01 3.17506950e-01 3.41211161e-01 1.11547602e-03]
[9.00000000e-01 9.39029466e-01 9.83888895e-01 6.50305816e-01 4.21707581e-01 3.12499664e-01 8.28179460e-01 7.86704514e-01 6.45492548e-01 5.30523767e-01 6.10007499e-01 1.81479969e-04]
[9.26000000e-01 9.45626620e-01 9.83669275e-01 6.46602008e-01 1.75352493e-01 2.85189302e-01 8.50846736e-01 7.57051390e-01 6.78770230e-01 4.42930431e-01 4.58654585e-01 1.51754802e-04]
[9.60000000e-01 9.40139637e-01 9.84220296e-01 6.45418394e-01 4.12548449e-01 3.02279023e-01 8.40678816e-01 7.73517171e-01 6.63527573e-01 5.18247692e-01 5.00578165e-01 7.65031939e-04]
[9.60000000e-01 9.38920170e-01 9.84408049e-01 6.44863216e-01 4.79168806e-01 2.99740521e-01 8.39014000e-01 7.96483690e-01 6.61992601e-01 5.06501446e-01 5.05708662e-01 2.89429261e-04]]
Output data, one step:
[0.50614313 0.49502421 0.00101378]
The following input data, 5 time steps:
[[9.00000000e-01 9.39029466e-01 9.83888895e-01 6.50305816e-01 4.21707581e-01 3.12499664e-01 8.28179460e-01 7.86704514e-01 6.45492548e-01 5.30523767e-01 6.10007499e-01 1.81479969e-04]
[9.26000000e-01 9.45626620e-01 9.83669275e-01 6.46602008e-01 1.75352493e-01 2.85189302e-01 8.50846736e-01 7.57051390e-01 6.78770230e-01 4.42930431e-01 4.58654585e-01 1.51754802e-04]
[9.60000000e-01 9.40139637e-01 9.84220296e-01 6.45418394e-01 4.12548449e-01 3.02279023e-01 8.40678816e-01 7.73517171e-01 6.63527573e-01 5.18247692e-01 5.00578165e-01 7.65031939e-04]
[9.60000000e-01 9.38920170e-01 9.84408049e-01 6.44863216e-01 4.79168806e-01 2.99740521e-01 8.39014000e-01 7.96483690e-01 6.61992601e-01 5.06501446e-01 5.05708662e-01 2.89429261e-04]
[9.20000000e-01 9.34661316e-01 9.84735168e-01 6.46949520e-01 5.32329634e-01 3.03084030e-01 8.25001569e-01 8.02090048e-01 6.46641243e-01 5.06143132e-01 4.95024213e-01 1.01378466e-03]]
Output data, one step:
[5.17295018e-01 5.13430126e-01 1.14207222e-04]
wonderful! Next, let's proceed with building the LSTM model.I chose to use the same architecture (Figure 5) Used in part 2. Figure 6 shows the results on a blind dataset.
As is evident from Figure 6, LSTM performs very poorly in predicting three different outputs. My hypothesis is that the main reason is water production which has a contrasting pattern compared to her two other productions (oil and gas). Therefore, our model has difficulty learning different trends. I've been experimenting with different hyperparameters and custom loss functions to get a good fit for all three. Unfortunately, they were all in vain.
However, the most favorable results were obtained when LSTM was combined with a convolutional neural network (CNN). This coupled model is called a long-term and short-term time series network (LSTNet). convnet searches for patterns anywhere within the input time series. Converts a long input sequence into a much shorter downsampled sequence containing higher level features. This extracted sequence of features becomes the input to the LSTM component of the network.
The LSTNet model architecture is shown below. Figure 7is one Conv1D layer with 284 filters and kernel size 3, followed by an LSTM layer with 284 units and a relu activation function, then a dropout layer with rate 0.5, and finally one dense layer with 3 units. It's a layer. Sigmoid activation function. The Adam optimizer with a learning rate of 0.0001 and mean squared log error is used as the loss function. Similar to the previous part, early stopping is used to stop training early if there is no significant improvement on the validation dataset. With this configuration, early stopping is triggered at approximately 300 epochs. The following figure shows the model's performance on the blind dataset. Figure 8.
The results (Figure 8) are very good, but not perfect in the water, especially in the tail. You can also try adjusting the hyperparameters further to get better compatibility with water as well.
Until now, we have only used production data to predict a well's future production, which is, after all, empirical decline curve analysis (DCA). Reservoir characteristics, completion design, or production constraints play no role in DCA. Additionally, DCA completely ignores water or gas injection rates.
Therefore, it is not wise to adopt machine learning to tackle DCA. In full-field modeling terminology, it is very important to include spatial data (static) along with temporal data (dynamic) in the model. For example, porosity, permeability, thickness, perforation, and offset well data, just to name a few. A future article will cover this topic: Preparing spatiotemporal datasets for machine learning.
Thank you for your time. Stay tuned for new content in the future!