Long Short-Term Memory (LSTM) for Predicting Pore Pressure in Liquefaction¶

Learning Objectives:

Understand why standard models fail to capture stress history-dependent behavior
Learn how LSTM networks capture sequential dependencies in time-series data
Master time-window sampling for sequence-to-sequence prediction
Compare constitutive models vs data-driven LSTM approaches

Reference Paper: Choi and Kumar (2023). "A Machine Learning Approach to Predicting Pore Pressure Response in Liquefiable Sands under Cyclic Loading." Geo-Congress 2023.

Exercise: Solution:

Reference: lstm-liquefaction

The Problem: Liquefaction and the Shielding Effect¶

What is Liquefaction?¶

During an earthquake, saturated soils experience cyclic loading that can cause liquefaction - a phenomenon where:

Excess pore water pressure builds up
Effective stress decreases
Soil loses strength and stiffness
When excess pore pressure equals the confining pressure ($r_u = 1.0$), liquefaction occurs

The Shielding Effect¶

Critical Observation: Pore pressure generation depends on stress history, not just current stress!

Shielding Effect: Excess pore pressure does NOT increase when the current shear stress amplitude is lower than the peak prior amplitude.

✅ New peak stress > prior peak → pore pressure increases rapidly
❌ Current stress < prior peak → pore pressure remains nearly constant (shielded)

Why Traditional Models Fail¶

Sophisticated constitutive models like PM4Sand and PDMY02 fail to capture the shielding effect:

They predict continuous pore pressure increase even when shielding should occur
They miss the fluctuations in pore pressure response
They provide less accurate predictions of time to liquefaction

Our Solution: Use LSTM networks to learn the stress history-dependent behavior directly from experimental data!

Get data¶

In [ ]:

Copied!

!wget https://github.com/kks32-courses/ai-geotech/raw/refs/heads/main/docs/02-lstm/data.zip
!unzip data.zip
!wget https://github.com/kks32-courses/ai-geotech/raw/refs/heads/main/docs/02-lstm/data.zip
!unzip data.zip

Import Libraries¶

In [59]:

Copied!





import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from pathlib import Path
from tqdm.auto import tqdm

# Deep learning libraries - PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, TensorDataset

# Evaluation
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Set device - prioritize MPS (Apple Silicon), then CUDA, then CPU
if torch.backends.mps.is_available():
    device = torch.device('mps')
    print("🚀 Using Apple Silicon GPU (MPS) for acceleration!")
elif torch.cuda.is_available():
    device = torch.device('cuda')
    print("🚀 Using NVIDIA GPU (CUDA)")
else:
    device = torch.device('cpu')
    print("⚠️  Using CPU (this will be slow)")

print(f"Device: {device}")
print(f"PyTorch version: {torch.__version__}")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from pathlib import Path
from tqdm.auto import tqdm

# Deep learning libraries - PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, TensorDataset

# Evaluation
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Set device - prioritize MPS (Apple Silicon), then CUDA, then CPU
if torch.backends.mps.is_available():
    device = torch.device('mps')
    print("🚀 Using Apple Silicon GPU (MPS) for acceleration!")
elif torch.cuda.is_available():
    device = torch.device('cuda')
    print("🚀 Using NVIDIA GPU (CUDA)")
else:
    device = torch.device('cpu')
    print("⚠️  Using CPU (this will be slow)")

print(f"Device: {device}")
print(f"PyTorch version: {torch.__version__}")

🚀 Using NVIDIA GPU (CUDA)
Device: cuda
PyTorch version: 2.8.0+cu126

Understanding the Experimental Data¶

Cyclic Simple Shear Tests on Nevada Sand¶

Data Source: Kwan et al. (2017) - Cyclic simple shear tests on saturated Nevada sand

Test Conditions:

Loose samples: $D_r$ = 33-55% (Experiments 7 and 9)
Dense samples: $D_r$ = 72-94% (Experiments 8 and 10)
Loading types:
- Harmonic (constant amplitude)
- Modulated-up (increasing amplitude)
- Modulated-down (decreasing amplitude)

Measured Variables:

Time [sec]
Shear Strain [%]
Shear Stress [kPa]
Effective Vertical Stress [kPa]
Excess Pore Pressure [kPa]

Computed Variables:

Relative Density $D_r$ [%]
Pore pressure ratio: $r_u = \frac{\text{Excess Pore Pressure}}{\text{Confining Pressure}}$

Data Loading Functions¶

Let's create functions to load and process the cyclic simple shear test data:

In [60]:

Copied!





def augment_data(df, add_length=900, time_step=0.012):
    """
    Augment data by adding static state before cyclic loading begins.
    This represents the initial state before shearing starts.
    """
    css_start_time = time_step * add_length
    add_Dr = np.full((add_length, 1), df["Dr [%]"].iloc[0])
    add_Time = np.reshape(np.linspace(0, css_start_time, add_length), (add_length, 1))
    add_ShearStrain = np.full((add_length, 1), 0)
    add_SheerStress = np.full((add_length, 1), 0)
    add_ConfPressure = np.full((add_length, 1), df["Effective Vertical Stress [kPa]"].iloc[0])
    add_ru = np.full((add_length, 1), df["ru"].iloc[0])
    add_PWP = np.full((add_length, 1), df["Excess Pore Pressure [kPa]"].iloc[0])

    # Aggregate as array
    add_data = np.hstack(
        (add_Dr, add_Time, add_ShearStrain, add_SheerStress, add_ConfPressure, add_ru, add_PWP)
    )

    # Append original timesteps
    first_timesteps = add_Time
    last_timesteps = first_timesteps[-1, 0] + df["Time [sec]"].to_numpy()
    last_timesteps = np.reshape(last_timesteps, (last_timesteps.shape[0], 1))
    full_timesteps = np.vstack((first_timesteps, last_timesteps))

    # Create dataframe
    header = df.columns
    df_add_data = pd.DataFrame(add_data, columns=header)

    # Append
    df_augmented = pd.concat([df_add_data, df], ignore_index=True)
    df_augmented["Time [sec]"] = full_timesteps

    return df_augmented


def csv_to_dataframe(file_path, num_headers=5):
    """
    Convert CSV file to dataframe with computed ru value.

    Args:
        file_path: Path to CSV file
        num_headers: Number of header rows (default=5)

    Returns:
        DataFrame with added Dr and ru columns
    """
    # Read relative density from header
    with open(file_path) as f:
        lines = f.readlines()
        if len(lines) >= num_headers:
            data = lines[num_headers-1]
            try:
                relative_density = float(data[18:22])
            except:
                relative_density = 0.5  # Default value
        else:
            relative_density = 0.5

    # Read CSV
    df = pd.read_csv(file_path, header=num_headers)

    # Add relative density as first column
    df.insert(loc=0, column="Dr [%]", value=relative_density * 100)  # Convert to percentage

    # Compute and insert ru (pore pressure ratio)
    confining_pressure = df['Effective Vertical Stress [kPa]'].iloc[0]
    ru = df['Excess Pore Pressure [kPa]'] / confining_pressure
    df.insert(loc=5, column="ru", value=ru)

    # Augment data
    df = augment_data(df=df)

    return df
def augment_data(df, add_length=900, time_step=0.012):
    """
    Augment data by adding static state before cyclic loading begins.
    This represents the initial state before shearing starts.
    """
    css_start_time = time_step * add_length
    add_Dr = np.full((add_length, 1), df["Dr [%]"].iloc[0])
    add_Time = np.reshape(np.linspace(0, css_start_time, add_length), (add_length, 1))
    add_ShearStrain = np.full((add_length, 1), 0)
    add_SheerStress = np.full((add_length, 1), 0)
    add_ConfPressure = np.full((add_length, 1), df["Effective Vertical Stress [kPa]"].iloc[0])
    add_ru = np.full((add_length, 1), df["ru"].iloc[0])
    add_PWP = np.full((add_length, 1), df["Excess Pore Pressure [kPa]"].iloc[0])

    # Aggregate as array
    add_data = np.hstack(
        (add_Dr, add_Time, add_ShearStrain, add_SheerStress, add_ConfPressure, add_ru, add_PWP)
    )

    # Append original timesteps
    first_timesteps = add_Time
    last_timesteps = first_timesteps[-1, 0] + df["Time [sec]"].to_numpy()
    last_timesteps = np.reshape(last_timesteps, (last_timesteps.shape[0], 1))
    full_timesteps = np.vstack((first_timesteps, last_timesteps))

    # Create dataframe
    header = df.columns
    df_add_data = pd.DataFrame(add_data, columns=header)

    # Append
    df_augmented = pd.concat([df_add_data, df], ignore_index=True)
    df_augmented["Time [sec]"] = full_timesteps

    return df_augmented


def csv_to_dataframe(file_path, num_headers=5):
    """
    Convert CSV file to dataframe with computed ru value.

    Args:
        file_path: Path to CSV file
        num_headers: Number of header rows (default=5)

    Returns:
        DataFrame with added Dr and ru columns
    """
    # Read relative density from header
    with open(file_path) as f:
        lines = f.readlines()
        if len(lines) >= num_headers:
            data = lines[num_headers-1]
            try:
                relative_density = float(data[18:22])
            except:
                relative_density = 0.5  # Default value
        else:
            relative_density = 0.5

    # Read CSV
    df = pd.read_csv(file_path, header=num_headers)

    # Add relative density as first column
    df.insert(loc=0, column="Dr [%]", value=relative_density * 100)  # Convert to percentage

    # Compute and insert ru (pore pressure ratio)
    confining_pressure = df['Effective Vertical Stress [kPa]'].iloc[0]
    ru = df['Excess Pore Pressure [kPa]'] / confining_pressure
    df.insert(loc=5, column="ru", value=ru)

    # Augment data
    df = augment_data(df=df)

    return df

Load Sample Data¶

Let's load a sample dataset to visualize the cyclic simple shear test results:

In [61]:

Copied!





# Load a sample dataset
sample_file = "Experiment-9/Trial-1_121227_yc1.csv"
sample_df = csv_to_dataframe(sample_file)

print(f"Dataset shape: {sample_df.shape}")
print(f"\nColumns: {list(sample_df.columns)}")
print(f"\nRelative Density: {sample_df['Dr [%]'].iloc[0]:.1f}%")
print(f"\nFirst few rows:")
sample_df.head()
# Load a sample dataset
sample_file = "Experiment-9/Trial-1_121227_yc1.csv"
sample_df = csv_to_dataframe(sample_file)

print(f"Dataset shape: {sample_df.shape}")
print(f"\nColumns: {list(sample_df.columns)}")
print(f"\nRelative Density: {sample_df['Dr [%]'].iloc[0]:.1f}%")
print(f"\nFirst few rows:")
sample_df.head()

Dataset shape: (9565, 7)

Columns: ['Dr [%]', 'Time [sec]', 'Shear Strain [%]', 'Shear Stress [kPa]', 'Effective Vertical Stress [kPa]', 'ru', 'Excess Pore Pressure [kPa]']

Relative Density: 50.0%

First few rows:

Out[61]:

	Dr [%]	Time [sec]	Effective Vertical Stress [kPa]
0	50.0	0.000000	104.523318
1	50.0	0.012013	104.523318
2	50.0	0.024027	104.523318
3	50.0	0.036040	104.523318
4	50.0	0.048053	104.523318

Visualize Cyclic Simple Shear Test Data¶

In [62]:

Copied!





# Plot the cyclic simple shear test results
fig, axes = plt.subplots(3, 1, figsize=(14, 10))

# Plot 1: Shear Stress vs Time
ax = axes[0]
ax.plot(sample_df["Time [sec]"], sample_df["Shear Stress [kPa]"], 'b-', linewidth=1)
ax.set_ylabel('Shear Stress [kPa]', fontsize=12)
ax.set_title(f'Cyclic Simple Shear Test (Dr = {sample_df["Dr [%]"].iloc[0]:.1f}%)', fontsize=14)
ax.grid(True, alpha=0.3)

# Plot 2: Shear Strain vs Time
ax = axes[1]
ax.plot(sample_df["Time [sec]"], sample_df["Shear Strain [%]"], 'g-', linewidth=1)
ax.set_ylabel('Shear Strain [%]', fontsize=12)
ax.grid(True, alpha=0.3)

# Plot 3: Pore Pressure Ratio vs Time
ax = axes[2]
ax.plot(sample_df["Time [sec]"], sample_df["ru"], 'r-', linewidth=1.5)
ax.axhline(y=1.0, color='k', linestyle='--', linewidth=1, alpha=0.5, label='Liquefaction threshold')
ax.set_xlabel('Time [sec]', fontsize=12)
ax.set_ylabel('Pore Pressure Ratio $r_u$', fontsize=12)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Maximum pore pressure ratio: {sample_df['ru'].max():.3f}")
print(f"Liquefaction {'occurred' if sample_df['ru'].max() >= 1.0 else 'did not occur'}")
# Plot the cyclic simple shear test results
fig, axes = plt.subplots(3, 1, figsize=(14, 10))

# Plot 1: Shear Stress vs Time
ax = axes[0]
ax.plot(sample_df["Time [sec]"], sample_df["Shear Stress [kPa]"], 'b-', linewidth=1)
ax.set_ylabel('Shear Stress [kPa]', fontsize=12)
ax.set_title(f'Cyclic Simple Shear Test (Dr = {sample_df["Dr [%]"].iloc[0]:.1f}%)', fontsize=14)
ax.grid(True, alpha=0.3)

# Plot 2: Shear Strain vs Time
ax = axes[1]
ax.plot(sample_df["Time [sec]"], sample_df["Shear Strain [%]"], 'g-', linewidth=1)
ax.set_ylabel('Shear Strain [%]', fontsize=12)
ax.grid(True, alpha=0.3)

# Plot 3: Pore Pressure Ratio vs Time
ax = axes[2]
ax.plot(sample_df["Time [sec]"], sample_df["ru"], 'r-', linewidth=1.5)
ax.axhline(y=1.0, color='k', linestyle='--', linewidth=1, alpha=0.5, label='Liquefaction threshold')
ax.set_xlabel('Time [sec]', fontsize=12)
ax.set_ylabel('Pore Pressure Ratio $r_u$', fontsize=12)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Maximum pore pressure ratio: {sample_df['ru'].max():.3f}")
print(f"Liquefaction {'occurred' if sample_df['ru'].max() >= 1.0 else 'did not occur'}")

No description has been provided for this image

Maximum pore pressure ratio: 0.647
Liquefaction did not occur

What is an LSTM Network?¶

From RNN to LSTM¶

Recurrent Neural Networks (RNN):

Process sequential data by maintaining a "memory" of previous inputs
Problem: Vanishing/exploding gradients for long sequences
Cannot learn long-term dependencies effectively

Long Short-Term Memory (LSTM):

Solves the vanishing gradient problem
Uses gates to control information flow
Can learn both short-term and long-term patterns

LSTM Cell Architecture¶

An LSTM cell has three gates:

Forget Gate ($f_t$): What information to discard from previous cell state $$f_t = \sigma(W_{xf}x_t + W_{hf}h_{t-1} + b_f)$$
Input Gate ($i_t$): What new information to add to cell state $$i_t = \sigma(W_{xi}x_t + W_{hi}h_{t-1} + b_i)$$ $$\tilde{C}_t = \tanh(W_{xc}x_t + W_{hc}h_{t-1} + b_c)$$
Output Gate ($o_t$): What information to output $$o_t = \sigma(W_{xo}x_t + W_{ho}h_{t-1} + b_o)$$

Cell State Update: $$C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t$$ $$h_t = o_t \odot \tanh(C_t)$$

Where:

$x_t$: Input at time $t$
$h_t$: Hidden state (output)
$C_t$: Cell state (memory)
$\sigma$: Sigmoid function (outputs 0-1)
$\odot$: Element-wise multiplication

Why LSTM for Liquefaction?¶

Perfect for the Shielding Effect:

Forget gate can forget low-stress periods
Input gate selectively remembers peak stresses
Cell state maintains long-term memory of stress history
No need to hard-code "if current < peak" logic!

Data Preparation for LSTM¶

Time-Window Sampling¶

Key Insight: To predict $r_u$ at time $t+800$, we need the stress history from $t$ to $t+799$

Features ($\mathbf{X}_t \in \mathbb{R}^{3 \times 800}$):

Normalized time: $[T_t, T_{t+1}, ..., T_{t+799}]$
Shear stress history: $[\tau_t, \tau_{t+1}, ..., \tau_{t+799}]$
Relative density (constant): $[D_r, D_r, ..., D_r]$

Target ($\mathbf{Y}_t \in \mathbb{R}^{1 \times 1}$):

Pore pressure ratio at next time step: $r_{u,t+800}$

Window Length: 800 time steps ≈ 2 loading cycles

In [63]:

Copied!





def normalize_data(dfs):
    """
    Normalize time by final time and shear stress by confining pressure.
    This ensures values are in [0, 1] range for better training.
    """
    normalized_dfs = []
    for df in dfs:
        confining_pressure = df["Effective Vertical Stress [kPa]"].iloc[0]
        final_time = df["Time [sec]"].iloc[-1]
        normalized_df = df.copy()
        normalized_df["Shear Stress [kPa]"] = df["Shear Stress [kPa]"] / confining_pressure
        normalized_df["Time [sec]"] = df["Time [sec]"] / final_time
        # Normalize Dr to [0, 1]
        normalized_df["Dr [%]"] = df["Dr [%]"] / 100.0
        normalized_dfs.append(normalized_df)
    return normalized_dfs


def create_lstm_inputs(dfs, features, targets, window_length):
    """
    Create time-window sampled inputs for LSTM.

    Args:
        dfs: List of dataframes
        features: List of feature column names
        targets: List of target column names
        window_length: Length of time window (e.g., 800)

    Returns:
        Dictionary with feature datasets, target datasets, and windowed samples
    """
    feature_datasets = []
    target_datasets = []
    x_rnns = []
    y_rnns = []

    for df in dfs:
        # Extract features and targets as arrays
        feature_dataset = df[features].to_numpy()
        target_dataset = df[targets].to_numpy()

        feature_datasets.append(feature_dataset)
        target_datasets.append(target_dataset)

        datapoint = len(df)

        # Create sliding windows
        sampled_features = []
        sampled_targets = []

        for i in range(datapoint - window_length):
            # Window of features from t to t+window_length
            sampled_feature = feature_dataset[i:i+window_length, :]
            # Target at t+window_length
            sampled_target = target_dataset[i+window_length, :]
            sampled_features.append(sampled_feature)
            sampled_targets.append(sampled_target)

        x_rnn = np.asarray(sampled_features)  # Shape: (samples, window_length, features)
        y_rnn = np.asarray(sampled_targets)   # Shape: (samples, targets)

        x_rnns.append(x_rnn)
        y_rnns.append(y_rnn)

    return {
        "feature_datasets": feature_datasets,
        "target_datasets": target_datasets,
        "x_rnns": x_rnns,
        "y_rnns": y_rnns
    }
def normalize_data(dfs):
    """
    Normalize time by final time and shear stress by confining pressure.
    This ensures values are in [0, 1] range for better training.
    """
    normalized_dfs = []
    for df in dfs:
        confining_pressure = df["Effective Vertical Stress [kPa]"].iloc[0]
        final_time = df["Time [sec]"].iloc[-1]
        normalized_df = df.copy()
        normalized_df["Shear Stress [kPa]"] = df["Shear Stress [kPa]"] / confining_pressure
        normalized_df["Time [sec]"] = df["Time [sec]"] / final_time
        # Normalize Dr to [0, 1]
        normalized_df["Dr [%]"] = df["Dr [%]"] / 100.0
        normalized_dfs.append(normalized_df)
    return normalized_dfs


def create_lstm_inputs(dfs, features, targets, window_length):
    """
    Create time-window sampled inputs for LSTM.

    Args:
        dfs: List of dataframes
        features: List of feature column names
        targets: List of target column names
        window_length: Length of time window (e.g., 800)

    Returns:
        Dictionary with feature datasets, target datasets, and windowed samples
    """
    feature_datasets = []
    target_datasets = []
    x_rnns = []
    y_rnns = []

    for df in dfs:
        # Extract features and targets as arrays
        feature_dataset = df[features].to_numpy()
        target_dataset = df[targets].to_numpy()

        feature_datasets.append(feature_dataset)
        target_datasets.append(target_dataset)

        datapoint = len(df)

        # Create sliding windows
        sampled_features = []
        sampled_targets = []

        for i in range(datapoint - window_length):
            # Window of features from t to t+window_length
            sampled_feature = feature_dataset[i:i+window_length, :]
            # Target at t+window_length
            sampled_target = target_dataset[i+window_length, :]
            sampled_features.append(sampled_feature)
            sampled_targets.append(sampled_target)

        x_rnn = np.asarray(sampled_features)  # Shape: (samples, window_length, features)
        y_rnn = np.asarray(sampled_targets)   # Shape: (samples, targets)

        x_rnns.append(x_rnn)
        y_rnns.append(y_rnn)

    return {
        "feature_datasets": feature_datasets,
        "target_datasets": target_datasets,
        "x_rnns": x_rnns,
        "y_rnns": y_rnns
    }

Demonstrate Time-Window Sampling¶

Let's visualize how time-window sampling works:

In [64]:

Copied!





# Create a simple example
window_length = 800
features = ["Time [sec]", "Dr [%]", "Shear Stress [kPa]"]
targets = ["ru"]

# Normalize the sample dataframe
normalized_sample = normalize_data([sample_df])

# Create LSTM inputs
lstm_data = create_lstm_inputs(normalized_sample, features, targets, window_length)

print(f"Number of windows created: {len(lstm_data['x_rnns'][0])}")
print(f"Shape of each window (features): {lstm_data['x_rnns'][0].shape}")
print(f"  - Samples: {lstm_data['x_rnns'][0].shape[0]}")
print(f"  - Window length: {lstm_data['x_rnns'][0].shape[1]}")
print(f"  - Features: {lstm_data['x_rnns'][0].shape[2]}")
print(f"\nShape of targets: {lstm_data['y_rnns'][0].shape}")

# Visualize one window
window_idx = 1000
plt.figure(figsize=(12, 4))
plt.plot(lstm_data['x_rnns'][0][window_idx, :, 2], 'b-', linewidth=1, label='Input: Shear Stress History')
plt.axhline(y=lstm_data['y_rnns'][0][window_idx, 0], color='r', linestyle='--',
            linewidth=2, label=f'Target: $r_u$ = {lstm_data["y_rnns"][0][window_idx, 0]:.3f}')
plt.xlabel('Time step within window')
plt.ylabel('Normalized Shear Stress')
plt.title(f'Time-Window Sample {window_idx}: Predict $r_u$ from stress history')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
# Create a simple example
window_length = 800
features = ["Time [sec]", "Dr [%]", "Shear Stress [kPa]"]
targets = ["ru"]

# Normalize the sample dataframe
normalized_sample = normalize_data([sample_df])

# Create LSTM inputs
lstm_data = create_lstm_inputs(normalized_sample, features, targets, window_length)

print(f"Number of windows created: {len(lstm_data['x_rnns'][0])}")
print(f"Shape of each window (features): {lstm_data['x_rnns'][0].shape}")
print(f"  - Samples: {lstm_data['x_rnns'][0].shape[0]}")
print(f"  - Window length: {lstm_data['x_rnns'][0].shape[1]}")
print(f"  - Features: {lstm_data['x_rnns'][0].shape[2]}")
print(f"\nShape of targets: {lstm_data['y_rnns'][0].shape}")

# Visualize one window
window_idx = 1000
plt.figure(figsize=(12, 4))
plt.plot(lstm_data['x_rnns'][0][window_idx, :, 2], 'b-', linewidth=1, label='Input: Shear Stress History')
plt.axhline(y=lstm_data['y_rnns'][0][window_idx, 0], color='r', linestyle='--',
            linewidth=2, label=f'Target: $r_u$ = {lstm_data["y_rnns"][0][window_idx, 0]:.3f}')
plt.xlabel('Time step within window')
plt.ylabel('Normalized Shear Stress')
plt.title(f'Time-Window Sample {window_idx}: Predict $r_u$ from stress history')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Number of windows created: 8765
Shape of each window (features): (8765, 800, 3)
  - Samples: 8765
  - Window length: 800
  - Features: 3

Shape of targets: (8765, 1)

LSTM Model Architecture (PyTorch)¶

Network Structure¶

Our LSTM-based model consists of:

Input (batch, 800, 3)
    ↓
LSTM Layer 1 (128 units, return all sequences)
    ↓
LSTM Layer 2 (128 units, return final output)
    ↓
Dense Layer 1 (64 units, tanh activation)
    ↓
Dense Layer 2 (16 units, tanh activation)
    ↓
Output Layer (1 unit, tanh activation)
    ↓
Output: ru prediction (batch, 1)

Key Design Choices:

Two LSTM layers: Capture complex temporal patterns
128 units: Balance between capacity and overfitting
tanh activation: Constrains output to [-1, 1], suitable for $r_u \in [0, 1]$
Dense layers: Further process LSTM output

In [65]:

Copied!





class LSTMLiquefactionModel(nn.Module):
    """
    LSTM model for pore pressure prediction in liquefaction.
    """
    def __init__(self, input_size=3, hidden_size=128, num_layers=2, output_size=1):
        super(LSTMLiquefactionModel, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # LSTM layers
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,  # Input shape: (batch, seq, features)
            dropout=0.0  # No dropout for now
        )

        # Dense layers
        self.fc1 = nn.Linear(hidden_size, 64)
        self.fc2 = nn.Linear(64, 16)
        self.fc3 = nn.Linear(16, output_size)

        # Activation
        self.tanh = nn.Tanh()

    def forward(self, x):
        # x shape: (batch, seq_len, input_size)

        # LSTM forward
        # lstm_out shape: (batch, seq_len, hidden_size)
        # h_n, c_n shape: (num_layers, batch, hidden_size)
        lstm_out, (h_n, c_n) = self.lstm(x)

        # Take the output from the last time step
        # last_output shape: (batch, hidden_size)
        last_output = lstm_out[:, -1, :]

        # Dense layers with tanh activation
        out = self.tanh(self.fc1(last_output))
        out = self.tanh(self.fc2(out))
        out = self.tanh(self.fc3(out))

        return out


# Create and display model
model = LSTMLiquefactionModel(input_size=3, hidden_size=128, num_layers=2, output_size=1)
model = model.to(device)

print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")
class LSTMLiquefactionModel(nn.Module):
    """
    LSTM model for pore pressure prediction in liquefaction.
    """
    def __init__(self, input_size=3, hidden_size=128, num_layers=2, output_size=1):
        super(LSTMLiquefactionModel, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # LSTM layers
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,  # Input shape: (batch, seq, features)
            dropout=0.0  # No dropout for now
        )

        # Dense layers
        self.fc1 = nn.Linear(hidden_size, 64)
        self.fc2 = nn.Linear(64, 16)
        self.fc3 = nn.Linear(16, output_size)

        # Activation
        self.tanh = nn.Tanh()

    def forward(self, x):
        # x shape: (batch, seq_len, input_size)

        # LSTM forward
        # lstm_out shape: (batch, seq_len, hidden_size)
        # h_n, c_n shape: (num_layers, batch, hidden_size)
        lstm_out, (h_n, c_n) = self.lstm(x)

        # Take the output from the last time step
        # last_output shape: (batch, hidden_size)
        last_output = lstm_out[:, -1, :]

        # Dense layers with tanh activation
        out = self.tanh(self.fc1(last_output))
        out = self.tanh(self.fc2(out))
        out = self.tanh(self.fc3(out))

        return out


# Create and display model
model = LSTMLiquefactionModel(input_size=3, hidden_size=128, num_layers=2, output_size=1)
model = model.to(device)

print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

LSTMLiquefactionModel(
  (lstm): LSTM(3, 128, num_layers=2, batch_first=True)
  (fc1): Linear(in_features=128, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=16, bias=True)
  (fc3): Linear(in_features=16, out_features=1, bias=True)
  (tanh): Tanh()
)

Total parameters: 209,505
Trainable parameters: 209,505

Training Configuration¶

Hyperparameters¶

Optimizer: Adam with learning rate = 0.0002

Adaptive learning rate for each parameter
Works well with RNNs/LSTMs

Loss Function: Mean Squared Error (MSE) $$\mathcal{L} = \frac{1}{n} \sum_{i=1}^{n} (r_{u,\text{predicted}}^{(i)} - r_{u,\text{target}}^{(i)})^2$$

Training Settings:

Epochs: 100
Batch size: 32
Validation split: 20%

In [66]:

Copied!





def train_model(model, train_loader, val_loader, epochs=100, learning_rate=0.0002, device='cpu'):
    """
    Train the LSTM model with real-time progress tracking optimized for Mac.
    """
    print(f"🔧 Setting up training on {device}...")

    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    history = {
        'train_loss': [],
        'val_loss': []
    }

    best_val_loss = float('inf')
    patience = 1000
    patience_counter = 0

    print(f"🏃 Starting training loop...")

    # Main epoch loop with tqdm
    pbar = tqdm(range(epochs), desc='Training')

    for epoch in pbar:
        # Training phase
        model.train()
        train_loss = 0.0
        train_batches = 0

        for batch_x, batch_y in train_loader:
            # Move to device
            batch_x = batch_x.to(device)
            batch_y = batch_y.to(device)

            # Forward pass
            optimizer.zero_grad()
            outputs = model(batch_x)
            loss = criterion(outputs, batch_y)

            # Backward pass
            loss.backward()
            optimizer.step()

            train_loss += loss.item()
            train_batches += 1

        train_loss /= train_batches

        # Validation phase
        model.eval()
        val_loss = 0.0
        val_batches = 0

        with torch.no_grad():
            for batch_x, batch_y in val_loader:
                batch_x = batch_x.to(device)
                batch_y = batch_y.to(device)

                outputs = model(batch_x)
                loss = criterion(outputs, batch_y)
                val_loss += loss.item()
                val_batches += 1

        val_loss /= val_batches

        history['train_loss'].append(train_loss)
        history['val_loss'].append(val_loss)

        # Early stopping
        is_best = val_loss < best_val_loss
        if is_best:
            best_val_loss = val_loss
            patience_counter = 0
            torch.save(model.state_dict(), 'best_lstm_model.pth')
            status = '✓'
        else:
            patience_counter += 1
            status = f'({patience_counter})'

        # Update progress bar with metrics
        pbar.set_postfix({
            'train': f'{train_loss:.4f}',
            'val': f'{val_loss:.4f}',
            'best': f'{best_val_loss:.4f}',
            's': status
        })

        if patience_counter >= patience:
            pbar.write(f'⏹️  Early stopping at epoch {epoch+1}')
            break

    # Load best model
    model.load_state_dict(torch.load('best_lstm_model.pth'))
    print(f'\n✅ Training completed! Best validation loss: {best_val_loss:.6f}')

    return history
def train_model(model, train_loader, val_loader, epochs=100, learning_rate=0.0002, device='cpu'):
    """
    Train the LSTM model with real-time progress tracking optimized for Mac.
    """
    print(f"🔧 Setting up training on {device}...")

    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    history = {
        'train_loss': [],
        'val_loss': []
    }

    best_val_loss = float('inf')
    patience = 1000
    patience_counter = 0

    print(f"🏃 Starting training loop...")

    # Main epoch loop with tqdm
    pbar = tqdm(range(epochs), desc='Training')

    for epoch in pbar:
        # Training phase
        model.train()
        train_loss = 0.0
        train_batches = 0

        for batch_x, batch_y in train_loader:
            # Move to device
            batch_x = batch_x.to(device)
            batch_y = batch_y.to(device)

            # Forward pass
            optimizer.zero_grad()
            outputs = model(batch_x)
            loss = criterion(outputs, batch_y)

            # Backward pass
            loss.backward()
            optimizer.step()

            train_loss += loss.item()
            train_batches += 1

        train_loss /= train_batches

        # Validation phase
        model.eval()
        val_loss = 0.0
        val_batches = 0

        with torch.no_grad():
            for batch_x, batch_y in val_loader:
                batch_x = batch_x.to(device)
                batch_y = batch_y.to(device)

                outputs = model(batch_x)
                loss = criterion(outputs, batch_y)
                val_loss += loss.item()
                val_batches += 1

        val_loss /= val_batches

        history['train_loss'].append(train_loss)
        history['val_loss'].append(val_loss)

        # Early stopping
        is_best = val_loss < best_val_loss
        if is_best:
            best_val_loss = val_loss
            patience_counter = 0
            torch.save(model.state_dict(), 'best_lstm_model.pth')
            status = '✓'
        else:
            patience_counter += 1
            status = f'({patience_counter})'

        # Update progress bar with metrics
        pbar.set_postfix({
            'train': f'{train_loss:.4f}',
            'val': f'{val_loss:.4f}',
            'best': f'{best_val_loss:.4f}',
            's': status
        })

        if patience_counter >= patience:
            pbar.write(f'⏹️  Early stopping at epoch {epoch+1}')
            break

    # Load best model
    model.load_state_dict(torch.load('best_lstm_model.pth'))
    print(f'\n✅ Training completed! Best validation loss: {best_val_loss:.6f}')

    return history

Load Training Data¶

We'll use a subset of the experimental data for demonstration purposes. In the full implementation, 12 trials are used for training and 3 for testing.

Memory Optimization Strategies¶

When working with large datasets like Experiment-9, we need to be careful about memory usage. The following strategies are implemented:

1. Sampling Stride¶

Instead of creating windows at every timestep (stride=1), we sample every Nth window (configurable via sampling_stride):

stride=1: ~500,000 windows (very high memory)
stride=2: ~250,000 windows (50% reduction)
stride=4: ~125,000 windows (75% reduction)

This is a valid approach because adjacent windows have 99.9% overlap, so we don't lose much information.

2. File-by-File Processing¶

Load and process one file at a time instead of all files simultaneously, with explicit garbage collection after each file.

3. Reduced Training Set¶

Use 10 trials instead of 20 for training to reduce memory requirements while maintaining diversity.

4. Explicit Memory Management¶

Delete intermediate arrays immediately after use
Call gc.collect() to free memory
Convert to float32 instead of float64 (50% memory reduction)

5. Efficient DataLoader¶

num_workers=0: Avoids multi-process overhead on Apple Silicon
pin_memory=False: Reduces memory usage on MPS devices

These optimizations can reduce memory usage by 70-80% while maintaining model performance!

In [67]:

Copied!





# Load configuration from input_config.json
import json
import gc

print("📋 Loading configuration...")
with open('input_config.json', 'r') as f:
    config = json.load(f)

print("Configuration loaded:")
print(f"  Features: {config['features']}")
print(f"  Targets: {config['targets']}")
print(f"  Window length: {config['window_length']}")
print(f"  Training config: {config['training']}")

# MEMORY OPTIMIZATION: Add sampling stride to reduce dataset size
SAMPLING_STRIDE = config.get('sampling_stride', 2)  # Sample every Nth window (default: 2)
print(f"  Sampling stride: {SAMPLING_STRIDE} (memory optimization)")

# Load training data based on config
print("\n🔍 Searching for training data files...")
train_files = []
for trial_info in config['train_ids']:
    exp_id = trial_info['exp_id']
    trial_id = trial_info['trial_id']
    # Find the CSV file matching this experiment and trial
    exp_dir = f"Experiment-{exp_id}"
    if os.path.exists(exp_dir):
        files = [f for f in os.listdir(exp_dir) if f.endswith('.csv') and f'Trial-{trial_id}_' in f]
        if files:
            train_files.append(os.path.join(exp_dir, files[0]))

print(f"\n📂 Found {len(train_files)} training files:")
for f in train_files:
    print(f"  - {f}")

# Extract features and targets
window_length = config['window_length']
features = config['features']
targets = config['targets']

# MEMORY OPTIMIZATION: Process files one at a time and subsample
print("\n🔄 Loading and preprocessing data (memory-efficient mode)...")
print(f"⚡ Using sampling stride of {SAMPLING_STRIDE} to reduce memory usage")

train_x_list = []
train_y_list = []
total_windows_before = 0
total_windows_after = 0

for i, file_path in enumerate(train_files):
    print(f"  Processing {i+1}/{len(train_files)}: {file_path}")

    # Load single file
    df = csv_to_dataframe(file_path)
    normalized_df = normalize_data([df])[0]

    # Extract features and targets as arrays
    feature_dataset = normalized_df[features].to_numpy()
    target_dataset = normalized_df[targets].to_numpy()

    datapoint = len(normalized_df)
    num_windows = datapoint - window_length
    total_windows_before += num_windows

    # Create sliding windows with STRIDE (memory optimization)
    # Instead of every timestep, sample every Nth window
    sampled_features = []
    sampled_targets = []

    for j in range(0, num_windows, SAMPLING_STRIDE):
        sampled_feature = feature_dataset[j:j+window_length, :]
        sampled_target = target_dataset[j+window_length, :]
        sampled_features.append(sampled_feature)
        sampled_targets.append(sampled_target)

    x_rnn = np.asarray(sampled_features, dtype=np.float32)
    y_rnn = np.asarray(sampled_targets, dtype=np.float32)

    total_windows_after += len(x_rnn)

    train_x_list.append(x_rnn)
    train_y_list.append(y_rnn)

    # Clear memory
    del df, normalized_df, feature_dataset, target_dataset, sampled_features, sampled_targets
    gc.collect()

print(f"\n  📉 Reduced dataset size:")
print(f"     Before: {total_windows_before:,} windows")
print(f"     After:  {total_windows_after:,} windows ({100*total_windows_after/total_windows_before:.1f}%)")

# Concatenate all training samples
print("  Concatenating samples...")
train_x = np.concatenate(train_x_list, axis=0)
train_y = np.concatenate(train_y_list, axis=0)

# Clear intermediate lists
del train_x_list, train_y_list
gc.collect()

# Shuffle training data
print("  Shuffling data...")
shuffler = np.random.permutation(len(train_x))
train_x = train_x[shuffler]
train_y = train_y[shuffler]
del shuffler
gc.collect()

# Split into train and validation using config
validation_split = config['training']['validation_split']
split_idx = int((1 - validation_split) * len(train_x))
train_x_split = train_x[:split_idx]
train_y_split = train_y[:split_idx]
val_x = train_x[split_idx:]
val_y = train_y[split_idx:]

# Delete full arrays to save memory
del train_x, train_y
gc.collect()

# Convert to PyTorch tensors
print("  Converting to PyTorch tensors...")
train_x_tensor = torch.FloatTensor(train_x_split)
train_y_tensor = torch.FloatTensor(train_y_split)
val_x_tensor = torch.FloatTensor(val_x)
val_y_tensor = torch.FloatTensor(val_y)

# Delete numpy arrays after tensor conversion
del train_x_split, train_y_split, val_x, val_y
gc.collect()

# Create data loaders using config
batch_size = config['training']['batch_size']
train_dataset = TensorDataset(train_x_tensor, train_y_tensor)
val_dataset = TensorDataset(val_x_tensor, val_y_tensor)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=False)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0, pin_memory=False)

print(f"\n✅ Data preparation complete!")
print(f"📊 Training data shape: X={train_x_tensor.shape}, Y={train_y_tensor.shape}")
print(f"📊 Validation data shape: X={val_x_tensor.shape}, Y={val_y_tensor.shape}")
print(f"📊 Total training samples: {len(train_dataset):,}")
print(f"📊 Total validation samples: {len(val_dataset):,}")
print(f"📊 Batches per epoch: {len(train_loader)} (train), {len(val_loader)} (val)")
print(f"💾 Estimated memory usage: ~{(train_x_tensor.nelement() * 4 + val_x_tensor.nelement() * 4) / 1e9:.2f} GB")
# Load configuration from input_config.json
import json
import gc

print("📋 Loading configuration...")
with open('input_config.json', 'r') as f:
    config = json.load(f)

print("Configuration loaded:")
print(f"  Features: {config['features']}")
print(f"  Targets: {config['targets']}")
print(f"  Window length: {config['window_length']}")
print(f"  Training config: {config['training']}")

# MEMORY OPTIMIZATION: Add sampling stride to reduce dataset size
SAMPLING_STRIDE = config.get('sampling_stride', 2)  # Sample every Nth window (default: 2)
print(f"  Sampling stride: {SAMPLING_STRIDE} (memory optimization)")

# Load training data based on config
print("\n🔍 Searching for training data files...")
train_files = []
for trial_info in config['train_ids']:
    exp_id = trial_info['exp_id']
    trial_id = trial_info['trial_id']
    # Find the CSV file matching this experiment and trial
    exp_dir = f"Experiment-{exp_id}"
    if os.path.exists(exp_dir):
        files = [f for f in os.listdir(exp_dir) if f.endswith('.csv') and f'Trial-{trial_id}_' in f]
        if files:
            train_files.append(os.path.join(exp_dir, files[0]))

print(f"\n📂 Found {len(train_files)} training files:")
for f in train_files:
    print(f"  - {f}")

# Extract features and targets
window_length = config['window_length']
features = config['features']
targets = config['targets']

# MEMORY OPTIMIZATION: Process files one at a time and subsample
print("\n🔄 Loading and preprocessing data (memory-efficient mode)...")
print(f"⚡ Using sampling stride of {SAMPLING_STRIDE} to reduce memory usage")

train_x_list = []
train_y_list = []
total_windows_before = 0
total_windows_after = 0

for i, file_path in enumerate(train_files):
    print(f"  Processing {i+1}/{len(train_files)}: {file_path}")

    # Load single file
    df = csv_to_dataframe(file_path)
    normalized_df = normalize_data([df])[0]

    # Extract features and targets as arrays
    feature_dataset = normalized_df[features].to_numpy()
    target_dataset = normalized_df[targets].to_numpy()

    datapoint = len(normalized_df)
    num_windows = datapoint - window_length
    total_windows_before += num_windows

    # Create sliding windows with STRIDE (memory optimization)
    # Instead of every timestep, sample every Nth window
    sampled_features = []
    sampled_targets = []

    for j in range(0, num_windows, SAMPLING_STRIDE):
        sampled_feature = feature_dataset[j:j+window_length, :]
        sampled_target = target_dataset[j+window_length, :]
        sampled_features.append(sampled_feature)
        sampled_targets.append(sampled_target)

    x_rnn = np.asarray(sampled_features, dtype=np.float32)
    y_rnn = np.asarray(sampled_targets, dtype=np.float32)

    total_windows_after += len(x_rnn)

    train_x_list.append(x_rnn)
    train_y_list.append(y_rnn)

    # Clear memory
    del df, normalized_df, feature_dataset, target_dataset, sampled_features, sampled_targets
    gc.collect()

print(f"\n  📉 Reduced dataset size:")
print(f"     Before: {total_windows_before:,} windows")
print(f"     After:  {total_windows_after:,} windows ({100*total_windows_after/total_windows_before:.1f}%)")

# Concatenate all training samples
print("  Concatenating samples...")
train_x = np.concatenate(train_x_list, axis=0)
train_y = np.concatenate(train_y_list, axis=0)

# Clear intermediate lists
del train_x_list, train_y_list
gc.collect()

# Shuffle training data
print("  Shuffling data...")
shuffler = np.random.permutation(len(train_x))
train_x = train_x[shuffler]
train_y = train_y[shuffler]
del shuffler
gc.collect()

# Split into train and validation using config
validation_split = config['training']['validation_split']
split_idx = int((1 - validation_split) * len(train_x))
train_x_split = train_x[:split_idx]
train_y_split = train_y[:split_idx]
val_x = train_x[split_idx:]
val_y = train_y[split_idx:]

# Delete full arrays to save memory
del train_x, train_y
gc.collect()

# Convert to PyTorch tensors
print("  Converting to PyTorch tensors...")
train_x_tensor = torch.FloatTensor(train_x_split)
train_y_tensor = torch.FloatTensor(train_y_split)
val_x_tensor = torch.FloatTensor(val_x)
val_y_tensor = torch.FloatTensor(val_y)

# Delete numpy arrays after tensor conversion
del train_x_split, train_y_split, val_x, val_y
gc.collect()

# Create data loaders using config
batch_size = config['training']['batch_size']
train_dataset = TensorDataset(train_x_tensor, train_y_tensor)
val_dataset = TensorDataset(val_x_tensor, val_y_tensor)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=False)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0, pin_memory=False)

print(f"\n✅ Data preparation complete!")
print(f"📊 Training data shape: X={train_x_tensor.shape}, Y={train_y_tensor.shape}")
print(f"📊 Validation data shape: X={val_x_tensor.shape}, Y={val_y_tensor.shape}")
print(f"📊 Total training samples: {len(train_dataset):,}")
print(f"📊 Total validation samples: {len(val_dataset):,}")
print(f"📊 Batches per epoch: {len(train_loader)} (train), {len(val_loader)} (val)")
print(f"💾 Estimated memory usage: ~{(train_x_tensor.nelement() * 4 + val_x_tensor.nelement() * 4) / 1e9:.2f} GB")

📋 Loading configuration...
Configuration loaded:
  Features: ['Time [sec]', 'Dr [%]', 'Shear Stress [kPa]']
  Targets: ['ru']
  Window length: 800
  Training config: {'epochs': 50, 'batch_size': 32, 'learning_rate': 0.0002, 'validation_split': 0.2}
  Sampling stride: 4 (memory optimization)

🔍 Searching for training data files...

📂 Found 14 training files:
  - Experiment-8/Trial-1_130731_yc1.csv
  - Experiment-8/Trial-2_130802_yc1.csv
  - Experiment-8/Trial-4_130805_yc1.csv
  - Experiment-8/Trial-5_130805-02_yc1.csv
  - Experiment-9/Trial-1_121227_yc1.csv
  - Experiment-9/Trial-2_121228_yc1.csv
  - Experiment-9/Trial-3_121231_yc1.csv
  - Experiment-9/Trial-4_130110_yc1.csv
  - Experiment-9/Trial-5_130113_yc1.csv
  - Experiment-9/Trial-6_130115_yc1.csv
  - Experiment-9/Trial-7_130116_yc1.csv
  - Experiment-9/Trial-8_130117_yc1.csv
  - Experiment-9/Trial-9_130118_yc1.csv
  - Experiment-9/Trial-10_130119_yc1.csv

🔄 Loading and preprocessing data (memory-efficient mode)...
⚡ Using sampling stride of 4 to reduce memory usage
  Processing 1/14: Experiment-8/Trial-1_130731_yc1.csv
  Processing 2/14: Experiment-8/Trial-2_130802_yc1.csv
  Processing 3/14: Experiment-8/Trial-4_130805_yc1.csv
  Processing 4/14: Experiment-8/Trial-5_130805-02_yc1.csv
  Processing 5/14: Experiment-9/Trial-1_121227_yc1.csv
  Processing 6/14: Experiment-9/Trial-2_121228_yc1.csv
  Processing 7/14: Experiment-9/Trial-3_121231_yc1.csv
  Processing 8/14: Experiment-9/Trial-4_130110_yc1.csv
  Processing 9/14: Experiment-9/Trial-5_130113_yc1.csv
  Processing 10/14: Experiment-9/Trial-6_130115_yc1.csv
  Processing 11/14: Experiment-9/Trial-7_130116_yc1.csv
  Processing 12/14: Experiment-9/Trial-8_130117_yc1.csv
  Processing 13/14: Experiment-9/Trial-9_130118_yc1.csv
  Processing 14/14: Experiment-9/Trial-10_130119_yc1.csv

  📉 Reduced dataset size:
     Before: 282,513 windows
     After:  70,631 windows (25.0%)
  Concatenating samples...
  Shuffling data...
  Converting to PyTorch tensors...

✅ Data preparation complete!
📊 Training data shape: X=torch.Size([56504, 800, 3]), Y=torch.Size([56504, 1])
📊 Validation data shape: X=torch.Size([14127, 800, 3]), Y=torch.Size([14127, 1])
📊 Total training samples: 56,504
📊 Total validation samples: 14,127
📊 Batches per epoch: 1766 (train), 442 (val)
💾 Estimated memory usage: ~0.68 GB

Train the LSTM Model¶

Note: For this demonstration, we'll use a smaller number of epochs. The full model was trained for 100 epochs on 12 experimental datasets.

In [68]:

Copied!





# Build fresh model
print("Building model...")
lstm_model = LSTMLiquefactionModel(input_size=3, hidden_size=128, num_layers=2, output_size=1)
lstm_model = lstm_model.to(device)
print(f"Model created on {device}")

# Train model using config parameters
print(f"\nStarting training with {len(train_loader)} batches per epoch...")
print(f"Total training samples: {len(train_loader.dataset)}")
print(f"Batch size: {config['training']['batch_size']}")
print(f"Epochs: {config['training']['epochs']}\n")

history = train_model(
    lstm_model,
    train_loader,
    val_loader,
    epochs=config['training']['epochs'],
    learning_rate=config['training']['learning_rate'],
    device=device
)

print("\n✅ Training completed!")
# Build fresh model
print("Building model...")
lstm_model = LSTMLiquefactionModel(input_size=3, hidden_size=128, num_layers=2, output_size=1)
lstm_model = lstm_model.to(device)
print(f"Model created on {device}")

# Train model using config parameters
print(f"\nStarting training with {len(train_loader)} batches per epoch...")
print(f"Total training samples: {len(train_loader.dataset)}")
print(f"Batch size: {config['training']['batch_size']}")
print(f"Epochs: {config['training']['epochs']}\n")

history = train_model(
    lstm_model,
    train_loader,
    val_loader,
    epochs=config['training']['epochs'],
    learning_rate=config['training']['learning_rate'],
    device=device
)

print("\n✅ Training completed!")

Building model...
Model created on cuda

Starting training with 1766 batches per epoch...
Total training samples: 56504
Batch size: 32
Epochs: 50

🔧 Setting up training on cuda...
🏃 Starting training loop...

Training:   0%|          | 0/50 [00:00<?, ?it/s]

✅ Training completed! Best validation loss: 0.008104

✅ Training completed!

Save the Trained Model¶

Let's save the trained model along with its configuration and training metadata:

In [74]:

Copied!





def load_model_from_checkpoint(checkpoint_path, device='cpu'):
    """
    Load a trained LSTM model from a checkpoint file.

    Args:
        checkpoint_path: Path to the checkpoint file (.pth)
        device: Device to load the model onto ('cpu', 'cuda', or 'mps')

    Returns:
        model: Loaded LSTM model
        checkpoint: Full checkpoint dictionary with training history and configs
    """
    print(f"📂 Loading checkpoint from: {checkpoint_path}")

    # Load checkpoint
    checkpoint = torch.load(checkpoint_path, map_location=device)

    # Extract model configuration
    model_config = checkpoint['model_config']

    # Create model with the same architecture
    model = LSTMLiquefactionModel(
        input_size=model_config['input_size'],
        hidden_size=model_config['hidden_size'],
        num_layers=model_config['num_layers'],
        output_size=model_config['output_size']
    )

    # Load the trained weights
    model.load_state_dict(checkpoint['model_state_dict'])
    model = model.to(device)
    model.eval()  # Set to evaluation mode

    print(f"✅ Model loaded successfully!")
    print(f"   Trained on: {checkpoint.get('timestamp', 'Unknown')}")
    print(f"   Final training loss: {checkpoint['final_train_loss']:.6f}")
    print(f"   Final validation loss: {checkpoint['final_val_loss']:.6f}")
    print(f"   Total epochs: {len(checkpoint['history']['train_loss'])}")

    return model, checkpoint


# Example: Load the model (uncomment to use)
# loaded_model, loaded_checkpoint = load_model_from_checkpoint('lstm_liquefaction_checkpoint.pth', device=device)
#
# # You can now use loaded_model for predictions:
# # loaded_model.eval()
# # with torch.no_grad():
# #     predictions = loaded_model(test_tensor)
def load_model_from_checkpoint(checkpoint_path, device='cpu'):
    """
    Load a trained LSTM model from a checkpoint file.

    Args:
        checkpoint_path: Path to the checkpoint file (.pth)
        device: Device to load the model onto ('cpu', 'cuda', or 'mps')

    Returns:
        model: Loaded LSTM model
        checkpoint: Full checkpoint dictionary with training history and configs
    """
    print(f"📂 Loading checkpoint from: {checkpoint_path}")

    # Load checkpoint
    checkpoint = torch.load(checkpoint_path, map_location=device)

    # Extract model configuration
    model_config = checkpoint['model_config']

    # Create model with the same architecture
    model = LSTMLiquefactionModel(
        input_size=model_config['input_size'],
        hidden_size=model_config['hidden_size'],
        num_layers=model_config['num_layers'],
        output_size=model_config['output_size']
    )

    # Load the trained weights
    model.load_state_dict(checkpoint['model_state_dict'])
    model = model.to(device)
    model.eval()  # Set to evaluation mode

    print(f"✅ Model loaded successfully!")
    print(f"   Trained on: {checkpoint.get('timestamp', 'Unknown')}")
    print(f"   Final training loss: {checkpoint['final_train_loss']:.6f}")
    print(f"   Final validation loss: {checkpoint['final_val_loss']:.6f}")
    print(f"   Total epochs: {len(checkpoint['history']['train_loss'])}")

    return model, checkpoint


# Example: Load the model (uncomment to use)
# loaded_model, loaded_checkpoint = load_model_from_checkpoint('lstm_liquefaction_checkpoint.pth', device=device)
#
# # You can now use loaded_model for predictions:
# # loaded_model.eval()
# # with torch.no_grad():
# #     predictions = loaded_model(test_tensor)

Load a Saved Model¶

To load a previously trained model, use the following code:

In [75]:

Copied!





import json
from datetime import datetime

# Save the complete model checkpoint
checkpoint = {
    'model_state_dict': lstm_model.state_dict(),
    'model_config': {
        'input_size': 3,
        'hidden_size': 128,
        'num_layers': 2,
        'output_size': 1
    },
    'training_config': config['training'],
    'data_config': {
        'features': config['features'],
        'targets': config['targets'],
        'window_length': config['window_length']
    },
    'history': history,
    'final_train_loss': history['train_loss'][-1],
    'final_val_loss': history['val_loss'][-1],
    'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
}

# Save checkpoint
checkpoint_path = 'lstm_liquefaction_checkpoint.pth'
torch.save(checkpoint, checkpoint_path)
print(f"💾 Model checkpoint saved to: {checkpoint_path}")

# Also save just the model state dict with a descriptive name
model_path = f"lstm_model_exp{config['train_ids'][0]['exp_id']}.pth"
torch.save(lstm_model.state_dict(), model_path)
print(f"💾 Model weights saved to: {model_path}")

# Save training history as JSON for easy access
history_path = 'training_history.json'
with open(history_path, 'w') as f:
    json.dump({
        'train_loss': [float(x) for x in history['train_loss']],
        'val_loss': [float(x) for x in history['val_loss']],
        'final_train_loss': float(history['train_loss'][-1]),
        'final_val_loss': float(history['val_loss'][-1]),
        'num_epochs': len(history['train_loss']),
        'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    }, f, indent=2)
print(f"📊 Training history saved to: {history_path}")

print("\n✅ All model files saved successfully!")
import json
from datetime import datetime

# Save the complete model checkpoint
checkpoint = {
    'model_state_dict': lstm_model.state_dict(),
    'model_config': {
        'input_size': 3,
        'hidden_size': 128,
        'num_layers': 2,
        'output_size': 1
    },
    'training_config': config['training'],
    'data_config': {
        'features': config['features'],
        'targets': config['targets'],
        'window_length': config['window_length']
    },
    'history': history,
    'final_train_loss': history['train_loss'][-1],
    'final_val_loss': history['val_loss'][-1],
    'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
}

# Save checkpoint
checkpoint_path = 'lstm_liquefaction_checkpoint.pth'
torch.save(checkpoint, checkpoint_path)
print(f"💾 Model checkpoint saved to: {checkpoint_path}")

# Also save just the model state dict with a descriptive name
model_path = f"lstm_model_exp{config['train_ids'][0]['exp_id']}.pth"
torch.save(lstm_model.state_dict(), model_path)
print(f"💾 Model weights saved to: {model_path}")

# Save training history as JSON for easy access
history_path = 'training_history.json'
with open(history_path, 'w') as f:
    json.dump({
        'train_loss': [float(x) for x in history['train_loss']],
        'val_loss': [float(x) for x in history['val_loss']],
        'final_train_loss': float(history['train_loss'][-1]),
        'final_val_loss': float(history['val_loss'][-1]),
        'num_epochs': len(history['train_loss']),
        'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    }, f, indent=2)
print(f"📊 Training history saved to: {history_path}")

print("\n✅ All model files saved successfully!")

💾 Model checkpoint saved to: lstm_liquefaction_checkpoint.pth
💾 Model weights saved to: lstm_model_exp8.pth
📊 Training history saved to: training_history.json

✅ All model files saved successfully!

Visualize Training History¶

In [76]:

Copied!





# Plot training history
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(history['train_loss'], 'b-', label='Training loss', linewidth=2)
plt.plot(history['val_loss'], 'r-', label='Validation loss', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss (MSE)', fontsize=12)
plt.title('Training and Validation Loss', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(history['train_loss'], 'b-', label='Training loss', linewidth=2)
plt.plot(history['val_loss'], 'r-', label='Validation loss', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss (MSE)', fontsize=12)
plt.title('Training and Validation Loss (Log Scale)', fontsize=14)
plt.yscale('log')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Final training loss: {history['train_loss'][-1]:.6f}")
print(f"Final validation loss: {history['val_loss'][-1]:.6f}")
print(f"\nValidation loss is {'lower' if history['val_loss'][-1] < history['train_loss'][-1] else 'higher'} than training loss")
print("✅ No overfitting detected!" if history['val_loss'][-1] < 1.2 * history['train_loss'][-1] else "⚠️  Possible overfitting")
# Plot training history
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(history['train_loss'], 'b-', label='Training loss', linewidth=2)
plt.plot(history['val_loss'], 'r-', label='Validation loss', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss (MSE)', fontsize=12)
plt.title('Training and Validation Loss', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(history['train_loss'], 'b-', label='Training loss', linewidth=2)
plt.plot(history['val_loss'], 'r-', label='Validation loss', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss (MSE)', fontsize=12)
plt.title('Training and Validation Loss (Log Scale)', fontsize=14)
plt.yscale('log')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Final training loss: {history['train_loss'][-1]:.6f}")
print(f"Final validation loss: {history['val_loss'][-1]:.6f}")
print(f"\nValidation loss is {'lower' if history['val_loss'][-1] < history['train_loss'][-1] else 'higher'} than training loss")
print("✅ No overfitting detected!" if history['val_loss'][-1] < 1.2 * history['train_loss'][-1] else "⚠️  Possible overfitting")

Final training loss: 0.011228
Final validation loss: 0.008104

Validation loss is lower than training loss
✅ No overfitting detected!

Make Predictions on Test Data¶

Let's test the model on data it hasn't seen during training:

In [87]:

Copied!





# Load test data based on config
test_files = []
for trial_info in config['test_ids']:
    exp_id = trial_info['exp_id']
    trial_id = trial_info['trial_id']
    # Find the CSV file matching this experiment and trial
    exp_dir = f"Experiment-{exp_id}"
    if os.path.exists(exp_dir):
        files = [f for f in os.listdir(exp_dir) if f.endswith('.csv') and f'Trial-{trial_id}_' in f]
        if files:
            test_files.append(os.path.join(exp_dir, files[0]))

print(f"📂 Found {len(test_files)} test files:")
for f in test_files:
    print(f"  - {f}")

# Use first test file for visualization
test_file = test_files[3] if test_files else None
if test_file is None:
    print("⚠️ No test files found!")
else:
    test_df = csv_to_dataframe(test_file)
    normalized_test_df = normalize_data([test_df])

    # Create LSTM inputs for test data
    rnn_data_test = create_lstm_inputs(
        normalized_test_df, features, targets, window_length
    )

    test_x = rnn_data_test["x_rnns"][0].astype(np.float32)
    test_y = rnn_data_test["y_rnns"][0].astype(np.float32)

    print(f"\n📊 Test data shape:")
    print(f"  X (features): {test_x.shape}")
    print(f"  Y (targets): {test_y.shape}")

    # Make predictions in BATCHES to avoid OOM
    print("\n🔮 Making predictions in batches to avoid memory issues...")
    lstm_model.eval()

    # Use batch size for inference
    inference_batch_size = 256  # Adjust this based on available memory
    num_samples = len(test_x)
    test_predictions = []

    with torch.no_grad():
        for i in tqdm(range(0, num_samples, inference_batch_size), desc="Predicting"):
            # Get batch
            batch_end = min(i + inference_batch_size, num_samples)
            batch_x = test_x[i:batch_end]

            # Move to device and predict
            batch_x_tensor = torch.FloatTensor(batch_x).to(device)
            batch_predictions = lstm_model(batch_x_tensor).cpu().numpy()
            test_predictions.append(batch_predictions)

    # Concatenate all predictions
    test_predictions = np.concatenate(test_predictions, axis=0)
    print(f"✅ Predictions complete! Shape: {test_predictions.shape}")

    # Compute metrics
    mse = mean_squared_error(test_y, test_predictions)
    mae = mean_absolute_error(test_y, test_predictions)
    r2 = r2_score(test_y, test_predictions)

    print(f"\n📈 Test Set Performance:")
    print(f"  MSE: {mse:.6f}")
    print(f"  MAE: {mae:.6f}")
    print(f"  R²: {r2:.4f}")
# Load test data based on config
test_files = []
for trial_info in config['test_ids']:
    exp_id = trial_info['exp_id']
    trial_id = trial_info['trial_id']
    # Find the CSV file matching this experiment and trial
    exp_dir = f"Experiment-{exp_id}"
    if os.path.exists(exp_dir):
        files = [f for f in os.listdir(exp_dir) if f.endswith('.csv') and f'Trial-{trial_id}_' in f]
        if files:
            test_files.append(os.path.join(exp_dir, files[0]))

print(f"📂 Found {len(test_files)} test files:")
for f in test_files:
    print(f"  - {f}")

# Use first test file for visualization
test_file = test_files[3] if test_files else None
if test_file is None:
    print("⚠️ No test files found!")
else:
    test_df = csv_to_dataframe(test_file)
    normalized_test_df = normalize_data([test_df])

    # Create LSTM inputs for test data
    rnn_data_test = create_lstm_inputs(
        normalized_test_df, features, targets, window_length
    )

    test_x = rnn_data_test["x_rnns"][0].astype(np.float32)
    test_y = rnn_data_test["y_rnns"][0].astype(np.float32)

    print(f"\n📊 Test data shape:")
    print(f"  X (features): {test_x.shape}")
    print(f"  Y (targets): {test_y.shape}")

    # Make predictions in BATCHES to avoid OOM
    print("\n🔮 Making predictions in batches to avoid memory issues...")
    lstm_model.eval()

    # Use batch size for inference
    inference_batch_size = 256  # Adjust this based on available memory
    num_samples = len(test_x)
    test_predictions = []

    with torch.no_grad():
        for i in tqdm(range(0, num_samples, inference_batch_size), desc="Predicting"):
            # Get batch
            batch_end = min(i + inference_batch_size, num_samples)
            batch_x = test_x[i:batch_end]

            # Move to device and predict
            batch_x_tensor = torch.FloatTensor(batch_x).to(device)
            batch_predictions = lstm_model(batch_x_tensor).cpu().numpy()
            test_predictions.append(batch_predictions)

    # Concatenate all predictions
    test_predictions = np.concatenate(test_predictions, axis=0)
    print(f"✅ Predictions complete! Shape: {test_predictions.shape}")

    # Compute metrics
    mse = mean_squared_error(test_y, test_predictions)
    mae = mean_absolute_error(test_y, test_predictions)
    r2 = r2_score(test_y, test_predictions)

    print(f"\n📈 Test Set Performance:")
    print(f"  MSE: {mse:.6f}")
    print(f"  MAE: {mae:.6f}")
    print(f"  R²: {r2:.4f}")

📂 Found 5 test files:
  - Experiment-9/Trial-21_130306.csv
  - Experiment-9/Trial-22_130307.csv
  - Experiment-9/Trial-23_130307-02.csv
  - Experiment-8/Trial-3_130802-02_yc1.csv
  - Experiment-8/Trial-7_130807_yc1.csv

📊 Test data shape:
  X (features): (8701, 800, 3)
  Y (targets): (8701, 1)

🔮 Making predictions in batches to avoid memory issues...

Predicting:   0%|          | 0/34 [00:00<?, ?it/s]

✅ Predictions complete! Shape: (8701, 1)

📈 Test Set Performance:
  MSE: 0.006696
  MAE: 0.052343
  R²: 0.9287

Visualize LSTM Predictions¶

In [92]:

Copied!





# Get the full time series for plotting
feature_dataset = rnn_data_test["feature_datasets"][0]
target_dataset = rnn_data_test["target_datasets"][0]

# Extract stress for context (skip first window_length points)
stress_input = feature_dataset[window_length:, 2]

# Plot results
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Plot 1: Input shear stress
ax = axes[0]
time_points = np.arange(len(stress_input))
ax.plot(time_points, stress_input, 'b-', linewidth=1, alpha=0.7)
ax.set_ylabel('Normalized Shear Stress', fontsize=12)
ax.set_title(f'LSTM Prediction on Test Data (Dr = {test_df["Dr [%]"].iloc[0]:.1f}%)', fontsize=14)
ax.set_xlim([0,8500])
ax.grid(True, alpha=0.3)

# Plot 2: Pore pressure prediction vs actual
ax = axes[1]
ax.plot(time_points, test_y, 'k-', linewidth=2, label='Laboratory experiment', alpha=0.8)
ax.plot(time_points, test_predictions, 'r-', linewidth=2, label='LSTM prediction', alpha=0.8)
ax.axhline(y=1.0, color='gray', linestyle='--', linewidth=1, alpha=0.5)
ax.set_xlabel('Data Point', fontsize=12)
ax.set_ylabel('Pore Pressure Ratio $r_u$', fontsize=12)
ax.set_ylim([0, 1.2])
ax.set_xlim([0,8500])
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\n📊 Model Performance Summary:")
print(f"  - The LSTM model captures the pore pressure response")
print(f"  - MSE: {mse:.6f}")
print(f"  - R² score: {r2:.4f}")
# Get the full time series for plotting
feature_dataset = rnn_data_test["feature_datasets"][0]
target_dataset = rnn_data_test["target_datasets"][0]

# Extract stress for context (skip first window_length points)
stress_input = feature_dataset[window_length:, 2]

# Plot results
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Plot 1: Input shear stress
ax = axes[0]
time_points = np.arange(len(stress_input))
ax.plot(time_points, stress_input, 'b-', linewidth=1, alpha=0.7)
ax.set_ylabel('Normalized Shear Stress', fontsize=12)
ax.set_title(f'LSTM Prediction on Test Data (Dr = {test_df["Dr [%]"].iloc[0]:.1f}%)', fontsize=14)
ax.set_xlim([0,8500])
ax.grid(True, alpha=0.3)

# Plot 2: Pore pressure prediction vs actual
ax = axes[1]
ax.plot(time_points, test_y, 'k-', linewidth=2, label='Laboratory experiment', alpha=0.8)
ax.plot(time_points, test_predictions, 'r-', linewidth=2, label='LSTM prediction', alpha=0.8)
ax.axhline(y=1.0, color='gray', linestyle='--', linewidth=1, alpha=0.5)
ax.set_xlabel('Data Point', fontsize=12)
ax.set_ylabel('Pore Pressure Ratio $r_u$', fontsize=12)
ax.set_ylim([0, 1.2])
ax.set_xlim([0,8500])
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\n📊 Model Performance Summary:")
print(f"  - The LSTM model captures the pore pressure response")
print(f"  - MSE: {mse:.6f}")
print(f"  - R² score: {r2:.4f}")

📊 Model Performance Summary:
  - The LSTM model captures the pore pressure response
  - MSE: 0.006696
  - R² score: 0.9287

Summary: LSTM for Liquefaction Prediction¶

What We Learned¶

Problem: Pore pressure response in liquefaction is history-dependent (shielding effect)
Limitation: Traditional constitutive models fail to capture shielding
Solution: LSTM networks naturally learn stress history dependencies
Architecture:
- Time-window sampling (800 steps ≈ 2 cycles)
- Two LSTM layers (128 units each)
- Dense layers for post-processing
- PyTorch implementation
Training:
- 12 cyclic simple shear tests
- MSE loss with Adam optimizer
- 100 epochs with early stopping
Results:
- ✅ Captures shielding effect
- ✅ Accurate time to liquefaction
- ✅ Models density effects
- ✅ Reproduces pore pressure fluctuations

Key Takeaway¶

LSTMs are powerful for history-dependent phenomena in geotechnical engineering!

The gates mechanism naturally captures:

Forget gate: Discards low-stress periods
Input gate: Remembers peak stresses
Cell state: Maintains long-term stress history

References¶

Choi, Y., and Kumar, K. (2023). "A Machine Learning Approach to Predicting Pore Pressure Response in Liquefiable Sands under Cyclic Loading." Geo-Congress 2023.
Hochreiter, S., and Schmidhuber, J. (1997). "Long Short-Term Memory." Neural Computation, 9(8), 1735-1780.
Kwan, W. S., Sideras, S. S., Kramer, S. L., and El Mohtar, C. (2017). "Experimental Database of Cyclic Simple Shear Tests under Transient Loadings." Earthquake Spectra, 33(3), 1219-1239.
Sideras, S. S. (2019). Evolutionary Intensity Measures for More Accurate and Informative Evaluation of Liquefaction Triggering. Ph.D. dissertation, University of Washington, WA.