Enhancing Multi-Output AIS Prediction with Indirect Sea Level Referencing: Feature Augmentation for Improved Accuracy in Korean Coastal Waters
Article information
Abstract
This study introduced a novel methodology for enhancing Automatic Identification System (AIS) trajectory forecasting in regions characterized by significant tidal variations through feature augmentation, specifically indirect incorporation of sea level data via the nearest tidal gauge. Traditional AIS prediction models predominantly utilize features such as latitude, longitude, speed over ground (SOG), and course over ground (COG) for time series forecasting. However, these models often overlook the influence of tidal fluctuations, which can significantly impact prediction accuracy in areas with pronounced tidal changes. To address this limitation, we proposed a feature augmentation approach by incorporating the Haversine distance to the nearest tidal gauge and the real-time sea level at that gauge as additional features. Direct access to sea level data at a vessel’s precise location presents practical challenges, making this indirect method an efficient and effective solution. Through comprehensive analyses across multiple deep learning models and test scenarios, our results demonstrate that this augmented feature set can substantially improve AIS forecasting performance in regions with significant tidal variation surrounding the Korean Peninsula.
1. Introduction
The accelerating rise in sea levels, driven by factors such as global warming and the potential collapse of the Atlantic Meridional Overturning Circulation (AMOC) (van Westen et al., 2024), poses significant challenges for maritime navigation and vessel route prediction. The dynamic nature of these changes is reshaping coastal topographies and tidal patterns, making accurate predictions of vessel movements near coastlines more difficult. Traditionally, predictive models for vessel movement prediction have relied on historical AIS data, focusing on vessel trajectories, speed, and heading to forecast future movements (Li et al., 2023). However, these models often overlook real-time environmental conditions such as tidal fluctuations, which can lead to inaccuracies, especially in regions with significant tidal changes.
Incorporating real-time sea level data into vessel route prediction models is critical for improving accuracy, particularly in coastal regions where tidal variations are prominent. This is especially relevant for maritime environments like those around the Korean Peninsula, where both long-term sea level trends and short-term tidal effects create volatile conditions (Tebaldi et al., 2021). However, direct access to real-time sea level measurements at precise vessel locations is often challenging due to the lack of high-resolution data. Organizations like the International Hydrographic Organization (IHO) provide data, such as S-104 water level information, for surface navigation (Stewart et al., 2019). However, access to and availability of this data is limited. Besides the data provided is mostly not high-resolution sea level information at the exact location of the vessel, but rather sea level data segmented into grids over the region, offering only low-resolution information for each grid, which poses a significant limitation.
This study introduces an indirect sea level referencing method, which integrates real-time sea level data from the nearest tidal gauge and incorporates the Haversine distance to these gauges. By doing so, the prediction accuracy of vessel routes in coastal areas with fluctuating tides is enhanced. The waters surrounding the Korean Peninsula, with their distinct maritime environments, offer an ideal context for this investigation. Major ports such as Busan and Incheon, situated in areas with varying tidal ranges, underscore the necessity for more precise and reliable predictive models (Kim et al., 2024).
2. Literature Review
AIS-based vessel route prediction has traditionally relied on historical trajectory data. Earlier studies applied statistical and machine learning models, such as the Extended Kalman Filter (EKF) (Perera et al., 2010), k-means clustering with Artificial Neural Networks (Gan et al., 2016), and k-Nearest Neighbors (kNN) classification (Duca et al., 2017) to improve prediction accuracy.
More recently, deep learning models like Recurrent Neural Networks (RNNs) (Capobianco et al., 2021) have shown greater effectiveness in open-sea environments. Further advancements, including Long Short-Term Memory (LSTM) networks (Tang et al., 2019), LSTM encoder-decoder architectures (Nguyen et al., 2018), Bidirectional LSTM (Bi-LSTM) (Gao et al., 2018), Bidirectional Gated Recurrent Units (Bi-GRU) (Wang et al., 2020), and GRU encoder-decoder frameworks (You et al., 2020), have improved predictive performance by capturing long-range dependencies in time series data. Recently, Transformer models have been adopted to enhance AIS predictions by utilizing novel data representations and loss functions (Nguyen et al., 2024).
Despite advances in modeling, many studies still rely on conventional AIS features and overlook dynamic environmental factors, such as sea level variations, particularly in coastal areas (Tebaldi et al., 2021). Real-time environmental data, especially sea level measurements, can enhance AIS-based predictions, but availability and integration remain challenging. High-resolution sea level data require extensive coastal infrastructure, such as tidal gauges (Stewart et al., 2019), and integrating these data into models is further complicated by the variability of coastal environments where tidal ranges vary significantly (Kantha et al., 1996).
Some recent research has integrated environmental factors like sea level and tidal variations into deep learning models for improved predictions. Kim et al. (2024) developed a U-Net model for surface sediment classification in tidal flats using UAV images, highlighting the importance of tidal channel density in improving accuracy. Shahabi and Tahvildari (2024) introduced a CNN-LSTM model for predicting coastal water levels in the Chesapeake Bay, combining wind data and astronomical tides. Their model achieved accuracy comparable to hydrodynamic models, demonstrating the effectiveness of deep learning in managing coastal water levels and flood risks.
Additionally, Guo et al. (2024) investigated the use of environmental factors like wind and tides to optimize the navigation of unmanned surface vehicles (USVs). Kantha et al. (1996) examined the shallow water tides around Korea, emphasizing the significant impact of tidal variations, particularly in the Yellow Sea, where the M4 tide can reach amplitudes of up to 10cm near the coast. Their work highlighted the importance of high-resolution models for accurate tidal predictions in coastal regions and the challenges of forecasting sea levels in such dynamic environments.
3. Definitions and Problem Statements
3.1 Definitions
3.1.1 Ship Trajectory
A ship trajectory, denoted as Traj, is a sequence of 120 time-stamped points, which represents a fixed-length window covering both past and future vessel movements at 10-minute intervals. This window size of 120 was chosen to capture 5 hours of historical data (60 points) and 5 hours of future predictions (60 points), allowing the model to make accurate forecasts based on the previous trajectory:
Each point
In more advanced models, additional features are incorporated. The Sea Level feature, for example, introduces tidal data collected from the nearest gauge, reflecting the water level at the vessel’s location. This feature is particularly important for understanding how fluctuations in sea level can affect vessel movement. Another key feature is the Distance, which represents the Haversine distance between the vessel’s position and the nearest tidal gauge. This provides spatial context to the sea level data by measuring how far the vessel is from the location where tidal information is collected. The inclusion of these features in more complex models allows for more accurate predictions by accounting for environmental factors that influence vessel trajectories.
The Haversine distance d between the vessel’s position (LA1,LO1) and the tidal gauge’s position (LA2,LO2) is calculated as follows:
Here, ρ1 and ρ2 represents the latitudes of the vessel and the gauge, and ∆ρ and ∆λ denote the differences in latitude and longitude, respectively.
3.1.2 Ship Trajectory Dataset
The dataset of ship trajectories is defined as:
where j=1, …, n denotes the number of trajectories.
3.2 Problem Generation and Statements
The geographical complexity of the Korean Peninsula, bordered by three distinct seas with varying tidal ranges, presents unique challenges for accurate ship trajectory prediction, as shown in Fig. 1. The dots in the map of Fig. 1 directly reflect the data represented in Fig. 2, where the sea level variations for each tidal gauge are plotted.
The East Sea shows relatively stable sea level variations within 100cm, the South Sea experiences more significant fluctuations, ranging from 100 to 500cm, and the West Sea has even greater variations, from 500 to 900cm. These three distinct regions, separated based on sea level differences, were segmented using a one-dimensional clustering analysis, which identified significant changes in sea level behavior, particularly around the 100cm and 500cm marks, and these values were used as the boundaries for segmentation in Fig. 2.
Among these regions, the South Sea poses the most complex challenges due to its strong currents, numerous islands, and substantial tidal changes. Additionally, as ships approach coastal areas, especially near ports, the increased traffic density further complicates trajectory predictions.
4. Experiments and Discussion
The prediction tasks in this study were conducted using six time-series deep learning models: RNN, LSTM, GRU, Bi-LSTM, Bi-GRU, and Transformer. The dataset comprised AIS data collected over a six-month period from the waters surrounding the Korean Peninsula, along with concurrent sea level data obtained from tidal gauges positioned across various coastal regions nationwide. The performance of the models was evaluated on the test data using six different metrics. Fig. 3 illustrates the overall flow of this study in a single diagram.
4.1 Dataset and Experimental Settings
4.1.1 Dataset
The dataset employed in this study consists of two primary components. The first component is the Automatic Identification System (AIS) data, which encompasses AIS records for cargo vessels over a six-month period, from June 1, 2022, to November 30, 2022. The defined Region of Interest (ROI) is a rectangular area bounded by the coordinates (31.0°N to 39.0°N) in latitude and (124.0°E to 132.0°E) in longitude.
Prior to preprocessing, the dataset comprised 9,546 unique vessel identifiers. AIS messages were aggregated by vessel ID to form continuous trajectories. To account for instances of anchorage or data loss, any trajectory with a data gap exceeding one hour was segmented into separate trajectories. The mean trajectory duration was 454.2 minutes, and the dataset initially contained 216,517 trajectories. Fig. 4 presents a sample of the indirect sea level referencing approach by illustrating the AIS trajectory in conjunction with data from the nearest tidal gauge and the distance to the gauge.
Additionally, the sea level data influenced by tidal forces was sourced from the public Smart Tidal Forecasting system provided by the Korea Hydrographic and Oceanographic Agency. This data, collected from June 1, 2022, to November 30, 2022, aligns precisely with the AIS trajectory data, capturing sea level measurements at one-minute intervals from tidal gauges across the country. Fig. 2 illustrates the classification of tidal gauges based on sea level variation into three categories, represented by different colors. A total of 129 tidal gauges were utilized during the study period, with 22 gauges showing a sea level variation of less than 100cm (blue), 68 gauges with variations up to 500cm (green), and 39 gauges with variations exceeding 500cm (red). This classification naturally reflects the geographical distinctions between the East Sea, South Sea, and West Sea, corresponding to the varying tidal conditions observed in these regions.
4.1.2 Data Pre-processing
The AIS data underwent several preprocessing steps to ensure data quality and consistency for training. Given that AIS data is collected at irregular time intervals across various vessels, the data was first resampled to a uniform 1-minute interval. This resampling process standardized the data and ensured consistency across all trajectories, a crucial step for accurate time-series prediction. To further refine the dataset, several filtering criteria were applied.
Trajectories from vessels with a gross tonnage below 30 tons were removed, as these do not represent typical cargo ship movements and could introduce noise into the predictions.
Any trajectory where the Speed Over Ground (SOG) was less than 3 knots or greater than 30 knots was excluded to eliminate unrealistic vessel speeds that might represent sensor errors or anomalies.
For each trajectory, the total distance traveled between the first and last point was calculated using the Haversine formula. If this distance was less than 35 nautical miles, the trajectory was excluded, as such movements likely represent stationary ships or those performing local maneuvers like docking or circling.
Any trajectory shorter than 600 minutes was excluded to maintain consistency in the length of the trajectories used for training. For trajectories longer than 600 minutes, a sliding window approach was applied. The trajectory was split into 600-minute segments, with a 60-minute step size to minimize data loss and capture multiple subsections of longer journeys.
After the initial filtering, the data was downsampled to a 10-minute interval to balance model performance and computational efficiency. The new resampling interval maintained essential temporal trends while reducing the dataset size for smoother training.
To capture vessel movement directionality, the changes in latitude (ΔLA) and longitude (ΔLO) between consecutive points were calculated using the backward filling method, ensuring the data remained consistent without introducing gaps.
As a result, the numeric features used for training, including latitude (LA), longitude (LO), speed over ground (SOG), course over ground (COG), change in latitude (ΔLA), change in longitude (ΔLO), sea level, and distance to the nearest tidal gauge, were all normalized using min-max scaling. This ensured that all feature values were within the 0-1 range, facilitating efficient and stable model training. The scaler was fit exclusively on the training data to prevent any data leakage. Then to focus on relevant test cases, trajectories within a 20-nautical-mile radius of the nearest tidal gauge were selected. These trajectories were categorized into nine distinct test cases based on sea level variations, with intervals of 100cm ranging from 0 to 900cm, as illustrated in Fig. 5.
4.1.3 Training, Validation, and Hyperparameter Settings
The training and validation process adhered to a Holdout Validation strategy, where the dataset was divided into 70% for training, 15% for validation, and 15% for testing. Specifically, trajectories starting between June 1, 2022, and September 30, 2022, were allocated for training (39,202 trajectories), October 1, 2022, to October 31, 2022, was used as the validation period (9,013 trajectories), and November 1, 2022, to November 30, 2022, comprised the test set (8,641 trajectories).
During validation, each model’s performance was monitered to identify the best model configuration by evaluating validation loss. The best-performing model on the validation set was then saved and used for subsequent testing on the unseen test set. This Holdout Validation approach was chosen over cross-validation due to the ample size and density of the dataset, making holdout validation sufficient for capturing the model’s generalizability. By reserving a stable test set for final evaluation, the methodology ensures that the performance metrics accurately reflect the model’s ability to generalize to new data without the potential variability introduced by cross-validation.
The six deep learning models evaluated in this study — RNN, LSTM, GRU, Bi-LSTM, Bi-GRU, and Transformer — were implemented using the PyTorch framework. During training, the Adam optimizer was employed for parameter updates, known for its ability to handle sparse gradients and non-stationary objectives. Adam adapts the learning rate for each parameter individually by estimating the first and second moments of the gradients, thereby improving optimization efficiency, particularly in complex objective functions.
Early stopping was used as a key mechanism to prevent overfitting. Training was terminated if the validation loss did not improve for 10 consecutive epochs, ensuring that the models did not continue training once they reached a performance plateau on unseen data. Furthermore, dropout was applied at a rate of 0.1 during training to regularize the models by randomly deactivating neurons, helping prevent over-reliance on specific features and promoting generalization.
The hyperparameters employed across all models are summarized in Table 3, while the experimental environment is described in Table 4. These configurations were consistently applied to all models, ensuring a fair comparison of performance across different architectures.
4.2 Evaluation Metrics
This study employs six evaluation metrics, as utilized in Li et al. (2023), to rigorously assess the trajectory prediction performance of six prediction methods from both global and local perspectives. These metrics — Mean Squared Error (MSE), Mean Absolute Error (MAE), Symmetric Mean Absolute Percentage Error (SMAPE), Final Displacement Error (FDE), Final Displacement (FD), and Average Euclidean Distance (AED) — offer a comprehensive evaluation framework that captures different dimensions of model accuracy and error distribution.
MSE penalizes larger errors more heavily by squaring the differences between predicted and actual values, making it particularly sensitive to outliers. MAE, on the other hand, calculates the average of absolute errors, providing a balanced measure of accuracy by treating all errors equally, regardless of size. SMAPE offers a scale-independent view of prediction accuracy by using a percentage-based error metric, making it particularly useful for comparing trajectories of different magnitudes. FDE focuses on the accuracy of the final point in the trajectory by measuring the Euclidean distance between the predicted and actual final positions. FD assesses the overall accuracy of the entire trajectory by capturing the total displacement error from start to finish. Finally, AED provides a holistic measure of accuracy across the full trajectory by calculating the average Euclidean distance between the predicted and actual positions at each time step. The specific formulas are defined as follows:
where
4.3 Experiment Results
4.3.1 Overall Results
This section presents the experimental results of various deep learning models in a multi-output prediction framework, predicting the next 5 hours of ship trajectories based on the previous 5 hours of data. Figures 6 through 14 depict the prediction outcomes of the selected models with their optimal feature configurations: RNN and Bi-GRU with the Basic feature set, LSTM and GRU with the Distance feature set, and Bi-LSTM and Transformer with the Extended feature set. Across all test cases, the Transformer model consistently demonstrated superior performance, closely aligning its predictions with the actual routes.
Figures 15 through 20 and Tables 5 through 10 provide the detailed performance metrics obtained through this evaluation process, offering a quantitative perspective on each model’s accuracy across different sea level conditions. The models were trained on four months of AIS data from June to September, validated on October data, and evaluated using the November data as the test set. The test set was divided into specific test cases to measure model performance under varying conditions, with metrics obtained for each case based on the predictions generated by the trained models.
In particular, Fig 8 illustrates a scenario where the Transformer model’s predictions were almost indistinguishable from the actual route, while other models exhibited significant deviations. This demonstrates the effectiveness of combining the multi-output prediction framework with feature augmentation, particularly with sea level data and Haversine distance to tidal gauges. The Extended feature configuration of the Transformer was notably robust, excelling in complex environments like the South Sea, which is characterized by strong tidal variations and intricate coastlines.
These results underscore the importance of incorporating additional environmental features, such as sea level, into trajectory prediction tasks. The Transformer model's superior performance across multiple tidal regions highlights the advantage of using augmented features, especially in maritime environments with complex tidal dynamics.
4.3.2 Model-Specific Analysis
The detailed analysis of each model focused on six key metrics. Below, the performance of each model is discussed with an emphasis on the impact of feature augmentation and how the models responded to different sea level ranges.
The RNN model (Figure 15, Table 5) struggled to fully leverage the Extended features. In the 0-100cm and 600-700cm sea level ranges, the Basic feature set outperformed the Extended set in all metrics, indicating that adding sea level and distance features did not improve short-term predictions. Similarly, in the 100-200cm and 700-800cm ranges, the Basic model performed better across most metrics. In the 800-900cm range, the Extended features led to improved performance in all metrics except FD, suggesting that while additional features helped in highly dynamic environments, challenges remained in predicting the final destination.
The LSTM model (Figure 16, Table 6) demonstrated moderate improvements over RNN, especially in low sea level variation environments. In the 0-100cm range, Basic features outperformed Extended features in all metrics except FDE. As sea level variations increased, the Extended features generally led to better performance across most metrics, particularly in the 100-500cm range. However, similar to RNN, FD remained lower with Basic features in the ranges of 600cm and above, indicating a difficulty in maintaining final destination accuracy with Extended features in extreme conditions.
The GRU model (Figure 17, Table 7) showed more substantial gains with the Extended features, particularly in sea level ranges above 500cm. Metrics like RMSE, MAE, and SMAPE consistently showed better performance with the Extended set. In the 800-900cm range, however, the FD metric increased for the Extended model, indicating a trade-off: while the overall trajectory accuracy improved, FD suffered. This suggests that additional features helped in short- and medium-term predictions, but introduced complexities in extreme tidal environments.
For both the Bi-LSTM (Figure 18, Table 8) and Bi-GRU (Figure 19, Table 9) models, lower overall error rates were observed compared to their unidirectional counterparts. Despite improved performance in trajectory prediction, the additional features may have overcomplicated the prediction task in extreme tidal environments. Interestingly, FD values for these models did not degrade as significantly, suggesting that while the overall trajectory predictions struggled, the final displacement remained stable.
The Transformer model (Figure 20, Table 10) outperformed all other models across all sea level ranges, with significantly lower error rates in all of the metrics. However, in the 800-900cm range, the Transformer model exhibited increased errors in FDE, FD, and AED with the Extended features. This suggests that in that specific range while the Transformer was highly effective at short- to medium-term predictions, it struggled with predicting the final destination in extreme tidal environments. The increased errors in FDE and FD indicate a gap between trajectory accuracy and final displacement accuracy, especially in dynamic environments.
Recent research in time series forecasting supports these findings. Tang and Matteson (2021) demonstrated that Transformer architectures, particularly when combined with probabilistic modeling, excel in handling long-range dependencies and non-deterministic dynamics in time series data. Their probabilistic Transformer model provided better long-term forecasts while also capturing uncertainty, which aligns with the performance observed in this study. However, the challenges in predicting final destination accuracy in extreme conditions suggest that further refinement is needed.
4.3.3 Sea Level Effect
A clear trend emerges across all models: as the range of sea level differences increases, prediction errors rise. This highlights the strong influence of sea level variations on ship trajectory prediction, particularly in areas with significant tidal differences.
In the East Sea (0-100cm), models generally exhibited lower error rates. The relatively stable sea level and straightforward coastline contributed to easier predictions, limiting the impact of feature augmentation. For instance, the Transformer model showed only marginal improvements with the Extended features in this region.
In contrast, the South Sea (100-500cm) presented a more challenging environment, characterized by significant tidal changes and a complex coastline. In this region, the Transformer model’s Extended configuration significantly outperformed the Basic model, demonstrating the value of incorporating sea level data and distance to tidal gauges. These features helped capture the dynamic tidal variations, leading to better performance.
In the West Sea (500-900cm), which experiences extreme tidal variations, the Transformer model continued to outperform the Basic model, though the improvement rate diminished compared to the South Sea. In the 800-900cm range, the Extended model even underperformed in some cases, particularly with the Transformer and Bi-GRU models. One possible explanation is that the 10-hour prediction window extends beyond regions where sea level impacts vessel movement, reducing the effectiveness of the additional features.
Overall, the results demonstrate that sea level variations significantly impact ship trajectory prediction. The Extended model’s inclusion of sea level data and Haversine distance to the nearest tidal gauge proves particularly effective in regions with moderate tidal variations, but faces limitations in extreme environments like the West Sea.
5. Conclusion
This study conducted experiments leveraging the prominent sea level differences observed along the coasts of Korea’s East, West, and South Seas to demonstrate the significance of an indirect sea level referencing approach. Recognizing the challenge of obtaining real-time sea level data at a vessel’s current location, this method references sea level data from the nearest tidal gauge, augmented by the Haversine distance to the gauge as an additional feature. This indirect referencing method consistently outperformed basic feature models across most of the deep learning models, showing improvements in key metrics and test cases. Notably, the Transformer model exhibited superior performance, particularly benefiting from feature augmentation, which effectively captured long-range dependencies in time series forecasting.
While the simulations covered a broad range of tidal conditions, the study primarily focused on sea level as an environmental factor. To enhance future research, the inclusion of additional environmental variables, such as wind and wave data, could offer further insights and refine model predictions in more dynamic coastal conditions. Incorporating these factors would provide a more comprehensive understanding of the various influences on ship trajectory prediction, particularly in rapidly changing maritime environments.
Additionally, this study adopted a multi-output approach, where the past 5 hours of trajectory data were used to predict the next 5 hours. As a potential direction for future work, research could explore a multi-step prediction method, enabling the model to begin predictions based on shorter trajectories and dynamically generate forecasts over more frequent intervals. This would enhance the flexibility and applicability of the models in real-world navigation, particularly in time-sensitive scenarios.