A team of researchers has developed an innovative AI system that could enhance weather forecasting by providing meteorologists with explainable predictions. The system, which focuses on short-term rainfall forecasts, aims to make AI-powered weather predictions more transparent and trustworthy for operational use.
The research, conducted by scientists from the Korea Advanced Institute of Science and Technology (KAIST) and other institutions, addresses a key challenge in adopting AI for critical applications like weather forecasting - the "black box" nature of many AI models that makes their decision-making process opaque to users.
"Weather prediction has always been crucial for society, influencing everything from agriculture to public safety," said lead researcher Soyeon Kim. "With climate change increasing weather volatility, accurate forecasts are more important than ever. Our goal was to create an AI system that not only makes accurate predictions but also explains its reasoning in ways meaningful to meteorologists."
The team's approach centered on developing an "explainable AI" (XAI) interface that provides three key types of information alongside its rainfall predictions:
Model performance breakdowns for different rainfall scenarios
Reasoning behind specific predictions
Confidence levels for forecasts
"We wanted forecasters to be able to understand the AI's strengths and weaknesses, see why it was making certain predictions, and gauge how much to trust each forecast," Kim explained. "This level of transparency is critical for operational use in high-stakes domains like weather forecasting."
The AI Model: Predicting Short-Term Rainfall
At the core of the system is an artificial neural network called UNet2, developed by South Korea's National Institute of Meteorological Sciences. UNet2 analyzes radar data to predict rainfall intensity over the next 1-6 hours.
The model takes in sequences of radar images along with location and time information. It then classifies each location into one of three categories: no rain (0-1 mm/hr), light rain (1-10 mm/hr), or heavy rain (over 10 mm/hr).
In tests, UNet2 showed comparable or superior performance to other leading short-term rainfall prediction models, particularly for forecasts 1-2 hours ahead. For example, when predicting light rainfall (1-10 mm/hr) one hour in advance, UNet2 achieved an F1 score (a measure of accuracy) of 0.824, slightly edging out Google's MetNet model at 0.822.
The model showed even more significant improvements for heavy rainfall predictions. For rainfall over 10 mm/hr one hour ahead, UNet2 scored 0.604 compared to MetNet's 0.480.
"Accurate prediction of heavy rainfall events is particularly important for public safety," noted Kim. "The improved performance for these scenarios could be valuable for issuing timely warnings."
Explaining Model Performance
The first component of the team's XAI interface provides a breakdown of the model's performance for different types of rainfall events. This allows forecasters to understand the AI's strengths and weaknesses across various weather scenarios.
To accomplish this, the researchers developed a separate neural network to classify input data into one of five rainfall types common in South Korea: monsoon front (southern region), monsoon front (central region), isolated thunderstorm, extratropical cyclone (east coast), and extratropical cyclone (inland).
Once an input is classified, the interface displays the model's historical performance metrics for that specific rainfall type. This includes measures like the F1 score, Critical Success Index, and bias.
"Forecasters often consider different factors when dealing with various types of rainfall events," Kim said. "By providing performance breakdowns, we allow them to calibrate their trust in the AI predictions based on the specific weather patterns they're seeing."
The team visualized these performance metrics using a "performance diagram" that combines multiple evaluation indicators into a single, easy-to-interpret chart. This allows quick assessment of how well the model handles different rainfall intensities and lead times for each weather pattern.
Reasoning Behind Predictions
The second key component of the XAI interface aims to explain why the AI is making specific predictions. This uses a technique called "feature attribution" to highlight which parts of the input data had the biggest influence on a given forecast.
After evaluating several methods, the team selected a technique called Integrated Gradients. This approach generates heatmaps showing which areas of the input radar images were most important for the model's prediction.
"We wanted forecasters to be able to see what the AI was 'looking at' when making its predictions," explained co-author Junho Choi. "This allows them to assess whether the model is focusing on meteorologically relevant factors."
The researchers worked with domain experts to analyze whether the generated explanations aligned with known meteorological principles. In several case studies of extreme precipitation events, they found the AI's focus areas often corresponded with important atmospheric features identified by meteorologists.
For instance, in one case involving an extratropical cyclone, the feature attribution highlighted areas along South Korea's Taebaek Mountain range. This aligned with experts' understanding that moist air from surrounding seas was likely causing convergence and precipitation as it rose over the mountains.
"Finding this kind of alignment between the AI's reasoning and domain knowledge helps build trust in the system," Choi noted. "It suggests the model is picking up on genuinely relevant atmospheric dynamics rather than spurious correlations."
The analysis also revealed some limitations of the current approach. Because the model relies solely on radar data, the explanations are limited to showing horizontal movement of precipitation systems. The researchers suggest future versions could incorporate additional data types to provide more comprehensive explanations.
Confidence Calibration
The third component of the XAI interface provides calibrated confidence scores for the model's predictions. This aims to give forecasters a realistic sense of how much they can trust each forecast.
Raw output from neural networks often suffers from overconfidence, showing high certainty even for incorrect predictions. To address this, the team applied a technique called "temperature scaling" to adjust the model's output probabilities.
"Properly calibrated confidence scores are crucial for operational use," said co-author Yeji Choi. "Forecasters need to know when the model is genuinely certain versus when it's making an educated guess."
The researchers evaluated the calibration using a metric called Expected Calibration Error (ECE). They found that their approach significantly improved calibration across all forecast lead times, with the biggest improvements seen for longer-range predictions.
For example, for 6-hour forecasts, the ECE improved from 0.320 to 0.003 after calibration. This indicates that the reported confidence scores much more closely match the model's actual accuracy at this lead time.
Designing a User-Centered Interface
With the core XAI components in place, the team focused on designing an interface that would present the information in an intuitive, user-friendly manner. They worked closely with forecasters throughout the process to refine the design.
"It was crucial to strike a balance between providing comprehensive explanations and avoiding information overload," said Kim. "Forecasters often need to make rapid decisions, so we needed an interface that quickly conveyed key insights."
The resulting interface allows users to toggle between different explanation types and drill down into details as needed. It also incorporates additional contextual information like satellite imagery and numerical weather prediction model outputs to help forecasters corroborate the AI's predictions.
User Feedback and Future Directions
To evaluate the system, the team conducted a small-scale user study with four professional forecasters from the Korea Meteorological Administration. The participants interacted with three prototype interfaces: one showing only the AI's predictions, one with basic explanations, and one with refined explanations and additional contextual information.
The results showed that adding explanations increased forecasters' trust in the AI system. The performance breakdown by rainfall type and confidence calibration explanations were particularly well-received.
However, forecasters found some aspects of the explanations, particularly the feature attribution heatmaps, difficult to interpret quickly. They also emphasized that for operational use, the system would need to integrate seamlessly with existing forecasting workflows and tools.
"The feedback highlights both the potential and challenges of bringing explainable AI into operational meteorology," Kim reflected. "There's clearly value in these explanations, but we need to keep refining how we present them to maximize their practical utility."
The researchers outline several directions for future work based on the study results:
Incorporating multi-modal data sources to provide more comprehensive explanations of atmospheric processes
Developing metrics to quantify the complexity of explanations and optimize for both faithfulness and interpretability
Creating more interactive explanation capabilities that allow forecasters to query the AI system in real-time
Conducting larger-scale evaluations in operational settings to assess the system's impact on forecast quality and forecaster workflows
"This study is really just a first step towards making AI weather forecasts more transparent and trustworthy," Kim said. "There's still much work to be done to bridge the gap between AI capabilities and the needs of operational meteorology."
Broader Implications for Explainable AI
While focused on weather forecasting, the researchers believe their user-centered approach to XAI could inform similar efforts in other high-stakes domains where AI is increasingly being applied, such as healthcare, finance, and autonomous vehicles.
"In any field where AI is making critical decisions, it's vital that human experts can understand and trust those decisions," said Jaesik Choi, the study's senior author. "Our work demonstrates a framework for deeply involving domain experts in the design of explanations, rather than just having AI researchers decide what explanations to provide."
The study also highlights the importance of going beyond technical measures of explanation quality to assess the real-world utility of XAI systems. "It's not enough for explanations to be mathematically sound - they need to be actionable for the intended users," Choi noted.
As AI systems become more prevalent in decision-making processes across industries, the ability to explain their reasoning in human-understandable terms will likely become increasingly important. This research provides valuable insights into the challenges and opportunities of creating such systems in collaboration with domain experts.
The full details of the study, titled "Explainable AI-Based Interface System for Weather Forecasting Model," are available in a preprint on arXiv. The researchers plan to refine the system based on their initial findings and conduct more extensive real-world testing in collaboration with meteorological agencies.