METHODS ARTICLE | SHORT Koch & Schneider 2022: GEUS Bulletin 49. 8292. https://doi.org/10.34194/geusb.v49.8292 1 of 7 Long short-term memory networks enhance rainfall-runoff modelling at the national scale of Denmark Julian Koch* , Raphael Schneider Department of Hydrology, Geological Survey of Denmark and Greenland, Copenhagen, Denmark Abstract This study explores the application of long short-term memory (LSTM) net- works to simulate runoff at the national scale of Denmark using data from 301 catchments. This is the first LSTM application on Danish data. The results were benchmarked against the Danish national water resources model (DK-model), a physically based hydrological model. The median Kling-Gupta Efficiency (KGE), a common metric to assess performance of runoff predictions (optimum of 1), increased from 0.7 (DK-model) to 0.8 (LSTM) when trained against all catchments. Overall, the LSTM outperformed the DK-model in 80% of catchments. Despite the compelling KGE evaluation, the water balance closure was modelled less accurately by the LSTM. The applicability of LSTM networks for modelling ungauged catchments was assessed via a spatial split-sample experiment. A 20% spatial hold-out showed poorer performance of the LSTM with respect to the DK model. However, after pre-training, that is, weight initialisation obtained from training against simulated data from the DK-model, the performance of the LSTM was effectively improved. This formed a convincing argument supporting the knowledge-guided machine learning (ML) paradigm to integrate physically based models and ML to train robust models that generalise well. Introduction The runoff at a given point along a river network can be defined as the outflow generated in the upstream contributing area. Accurate modelling of runoff has been a prime research theme for several decades (Wagener et al. 2004). A multitude of numerical modelling tools, from parsimonious conceptual rainfall-runoff models to complex fully distributed physically based models (PBMs), have been developed. In recent years, machine learning (ML) models, in particular, long short-term memory (LSTM) networks, have proved use- ful for rainfall-runoff modelling. Since the first application by Kratzert et  al. (2018), LSTMs quickly gained popularity and have typically outperformed tra- ditional hydrological models under data-rich settings (Mai et al. 2021) and in ungauged catchments (Kratzert et al. 2019a). The knowledge-guided ML paradigm aims to increase robustness and generalisability by integrating scientific knowledge into ML models (Nearing et al. 2020; Reichstein et al. 2019). This can be achieved by building physical constraints, such as the first-principle law of mass conservation (Hoedt et al. 2021), into a ML model or using a PBM to augment training data (Jia et al. *Correspondence: juko@geus.dk Received: 17 Aug 2021 Accepted: 07 Dec 2021 Published: 13 Jan 2022 Keywords: rainfall-runoff modelling, long-short term memory networks, deep learning, knowledge-guided machine learning, pre-training-finetuning Abbreviations: CAMELS: catchment attributes and meteorology for large-sample studies Fbal: flow balance KGE: Kling-Gupta Efficiency LSTM: long short-term memory ML: machine learning MSE: mean squared error PBM: physically based model GEUS Bulletin is an open access, peer- reviewed journal published by the Geological Survey of Denmark and Greenland (GEUS). This article is distributed under a CC-BY 4.0 licence, permitting free redistribution, and reproduction for any purpose, even commercial, provided proper citation of the original work. Author(s) retain copyright. Edited by: Hyojin Kim (GEUS, Denmark) Reviewed by: Two anonymous reviewers. Funding: None provided Competing interests: See page 6 Additional files: See page 6 https://doi.org/10.34194/geusb.v49.8292 https://orcid.org/0000-0002-7732-3436 https://orcid.org/0000-0001-9628-0809 mailto:juko@geus.dk Koch & Schneider 2022: GEUS Bulletin 49. 8292. https://doi.org/10.34194/geusb.v49.8292 2 of 7 w w w . g e u s b u l l e t i n . o r g 2021). In this context, the method of pre-training by weight initialisation using PBM simulation data appears to be very promising, as a pre-trained LSTM attempts to emulate a PBM. The rapid advancement of ML models for runoff prediction was facilitated by the availability of multiple large-scale runoff data sets containing a long timeseries of observed runoff, dynamic meteorological forcing and static catchment attributes, referred to as catchment attributes and meteorology for large-sample studies (CAMELS) data sets (e.g. Addor et al. 2017). In this article, we highlight the value of Danish hydrological big data for the advancement of ML-based runoff modelling. The Danish case offers a data-rich setting with over 300 sta- tions and high-quality auxiliary data. Moreover, there exists a national water resources model (the DK-model), an advanced hydrological PBM that integrates ground- water and surface water processes (Højberg et al. 2013; Stisen et al. 2019). The DK-model is a perfect benchmark for ML model development and provides simulated runoff, which is valuable for augmentation, as well as auxiliary hydrological information, such as groundwater conditions. In this study, we aim to (1) highlight the value of Dan- ish hydrological big data for advancing ML research at an international level, (2) implement a state-of-the-art LSTM to model runoff at the national scale of Denmark and (3) test a knowledge-guided LSTM based upon pre-training against simulated runoff obtained from a PBM. Methods Data As in existing CAMELS data sets, we curated a data set comprising observed runoff as well as dynamic and static attributes for 301 Danish catchments (Fig.  1). The catchments vary in size between 10 km2 and 2574 km2 with an average of 133 km2. The dynamic variables cover a period of 21 years (1990–2011) at daily timesteps and comprise observed runoff, sim- ulated runoff (DK-model), air temperature, potential evapotranspiration and precipitation (Fig. 2). The three meteorological variables were derived from gridded data provided by the Danish Meteorological Institute and represent daily-averaged conditions for the entire catchment (Scharling 1999a, 1999b). A com- plete timeseries of 21 years of daily observed runoff were available for 51% of the catchments, with 77% of the catchments having at least an 80% coverage. The runoff was normalised by the catchment size to mm/ day to give equal weight to the catchments during training, independent of their size. In total, 17 static catchment attributes were compiled. Eleven of which were calculated as catchment averages: precipitation, potential evapotranspiration, air temperature, slope, topographic wetness index, clay fraction, annual, summer and winter-simulated water table depth (DK-model), exceedance probability of a simulated water-table depth less than 1 m (DK-model) and the thickness of the surficial clay layer. Five land-use classes were expressed as percentages: forest, wet- land, lake, agriculture and urban. Finally, the catch- ment area was included as well. All data are available at https://doi.org/10.22008/FK2/YCQXTR. Long short-term memory The LSTM network architecture is a special type of recur- rent neural network, designed to store and regulate information over time, which makes LSTMs well suited to learn long-term dependencies and memory effects (Hochreiter & Schmidhuber 1997). The LSTM is described in full elsewhere (Kratzert et al. 2018; Shen 2018). Similar to traditional hydrological models, the LSTM processes input data time step after time step. Runoff on a specific day is simulated based on the timeseries of length n of the preceding n days of meteorological data. Kratzert at  al. (2019b) developed the entity-aware LSTM, which is an adaptation of the standard LSTM capable of learn- ing catchment similarities based on the static attributes, which are treated in a separate embedding layer. In this study, we applied the proposed entity-aware LSTM, referred to simply as LSTM hereafter. We used the Fig. 1 Map of Denmark showing the 301 catchments used in this study. 60 catchments were randomly sampled for the spa- tio-temporal split-sample experiment. ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 60 800 20 40 km Q stations Spatial split-sample Catchments https://doi.org/10.34194/geusb.v49.8292 http://www.geusbulletin.org https://doi.org/10.22008/FK2/YCQXTR Koch & Schneider 2022: GEUS Bulletin 49. 8292. https://doi.org/10.34194/geusb.v49.8292 3 of 7 w w w . g e u s b u l l e t i n . o r g NeuralHydrology codebase (github.com/neuralhydrol- ogy/neuralhydrology/) to train and evaluate the models used in this study. Experimental setup Hyperparameters and general settings As the purpose of this study was to initially explore the LSTM applicability, hyperparameters were not opti- mised. Following Kratzert et al. (2019b), we assigned the following hyperparameter values: a learning rate of 0.001, a batch size of 256, an input length of 270 days, 64 hidden cell states, a dropout rate of 0.4 and 20 training epochs. All models were trained with five different seeds and the average of the five models was used for the final LSTM prediction. The model setup files are available at https://doi.org/10.22008/ FK2/YCQXTR. Split-sample experiments We conducted both a temporal split-sample and a spa- tio-temporal split-sample experiment to test the capa- bilities of a LSTM for Danish runoff data. The temporal split-sample experiment used data from all 301 stations for training. The timeseries were split into a training period of 11 years (2000–2011) and a test period of 10 years (1990–1999; Fig. 2). The two periods correspond to the calibration and test period of the DK-model, which permitted a fair comparison between the two mod- els. The spatio-temporal split-sample experiment was divided into the same training and test periods. Fur- thermore, 20% of the stations were randomly selected and removed from the training data set and retained for model evaluation of the spatio-temporal split-sam- ple experiment (i.e. a 20% spatial hold-out; Fig. 1). This experiment offers a more robust evaluation, as it tests the transferability of 80% of stations to the remain- ing 20%. This allows us to assess the ability to predict ungauged basins. Pre-training The concept of pre-training can be used to initialise the weights of a LSTM using alternative runoff data before fine-tuning the LSTM using the actual runoff data from the catchments of interest. Runoff data for pre-training can potentially be obtained from observational data sets from a larger or different geographical region or from a PBM. In this study, we followed the latter and employed simulation data from the DK-model to pre- train. In this way, the LSTM aimed to emulate the pro- cess descriptions of the PBM before being fine-tuned against observed runoff. The training epochs were set to 15 for pre-training and 5 for fine-tuning. Simulated runoff at all 301 stations for the training period of 11 years (2000–2011) was used for pre-training, and it was applied to both split-sample experiments. Evaluation metrics For training the LSTM network, the mean squared error (MSE) between the observed and simulated runoff was selected as the loss function. Two alternative metrics were calculated for the model evaluation, namely the Kling-Gupta Efficiency (KGE) and the averaged flow bal- ance (Fbal). KGE is a three-component metric that con- siders the correlation, the standard deviation ratio and the bias between the observed and simulated runoff (Gupta et al. 2009). Fbal quantifies the water balance closure between the observed and simulated runoff rel- ative to the observed flow (Henriksen et al. 2003). Nega- tive Fbal scores indicate an overestimation of the model with respect to the observed runoff. The optimal values for KGE and Fbal are 1 and 0, respectively. Results and discussions The cumulative density functions for KGE and Fbal are presented in Figure 3. The LSTM was benchmarked against the DK-model (PBM), and the effect of pre-train- ing was also investigated. Superior performance could be attributed to the LSTM, with and without pre-training, Fig. 2 Dynamic input data for a single catchment used to train the LSTM. a: Precipitation. b: Potential evapotranspiration. c: Air temperature. d: Observed runoff (obs) and simulated runoff (PBM) were used as training data. The training period and test period are shown in a. b: Potential evapotranspiration c: Air temperature d: Runoff a: Precipitation Test Training PBM obs m 3 / s m m /d ay m m /d ay 0 10 20 19 92 19 94 19 96 19 98 20 00 20 02 20 04 20 06 20 08 20 10 19 90 40 ° C 20 20 0 0 0 2 4 https://doi.org/10.34194/geusb.v49.8292 http://www.geusbulletin.org http://github.com/neuralhydrology/neuralhydrology/ http://github.com/neuralhydrology/neuralhydrology/ https://doi.org/10.22008/FK2/YCQXTR https://doi.org/10.22008/FK2/YCQXTR Koch & Schneider 2022: GEUS Bulletin 49. 8292. https://doi.org/10.34194/geusb.v49.8292 4 of 7 w w w . g e u s b u l l e t i n . o r g with respect to KGE for the temporal split-sample exper- iment. The median KGE was 0.8 for both LSTM config- urations and 0.7 for the PBM. The conclusion was less clear for the water balance closure (Fbal); here, the PBM showed normally distributed under- and over-estimates with a median close to zero. However, the LSTMs were skewed towards negative values, that is, overestimation of runoff, with a median of –0.08. Overestimated runoff was predominately evident during the low-flow summer periods. Using alternative loss functions instead of the MSE in the LSTM training may alleviate this problem. The spatio-temporal split-sample experiment revealed that the LSTM did not generalise well to ungauged basins. The median KGE was 0.69 and comparable with the PBM (KGE = 0.73), despite poor KGE scores for the lowest 20% of the cumulative density function. The same was evident for Fbal, where the lowest 20% performed poorly with respect to the PBM. However, pre-train- ing using PBM data resulted in better performance for ungauged basins, making them comparable with the PBM. This emphasised the merit of pre-training. For the spatio-temporal split-sample experiment, where infor- mation was evidently missing in the training data set, pre-training against PBM data helped to increase perfor- mance. However, the performance did not change for the temporal split-sample experiment, where the obser- vations provided enough information. Considering KGE for the temporal split-sample experiment, the LSTM outperformed the PBM in 80% of catchments. This fell to 44% for the spatio-temporal experiment but could rise to 54% through pre-training. Considering the absolute Fbal, 68% of the catchments were simulated more precisely by the LSTM than by the PBM for the temporal split-sample experiment. For the spatio-temporal split-sample experiment, this could be raised slightly from 44% to 49% through pre-training. The simulation results for two selected catchments for the 10-year test period of the temporal and spa- tio-temporal split-sample experiments are presented in Figure  4. In the first catchment (260080), the perfor- mance between the LSTMs and the PBM was very com- parable with a KGE score of 0.8 (PBM) and 0.78 (LSTM) for the temporal split-sample experiment. The performance dropped to 0.66 in the spatio-temporal split-sample experiment but increased to 0.8 through pre-training. The second catchment (420022) showed a very poor performance for the spatio-temporal split-sample experiment (KGE = –0.12). However, KGE improved to 0.78 through pre-training and thus became compara- ble with the PBM (KGE = 0.75). In other words, the spa- tio-temporal split-sample experiment for catchment 260 080 could be simulated accurately without pre-training, because the LSTM could learn the runoff behaviour of that catchment using data from similar neighbouring or upstream catchments. However, the runoff behaviour of catchment 420022 could not be learned without data from the same catchment. Nevertheless, pre-training using PBM data helped to increase the performance of the LSTM. Figure 4 presents results for a large (260080, 323 km2) and a small (420022, 44  km2) catchment. Fig. 3 Cumulative density functions for KGE and Fbal in the test period for runoff simulated by the DK-model (PBM), the LSTM model and the pre-trained LSTM model (prtrn LSTM). The temporal split-sample experiment is depicted in the left panels and the spatio-temporal split-sample experiment in the right panels. The optimal value of Fbal is highlighted with a dashed horizontal line. The LSTM predictions are based on the mean of 5 seeds, indicated here with transparent coloured lines. Spatio-temporal split-sampleTemporal split-sample Probability Probability LSTM PBM F ba l K G E prtrn LSTM –0.5 0.0 0.5 –0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 https://doi.org/10.34194/geusb.v49.8292 http://www.geusbulletin.org Koch & Schneider 2022: GEUS Bulletin 49. 8292. https://doi.org/10.34194/geusb.v49.8292 5 of 7 w w w . g e u s b u l l e t i n . o r g Smaller catchments generally perform less well due to the stronger imprint of anthropogenic activities (drain- age and abstraction) and an increased uncertainty of precipitation data for smaller catchments. The superior performance of LSTMs over concep- tual rainfall-runoff models or hydrological PBMs was demonstrated for temporal split-sample experiments by Kratzert et al. (2018), Gauch et al. (2021), Mai et al. (2021) and others; however, conclusions of the spatial transfer- ability to ungauged basins are disputed. Kratzert et  al. (2019a) reported a superior performance of LSTM for a small spatial hold-out (8%), whereas Mai et al. (2021) found a worse performance for a more systematic spa- tial hold-out. Loss function plots are presented in Supplementary file S1 to elucidate the training of the applied modelling experiments in more detail. The data generally support the chosen hyperparameter values and number of training epochs. To our knowledge, this is the first study that demon- strates the merits of pre-training against PBM sim- ulation data for runoff modelling in the context of knowledge-guided ML. In a related study, pre-training using PBM data was found to be beneficial for the mod- elling of lake-water temperature (Read et al. 2019). For rainfall-runoff modelling, pre-training has so far been found to be suitable for transferring trained LSTMs from one geographical region to another (Ma et al. 2021). We have shown that pre-training using PBM data offers great potential to initialise the LSTM with diverse runoff behaviour. Here, we constrained only the pre-training to the same catchments and time; however, in theory, PBM simulations for different climate change scenarios or a larger geographical domain could inform the LSTM with diverse runoff behaviour not seen in the observed runoff data. Most of the published studies on LSTM runoff mod- elling are of catchments with a low anthropogenic Fig. 4 Two example catchments showing the observed (obs) and simulated runoff for the test period. a: Catchment 260080, 323 km2. b: Catchment 420022, 44 km2. Simulated data comprise the DK-model (PBM), the LSTM-based model and the pre-trained LSTM-based model (prtrn LSTM). The runoff predictions of the LSTM-based models are given for the temporal split-sample (ts) and spatio-temporal split-sample (sts) experiments. a: Runoff 260080 b: Runoff 420022 m 3 / s m 3 / s 15 10 20 25 5 0 1.5 1.0 2.0 0.5 0.0 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 LSTM_ts LSTM_sts prtrn LSTM_ts prtrn LSTM_sts PBM obs https://doi.org/10.34194/geusb.v49.8292 http://www.geusbulletin.org Koch & Schneider 2022: GEUS Bulletin 49. 8292. https://doi.org/10.34194/geusb.v49.8292 6 of 7 w w w . g e u s b u l l e t i n . o r g impact; however, recent efforts to model highly man- aged catchments have documented promising results as well (Ouyang et al. 2021). The 301 Danish catchments selected in this study are, to a large degree, affected by groundwater abstraction and drainage, and the effect of the degree of anthropogenic impact on model per- formance and transferability should be investigated in future work. Conclusions We draw the following main conclusions from the initial application of LSTM networks for rainfall runoff model- ling at the national scale of Denmark: Danish hydrological big data have the potential for conducting ML research at an international level. The DK-model serves as a valuable benchmark as well as a source for augmented training data and input data in the form of static catchment attributes. An LSTM can outperform a state-of-the-art hydro- logical model; however, accuracy decreases for ungauged catchments. This can be alleviated by pre-training against physically based simulated run- off, providing crucial information to the LSTM where needed. Future research studies should (1) advance knowl- edge-guided ML to use hydrological knowledge provided by the DK-model optimally; (2) test alternative LSTM architectures, hyperparameters and loss functions; (3) study the effect of anthropogenic impact (drainage and groundwater abstraction) on the LSTM; (4) investigate ways of interpreting LSTM models to gain new insights into the runoff process in Denmark; (5) apply a broad range of hydrological signatures in the evaluation of LSTMs; and (6) produce a CAMELS data set for Denmark to provide high-quality hydrological and meteorological data. Acknowledgements The authors acknowledge the developer team behind the NeuralHy- drology codebase for making LSTM modelling tools so accessible. Fur- thermore, GEUS colleagues H.J. Henriksen and S. Stisen are thanked for providing valuable feedback to this manuscript. Two anonymous reviewers are thanked for providing valuable comments to this manuscript. Author contributions JK: code development, writing original draft and visualisation. RS: data  preparation. Both authors have  conceptualised the study and  design, read, edited and agreed to the published version of the manuscript. Competing interests The authors declare no competing interests. Additional files All data and model setups are available at: https://doi.org/10.22008/ FK2/YCQXTR. An additional supplementary file is available at: https:// doi.org/10.22008/FK2/WCF76I. References Addor, N., Newman, A.J., Mizukami, N. & Clark, M.P. 2017: The CAMELS data set: Catchment attributes and meteorology for large-sample studies. Hydrology and Earth System Sciences 21, 5293–5313. https:// doi.org/10.5194/hess-21-5293-2017 Gauch, M., Mai, J. & Lin, J. 2021: The proper care and feeding of CAM- ELS: How limited training data affects streamflow prediction. Environ- mental Modelling and Software 135, 104926. https://doi.org/10.1016/j. envsoft.2020.104926 Gupta, H.V, Kling, H., Yilmaz, K.K. & Martinez, G.F. 2009: Decomposition of the mean squared error and NSE performance criteria: Implica- tions for improving hydrological modelling. Journal of Hydrology 377(1–2), 80–91. https://doi.org/10.1016/j.jhydrol.2009.08.003 Henriksen, H.J., Troldborg, L., Nyegaard, P., Sonnenborg, T.O., Refsgaard, J.C. & Madsen, B. 2003: Methodology for construction, calibration and validation of a national hydrological model for Denmark. Journal of Hydrology 280, 52–71. https://doi.org/10.1016/S0022-1694(03)00186-0 Hochreiter, S. & Schmidhuber, J. 1997: Long Short-Term Memory. Neural Computation 9, 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 Hoedt, P.-J., Kratzert, F., Klotz, D., Halmich, C., Holzleitner, M., Nearing, G., Hochreiter, S. & Klambauer, G. 2021: MC-LSTM: Mass-Conserving LSTM. arXiv preprint arXiv:2101.05186 (2021). Højberg, A.L., Troldborg, L., Stisen, S., Christensen, B.B.S. & Henriksen, H.J. 2013: Stakeholder driven update and improvement of a national water resources model. Environmental Modelling and Software 40, 202–213. https://doi.org/10.1016/j.envsoft.2012.09.010 Jia, X. et al. 2021: Physics-guided recurrent graph model for predict- ing flow and temperature in river networks. In: Demeniconi, C. & Davidson, I. (eds): Proceedings of the 2021 SIAM International Confer- ence on Data Mining (SDM), Virtual conference, 612–620. https://doi. org/10.1137/1.9781611976700.69 Kratzert, F., Klotz, D., Brenner, C., Schulz, K. & Herrnegger, M. 2018: Rainfall-runoff modelling using Long Short-Term Memory (LSTM) net- works. Hydrology and Earth System Sciences 22, 6005–6022. https:// doi.org/10.5194/hess-22-6005-2018 Kratzert, F., Klotz, D., Herrnegger, M., Sampson, A.K., Hochreiter, S. & Nearing, G.S. 2019a: Toward improved predictions in ungauged basins: Exploiting the power of machine learning. Water Resources Research 55, 11344–11354. https://doi.org/10.1029/2019WR026065 Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S. & Nearing, G. 2019b: Towards learning universal, regional, and local hydrologi- cal behaviors via machine learning applied to large-sample datasets. Hydrology and Earth System Sciences 23, 5089–5110. https://doi. org/10.5194/hess-23-5089-2019 Ma, K. et al. 2021: Transferring hydrologic data across continents – Leveraging data-rich regions to improve hydrologic prediction in data-sparse regions. Water Resources Research 57, e2020WR028600. https://doi.org/10.1029/2020WR028600 Mai, J. et al. 2021: Great lakes runoff intercomparison project phase 3: Lake Erie (GRIP-E). Journal of Hydrologic Engineering 26, 1–19. https:// doi.org/10.1061/(asce)he.1943-5584.0002097 Nearing, G.S., Kratzert, F., Sampson, A.K., Pelissier, C.S., Klotz, D., Frame, J.M., Prieto, C. & Gupta, H.V. 2020: What role does hydrological sci- ence play in the age of machine learning? Water Resources Research 57, e2020WR028091. https://doi.org/10.1029/2020wr028091 Ouyang, W., Lawson, K., Feng, D., Ye, L., Zhang, C. & Shen, C. 2021: Conti- nental-scale streamflow modeling of basins with reservoirs: Towards a coherent deep-learning-based strategy. Journal of Hydrology 599, 126455. https://doi.org/10.1016/j.jhydrol.2021.126455 Read, J.S. et al. 2019: Process-guided deep learning predictions of lake water temperature. Water Resources Research 55, 9173–9190. https://doi.org/10.1029/2019WR024922 Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carval- hais, N. & Prabhat 2019: Deep learning and process understanding for data-driven Earth system science. Nature 566(7743), 195–204. https://doi.org/10.1038/s41586-019-0912-1 Scharling, M. 1999a: Klimagrid Danmark: Nedbør, lufttemperatur og potentiel fordampning 20*20 & 40*40 km. Danish Meteorological Institute Technical Report 99-12, DMI, Copenhagen, DK. https://doi.org/10.34194/geusb.v49.8292 http://www.geusbulletin.org https://doi.org/10.22008/FK2/YCQXTR https://doi.org/10.22008/FK2/YCQXTR https://doi.org/10.22008/FK2/WCF76I https://doi.org/10.22008/FK2/WCF76I https://doi.org/10.5194/hess-21-5293-2017 https://doi.org/10.5194/hess-21-5293-2017 https://doi.org/10.1016/j.envsoft.2020.104926 https://doi.org/10.1016/j.envsoft.2020.104926 https://doi.org/10.1016/j.jhydrol.2009.08.003 https://doi.org/10.1016/S0022-1694(03)00186-0 https://doi.org/10.1162/neco.1997.9.8.1735 https://doi.org/10.1016/j.envsoft.2012.09.010 https://doi.org/10.1137/1.9781611976700.69 https://doi.org/10.1137/1.9781611976700.69 https://doi.org/10.5194/hess-22-6005-2018 https://doi.org/10.5194/hess-22-6005-2018 https://doi.org/10.1029/2019WR026065 https://doi.org/10.5194/hess-23-5089-2019 https://doi.org/10.5194/hess-23-5089-2019 https://doi.org/10.1029/2020WR028600 https://doi.org/10.1061/(asce)he.1943-5584.0002097 https://doi.org/10.1061/(asce)he.1943-5584.0002097 https://doi.org/10.1029/2020wr028091 https://doi.org/10.1016/j.jhydrol.2021.126455 https://doi.org/10.1029/2019WR024922 https://doi.org/10.1038/s41586-019-0912-1 Koch & Schneider 2022: GEUS Bulletin 49. 8292. https://doi.org/10.34194/geusb.v49.8292 7 of 7 w w w . g e u s b u l l e t i n . o r g Scharling, M. 1999b: Klimagrid Danmark: Nedbør 10*10 km (ver. 2). Danish  Meteorological Institute Technical Report 99-15, DMI, Copenhagen, DK. Shen, C. 2018: A trans-disciplinary review of deep learn- ing research and  its relevance for water resources scien- tists. Water Resources Research 54, 8558–8593. https://doi. org/10.1029/2018WR022643 Stisen, S., Ondracek, M., Troldborg, L., Schneider, R.J.M. & van Thil, M.J. 2019: National vandressource model. Modelopstilling og kalibrering af DK-model 2019. Danmarks og Grønlands Geologiske Undersøgelse Rapport 2019/31, GEUS, Copenhagen, DK. Wagener, T., Wheater, H.S. & Gupta, H.V. 2004: Rainfall-runoff modelling in gauged and ungauged catchments, 332 pp. London: Imperial Col- lege Press. https://doi.org/10.1142/p335 https://doi.org/10.34194/geusb.v49.8292 http://www.geusbulletin.org https://doi.org/10.1029/2018WR022643 https://doi.org/10.1029/2018WR022643 https://doi.org/10.1142/p335 Long short-term memory networks enhance rainfall-runoff modelling at the national scale of Denmark Abstract Acknowledgements References Figures Fig. 1 Map of Denmark showing the 301 catchments used in this study. 60 catchments were randomly sam Fig. 2 Dynamic input data for a single catchment used to train the LSTM. a: Precipitation. b: Potent Fig. 3 Cumulative density functions for KGE and Fbal in the test period for runoff simulated by the Fig. 4 Two example catchments showing the observed (obs) and simulated runoff for the test period. a