The errors and uncertainties associated with gap-filling algorithms of water, carbon, and energy fluxes data have always been one of the main challenges of the global network of microclimatological tower sites that use the eddy covariance (EC) technique. To address these concerns and find more efficient gap-filling algorithms, we reviewed eight algorithms to estimate missing values of environmental drivers and nine algorithms for the three major fluxes typically found in EC time series. We then examined the algorithms' performance for different gap-filling scenarios utilising the data from five EC towers during 2013. This research's objectives were (a) to evaluate the impact of the gap lengths on the performance of each algorithm and (b) to compare the performance of traditional and new gap-filling techniques for the EC data, for fluxes, and separately for their corresponding meteorological drivers. The algorithms' performance was evaluated by generating nine gap windows with different lengths, ranging from a day to 365 d. In each scenario, a gap period was chosen randomly, and the data were removed from the dataset accordingly. After running each scenario, a variety of statistical metrics were used to evaluate the algorithms' performance. The algorithms showed different levels of sensitivity to the gap lengths; the Prophet Forecast Model (FBP) revealed the most sensitivity, whilst the performance of artificial neural networks (ANNs), for instance, did not vary as much by changing the gap length. The algorithms' performance generally decreased with increasing the gap length, yet the differences were not significant for windows smaller than 30 d. No significant differences between the algorithms were recognised for the meteorological and environmental drivers. However, the linear algorithms showed slight superiority over those of machine learning (ML), except the random forest (RF) algorithm estimating the ground heat flux (root mean square errors – RMSEs – of 28.91 and 33.92 for RF and classic linear regression – CLR, respectively). However, for the major fluxes, ML algorithms and the MDS showed superiority over the other algorithms. Even though ANNs, random forest (RF), and eXtreme Gradient Boost (XGB) showed comparable performance in gap-filling of the major fluxes, RF provided more consistent results with slightly less bias against the other ML algorithms. The results indicated no single algorithm that outperforms in all situations, but the RF is a potential alternative for the MDS and ANNs as regards flux gap-filling.

To address the global challenges of climatological and ecological changes, environmental scientists and policy makers are demanding data that are continuous in time and space. In addition, there is a need to quantify and reduce uncertainties in such data, including observations of carbon, water, and energy exchanges that are crucial components in national and international flux networks as well as global Earth-observing systems. Satellites partially fill this gap as they provide excellent spatial coverage but have limited temporal resolution and do not measure at a point scale. As such, high-quality long-term site observations of ecosystem processes and fluxes are needed that are continuous in time and space. The global eddy covariance (EC) flux tower network (FLUXNET) consists of its regional counterparts (i.e. AmeriFlux, EUROFLUX, OzFlux) and was established in the late 1990s to address the global demand for such information (Aubinet et al., 1999; Baldocchi et al., 2001; Beringer et al., 2016; Hollinger et al., 1999; Menzer et al., 2013; Tenhunen et al., 1998). Despite EC data being frequently used to validate process modelling analyses, field surveys, and remote sensing assessments (Hagen et al., 2006), there are some serious concerns regarding the technique's challenges, e.g. data gaps and uncertainties. Hence, filling data gaps and reducing uncertainties through better gap-filling techniques are highly needed.

Even though the EC is a common technique to measure fluxes of carbon, water, and energy, there are some challenges in providing robust, high-quality, continuous observations. One of the challenges regarding the technique and therefore the flux networks is addressing data gaps and the uncertainties associated with the gap-filling process, mainly when the gap windows are long (longer than 12 consecutive days, as described by Moffat et al., 2007). These gaps happen quite often for a variety of reasons, such as values out of range, spike detection or manual exclusion of date and time ranges, instrument or power failure, herbivores, fire, eagles' nests, lightning, and/or researchers on leave (Beringer et al., 2017). Since EC flux towers are often located in harsh climates, their data are more susceptible to adverse weather (i.e. rain conditions), and they sometimes prevent quick access to sites for repair and maintenance. As a result, this issue can, in turn, produce gaps which might be relatively long (Isaac et al., 2017) and thus problematic, as explained in the following. Firstly, loss of data is considered a threat to scientific studies depending on the missing data quantity, pattern, mechanism, and nature (Altman and Bland, 2007; Molenberghs et al., 2014; Tannenbaum, 2010). That is because using an incomplete dataset might lead to biased, invalid, and unreliable results (Allison, 2000; Kang, 2013; Little, 2002). Second, continuous gap-filled data are required to calculate the annual or monthly budgets of carbon and water balance components (Hutley et al., 2005).

Other than the challenges caused by missing data, there are several sources of errors and uncertainties in the EC technique. Firstly, random error is associated with the stochastic nature of turbulence, associated sampling errors (incomplete sampling of large eddies, uncertainty in the calculated covariance between the vertical wind velocity and the scalar of interest), instrument errors, and footprint variability (Aubinet et al., 2012). For instance, Dragoni et al. (2007) analysed EC-based data from the Morgan–Monroe State Forest for 8 years (1999–2006) and assessed instrument uncertainty as equal to 3 % of the total annual net ecosystem exchange (NEE). Another primary source of uncertainty in EC measurements is systematic errors caused by methodological challenges and instrument calibration problems (e.g. sonic anemometer errors, spikes, gas analyser errors). Finally, one of the sources of uncertainties is data processing, especially data gap-filling (Isaac et al., 2017; Moffat et al., 2007; Richardson et al., 2012; Richardson and Hollinger, 2007).

There are several uncertainties pertaining to gap-filling of missing values,
including measurement uncertainty (Richardson and Hollinger, 2007), lengths and timing of the gaps (Falge et al., 2001; Richardson and Hollinger, 2007), and the particular gap-filling algorithm that is used (Falge et al., 2001; Moffat et al., 2007). However, there are two dominant issues with long data gaps and the choice of a particular gap-filling algorithm (Aubinet et al., 2012). Firstly, long gaps can significantly increase the total amount of uncertainty as the ecosystem behaviour might change because of different agricultural periods or phenological phases (e.g. growing season, harvest period, bushfire) and thereby show different responses under similar meteorological conditions (Aubinet et al., 2012; Isaac et al., 2017; Richardson and Hollinger, 2007). Consequently, the period in which a long gap happens is important. For example, research undertaken by Richardson and Hollinger (2007) on data from a range of FLUXNET sites revealed that a week data gap during spring green-up in a forest led to a higher uncertainty over a 3-week gap period during winter. Second, each gap-filling algorithm has
its strengths and weaknesses; for instance, Moffat et al. (2007) compared 15 different commonly used gap-filling algorithms. They found no significant difference between the performance of the algorithms with “good” reliability based on analysis of variance of the root mean square error (RMSE). The overall gap-filling uncertainty was within

Several methods have typically been used to fill data gaps in both fluxes
and their meteorological drivers to manage the missing data problem. Due to
computational constraints of complex algorithms, early works to impute EC data gaps used interpolation methods based mostly on linear regression or
temporal autocorrelation (Falge et al., 2001; Lee et al., 1999). These approaches were quickly replaced by more sophisticated methods such as non-linear regressions (Barr et al., 2004; Falge et al., 2001; Moffat et al., 2007; Richardson et al., 2006), look-up tables (Falge et al., 2001; Law et al., 2002; Zhao and Huang, 2015), artificial neural networks (ANNs) (Aubinet
et al., 1999; Beringer et al., 2016; Cleverly et al., 2013; Hagen et al.,
2006; Isaac et al., 2017; Kunwor et al., 2017; Moffat et al., 2007; Papale
and Valentini, 2003; Pilegaard et al., 2001; Staebler, 1999), mean diurnal
variation (Falge et al., 2001; Moffat et al., 2007; Zhao and Huang, 2015), and multiple imputations (Hui et al., 2004; Moffat et al., 2007). Each of these methods has its pros and cons as follows: (a) interpolation methods such as the mean diurnal variation (MDV) do not need any drivers, yet their accuracy is lower than other approaches (Aubinet et al., 2012). Moreover,
this method may provide biased results on extremely clear or cloudy days
(Falge et al., 2001). MDV is not recommended when a gap is longer than 2 weeks because it cannot consider the non-linear relations between the drivers and the flux, leading to a high level of uncertainty (Falge et al., 2001). (b) The look-up table, especially its modified version – marginal distribution sampling (MDS) – has provided performance close to ANNs and is more reliable and consistent than the other algorithms so far. Hence, MDS was
chosen as one of the standard gap-filling methods in EUROFLUX (Aubinet et al., 2012). Nevertheless, the performance of MDS in gap-filling of extra long gaps is not well known (Kim et al., 2020). (c) ANNs have commonly been used to gap-fill EC fluxes since 2000, and because of their robust and consistent results they are considered a standard gap-filling algorithm in several networks, e.g. ICOS, FLUXNET, and OzFlux (Aubinet et al., 2012; Beringer et al., 2017; Isaac et al., 2017). Despite their reliable performance, ANNs – and generally all other ML algorithms – face some challenges. Over-fitting, for instance, is a big concern and can happen when the number of degrees of freedom is high, while the training window is not long enough or the quality of the training dataset is low. This challenge becomes acute when the gaps happen while the ecosystem behaviour is changing and shows different responses under similar meteorological conditions. Furthermore, there is a desire to have the training windows short so that the algorithm can track the ecosystem behaviour shift. Yet, this increases the risk of over-fitting depending on the algorithm. In other words, the training window length should be neither so short that it causes over-fitting nor so long that it leads to algorithms ignoring ecological condition changes. Long gaps are considered one of the primary uncertainty sources of

Apart from the limitations and disadvantages of the mentioned algorithms, gap-filling of fluxes (e.g. NEE) experiences some other challenges that make it necessary to find or develop new gap-filling algorithms. That is because the current methods are not flexible enough to perform well on special occasions or with extreme values (Kunwor et al., 2017), and there is almost no room to optimise them to improve their outcome (Moffat et al., 2007). Moreover, even using the best available algorithm, such as ANNs, the model (gap-filling) uncertainty still accounts for a sizable proportion of the total uncertainties, especially when the gaps are relatively long. Since the 2000s when MDS and ANNs were chosen as the most reliable gap-filling methods for EC flux observations, many new ML and optimisation algorithms have been developed and used in various scientific fields. Some have shown superiority over ANNs, either individually or as a part of a hybrid or ensemble model (e.g. Gani et al., 2016). As a result, comparing the cutting-edge algorithms with the current standard ones can show whether there is any room to improve the gap-filling process within the field. According to the concerns mentioned above, this paper has two objectives: (a) to find out the impact of different gap lengths on the performance of each algorithm and (b) to compare the performance of traditional with new gap-filling techniques separately for fluxes and their meteorological drivers, particularly soil moisture, because this has always been a challenging variable to gap-fill due to the biology and heterogeneity of soil parameters. To address these objectives, we utilised nine different algorithms – eXtreme Gradient Boost (XGB), random forest (RF) algorithm, artificial neural networks (ANNs), marginal distribution sampling (MDS), classic linear regression (CLR), support vector regression (SVR), elastic net regularisation (ELN), panel data (PD), and the Prophet Forecast Model (FBP) – to fill the gaps of the major fluxes and eight of them (excluding MDS) to fill the gaps of the environmental drivers. We then assessed their relative performance to evaluate potentially better ways to fill EC flux data. To test the approaches, we used five flux towers from the OzFlux network. To evaluate the performance of these algorithms, nine scenarios for gaps were planned – from a day to a whole year – and applied to the datasets, and different common performance metrics (e.g. RMSE, MBE) and visual graphs were used.

In order to address the first objective of this research, nine different gap lengths were superimposed to the datasets, i.e. 1, 5, 10, 20, 30, 60, 90, 180, and 365 d. To address the second objective, we chose nine different algorithms to fill the gaps, including a wide variety of different approaches, e.g. from a simple algorithm like CLR to the cutting-edge ML algorithms like XGB (MDS was not used to gap-fill the environmental drivers). The data used in this paper came from five EC towers of the OzFlux network, i.e. Alice Springs Mulga, Calperum, Gingin, Howard Springs, and Tumbarumba, from 2012 to 2013, with a time resolution of 30 min, except for Tumbarumba (60 min). Additionally, data coming from three additional sources outside the network were also used as ancillary data to help the algorithms fill environmental driver gaps.

The data used for this research came from OzFlux, which is the regional
Australian and New Zealand flux tower network that aims to provide a
continental-scale national research facility to monitor and assess
Australia's terrestrial biosphere and climate (Beringer et al., 2016). As described in Isaac et al. (2017), all OzFlux towers continuously measure and record meteorological and flux variables at resolutions up to 10 Hz and use a 30 min averaging period, with a few exceptions (data are available from

The datasets used in this research came from five towers from the OzFlux network between 2012 and 2013, each representative of a different climate and land cover for Australian ecological conditions (Alice Springs Mulga: tropical and subtropical desert, Calperum: steppe, Gingin: Mediterranean, Howard Springs: tropical savanna, Tumbarumba: oceanic; Table 1 and Beringer et al., 2016). The datasets included 15 meteorological drivers and three major fluxes recorded (Table 2) based upon the EC technique at a 30 min temporal resolution, except for Tumbarumba, which was hourly. Additionally, relevant ancillary datasets for the mentioned towers were used to follow the OzFlux network gap-filling protocol (Table 3). Each dataset was quality checked at three levels based on the OzFlux network protocol described in Isaac et al. (2017) and applied using PyFluxPro version 0.9.2. To address the underestimation of canopy respiration by EC measurements at night, we used the change-point detection (CPD) method (Barr et al., 2013) to reject nightly records when the friction velocity fell below each site's threshold value. After dismissing the inappropriate measurements, overall coverage of 72 %–88 % and 21 %–48 % was achieved for diurnal and nocturnal records during 2013 (the year to which the artificial gaps were superimposed), respectively.

Information on the five towers from which data were used, including their name, location, dominant species, and climate.

List of variables and their units used in this research, including the three main fluxes and their environmental drivers.

The datasets whereby each environmental variable was gap-filled are shown in
Table 3. For each of these variables, the same variable of the ancillary source was used to fill the gaps. For instance, to gap-fill Ah, the Ah records of AWS, ACCESS-R, and BIOS2 were used. To gap-fill the missing values of fluxes, i.e.

The ancillary sources used to gap-fill each environmental driver.

Eight imputation algorithms for estimating 15 environmental drivers and nine algorithms for the three major fluxes were chosen to make the comparison. These algorithms were selected in such a way that a variety of approaches were tested, from the standard methods like ANNs and MDS to the newer algorithms which have rarely or never been used in the field, such as eXtreme Gradient Boosting and panel data (Table 4).

The name and the abbreviation of the gap-filling algorithms.

Reichstein et al. (2005) introduced the MDS as an enhanced look-up table method, which considers both the covariation of fluxes with meteorological variables and the temporal autocorrelation of the fluxes
(Aubinet et al., 2012). Alongside the ANNs, the MDS is considered one of the standard gap-filling methods for flux data amongst FLUXNET and is selected in this study to help the community have a clear idea of the performance of other algorithms. Unlike the other algorithms used in this research, we used

Rooted in the 1950s, artificial neural networks are ML methods inspired by biological neural networks and are classified as supervised learning methods (Dreyfus, 1990; Farley and Clark, 1954). ANNs work based on several connected units called nodes, which are used to mimic a neuron's functionality in an animal brain by sending and receiving signals to other nodes. The ANN technique used in this paper was the Multi-Layer Perceptron regressor, which optimises the squared loss using stochastic gradient descent. Sklearn.neural_network.MLPRegressor was used to apply this method in Python, and its hyperparameters were 800 and 500 for “hidden_layer_sizes” and “max_iter”, respectively, based on a grid search. ANNs are one of the current standard approaches for gap-filling in FLUXNET and in this research were picked out as a performance reference for other algorithms.

A classical linear regression is an equation developed to estimate the value
of the dependent variable (

Random forest, a supervised ML algorithm used for both classification and regression, consists of multiple trees constructed systematically by pseudo-randomly selecting subsets of components of the feature vector: that is, trees constructed in randomly chosen subspaces (Ho, 1998). The RF algorithm has been developed to overcome the over-fitting problem, a commonplace limitation of its preceding decision-tree-based methods (Ho, 1995, 1998). Sklearn.ensemble.RandomForestRegressor was used to apply this method in Python, and the hyperparameters used were 5 and 1000 for “max_depth” and “n_estimators”, respectively, based on a grid search.

As a non-linear method, support vector regression was developed based on Vapnik's concept of support vector theory (Drucker et al., 1997). An SVR algorithm is trained by trying to solve the following problem:

The elastic net is a linear regularised regression method that exerts small
amounts of bias by adding two penalty components to the regressed line to
decline the coefficients of independent variables. It thus provides better
long-term predictions. Given that these two penalty components come from
ridge regression and LASSO, the elastic net is considered a hybrid model
consisting of ridge and LASSO regressions, thereby overcoming the limitations of both.
The estimates from the ELN method can be formulated as below (Zou and Hastie, 2005):

The panel data method is a multidimensional statistical method mainly used in econometrics to analyse datasets which involve time series of observations
amongst individual cross sections (Baltagi, 1995), usually based on ordinary least squares (OLS) or generalised least squares (GLS). A two-way panel data model consists of two extra components beyond a CLR as follows (Baltagi, 1995; Hsiao et al., 2002; Wooldridge, 2002):

The eXtreme Gradient Boost algorithm is a reinforced method of gradient boost introduced
in 1999 that works based on parallel boosted decision trees. Similar to RF, it can be used for a variety of data processing purposes including classification and regression (Friedman, 2001, 2002; Ye et al., 2009). The XGB method is resistant to over-fitting and provides a robust, portable, and scalable algorithm for large-scale boosting decision-tree-based techniques. sklearn.ensemble.GradientBoostingRegressor was used to apply this method in Python, and its hyperparameters were chosen based on a grid search as follows:

The Prophet Forecasting Model, also known as “Prophet”, is a time series
forecasting model developed by Facebook to manage the common features of
business time series. It is designed to have intuitive parameters that can be
adjusted without knowing the details of the underlying model (Taylor and Letham, 2018). A decomposable time series model was used (Harvey and Peters, 1990) to develop this model, with three main components: trend, seasonality, and holidays (Taylor and Letham, 2018):

In order to find out the effect of gap size on the performance of our gap-filling algorithms, the data were removed randomly from nine different gap windows (i.e. 1, 5, 10, 20, 30, 60, 90, 180, and 365 consecutive days) during 2013. Afterwards, the data from 2012 to 2013 were used to train the algorithms (excluding the superimposed gaps). Finally, the trained algorithms were used to fill the artificial gaps superimposed to the datasets. The entire process permutated five times in each scenario to ensure the performance was not sensitive to the gap position (i.e. seasonally). As such, 15 variables, nine window lengths, eight gap-filling methods (MDS excluded), and five permutations across five towers resulted in 27 000 computations for the meteorological features. Similarly, three fluxes, nine window lengths, nine gap-filling methods, and five permutations across five towers resulted in 6075 computations for the major fluxes overall.

Different statistical metrics were used to evaluate algorithms' performance
and enable comparison between measured values from the flux towers with each
gap-filling algorithm prediction. These metrics included the coefficient of
determination (

Even though factors such as ground heat (

The average performance metrics for each gap-filling
algorithm regarding

A heat map of mean RMSE values of

These outcomes were expected for the XGB as it uses a more regularised model
formalisation to control over-fitting (Chen and Guestrin, 2016), which, on paper, leads to better performance against its ML rivals. The relatively poor performance of FBP was also foreseen because, unlike other algorithms, FBP did not use any feature to estimate flux values other than the previous time series of flux values. However, the weaker performance of the ELN compared to CLR was unforeseen as by adding two penalty components to the regression line, the ELN is supposed to improve the long-term prediction compared to the traditional linear regression methods. Tukey's HSD (honestly significant difference) test at the level of 0.05 was applied to the results to determine whether the difference amongst the algorithms was significant (Table 5). When the null hypothesis is confirmed there is no significant difference between the mean values of the RMSE. According to the results, there were significant differences between certain algorithms, and the XGB, RF, and ANNs were different from the rest, showing that these three performed considerably better. Tukey's HSD test, however, did not reject the second error probability between RF, XGB, and ANNs, meaning that the three algorithms were not significantly different from each other. This result agrees with the results of Falge et al. (2001) and Moffat et al. (2007) in the sense that ANNs are one of the best available gap-filling algorithms, and there is no significant difference amongst the appropriate algorithms. However, the test showed that the performance of the MDS was significantly different from the ANNs. It seems that the difference has occurred because of the longer gaps (

To address this paper's first objective, which was to find the sensitivity of the gap-filling algorithms to the gap window length, we used
the averaged RMSE,

The average RMSE,

According to the MBE values (Table 5), mainly, all algorithms had negative MBEs, indicating an overestimation of the

Measured vs. estimated values of

Observations from the EC technique often include extremely low or high values after quality control (QC), especially at night when some of the theoretical assumptions might be violated. One of the practical challenges associated with the EC technique is that it is often difficult to distinguish between the good data and the noise (Aubinet et al., 2012; Burba and Anderson, 2010). This problem seems to affect the outcomes of the gap-filling algorithms in this research, as none of them performed ideally in capturing the observed variance (Table 5). Even though RMSE,

The linear algorithms, CLR, PD, and ELN, performed worse concerning the VR compared to the ML algorithms, with the VR of

The performance of algorithms for

The average metrics for

Measured vs. estimated values of

As with the other flux results, the metrics of RMSE,

The average metrics for

Measured vs. estimated values of

Since meteorological and environmental drivers are needed to fill the gaps
of the three turbulent fluxes (

The average RMSE for

Nine gap-filling algorithms were used in this study: eXtreme Gradient Boost
(XGB), random forest (RF) algorithm, artificial neural networks (ANNs),
marginal distribution sampling (MDS), support vector regression (SVR),
classical linear regression (CLR), panel data (PD), elastic net
regularisation (ELN), and the Prophet Forecasting Model (FBP). All algorithms performed similarly in estimating the meteorological and
environmental drivers (turbulent fluxes included) across all stations except the FBP, which performed poorly because it did not use any ancillary data. The best results were achieved for the 30 d gaps and shorter, while the worst results obtained for the most extended windows of 180 and 365 d. Although most of the algorithms performed almost equally well in estimating meteorological and environmental drivers, the linear algorithms (CLR, ELN, and PD) performed slightly better, though not significantly using Tukey's HSD test. The only apparent exception was

The XGB was the most novel ML algorithm used in this research, and based on most performance metrics it provided comparatively robust results in estimating the fluxes. In estimating the meteorological drivers, though, the XGB did not show any superiority over the other algorithms, especially the linear ones. Moreover, the XGB needed 4 to 6 times longer to be trained and tuned, making it a less feasible algorithm when time and processing power are important factors or several years of data need to be gap-filled. Hence, we do not recommend the XGB as an alternative to the current standard algorithms. Nevertheless, because of its local superiorities, this algorithm might be suitable to use in an ensemble model alongside algorithms with different weaknesses.

The RF was the best all-around algorithm amongst the nine algorithms used in
this study, providing the best consistent and robust estimates of the fluxes
(similar to XGB). It is also less complicated and performs faster than
the XGB. The RF also provided the best results for

The ANNs estimated the fluxes better than the linear algorithms, most notably for

The MDS performed similar to, yet not as well as, the XGB, RF, and ANNs in
gap-filling the fluxes. Its performance was close to the SVR but was more
reliable for

The SVR showed consistent inferiority to the other ML algorithms and did not fulfil our expectations for the meteorological drivers or for the major fluxes. The only strength of the SVR was that it captured the extreme values better than any other algorithm. However, because of the larger RMSE the mentioned advantage seems to have been achieved suspiciously and might have occurred due to over-fitting. This dubious performance shows that SVR is perhaps more vulnerable to the over-fitting issues regarding these data types. Hence, we suggest the SVR not be used in environmental modelling related to the reviewed drivers and fluxes.

The CLR, the simplest algorithm used in this research, provided a comparatively acceptable performance in estimating the meteorological
drivers, except for

The PD performed slightly better than the CLR, yet it did not show a significant superiority over the other linear algorithms used in the research. This unforeseen weak performance can be explained due to a couple of factors. First, one of the assumptions of using the PD is that the cross-sectional behaviour (here towers) is similar under similar conditions (the independent variables), and the only thing that leads to the difference is the specific characteristics of each individual cross section. Contrariwise, it seems that the five towers selected in this research violated this assumption due to their being in widely different ecosystems. Based on previous studies in which the PD performed well (Izady et al., 2013, 2016; Mahabbati et al., 2017), it appears that a decent level of homogeneity is vital for the PD to perform satisfactorily. As in all previous cases, the cross-sectional ecosystem had significant similarities, and the distance between them was smaller. Therefore, the characteristics of cross sections, such as radiation, climate, and rainfall, had considerably more similarity and homogeneity compared with the towers used in this research. Finally, it is worth mentioning that PD has been commonly used to analyse time series with a time resolution of weekly or longer, with some exceptions using daily time steps. In this research, the data resolution was half-hourly instead, which dramatically increased the computational demands of the algorithm and led to days of processing for a single run. This demand happened because the algorithm creates a dummy variable for each time step and the relevant matrix of variables becomes too large to compute with a regular PC. Considering the computational expense of this algorithm, we recommend other researches not use PD when the time resolution is shorter than daily. Despite the limitation, we still encourage further use of PD whenever there is a decent homogeneity level amongst the cross sections and the time resolution is daily or longer.

As a hybrid linear model, the ELN did not show any superiority over the CLR,
despite its modifications to provide more accurate estimations. However,
ELN performed well in estimating the drivers with slight superiority on some
occasions (e.g. for

The FBP was a unique algorithm used in this research, as it did not use any independent variables to estimate the values of drivers and fluxes. The FBP performance was the least satisfactory of all the algorithms. Therefore, FBP cannot be considered a reliable alternative for current algorithms to fill gaps, especially longer ones.

Given that some of the environmental drivers that affect

Finally, it is noteworthy that some of the flux drivers used in this study as input features for the gap-filling algorithms are not commonly used or might not globally be available. However, considering that similar relative performance has been achieved in other research for which different sets of input features were used (Kim et al., 2020), the relative performance of the algorithms reviewed in this research should generally provide similar relative performance while using different input features.

Eight different gap-filling algorithms for estimating 16 meteorological
drivers and nine algorithms for the three key ecosystem turbulent
fluxes (sensible heat flux –

Since the RF was more consistent than its competitors, including the ANNs, we suggest it is a good idea to use RF alongside the commonly used algorithms in challenging scenarios, such as with long gaps, to figure out whether this superiority can be generalised.

It appears that even after three levels of quality control process by the flux processing software (e.g. PyFluxPro), the data are still quite noisy. These noisy data are an essential source of both uncertainty and inaccuracy of the outcome, regardless of the algorithm used to gap-fill the data. As a result, another level of quality control methods, such as wavelets or matrix factorisation, in addition to the current classical ones used by PyFluxPro and other similar platforms can probably improve the data quality and thereby improve the final imputation results.

For future researchers, using recurrent neural networks (RNNs) instead of feed-forward neural networks (FFNNs) could improve estimations. This is likely because RNNs help the model to consider the temporal dynamic behaviour of time series. Unlike FFNNs, wherein the activations flow only from the input layer to the output layer, RNNs also have neuron connections pointing backwards (Géron, 2019). The demand for an algorithm capable of considering time has been mentioned in previous research as one of the reasons why testing new algorithms is needed (Richardson and Hollinger, 2007).

Developing ensemble models using algorithms with different weaknesses and strengths may also enhance the results when a single algorithm shows performance deficiency.

All data used in this research are available at this repository address:

The supplement related to this article is available online at:

The ideas for this study originated in discussions with AM, JB, and ML. AM carried out the analysis, supported by IM and PI. The paper was prepared with contributions from all authors.

The authors declare that they have no conflict of interest.

The authors would like to acknowledge the Terrestrial Ecosystems Research
Network (TERN) (

This paper was edited by Jean Dumoulin and reviewed by Thomas Wutzler and one anonymous referee.