Examination of Patterns and trends in streamflow variables used in stream ecology across the US

1. Abstract

Hydrology is one of the primary factors influencing the physical and biological characteristics of streams. Intra and inter variation in hydrologic regime alter the composition, structure and function of aquatic ecosystem through their impacts on physical habitat characteristics. To better understand the relationship between stream biota and various streamflow variables, I selected 2419 minimally impacted gaging sites across the entire US and investigated the spatial patterns/trends for 17 streamflow variables important to stream ecology using ArcGIS.

The spatial analysis showed higher Base Flow Index(BFI) on the East North Central areas (around Wisconsin and Michigan), a high Coefficient of Variation of Daily flows(CVDAYS) on West Central US, higher predictability (P) on the western US and a relatively higher flood duration on the South Atlantic Coasts(around Florida) and western US (Upper Mountain region).

The trend analysis showed that Mean Daily discharge (Qmean) has significant increasing trend in the Central Northern US and there is also an increasing trend for 50% timing of flow (T50).

2. Introduction

Hydrologic regime plays a major role in determining the biotic composition, structure and function of aquatic ecosystems (Richter, 1996). Streamflow, which is correlated with many critical physical characteristics of rivers, can be considered as a “master variable” that limits the distribution and abundance of aquatic species (Power et. al 1995, Resh et al 1988) and regulates the ecological integrity of flowing water systems (Poff et. al 1997). As geology, climate, topography, land cover and land use are not the same in different parts of the country, it can be expected that the streams characteristics which are necessary for stream ecology are also different and the trends that they follow might also be different. To understand more on how these characteristics which relate to stream biota, are spatially distributed in across the country and what kind of trends they might follow, I took up this GIS term project to find the patterns and trends. I expect that examination of these patterns will help us not only understand the relationship of these variables with stream biota but also the impacts of changes in streamflow regime on them. As for this project, there were 2 questions which were considered:

a) Do the streamflow variables (considered) have a spatial pattern across the US? And if there are any, which are those?

b) Is there some kind of trend in these variables? Do these trends follow some spatial pattern too?

3. Method

Selection of the streamflow variables was done using various literatures on stream ecology. For the selection of stations, GAGES (Geospatial Attributes of Gages for Evaluating Streamflow (Falcone et al. 2010)) dataset was used. The data was organized using ArcGIS. The streamflow data for these stations were downloaded from USGS (US Geological Survey) website using R. The calculation of these variables was done using R. The calculations were not done for stations which had more than 10% missing data values. Finally these values for various variables were plotted using ArcMap. The symbology was chosen so as to make the patterns, if they exist, clearer.

3.1 Selection of stations

GAGES (Geospatial Attributes of Gages for Evaluating Streamflow (Falcone et al. 2010)) provides the data for 6785 USGS stream gages and their upstream watersheds. These sites have at least 20 years of complete-year flow record. This dataset also provides the quantitative measure of how disturbed the stream is, which is measured in terms of Hydro Disturbance Index. Based on this measure as well as based on the visual inspection of gages, the dataset has identified 1512 reference quality stream gages across the entire country. This was the first subset of stations chosen for analysis of data and calculation of streamflow variables.

HCDN (Hydro Climactic Data Network) [Slack and Landwehr, 1988] is a dataset that has been developed for the purpose of studying the variation in surface water conditions throughout the United States. A large part of the HCDN dataset was also included in the GAGES (1457 gaging stations). Hence, the part of HCDN sites which were included in GAGES were also included in the set of stations used.

There was an overlap of 550 stations between HCDN and GAGES.

Fig 1: Location of the 2419 stations used for analysis

I used R to download the streamflow data required for the analysis from the United States Geological Survey (USGS) website.

3.2 Selection of Streamflow variables and their definitions

There are 5 critical components of streamflow regime that are important to stream biota: flow magnitude, duration, frequency, timing, and rate of change [Poff, 1996]. Based on these components, Poff [1996] focused on flow variables which represented the variability and predictability. Other ecologists have suggested additional variables for streamflow regime [Richter et al 1996, PuckRidge et al 1998]. Through these and various other literatures, the streamflow variables that were deemed important for stream ecosystem were chosen. These variables are given below:

1) Baseflow Index (BFI)

2) Coefficient of variation of Daily flows (DAYCV)

3) Mean daily discharge (Qmean)

4) Mean number of zero flow days per year (ZERODAYS)

5) Daily flow with a 1.67 year recurrence interval (Q1.67)

6) Colwell's index of Predictability (P)

7) Colwell's index of Constancy (C)

8) Colwell's index of Contingency (M)

9) Average 7 day minimum streamflow (7Qmin)

10) Average 7 day maximum streamflow (7Qmax)

11) Flow Reversals per year (R)

12) Flood Duration (FLDDUR)

13) Average 25% timing of flow (T50)

14) Average 50% timing of flow (T50)

15) Average 75% timing of flow (T75)

16) First Harmonic of flow (H1)

17) Time of Peak (Tp)

The explanations and their calculation methods of these variables are provided in Appendix B.

To calculate trends for mean daily discharge, the annual daily discharge is divided by the average discharge for the station and this term is regressed with years. This is done to remove the scale from the discharge. So the slope for mean daily discharge is a unitless term and can only be used to see if the trend is decreasing or increasing.

3.3 Calculation and Plotting of the variables:

The calculations for these variables were done using R. These values were then plotted using ArcGIS. Each variable is symbolized with different colors and sizes wherever necessary. For the interpretation of these spatial patterns in these maps, I have used the different regions of US as provided in Appendix C.

4. Results:

There were some variables which exhibited clear spatial patterns at various regions of the country. Plots of other variables are puzzling and they need to be examined further. Here I have discussed only those maps which exhibited some spatial patterns. Other maps which are not discussed here are included in the Appendix A.

The values in parentheses (in the legend) give the number of stations in a particular range.

Base Flow Index:

Fig 2: Base Flow Index

Fig 2. gives the map for Base Flow Index across the US. Here we can see that the BFI values are high on the East North Central region (Wisconsin and Michigan). These 2 states have 21 stations which showed a BFI of greater than 45. The BFI values are also high on the mid-western areas of US (North of Utah, East of Idaho and Western Oregon). Other pattern that can be seen is on the south eastern parts of US (around Western North Carolina) where the BFI is relatively higher than the surrounding areas. Higher base flows indicate that the flows are relatively constant throughout the year.

Coefficient of Variation of Daily Flows

Fig 3. gives the coefficient of variation of daily flows. This shows higher values of coefficient of variation in the West Central US (Texas and areas to the north of Texas) and South West region (California). The coefficient of variation was seen to be high on the places where the zero flow days are high (fig 4). There were 71 stations which had a coefficient of variation of greater than 6 and the number of zero flow days greater than 60. Out of the 71 stations, 18 were in Texas and 15 of them were in California.

Fig 3: Coefficient of Variation of Daily Flows

Fig 4: Zero Flow Days

Predictability

Fig 5 gives the predictability of flow. The predictability is high on the western part of the country than on the eastern part. The mid-mountain region (Around Colorado, Wyoming and Utah) shows a high predictability. 83 sites in this region have a predictability of more than 0.6. The West South Central region (Around Texas) and west coastal region (Around California) also have high predictability.

Comparing this map with the map of coefficient of variation of daily flows, we can see that the regions with high coefficient of variation relate to region with high predictability except for the mid-mountain region. In the mid-mountain region, the predictability is high even if the variation is low.

Fig 5: Predictability

Flood Duration:

Fig 6 gives the map of flood duration. It can be clearly seen that the flood duration is higher on the lower mid-mountain region (around Utah and Colorado). 39 stations in this region have flood duration of more than 5 days. The region in the south Atlantic (Around Florida) also shows longer flood duration. There are 14 stations in this area (of 47 total stations in Florida) which have flood duration of more than 5 days.

Fig 6: Flood Duration

Trend in Mean Daily Discharge:

Fig 7 gives the map of spatial distribution of trend in mean daily discharge. This map and Table 2 indicate that there is a significant increasing trend in the East North Central US (Around Iowa). Also there seems to be a decreasing trend at the lower west north central regions (Around Western Kansas). Table 2 gives the number of stations that showed significant change in mean daily discharge.

Slope Number of Highly significant Stations

< -0.03 6 (5 of them are around Western Kansas)

-0.03 to -0.005 22

-0.005 to 0.005 17

0.005 to 0.03 80 (More than 55 of them are in the East North Central part of US)- (Iowa and northern parts)

>0.03 14 (10 of them are in Eastern parts of West North Central part of US) -(South Dakota and North Dakota)

Slope Number of Significant Stations

< -0.03 9 (5 of them are around Western Kansas)

-0.03 to -0.005 110

-0.005 to 0.005 187

0.005 to 0.03 295 (More than 50 of them are in the East North Central part of US)- (Iowa and northern parts)

(More than 100 are on the eastern part of US from Vermont to Northern part of Virginia)

>0.03 42 (8 are in East North Central part of US – Iowa and northern parts)

(14 are in Upper South Atlantic parts of US – Around New Jersey)

Table 2: Number of stations that showed significant change in mean daily discharge

Fig 7: Trend in Mean Daily Discharge

Trend in 50% timing of flow(T50):

The trend for 50% timing of flow shows that there has been an increasing trend across the entire country. There are 32 highly significant (p-value less than 0.001) and 1055 significant (p-values between 0.001 and 0.1) stations for which the trend is increasing. Similar patterns were seen for 25% timing of flow and 75% timing of flow.

Fig 8: Trend in 50% timing of flow

4. Conclusion:

These maps indicated that some of the variables which are important to stream ecology have clear spatial variation across the US. Based on these patterns, the streamflow on the western region of US has a lot of variation than the eastern regions. Trend analysis carried out shows that the trend for daily discharge is increasing on the North Central region as well as Middle-Atlantic region. The conclusion that can be made from the trend of T25, T50 and T75 is that these timings of flow are getting sooner.

5. Discussion

This analysis was done using 2419 stations across the entire US. The numbers of stations are more densely distributed in the Eastern region than the western region. The Southern mountain regions (Nevada, Arizona and New Mexico) have less (only 111 out of 2419) sparsely distributed gaging stations due to which the trends at these regions might not have been clearer. Regarding the stream flow variables, break values play an important role to bring out the patterns and trends. It was essentially done following some previous works or which made the patterns more clear. This thing can be further investigated so that the patterns are not misinterpreted. To calculate trends linear regression was carried out but the relationship between the response and predictor was not linear so the trend could also be calculated using other non-linear methods.

APPENDICES

Appendix A

Maps of other streamflow variables are presented here. They need further examination.

Fig A.1: Map for 25% timing of flow

Fig A.2 Map for 50% timing of flow

Fig A.3 Map for 75% timing of flow

Fig A.4 Map for time of peak in first harmonic

Fig A.5 Map for Time of Peak

Fig A.6 Map for Constancy

Fig A.7 Map for Contingency

Appendix B

(1) Base Flow Index (BFI)

It is the average across all years of the lowest daily flow divided by the annual average flow (expressed as percentage).It represents stability of flow (Poff, 1996). The values of near 100% indicate a fairly constant flow and a value near 0% indicate intermittent stream.

(2) Coefficient of Variation of Daily Flows(DAYCV)

It is the ratio of the standard deviation of daily flows to the average of daily flows. It represents the overall variability of the streamflow regime [Poff,1996].

(3) Mean daily discharge(QMEAN)

It is the mean daily discharge over all the years of record and represents the magnitude of flow.

(4) Colwell’s Index of Predictability (P)

(5) Colwell’s Index of Constancy (C)

(6) Colwell’s Index of Contingency (M)

Predictability and Constancy are measures of how predictable and constant the daily flows are over the years (Colwell, 1974). Both variables take values from 0 to 1. To better explain Colwell’s indices the frequency tables for 6 different arbitrary streams are considered in Table 1 with 4 states and 4 periods (or seasons).A stream whose discharge never varies is perfectly predictable (Table 1 Stream A). A fluctuating stream could also be completely predictable if it changes its streamflow with full certainty on a periodic basis (Table 1 Stream B). This gives the idea of contingency. If the streamflow changes with full certainty after a time period, it is said to have a contingency of 1. As predictability is a combination of constancy and contingency, the predictability can also be 1 by a sum of constancy and contingency (Table 1 Stream C). When all states are equally likely to occur at any period, the contingency is 0 and so is the constancy which makes the predictability 0 (Table 1 Stream F). Also, if no more information about flow is obtained from knowing the season (period), it makes contingency as 0 (Table 1 Stream D). When the states of the streamflow fluctuate to the greatest degree possible during each year, it has constancy of 0 (Table 1 Stream E).

Table 1 : Seasonal Predictability, Constancy and Contingency of 7 arbitrary streams. Columns in each matrix represents tri-monthly period (I = Oct – Dec, II = Jan – Mar, III = Apr – Jun, IV = Jul – Sep) and rows represent the states (µ gives the average monthly discharge). Frequencies of flow states for 96 months (8 year) period are tabled in each matrix. e.g. values in column 2 of stream E indicates that out of 96 months, the flow was between µ and 2µ for 12 months while for 10 months the flow was between 2µ and 3µ and the flow did not go below µ and above 3µ for the period (II) between the months of January and March.

For the calculation of these 3 variables, Shannon’s entropy theory is used. The data needs to be binned into discrete groups. Here, the data is binned is 6 discrete groups (<0.5µ,1.0 µ,1.5 µ,2.0 µ,2.5 µ,3.0 µ, >3.0 µ)[Gordon et. al 2004] and 12 periods as months to represent the seasonal cycle.

tà

Periods (j)(à) States(i)	1	2	..	..	..	..	11	12
>3.0µ
2.5 µ-3.0 µ
…
…
1.0 µ-0.5 µ
<0.5 µ
									N

The number of occurrences of daily streamflow values in states and periods are counted and place. If Nij is the number of days of flow in month ‘j’ and state ‘i’, the uncertainty with respect to time is

where is the number of values in period j and here p_k is the probability that the process is in group k, in period j. Averaging this

across all periods we obtain:

Maximum uncertainty occurs when each value in the group is equally probable.

i.e. N_ij=X_j/s

When this occurs H_s=log(s).

Predictability is therefore defined as

Its value ranges from 0 to 1 where 0 is maximum uncertainty and 1 is complete certainty as to which value group the process is in each period.

In constancy seasonal variability across periods is disregarded. The uncertainty with respect to value groups (states) is quantified using entropy as:

where and

Maximum uncertainty occurs when for each period, each value group is equally probable, that is Y_i/N = 1/s.

When this occurs H_c=log(s).

Constancy is therefore defined as

Its value also ranges from 0 to 1.

Contingency is defined as the degree to which time period and value group are dependent on each other. In information theory this can be quantified by the mutual information (Jelineck, 1968) defined as:

where H_c and H_s are as defined above and

where

and

A scaled measure of contingency is given by

It can also be seen that P=C+M . Predictability is a combination of the measures of constancy of the process and contingency of the process on time period.

(7) Average 7 day minimum flow (7Qmin)

For each year, the seven day average is calculated and minimum among that is 7-day minimum flow for that year. 7Qmin is the average of those yearly 7-day minimum values.

(8) Average 7 day maximum flow (7Qmax)

For each year, the seven day average is calculated and maximum among that is 7-day maximum flow for that year. 7Qmax is the average of those yearly 7-day maximum values.

(9) Average number of flow reversals (R)

The change of trend from the previous day (increasing to decreasing or vice-versa) is summed for each year. Then the average of the trend values for all the years of record gives the flow reversal per year.

(10)Bank Full Flow (Q1.67)

A log-normal probability distribution is first fitted to the annual maximum daily flow series and the value that has a probability of exceedance of 1/1.67 is chosen, which is the Bank Full Flow. This corresponds to the flow when the channel maintenance is most effective and is an index for physical habitat disturbance in streams [Dunne and Leopold, 1978].

(11) Flood Duration

Flood Duration quantifies the duration of flooding as the average number of days per year when the daily flow equals or exceeds Q1.67.

(12) Average Number of Zero Flow Days (ZERODAYS)

It is the mean number of zero flow days per year and quantifies the low flow disturbances and intermittency in streamflow[Poff,1996].

(13) Average 25% timing of flow (T25)

T25yr is the time of the water year by which 25% of the total flow has occurred measured in days from start of water year (Oct 1). T25 is the mean of T25yracross all years of record.

(14) Average 50% timing of flow (T50)

T50yr is the time of the water year by which 50% of the total flow has occurred measured in days from start of water year (Oct 1). T50 is the mean of T50yracross all years of record.

(15) Average 75% timing of flow (T75)

T75yr is the time of the water year by which 75% of the total flow has occurred measured in days from start of water year (Oct 1). T75 is the mean of T75yracross all years of record.

(16) Time of Peak in first Harmonic (H1)

It is the time of the peak of the first harmonic of flow. This is calculated by fitting the average daily data into a Fourier series and calculating the time of peak for the harmonic representing the annual cycle.

(17) Time of Peak (Tp)

It is the time of peak flow calculated from the daily average across all the years.

For the calculation of trends:

APPENDIX C

Description: http://www.eia.doe.gov/emeu/reps/maps/us_census_files/cendivco.gif

Source: www.eia.doe.gov

References:

(1) Chinnayakanhalli, K. L. (2010), Characterizing ecologically relevant variations in streamflow regimes.

(2) James A. Falcone, Daren M. Carlisle, David M. Wolock, and Michael R. Meador. 2010. GAGES: A stream gage database for evaluating natural and altered flow conditions in the conterminous United States. Ecology 91:621. http://esapubs.org/archive/ecol/E091/045/default.htm

(3) Poff, N.L. (1996), A hydrogeography of unregulated streams in the United States and an examination of scale-dependence in some hydrological descriptors, Freshwater Biology, 36(1), 71-91.

(4) Poff, N. L., J. D. Allan, M. B. Bain, J. R. Larr, K. L. Prestegaard, B. D. Richter, R. E. Sparks, and J. C. Stromberg (1997), The natural flow regime: A paradigm for conservation and restoration of river ecosystems, BioScience, 47, 769-784.

(5) Puckridge, J. T., F. Sheldon, K. F. Walker, and A. J. Boulton (1998), Flow variability and the ecology of large rivers, Marine & Freshwater Research, 49, 55-72.

(6) Slack, J.R. & Landwehr, J.M. (1992) Hydro-climatic data network: A U.S. Geological Survey streamflow data set for the United States for the study of climate variations, 1874 - 1988. p. 193.

(7) Bunn, S.E. & Arthington, A.H. (2002) Basic principles and ecological consequences of altered flow regimes for aquatic biodiversity. Environmental Management, 30, 492-507.

(8) Clausen, B. & Biggs, B.J.F. (1997) Relationships between benthic biota and hydrological indices in New Zealand streams. Freshwater Biology, 38, 327-342.

(9) Olden, J.D. & Poff, N.L. (2003) Redundancy and the choice of hydrologic indices for characterizing streamflow regimes. River Research and Applications, 19, 101-121

(10)Poff, N.L. & Allan, J.D. (1995) Functional organization of stream fish assemblages in relation to hydrological variability. Ecology, 76, 606-627.

(11)Poff, N.L. & Ward, J.V. (1989) Implications of streamflow variability and predictability for lotic community structure - a regional-analysis of streamflow patterns. Canadian Journal of Fisheries and Aquatic Sciences, 46, 1805-1818.