Comparison of Time-Series Data Analysis

In this post, I examine a few different methodologies for determine the largest flow events from a time series of flow data at Newport, Arkansas on the White River. The analysis uses daily average flow data from the USGS. For this analysis, I want not only the highest flow values, but I also want the date that is associated with those values.

This flow data can be found in the following location:

USGS White River at Newport Daily Flow Data

The image below indicates that the flow data is available from 1927 to the present. I select to get the data in a Tab-separated file.

Method #1: Using Excel to Sort the Data

Probably the easiest and most direct method for performing this analysis is to import the data into Excel and then sort by the flow values. You need to sort the entire table to ensure that the proper date is still linked to the corresponding flow. After importing the data and sorting by flow value from largest to smallest, the top 15 values are shown below. The largest value of 340,000 cfs occurred on April 18, 1945. You can see that some of the other top values are also associated with this April 1945 event.

Method #2: Using USGS to Obtain the Annual Peaks

If you are only interested in the annual peaks, USGS will typically have those values tabulated. It should be noted, however, that peak streamflow values from USGS can be instantaneous values, which are likely to vary from the daily values. It should also be noted that instantaneous data set may contain a different period of record than the daily data set. In the data set below, you can see that the peak streamflow data begins in the year 1886 for this site.

On the USGS site, we can sort the streamflow values from highest to lowest by clicking on that column heading. The first click appears to order them from lowest to highest. Clicking on it again orders them from highest to lowest.

In the image above, two items are apparent in the data. The first is that peak recorded flow is 387,000 cfs on April 17, 1927. This value did not appear in the daily values since the daily values began in October 1927. The next is that the peak flow in the April 1945 event is recorded as 343,000 cfs as opposed to the 340,000 cfs that was found using the daily data. It is important to provide this type of context whenever you are summarizing time series data.

Method #3: Using HEC-DSSVue

HEC-DSSVue is a Corps of Engineers database that is well suited for handling time series data. To use HEC-DSSVue for this analysis, I first import the data. I was unable to get the direct import from USGS to work so I copied the data from an Excel spreadsheet using the manual data entry in HEC-DSSVue. I now have a full time series of the daily data. If I want to get the peak streamflow for each year, I can use the Math Functions within HEC-DSSVue.

To access this, I select Tools > Math Functions. I then select Time Functions. I select the Maximum for Period. I select Water Year as the New Period Interval. Since I want to preserve the date of the maximum value, I also select Save as Irregular Interval with the Block Size being IR-YEAR (meaning irregular values for each year as opposed to using the end of the water year as the date for each year).

HEC-DSSVue creates an additional path that can be sorted by clicking on the header of the flow column.

Method #2 and Method #3 are similar in that they give peak annual values. Method #2 gave peak instantaneous values while Method #3 gave peak daily values. Method #1 gave peak daily values that were not limited to one value per year.

Method #4: Python Script to Extract Largest Daily Flow Values

Since the USGS website and HEC-DSSVue are set up for this type of data, I would prefer to use those sources or to use Excel for data sorting if I needed multiple values within a given year. However, it is possible to read text files using Python and to print a desired result. In this example, I use Python to print out values that exceed 100,000 cfs. It should be noted that I shortened the data set significantly for testing. The data set is shown below.

In the Python code below, I import the module named csv. I use that to read each line in the text file as a list by indicating that the items are tab delimited. I then select the flow column, which has an index of 3 since the first item in a Python list has the index of zero, and test if the value is above 100,000 cfs. If it is, then print the date and flow value are printed.

The result of this code is shown below.

The ResSim Blog

Search This Blog

Comparison of Time-Series Data Analysis

Comments

Post a Comment