In the field of water resources engineering, it is often necessary to perform data analysis on time-series data. For example, a user of this data may want to summarize average monthly flow data to examine the month or season when flows are the highest along with determining the difference between the average monthly high flow and the average monthly low flow. This can be done using a variety of methods. Below, methods for doing this directly in Microsoft Excel and for using Visual Basic in Microsoft Excel are shown.
For this example, daily average USGS flow data for the White
River at Newport, Arkansas is used. The
data ranged from 1927 to 2020. A small
portion of the data is shown below. The
date is given in column A while the flow value is given in column B. This data set has close to 34,000 values.
=IF(MONTH(A2)=7,B2,"")
This “IF” statement indicates that if the month in column A
has a numeric value of 7 for July, then column C is populated with the flow
value found in column B for that row. If
it is not the month of July, then that row in column C is to be empty. After doing that, the minimum, maximum, and
average values in column C are computed.
avg |
15677.26 |
min |
0 |
max |
92600 |
To solve this issue, an adjustment to the “IF” statement is
required. An additional check is added
to the “IF” to check whether the flow value is blank. This is shown below. Now, the two conditions of the month being
July and the flow value not being empty must be satisfied to populate the
column. Doing this removes the zero
values which were adding nothing to the running sum of the flows, but they were
increasing the number of values. This
had the effect of reducing the true average.
=IF(AND(MONTH(A2)=7,NOT(B2="")),B2,"")
Using this updated “IF” statement and re-computing the
average, minimum, and maximum values gives the following:
avg |
16758.45 |
min |
4180 |
max |
92600 |
As expected, the average value increases since the zero
values are no longer counted and the minimum value is no longer zero.
The computation of this in Excel is relatively
straightforward, but Visual Basic within Excel can also be used. The first code that is shown reproduces the
example where missing values still add to the count. In this code, the variable “Monthnum” is set
as an integer. A running sum and a count
of July values is used to compute the average.
The code begins in cell A1, which is a header cell. The code obtains the date value and the flow
value for each time step by using the offset command to move through the
values. The “Do” loop is used to
continue the operation until it runs out of date values. It should be noted that if there are blank
values in the date column, this would need to be cleaned up for the code to
work as intended.
Once this code is simulated, the average value for July is
output to cell “F11”. The computed value
for this code is 15677.26, which matches the average found using the first “IF”
statement shown earlier. The answers are
the same since this code suffers from the same issues as the earlier “IF”
statement.
The solution is to once again alter the IF statement to
account for empty values. The rest of
the code remains the same. The updated
code is shown below.
Adding the “IsEmpty”
check solves the issue and gives the correct average of 16758.45.
This analysis demonstrates straightforward methods for
determining monthly average values using Excel and using Excel with Visual
Basic. It also shows the importance of
examining data prior to an analysis along with the importance of reviewing
results provided by the chosen method.
This Blog is very useful and informative.
ReplyDeletedata scientist course in delhi
Good information you shared. keep posting.
ReplyDeletebest data science course in delhi
This comment has been removed by the author.
ReplyDelete