Climate Analysis Help Page (Last Update: 05/05)

Here is a link Climate Analysis project outline: Climate Analysis
New Assessment Benchmarks:

Functions or Tasks	Points

10 points per function (maximum of 8)	80

5 points per Task (2 - 11) (maximum of 4)	20

8 functions and 4 Tasks (2 - 11) Completed by 05/07	10

5 points per additional task (maximum of 2) by 05/18	10

\(\boxdot\) Hand-In Instructions:

I will place a Project Submission item on the Katie Course page. Please upload your single Python file (having the suffix .py) that includes all completed functions and/or Tasks by 4:00 PM Friday, May 18.
Your file must be named using your Norse user name. For example, my Norse user name is bernatzr, so my submission file would be named proj_bernatzr.py This naming convention applies to all files, even those who previously submitted work by the early hand-in date.
If you submitted a file for extra credit by May 8:
- If you have completed additional tasks since your early hand-in of May 8, please submit your current, single file that includes all your completed work (including work you completed by the early-submission date).
- DO NOT submit another file if you have not completed any additional work since you submitted a file by the early submission date.
Your main() function must look like the one shown below. Have a line that invokes a task that you have completed. Comment out all but the line that invokes the first task.
```
def main():
    task_03()  
    # task_04()
    # task_05()
    # task_06()
    # task_07()
    # task_08()
```

\(\boxdot\) NumPy Help

The numpy module is used in this project. It may need to be installed depending on the version of the Python interpreter your PyCharm IDE is using.
Here is a link to a Numpy reference document: NumPy Reference
Here is some basic NumPy information:
- Creating an array of all float zeros (change ‘float’ to ‘int’ to create an array of int zeros)
```
import numpy as np
n_rows = 3
n_cols = 4
new_array = np.zeros((n_rows, n_cols), dtype=float)
print(new_array)
```
- Some basic algebra of NumPy arrays. Adding two NumPy arrays of the same dimension (using the + operator) results in a new NumPy array with corresponding individual entries in the two arrays summed and placed in the new array at the corresponding row and column location. Using the multiplication operator * with a scalar (either a float or int object) and a NumPy array results in a NumPy array where each entry of the original array is multiplied by the scalar.
```
a1 = np.array([[1, 2, 3], [4, 5, 6]])  
print(a1)
a2 = np.array([[7, 8, 9], [1, 2, 3]])
print(a2)
s = a1 + a2
print('The sum of a1 and a2 is:')
print(s)
print('The scalar 2 times array a1 is:')
print(2*a1)  
```
- Index and slice operators. The slice operator : provides a means of slicing indiviual rows or columns of data from a NumPy array. Note the result is a list object as shown with the brackets [ ] in the print result.
```
a1 = np.array([[1, 2, 3], [4, 5, 6]]) 
print(a1[0,0], a1[0, 1], a1[1, 1], a1[-1, -1])
print('Print the first row of a1:')
print(a1[0, :])
print('Print the last column of a1:')
print(a1[:, -1])  
```
- The use of slicing in this project. The primitive data type in the project is a NumPy array with 126 rows (one for each year) and 365 columns (one for each day - leap years are treated as non-leap years). The first row (index = 0) of the array represent all days of year 1893. The second row (index = 2) gives the data for 1894, and so on.
  - If you want to slice the first row the create a list of data of maximum daily temperature for the year 1893, then use
```
data_1893 = max_t[0, :]  
```
  - If you want the maximum temperature for January 31 for all years, then use
```
max_Jan31 = max_t[:, 30]  
```
  - If you want the maximum temperature for all days of February for all years, then use
```
max_Feb = max_t[:, 31:59]
```
- Flattening and reshaping a NumPy array. This information will be useful for the n_day_average() function.
```
a = np.array([[1, 2, 3], [4, 5, 6]])
a_flat = a.flatten()
print(a_flat)
b = a_flat.reshape(3,2)
print(b)
```

\(\boxdot\) Task Help

Write a function for tasks 3 - 11 you choose to complete. Name the function using the task number. For example, the function for Task 3 would start with the line def task_03(max_t, min_t).

def task_03():
    max_t = read_data('max_temp')
    min_t = read_data('min_temp')
    dly_ave = daily_ave(max_t, min_t)  
    ann_ave = annual_ave(dly_ave)  
    rnk_ave = rank_list(ann_ave) 
    wrt_file_t03(ann_ave, rnk_ave)

Your main() function should look like

def main():  
    task_03()  
    task_04()    
    .  
    .  
    .

Functions
1. clc_ave(list_in) - The possibility of missing data (-99) most likely means you will have to traverse the incoming list to avoid incorporating the -99 values in determining the average. However, if you have implemented a scheme for replacing missing data so that no -99 terms are in the list_in object, you may want to consider using a NumPy function for calculating the mean. Refer to the NumPy refrence link above, and go to the Section 3.28 Statistics.
  
  NOTE: The is at least one date for which the maximum temperature is NOT missing, but the minimum temperature IS missing, so this will complicate using the algebraic and/or statistical features of NumPy.
2. julian_2_mn_day(j) - We worked on this function for Lab Activity 6 of Chapter 7. A brute force way to accomplish this task is to use a chained selection process.
3. daily_ave(t_max, t_min) - The parameters t_max and t_min are NumPy arrays with dimension (126, 365). The return is a similar object of the same dimension. You can use t_ave = (t_max + t_min) * 0.5 provided you have some means of dealing with the missing data days.
4. annual_ave_t(a_in) - Take advantage of the slice operator : to carve out a row (year) of data and use it as a parameter for the clc_ave(list_in) function. Create an empty list for the annual average temperatures and append the return of the clc_ave(list_in) function.
5. monthly_ave_t(a_in, m_no, yr_s, yr_e) - Use the slice operator : to carve out a month of data for a given year from the a_in array that has dimensions (126, 365). For example, January data for 1893 is given by a_in[0, :31], and February of 1894 is given by a_in[1, 31:59], etc. Initialize an empty list, and append the return of the clc_ave(list_in) function.
6. rank_list(l_in, yr_b) - Use a bubble sort technique on the l_in items to rank them in decreasing order. Before starting the ranking process, create a list of two-element sub-list using the contents of l_in and the corresponding year of the l_in item. For example, suppose l_in = [40.6, 38.5, 43.9, …]. Create a list like s_list = [[40.6, 1893], [38.5, 1894], [43.9, 1895], …]. The bubble sort routine will compare the first items of the sub-list and switch the sub-lists if these first items indicate they are out of order.
7. plot_temp_vs_day(t_list) - The function will use the turtle module to generate a plot of temperatures (or average temperatures) versus the Julian day. The majority of the code for this function can be copy-and-pasted from the course web page for Chapter 5, Activity 1. You will need to make some modifications, esepcially to the last portion that plots the points.
  
  The t_list parameter has one or more temperature lists. That is, t_list = [t_year, his_ave]. The t_year list is 365 days long, and is sliced out of the result of the daily_ave(t_max, t_min) function. That is, suppose t_ave = daily_ave(t_max, t_min). Then t_year = t_2012 = t_ave[-7,:], all columns (days) of the second to last row (2012) of t_ave. The other suggested list for the plotting routine is the historic daily average temperatures, and it is meant to represent the average of the individual columns of the t_ave array. For sake of clarity, a better name would be his_ave, and it would be generated using the following code:
```
 t_ave_l = []
 for c in range(365):  
    his_ave.append(list_ave(t_ave[:, c]))
```
8. n_day_average(a_in, n) - I suggest you take advantage of the flatten() and reshape() methods of NumPy. Please refer to the corresponding item in the basic NumPy information section above.
  
  Here is an example of what this function will do. Suppose you have a 2 row, 3 column array of data that is referenced by a_in = [[1, 2, 3], [4, 5, 6]]. Then, a_flat = a_in.flatten() = [1, 2, 3, 4, 5, 6]. Suppose n = 2, so that we want to create averages for each two-consecutive pairs in a_flat. Create a equal-length np array to hold the averages using
  
  b_flat = np.zeros(len(a_flat), dtype = float) = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
  
  The first non-zero term in b_flat will be stored in index location 1 with value \((1 + 2)*0.5 = 1.5\). The next term in b_flat is \((2+3)*0.5 = 2.2\), and so on. The final form of b_flat is
  
  b_flat = [0.0, 1.5, 2.5, 3.5, 4.5, 5.5]
  
  Before returning the result, b_flat must be reshaped to match the dimensions of the input array.
  
  n_ave = b_flat.reshape(2,3)
  
  If a_in is the same as above, but n = 3, the result for b_flat is
  
  b_flat = [0.0, 0.0, 2.0, 3.0, 4.0, 5.0]
9. n_day_precip_total(a_in, n) -
Missing Data Scheme - I suggest you begin by making calculations by reporting data missing and not developing a scheme to replace missing data. Then, your results for Tasks 3 and 4 should match the results you see on the Decorah Weather Page.
Annual Average Temperature - Use function read_data(c_var) to read the maximum and minimum temperature arrays. Use function daily_ave(t_max, t_min) to create a t_ave array of 126 rows (years) and 365 columns (days). The result will be used by annual_ave_t(a_in) that will return a list of length 126. Each element of the list will represent the annual average temperature (-99.0 will represent data missing for one or more days of the year). This return will be used by rank_list(l_in, yr_b) that will return a list ranked from greatest to least.

You can check your results by going to the Decorah Weather Page and clicking on the Annual Analysis link under the Annual Heading in the far left pane. Then, click on the Annual Average Table option on the left side of the Annual Analysis page.

The function shown below will be used in Task 3 to write your results to a file. Please copy and paste this function into your code.
```
def wrt_file_t03(l1, l2):
    o_f = open('ann_ave_temp_his.txt', 'w')
    o_f.write('{0:^16s}|{1:^19s}\n'.format('Chronilogical', 'Ranked'))
    o_f.write('{0:^16s}|{1:^19s}\n'.format('-' * 16, '-' * 19))
    o_f.write('{0:^7s}{1:^9s}|{2:^6s}{3:^6s}{4:^7s}\n'.format('Year', 'Ave', 'No.', 'Year', 'Ave'))
    o_f.write('{0:^16s}|{1:^19s}\n'.format('-' * 16, '-' * 19))
    for i in range(len(l1)):
        o_f.write('{0:^7d}{1:^9.2f}|{2:^6d}{3:^6d}{4:^7.2f}\n'.format(i+1893,l1[i],i+1,l2[i][1]+1893,l2[i][0]))  
    o_f.close  
```

The Decorah Weather Page can be used to check your results for a given month. Follow the link for a given month under the Monthly Data section in the left pane. On the Monthly Climate History page that will open, click on the Average Monthly Temperature History link.

The function shown below will be used in Task 4 to write your results to a file. Please copy and paste this function into your code.

def wrt_file_t04(l1, l2, mn):
    o_f = open('m'+str(mn)+'_ave_temp_his.txt', 'w')
    n_list = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
    o_f.write('{0:^36s}\n'.format(n_list[mn-1]+' Temperature History'))
    o_f.write('{0:^16s}|{1:^19s}\n'.format('Chronilogical', 'Ranked'))
    o_f.write('{0:^16s}|{1:^19s}\n'.format('-' * 16, '-' * 19))
    o_f.write('{0:^7s}{1:^9s}|{2:^6s}{3:^6s}{4:^7s}\n'.format('Year', 'Ave', 'No.', 'Year', 'Ave'))
    o_f.write('{0:^16s}|{1:^19s}\n'.format('-' * 16, '-' * 19))
    for i in range(len(l1)):
        o_f.write('{0:^7d}{1:^9.2f}|{2:^6d}{3:^6d}{4:^7.2f}\n'.format(i+1893,l1[i],i+1,l2[i][1]+1893,l2[i][0]))
    o_f.close()

This task uses the columns (days) of the t_max and t_min arrays to determine the average value over all years (rows). An issue with this task is how to handle the case of missing data. If data for a given day (say January 10, for example) is missing for a few or more years, the average calculated with the non-missing days should be a good approximation. Therefore, missing data in a column should not result in a -99 for the day. The t_l = np.delete(t_max[:, i], np.where(t_max[:, i] == -99)) method is a useful way to remove the -99 values from a list. Construct a 365-day list for the average maximum (li) and average minimum temperature (l2). Then, use them and the month number as parameters to the wrt_file_t05(l1,l2,mn) function that you copy and paste into your code.
```
def wrt_file_t05(l1, l2, mn):
    o_f = open('m' + str(mn) + '_daily_ave_temps.txt', 'w')
    n_list = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    d_s =[0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334]
    m_l = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
    o_f.write('{0:^28s}\n'.format('Ave Max & Min Temps for '+n_list[mn-1]))
    o_f.write('{0:^28s}\n'.format('-'*28))
    o_f.write('{0:^6s}|{1:^10s}|{2:^10s}\n'.format('Day','Ave Max','Ave Min'))
    o_f.write('{0:^6s}|{1:^10s}|{2:^10s}\n'.format('-'*6,'-'*10,'-'*10))
    for i in range(m_l[mn-1]):
        o_f.write('{0:^6d}|{1:^10.2f}|{2:^10.2f}\n'.format(i+1,l1[i+d_s[mn-1]],l2[i+d_s[mn-1]]))
```
Here is what the description should say: Use the plot_temp_vs_day(t_list) function to create a plot of the daily average temperatures for the year 2012 and the historic daily average temperatures versus the day. Set your window coordinates to xmin = 0, xmax = 365, ymin = -40 and ymax = 120. Use an x tick space of 7 (this represents a spacing of 7 days) and a y tick space of 10 (represents a 10 degree space).

The historic averages are determined using code such as
```
max_t = read_data('max_temp')
min_t = read_data('min_temp')
t_ave = daily_ave(max_t, min_t)  
his_ave = []
for i in range(len(t_ave[0,:])):
    his_ave.append(clc_ave(t_ave[:, i]))  
```
This task will use the n_day_ave() function to calculate the average temperature for an n-day interval. The return of this function is an array of 126 rows and 365 columns.
- Eventually, lists (call then something like top and bot) will be ranked to determine the top and bottom ten values. Use a nested loop on rows and columns to search the file for values greater than a threshold value for warmest, and greater than -99 and less than a threshold for the coldest. Suppose the name of the returned array is ave_03 for the 3-day averages. Then
```
ave_03 = n_day_ave(t_ave, 3)
nrows = len(ave_03[:, 0])  
ncols = len(ave_03[0, :]) 
top = []
bot = []
for r in range(nrows):  
    for c in range(ncols):  
       if ave_03[r, c] > 90.0:  
           top.append([ave_03[r,c], r, c]) 
       if -99.0 < ave_03[r, c] < -10.0:  
           bot.append([ave_03[r,c], r, c])  
```
- The row and column values need to be saved in the lists in order to determine the corresponding year and Julian day. Next, use a bubble sort technique on these lists to determine the top ten warmest (from the top list) and bottom ten coldest (from the bot list). The top list should be sorted in decreasing order (warmest to coldest), and the bot list in increasing order (coldest to warmest). For example, the code for the top list is shown below. The code for the bot list would be identical except for the < sign would be a > sign.
```
switched = True  
while switched: 
    switched = False
    for i in range(len(top) - 1):  
        if top[i][0] < top[i+1][0]:  
            top[i], top[i+1] = top[i+1], top[i]  
            switched = True  
```
- The first ten items in the sorted lists can be sliced out and printed to give the desired information using the function julian_2_mn_dy(j) function. Here is an example of how the code might look:
```
for item in top[:10]:
    print('{0:3.2f} {1:4d} {2:3d} {3:2d}'.format(item[0], item[1]+1893, julian_2_mn_dy(1+item[2])[0], julian_2_mn_dy(1+item[2])[1]))
for item in bot[:10]:
    print('{0:3.2f} {1:4d} {2:3d} {3:2d}'.format(item[0], item[1]+1893, julian_2_mn_dy(1+item[2])[0], julian_2_mn_dy(1+item[2])[1]))  
```
Follow the steps of Task 7.