Stephen Lightfoote

60 minute read

This is also a follow up to the AWEA Wind Resource Working Group’s webinar on the open-source ecosystem and an expansion of my original blog post: [https://renewable-analytics.netlify.com/2018/05/30/example-wind-resource-assessment-using-r/].

I’ve also submitted this to [https://www.kaggle.com/srlightfoote/example-wind-analysis-in-r].

The goal here is to illustrate how aspects of a typical wind resource assessment from meteorological data can be accomplished using open source tools and serve as a introduction for analyst, in this case using R. Using representative data, we’ll walk through some of the typical steps taken to import, visualize, and analize meteorological data with the goal of estimating the long-term wind regime at a given hub height. Hopefully this illustrates how an analyst could use a notebook to go from quality controlled data to .tab files, the typical intput to wind flow models, in a single environment. With this environment you get all the benefits of an open-source scripting language such as auditability, transparency, scalability, repeatability, and disbributability with very little of the typical overhead needed to make sure an analysis can be transfered from one person or organization to another.

Load Libraries

First, load the R libraries used to perform a typical wind analysis.

library(tidyverse)
library(lubridate)
library(clifro)
library(MASS)
library(stringr)
library(zoo)
library(viridis)
library(broom)
library(plotly)

Read in Data

Read in the mast data and the reference data.

mast_data = read_csv('../../data/example_wind_data/demo_mast.csv',col_types=cols())
ref_data = read_csv('../../data/example_wind_data/demo_refs.csv',col_types=cols())

Site Data Exploratory Analysis

Now to the meat of the Kernel, which is to illustrate how to analyze and visualize typical steps within a wind resource assessment where the end goal is to characterise the long-term wind regime at a given hub height. For our purposes we’ll say 120 m.

Data Structure

Measured 10-minute data from multiple heights at a site met mast.

head(mast_data)

Along with average daily wind speeds from ten nearby reference stations.

head(ref_data)

These data have been cleaned and reconstructed so the recovery rate is very good for each sensor.

round(colMeans(!is.na(mast_data))*100,2)
##           Stamp      DIR_80_AVG DIR_95_COMB_AVG      DIR_95_AVG 
##          100.00           97.89           97.89           97.89 
## SPD_15_COMB_AVG SPD_32_COMB_AVG SPD_47_COMB_AVG SPD_59_COMB_AVG 
##           96.66           96.75           98.22           98.24 
##         T_4_AVG        T_58_AVG 
##          100.00          100.00

Metadata

We’ll supply some sample metadata about the mast and infer the metadata about the channels from the headers

primary_ano='SPD_59_COMB_AVG'
primary_vane='DIR_95_COMB_AVG'
metadata=list()
metadata$mast=data.frame(name='mast', lat=45, lon=-90, height=60, elev=500, primary_ano='SPD_59_COMB_AVG', primary_vane='DIR_95_COMB_AVG',stringsAsFactors=F)
metadata$channels=data.frame(str_split(names(mast_data),'_',simplify=T),stringsAsFactors = F)[,1:3]
names(metadata$channels)=c('type','height','signal')
metadata$channels<-mutate(metadata$channels,
                          height=as.numeric(metadata$channels$height),
                          name=names(mast_data),
                         isprimA= name ==primary_ano,
                         isprimV=name==primary_vane)
metadata
## $mast
##   name lat lon height elev     primary_ano    primary_vane
## 1 mast  45 -90     60  500 SPD_59_COMB_AVG DIR_95_COMB_AVG
## 
## $channels
##     type height signal            name isprimA isprimV
## 1  Stamp     NA                  Stamp   FALSE   FALSE
## 2    DIR     80    AVG      DIR_80_AVG   FALSE   FALSE
## 3    DIR     95   COMB DIR_95_COMB_AVG   FALSE    TRUE
## 4    DIR     95    AVG      DIR_95_AVG   FALSE   FALSE
## 5    SPD     15   COMB SPD_15_COMB_AVG   FALSE   FALSE
## 6    SPD     32   COMB SPD_32_COMB_AVG   FALSE   FALSE
## 7    SPD     47   COMB SPD_47_COMB_AVG   FALSE   FALSE
## 8    SPD     59   COMB SPD_59_COMB_AVG    TRUE   FALSE
## 9      T      4    AVG         T_4_AVG   FALSE   FALSE
## 10     T     58    AVG        T_58_AVG   FALSE   FALSE

Now, our first analysis question is: what does our wind and energy rose look like at this site? We’ll use the clifro package to illustrate the wind rose and energy roses. We’ll also bin the data in tabular format.

prim_df<-mast_data[,c(primary_ano,primary_vane,'T_58_AVG')]
names(prim_df)<-c('primary_ano','primary_vane','temperature')
mast_data<-bind_cols(mast_data,prim_df)
rm(prim_df)

mast_data<-mutate(mast_data,
                dir_bin=cut(primary_vane,seq(0,360,22.5)),
                density= (1013.25*100)/(287*(temperature+273.15))*exp((-9.81*metadata$mast$elev*(1.0397-0.000025*metadata$mast$elev))/(287*(temperature+273.15))),
                wpd=primary_ano^3*.5*density)

mast_data %>%
filter(!is.na(dir_bin)) %>%
group_by(dir_bin) %>%
summarise(counts=sum(!is.na(primary_ano)),
         mean_spd=mean(primary_ano,na.rm=T),
          mean_energy=mean(wpd,na.rm=T))
windrose(speed = mast_data$primary_ano,
                 direction =mast_data$primary_vane,
                 speed_cuts = seq(0,25,5),
                 ggtheme='minimal')

Next, what is the mean of monthly means (momm) wind speed at the top measurement height?

monthly_means<-mast_data %>%
mutate(month=month(Stamp),
      daysInMonth=days_in_month(Stamp)) %>%
group_by(month) %>%
summarise(mean_ws=mean(primary_ano,na.rm=T),
         daysInMonth=mean(daysInMonth))

momm<-weighted.mean(monthly_means$mean_ws,monthly_means$daysInMonth)
print(paste('MoMM at measurement height: ',round(momm,3),' m/s',sep=''))
## [1] "MoMM at measurement height: 7.655 m/s"

Shear analysis

Let’s calculate the shear between heights. First, create a list of comparisons.

heights<-filter(metadata$channels,type=='SPD') %>% pull(name)

shear.pairs<-expand.grid(data.frame(height1=heights,height2=heights,stringsAsFactors=F),stringsAsFactors = F) %>%
filter(height2!=height1)
shear.pairs

now loop through the comparison sets to get the shear values

shear<-list()
shear.df<-list()
for(i in 1:nrow(shear.pairs)){
  df<-mast_data[,c('Stamp',shear.pairs$height2[i],shear.pairs$height1[i])]
  magl<-as.numeric(str_split(shear.pairs[i,],'_',simplify=T)[,2])
  df$alpha<-as.vector(log(df[,3]/df[,2])/log(magl[1]/magl[2]))[,1]
  shear[[i]]<-data.frame(top=shear.pairs$height1[i],
                         bottom=shear.pairs$height2[i],
                         top_height=magl[1],
                         bottom_height=magl[2],
                         mean_shear=round(mean(df$alpha,na.rm=T),3),
                         stringsAsFactors=F)
}
shear<-bind_rows(shear)
shear<-mutate(shear,mean_height=(top_height+bottom_height)/2)

ggplotly(
  ggplot(shear,aes(x=mean_shear,y=mean_height))+
    geom_point()+
    geom_errorbar(aes(ymin = bottom_height,ymax = top_height))+
    scale_x_continuous(limits=c(.1,.3))+
    scale_y_continuous(limits=c(0,60))+
    theme_minimal()
)

For this site, we have two possible options to chose the representative annual shear. One is to take the average of all the sensor combinations. The other is to take the top sensor combination because it appears there is a risk of the shear relaxing with height and since we’re extrapolating up to 120 m our final results will depend on our shear analysis. Maybe even better yet, we could do both and explore the sensetivity between the two:

shear_mean<-mean(shear$mean_shear)
shear_top<-shear$mean_shear[nrow(shear)]
print(paste('Mean shear: ',round(shear_mean,3),' and top shear: ',shear_top,sep=''))
## [1] "Mean shear: 0.203 and top shear: 0.166"

Reference Data Exploratory Analysis

Now we look at some of the data characteristics of our reference station data in order to ensure they are valid options for MCP. First, are the annual profiles similar? And then, are there any concerning trends in the data that would indicate inconsistencies at the reference stations? We can check this by normalizing the monthly mean wind speeds by the long-term mean for each month.

ref_means<-ref_data %>%
  mutate(month=month(`Date/Time`)) %>%
  dplyr::select(-`Date/Time`) %>%
  group_by(month) %>%
  summarise_all('mean',na.rm=T) %>%  
  gather(ref,mean_ws,-month)

ggplot(ref_means,aes(factor(month),mean_ws,color=ref,group=ref))+
  geom_line()+
theme_minimal()+
scale_color_viridis(discrete=T)

normalized_monthlies<-ref_data %>%
  gather(ref,ws,-`Date/Time`) %>%
  mutate(yrmo=as.Date(cut(`Date/Time`,'month'))) %>%
    group_by(ref,yrmo) %>%
    summarise(ws=mean(ws,na.rm=T)) %>%
  mutate(month=month(yrmo)) %>%
  left_join(ref_means,c('ref','month')) %>%
  mutate(normalized_ws=ws/mean_ws) %>%
  ungroup()

normalized_rolling<-normalized_monthlies %>%
  group_by(ref) %>%
  mutate(rolling_mean=rollmean(normalized_ws,12,fill = NA,align='left'))

ggplotly(
  ggplot(normalized_rolling,aes(yrmo,rolling_mean,color=ref))+
    geom_line()+
    theme_minimal()+
    scale_y_continuous(labels=scales::percent)+
    scale_color_viridis(discrete=T)
)

Again, we can use the legend to turn on and off the normalized monthly rolling average traces. This is a highly subjective test but it would appear reference stations 9 and 10 display inconsistent behavior. If you turn them off you can see the monthly anomalies become much tighter.

Long-term analysis

MCP

We are now ready to correlate our site to our reference stations and estimate our long-term mean wind speed at our measurement height. We’ll then apply our alpha value(s) to our long-term mean wind speed to get our long-term hub-height mean wind speed at our mast locations. We’re making good progress.

There are many correlation methods available, for simplicity we’ll perform a daily correlation binned by month between our site and each reference station. First we’ll resample our mast data to daily. We’ll then concatnate our site and reference data and calculate linear correlations.

ref_data_daily<-ref_data %>%
  gather(ref,ws,-`Date/Time`) %>%
mutate(day=as.Date(cut(`Date/Time`,'day'))) %>%
group_by(ref,day) %>%
summarise(ref_ws=mean(ws,na.rm=T))

mast_data_daily<-mast_data %>%
mutate(day=as.Date(cut(`Stamp`,'day'))) %>%
group_by(day) %>%
summarise(site_ws=mean(primary_ano,na.rm=T))

joined_daily<-inner_join(ref_data_daily,mast_data_daily,'day') %>%
mutate(month=month(day))

ggplot(joined_daily,aes(site_ws,ref_ws,color=month,group=ref))+
geom_point(size=.6)+
geom_smooth(method='lm',color='red')+
facet_wrap(~ref,scales='free')+
theme_minimal()+
scale_color_viridis()

Now we can cycle through our reference stations to get the correlation results while leaving out Referene Stations 9 and 10 because of the suspected inconsistency discussed above. Here is an example of a daily correlation binned by month between our site and Reference Station 1

monthly_fits<-joined_daily %>%
filter(!(ref %in% c('9','10'))) %>%
group_by(ref,month) %>%
do(model = lm(site_ws ~ ref_ws, data = .))
fit_stats<-glance(monthly_fits,model)
head(fit_stats,12)

Synthesize Data

Now that we have the measure and correlate steps complete, we need to apply our correaltions to our reference data to predict our site wind speeds. We’ll do this by applying the monthly correlations to our data and then splicing with our measured data so that for any given day we take the measured data over the synthesized data. Alternatively, we could apply our correlations to the monthly mean reference data but this would de-prioritize measure site data and isn’t recommended.

# include month index
ref_data_daily<-mutate(ref_data_daily,month=month(day))

# since this is a simple linear model with two coefficients, we can just map those coefficients to the data rather than using a 'predict()' function
fit_coef<-tidy(monthly_fits,model) %>% 
  dplyr::select(ref,month,term,estimate) %>% 
  spread(term,estimate) %>%
  rename(slope=ref_ws)

# join the model coefficients to the reference station wind speed and calculate a synthesized wind speed
ref_data_daily<-ref_data_daily %>%
  filter(!(ref %in% c('9','10'))) %>%
           left_join(fit_coef,c('ref','month')) %>%
  mutate(synth_ws=ref_ws*slope+`(Intercept)`)

There are a bunch of ways we could combine these predictions but for the sake of simplicity and for demonstration purposes we’ll just take the simple mean of all the predictions.

long_term_predictions<-ref_data_daily %>%
  group_by(day) %>%
  summarise(synth_ws=mean(synth_ws,na.rm=T)) %>%
  filter(!(day %in% mast_data_daily$day))

mast_data_daily<-mutate(mast_data_daily,synth_ws=site_ws)
long_term_predictions<-bind_rows(long_term_predictions,dplyr::select(mast_data_daily,day,synth_ws))

annualized = long_term_predictions %>%
  mutate(month=month(day),
         daysInMonth=days_in_month(day)) %>%
  group_by(month) %>%
  summarise(daysInMonth=mean(daysInMonth),
            synth_ws=mean(synth_ws,na.rm=T))
lt_mws<-weighted.mean(annualized$synth_ws,annualized$daysInMonth)

print(paste('Long-term mean wind speed at 59 m: ',round(lt_mws,3),' m/s',sep=''))
## [1] "Long-term mean wind speed at 59 m: 7.844 m/s"

Shear to Hub Height

Finally, if we apply our alpha values to our long-term mean wind speed at measurement height we calculate our long-term hub-height mean wind speed at the mast location.

hub_height = 120 #m
meas_height = 59 #m
lt_hh_mws_top_alpha = lt_mws * (hub_height/meas_height)^shear_top
lt_hh_mws_mean_alpha = lt_mws * (hub_height/meas_height)^shear_mean
print(paste('Long-term hub-height mean wind speed using top sensor combination for shear: ',round(lt_hh_mws_top_alpha,3),' m/s',sep=''))
## [1] "Long-term hub-height mean wind speed using top sensor combination for shear: 8.825 m/s"
print(paste('Long-term hub-height mean wind speed using mean of all sensor combinations for shear: ',round(lt_hh_mws_mean_alpha,3),' m/s',sep=''))
## [1] "Long-term hub-height mean wind speed using mean of all sensor combinations for shear: 9.061 m/s"

Now we have our long-term hub height mean wind speeds. There seems to be about a 2.6% difference in our wind speeds depending on the shear methodology we choose. Not to mention these are different than Cory’s results using mostly the same methods. This is helpful information to know and can be passed along to the finaincal modeling team as a stress case or potential upside. This can also help inform the uncertainty analysis which isn’t covered in this notebook

Frequency distribution analysis

Finally, we’ve reached the last step in our typical assessment where we need to define the shape of our wind speed distribution. We’ll then scale this to our long-term hub-height mean wind speed and produce a .tab file as our input to our wind flow model. We again have multiple methods we could employ. For demonstration purposes we’ll take a simple approach of applying the top sensor combination mean shear across the board to the top level sensor data to predict the wind speeds at hub height for a given year of data (2015). We’ll then apply a linear scaling to the long-term mean and calculate the weibull parameters from this data. It’s simplistic to be sure, but not terrible for an example.

# apply shear
weibull_data<-mast_data %>%
  filter(year(Stamp)==2015) %>%
  mutate(hh_spd=primary_ano* (hub_height/meas_height)^shear_top)

# calculate mean of monthly means
momm_weibull<-weibull_data %>%
  mutate(month=month(Stamp),daysInMonth=days_in_month(Stamp)) %>%
  group_by(month) %>%
  summarise(hh_spd=mean(hh_spd,na.rm=T),
            daysInMonth=mean(daysInMonth))
momm_weibull<-weighted.mean(momm_weibull$hh_spd,momm_weibull$daysInMonth)

# apply scaling
weibull_data<-mutate(weibull_data,lt_hh_ws=hh_spd*(lt_hh_mws_top_alpha/momm_weibull))

Here’s the resulting wind distribution

weibull_fit<-fitdistr(weibull_data$lt_hh_ws[!is.na(weibull_data$lt_hh_ws)],'weibull')
x<-seq(0,20,.01)
weibull_density<-tibble(x,y=dweibull(x = x,shape = weibull_fit$estimate[1],scale = weibull_fit$estimate[2]))
ggplot(weibull_data,aes(lt_hh_ws))+
  geom_histogram(aes(y=..density..),bins=30,color='white')+
  geom_line(data=weibull_density,aes(x=x,y=y),color='red')+
  theme_minimal()+
  labs(title='Long-term hub-height wind speed frequency distribution',
       x='Wind speed [m/s]',
       y='Frequency')

Alright, finally, we have our desired output. Namely, a long-term hub_height wind speed frequency distribution scaled to our hub-height mean wind speed. Great. We can now output this as a .tab file and use it to initialize our wind flow model. Note, it’s been like 7 years since I’ve created a tab file so this is probably close to the write format, but see Cory’s post for more details.

tab<-weibull_data %>%
  mutate(spd_bin=round(lt_hh_ws)) %>%
  filter(!is.na(spd_bin) & !is.na(dir_bin)) %>%
  group_by(spd_bin,dir_bin) %>%
  tally() %>%
  spread(dir_bin,n,fill = 0)
tab

Hope you found this useful.

comments powered by Disqus