Stephen Lightfoote

7 minute read

Introduction

The purpose of this post is to illustrate an example method for assessing changes in performance for wind turbines based on the installation of performance modifications. In this case we’re looking at Vortex Generators (VG) which are typically after market hardware installed along wind turbine blades and are purported to improve blade performance by reducing flow separation, thereby improving lift, and increasing power efficiency.

In this particular instance VG’s were installed on four wind turbines (target turbines) and no modifications were made to neighboring turbines (source turbines). To assess the impact in performance of the VG’s on target turbines which are not located downwind of an IEC-compliant permanent met mast, a useful method is to develop a model which uses the source turbine performance to predict the target turbine performance. The model is trained based on the historical relationship between the target and source turbines (training period) and then the resulting performance change due to the VG’s is estimated as the % change in performance between the actual and predicted production of the target turbine for a period after installation (test period).

The method and results are depicted in the example below using R.

Test Setup

library(tidyverse)
library(mgcv)
library(lubridate)

# test parameters
targets<-c(80,86,126,128)
sources<-c(81,87,127,127)
start_before_period <- as.Date('2013-10-01')
end_before_period <- as.Date('2014-10-01')
start_after_period <- as.Date('2014-11-01')
end_after_period <- as.Date('2015-11-01')

# assign target / source
trainingData<-mutate(trainingData,type=case_when(wtg %in% targets ~ 'target',wtg %in% sources ~'source'))
testData<-mutate(testData,type=case_when(wtg %in% targets ~ 'target',wtg %in% sources ~'source'))

Train the GAM Model

In this instance we used a Generalized Additive Model (GAM) from the mgcv package where the inputs to predict the target turbine production are the source turbine production, and the nacelle positions of the target and source turbines (to account for the impact of wake effect on the available resource of either the target and/or source turbine).

models<-list()
train<-list()
for(i in 1:length(targets)){
  # format the training Data
  training.kw<-trainingData %>% 
    filter(wtg %in% c(targets[i],sources[i])) %>%
    dplyr::select(type,SCADA_TStamp,kW_Mean) %>% 
    distinct(SCADA_TStamp,type,.keep_all = TRUE) %>% 
    spread(type,kW_Mean)
  names(training.kw)<-c('SCADA_TStamp',paste(names(training.kw[,2:ncol(training.kw)]),'.kW_Mean',sep=''))
  
  training.nacpos<-trainingData %>% 
    filter(wtg %in% c(targets[i],sources[i])) %>%
    dplyr::select(type,SCADA_TStamp,NacOrientationMean_degs) %>% 
    distinct(SCADA_TStamp,type,.keep_all = TRUE) %>% 
    spread(type,NacOrientationMean_degs)
  names(training.nacpos)<-c('SCADA_TStamp',paste(names(training.nacpos[,2:ncol(training.nacpos)]),'.NacOrientationMean_degs',sep=''))
  
  train[[i]]<-left_join(training.kw,training.nacpos,'SCADA_TStamp')
  
  # train model
  models[[i]]<-gam(data = train[[i]],
                            formula = target.kW_Mean ~ s(source.kW_Mean)+s(target.NacOrientationMean_degs,bs='cc',k = 36)+s(source.NacOrientationMean_degs,bs='cc',k = 36))
  
  # predict performance in the training period (for reference)
  train[[i]]$predicted_KW<-predict.gam(object = models[[i]],newdata = train[[i]])
  train[[i]]$wtg<-targets[i]
  # show plot
  plot.gam(models[[i]],pages = 1)
  #show summary
  print(summary(models[[i]]))
  
}

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## target.kW_Mean ~ s(source.kW_Mean) + s(target.NacOrientationMean_degs, 
##     bs = "cc", k = 36) + s(source.NacOrientationMean_degs, bs = "cc", 
##     k = 36)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 423.3468     0.3005    1409   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                      edf Ref.df         F p-value    
## s(source.kW_Mean)                  7.649  8.537 104796.53  <2e-16 ***
## s(target.NacOrientationMean_degs) 31.089 34.000     20.12  <2e-16 ***
## s(source.NacOrientationMean_degs) 32.516 34.000     48.44  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.962   Deviance explained = 96.3%
## GCV = 3625.7  Scale est. = 3619.2    n = 40087

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## target.kW_Mean ~ s(source.kW_Mean) + s(target.NacOrientationMean_degs, 
##     bs = "cc", k = 36) + s(source.NacOrientationMean_degs, bs = "cc", 
##     k = 36)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 422.6513     0.2676    1580   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                      edf Ref.df         F p-value    
## s(source.kW_Mean)                  7.167  8.198 1.328e+05  <2e-16 ***
## s(target.NacOrientationMean_degs) 33.238 34.000 4.755e+01  <2e-16 ***
## s(source.NacOrientationMean_degs) 28.044 34.000 3.872e+00  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.97   Deviance explained =   97%
## GCV =   2874  Scale est. = 2869      n = 40071

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## target.kW_Mean ~ s(source.kW_Mean) + s(target.NacOrientationMean_degs, 
##     bs = "cc", k = 36) + s(source.NacOrientationMean_degs, bs = "cc", 
##     k = 36)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 405.2291     0.2569    1578   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                      edf Ref.df         F p-value    
## s(source.kW_Mean)                  5.576  6.713 1.692e+05  <2e-16 ***
## s(target.NacOrientationMean_degs) 32.123 34.000 6.436e+00  <2e-16 ***
## s(source.NacOrientationMean_degs) 29.611 34.000 1.026e+01  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.971   Deviance explained = 97.1%
## GCV = 2614.1  Scale est. = 2609.6    n = 39549

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## target.kW_Mean ~ s(source.kW_Mean) + s(target.NacOrientationMean_degs, 
##     bs = "cc", k = 36) + s(source.NacOrientationMean_degs, bs = "cc", 
##     k = 36)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 403.1133     0.2702    1492   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                     edf Ref.df         F p-value    
## s(source.kW_Mean)                  7.47  8.419 1.231e+05  <2e-16 ***
## s(target.NacOrientationMean_degs) 29.73 34.000 7.621e+00  <2e-16 ***
## s(source.NacOrientationMean_degs) 32.25 34.000 2.953e+01  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.968   Deviance explained = 96.8%
## GCV = 2887.1  Scale est. = 2882      n = 39468
train<-bind_rows(train)
trainingData<-left_join(trainingData,select(train,SCADA_TStamp,wtg,predicted_KW),c('SCADA_TStamp','wtg'))
trainingData$period<-'Training'

Okay, the models look pretty good (high r.squared). Now, let’s test!

Predict Performance for the test data period

Loop through each target turbine and predict the production during the test period using the models developed during the training period.

test<-list()
for(i in 1:length(targets)){
  #format the test data for the model  
  test.kw<-testData %>% 
    filter(wtg %in% c(targets[i],sources[i])) %>%
    dplyr::select(type,SCADA_TStamp,kW_Mean) %>% 
    distinct(SCADA_TStamp,type,.keep_all = TRUE) %>% 
    spread(type,kW_Mean)
  names(test.kw)<-c('SCADA_TStamp',paste(names(test.kw[,2:ncol(test.kw)]),'.kW_Mean',sep=''))
  
  test.nacpos<-testData %>% 
    filter(wtg %in% c(targets[i],sources[i])) %>%
    dplyr::select(type,SCADA_TStamp,NacOrientationMean_degs) %>% 
    distinct(SCADA_TStamp,type,.keep_all = TRUE) %>% 
    spread(type,NacOrientationMean_degs)
  names(test.nacpos)<-c('SCADA_TStamp',paste(names(test.nacpos[,2:ncol(test.kw)]),'.NacOrientationMean_degs',sep=''))

  test[[i]]<-left_join(test.kw,test.nacpos,'SCADA_TStamp')

  # predict
  test[[i]]$predicted_KW<-predict.gam(object = models[[i]],newdata = test[[i]])
  test[[i]]$wtg<-targets[i]
}

# merge predictions back into testData data.frame
test<-bind_rows(test)  
testData<-left_join(testData,select(test,SCADA_TStamp,wtg,predicted_KW),c('SCADA_TStamp','wtg'))
testData$period<-'Test'

Okay, we now have our predictions, so do the target turbines show any change in performance?

Calculate % Deviation in Performance during the Test Period

In this instance, all four target turbines showed an increase in performance during the test period, presumably due to the impact of the installation of VG’s. While not all turbines are showing equivalent gains in performance, it seems reasonable to conclude that there is a not inconsequential impact.

#calculate results
Results<- bind_rows(trainingData,testData) %>%
  filter(wtg %in% targets & !is.na(predicted_KW)& !is.na(kW_Mean)) %>%
  mutate(wtg=factor(wtg)) %>% 
  group_by(period,wtg) %>%
  arrange(SCADA_TStamp) %>% 
  mutate(cum_Actual_KWh=cumsum(kW_Mean)/6,
         cum_Predicted_KWh=cumsum(predicted_KW)/6,
         cum_pct_actual_div_predicted=cum_Actual_KWh/cum_Predicted_KWh)

#output table
test.parameters<-data.frame(wtg=targets,source=sources)
out.table<-testData %>%
  filter(wtg %in% targets & !is.na(predicted_KW)& !is.na(kW_Mean) & period=='Test') %>%
  group_by(wtg) %>%
  summarise(Actual_KWh=sum(kW_Mean)/6,
            Predicted_KWh=sum(predicted_KW)/6,
            actual_dv_predicted=Actual_KWh/Predicted_KWh) %>%
  left_join(test.parameters,'wtg')

kable(out.table)
wtg Actual_KWh Predicted_KWh actual_dv_predicted source
80 2048532 1996685 1.025966 81
86 2095334 2072429 1.011052 87
126 1975732 1915400 1.031498 127
128 1987114 1928578 1.030352 127
#show mean
mean(out.table$actual_dv_predicted)
## [1] 1.024717

So that is useful information. But how accurate are those predictions? Are these conclusions robust?

One useful way to look at this is to show a ‘convergence’ plot which calculates the cumulative ratio between the actual and predicted production. For the training period, the ratio should be 100% as the goal of the model is to accurately predict the target turbine production. But how long does it take for the model to converge to 100%? Ideally the model would converge to 100% relatively quickly and would not waiver up/down indicating that the training period is of sufficient length and that the training data itself isn’t given to any large unknown changes in either target or source turbine efficiency.

This idea of convergence also works for the test period as ideally the ratio between actual and predicted production of the target turbines will converge to some value, and that this value would be consistent for a sufficiently long period of time to consider the new ratio to be indicative of the performance change. How does our test stack up?

# show convergence plot
ggplot(Results,aes(as.Date(SCADA_TStamp),cum_pct_actual_div_predicted,color=wtg))+
  annotate(geom = 'rect',xmin = start_before_period,xmax=end_before_period,ymin=.9,ymax=1.1,alpha=.2,fill='blue',color='black')+
  annotate(geom = 'rect',xmin = start_after_period,xmax=end_after_period,ymin=.9,ymax=1.1,alpha=.2,fill='red',color='black')+
  geom_hline(yintercept = 1,lty=2,lwd=1.2)+
  geom_point()+
  labs(x='Date',y='Actual / Predicted %',title='% Change in Performance due to VG\'s',subtitle='Training Period in Blue, Test Period in Red')+
  theme_bw()+
  scale_x_date(date_breaks='1 months',date_labels='%m-%Y')+
  theme(axis.text.x=element_text(angle=90))+
  scale_y_continuous(labels=scales::percent,limits=c(.9,1.1),breaks=seq(.9,1.1,.01))

So from this plot it seems clear that the model training and test periods were of sufficient length with the possible exception of WTG-86 which seems to show a decreasing trend in performance over time.

See you next time!

comments powered by Disqus