Stephen Lightfoote

3 minute read

Motivation

The goal here is to show how to query, analyze and visualize ERA-Interim meteorological data in NetCDF format using tidy principles in R.

Load Libraries

library(tidyverse)
library(reticulate)
library(ncdf4)
library(ggthemes)
library(viridis)

Query ECMWF

To do this I’m using the ecmwfapi python package through reticulate. The API take a python dictionary object so we’ll create a list in R and convert it to it’s python equivalent. The downloaded file is specfiiced as NetCDF. For this example, we’ll download 6-hourly wind vector components, air temperature and sea level pressure for the USA for the month of March 2018.

ecmwf<-import('ecmwfapi')
#### Download the Data ####
server = ecmwf$ECMWFDataServer()
query<-r_to_py(list(
  class='ei',
  dataset= "interim",
  date= "2018-03-01/to/2018-03-31",
  expver= "1",
  grid= "0.75/0.75",
  levtype="sfc",
  param= "151.128/165.128/166.128/167.128", # sea level pressure, 2m temperature, 10m u wind component, 10m v wind component
  area="50/-130/25/-65", #N/W/S/E, get USA bounding box
  step= "0",
  stream="oper",
  time="00:00:00/06:00:00/12:00:00/18:00:00",
  type="an",
  format= "netcdf",
  target='file.nc'
))
server$retrieve(query)

Read in the data

open the file

myPath='../../data/ERAI/' # you will edit this
ncin<-nc_open(file.path(myPath,'file.nc'))

get coordinates

lat=ncvar_get(ncin,'latitude')
lon=ncvar_get(ncin,'longitude')

convert time

Time is entered as an integer with an origin. To convert it to a time-aware field in R, we’ll need to extract that origin and convert the integers.

t<- ncvar_get(ncin, "time")
tunits<-ncatt_get(ncin,'time')
tunits$units
## [1] "hours since 1900-01-01 00:00:0.0"
tustr<- strsplit(tunits$units, " ")
timestamp = as.POSIXct(t*3600,tz='GMT',origin=tustr[[1]][3])

import data into dataframe

The NetCDF format in this case contains geospatial data from 4 fields in three dimensions (latitude, longitude and time). The data fields also contain useful metadata. Let’s importing the data alongside the metadata.

data<-data_frame(name=attributes(ncin$var)$names) %>%
  bind_cols(map_df(.$name,ncatt_get,nc=ncin)) %>%
  mutate(values=map(name,ncvar_get,nc=ncin))
nc_close(ncin)
data

Create Processed Metrics

We can now operate on different data fields and append the post processed metrics as additional fields in the data frame. Here’s an example of calculating the scaler wind speed from it’s vector components.

newMetric<-data_frame(name='ws10',units='m s**-1',long_name='Wind Speed',
                      values=list(sqrt(data$values[data$name=='u10'][[1]]^2+data$values[data$name=='v10'][[1]]^2)))
data<-bind_rows(data,newMetric)
data

Tidy Timeseries

There’s lots of efficient ways to manipulate the NetCDF data for analysis, but for this exmaple we’ll convert the multidimensional data into a tidy 2-D data frame. To do this, first, create an empty data.frame with lat,lon and time index. Order is important to match dimensions of netcdf list

df<-expand.grid(lon=lon-360,lat=lat,timestamp=timestamp,name=data$name) %>%
  mutate(coord=factor(paste(lon,lat,'/')))

Now, pull the data fields, simplify to two-dimensions and append to the empty dataframe.

df<-data %>%
  pull(values) %>%
  unlist() %>%
  as_tibble() %>%
  bind_cols(df)

head(df)

show plot

By converting the data to a 2-D data frame we can now leverage the power of the tidyverse functions to manipulate and visualize the data. For example, here’s a timeseries plot of the data for a subset of coordinates.

# plot timeseries
df %>%
  filter(between(lat,42,45) & between(lon,-70,-69)) %>%
  left_join(select(data,name,long_name),'name') %>%
  ggplot(aes(timestamp,value,color=coord))+
  geom_line()+
  facet_wrap(~long_name,ncol=1,scales='free')+
  theme_minimal()+
  theme(legend.position = 'bottom')+
  scale_color_viridis(discrete = T)

Aggregate and Geospatial Plot

We can also aggregate and visualize the 2-D data in ggplot2.

Temperature

mean_temp<-df %>%
  filter(name=='t2m') %>%
  group_by(lat,lon) %>%
  summarise(value=mean(value))

  ggplot(mean_temp,aes(lon,lat))+
  geom_raster(aes(fill=value),interpolate = T,alpha=.75)+
  geom_polygon(data=map_data('state'),aes(x=long,y=lat,group=group),color='black',fill=NA)+
  theme_map()+
  scale_fill_viridis(option = 'A')+
  labs(title='Mean 2m Temperature')

Wind Speed

In this example, we can re-calculate the scaler wind speed from the aggregated vector components to show the average wind speed at 10m across the world for the month of March 2018.

mean_ws<-df %>%
  filter(name %in% c('u10','v10')) %>%
  group_by(lat,lon,name) %>%
  summarise(value=mean(value)) %>%
  spread(name,value) %>%
  mutate(ws=sqrt(u10^2+v10^2))

ggplot(mean_ws,aes(lon,lat))+
  geom_raster(aes(fill=ws),interpolate = T,alpha=.75)+
  geom_polygon(data=map_data('state'),aes(x=long,y=lat,group=group),color='black',fill=NA)+
  theme_map()+
  scale_fill_viridis()+
  labs(title='Mean 10m Wind Speed')

Hope you found this useful. Let me know what you think!

comments powered by Disqus