Motivation
The goal here is to show how to query, analyze and visualize ERA-Interim meteorological data in NetCDF format using tidy principles in R.
Load Libraries
library(tidyverse)
library(reticulate)
library(ncdf4)
library(ggthemes)
library(viridis)
Query ECMWF
To do this I’m using the ecmwfapi python package through reticulate. The API take a python dictionary object so we’ll create a list in R and convert it to it’s python equivalent. The downloaded file is specfiiced as NetCDF. For this example, we’ll download 6-hourly wind vector components, air temperature and sea level pressure for the USA for the month of March 2018.
ecmwf<-import('ecmwfapi')
#### Download the Data ####
server = ecmwf$ECMWFDataServer()
query<-r_to_py(list(
class='ei',
dataset= "interim",
date= "2018-03-01/to/2018-03-31",
expver= "1",
grid= "0.75/0.75",
levtype="sfc",
param= "151.128/165.128/166.128/167.128", # sea level pressure, 2m temperature, 10m u wind component, 10m v wind component
area="50/-130/25/-65", #N/W/S/E, get USA bounding box
step= "0",
stream="oper",
time="00:00:00/06:00:00/12:00:00/18:00:00",
type="an",
format= "netcdf",
target='file.nc'
))
server$retrieve(query)
Read in the data
open the file
myPath='../../data/ERAI/' # you will edit this
ncin<-nc_open(file.path(myPath,'file.nc'))
get coordinates
lat=ncvar_get(ncin,'latitude')
lon=ncvar_get(ncin,'longitude')
convert time
Time is entered as an integer with an origin. To convert it to a time-aware field in R, we’ll need to extract that origin and convert the integers.
t<- ncvar_get(ncin, "time")
tunits<-ncatt_get(ncin,'time')
tunits$units
## [1] "hours since 1900-01-01 00:00:0.0"
tustr<- strsplit(tunits$units, " ")
timestamp = as.POSIXct(t*3600,tz='GMT',origin=tustr[[1]][3])
import data into dataframe
The NetCDF format in this case contains geospatial data from 4 fields in three dimensions (latitude, longitude and time). The data fields also contain useful metadata. Let’s importing the data alongside the metadata.
data<-data_frame(name=attributes(ncin$var)$names) %>%
bind_cols(map_df(.$name,ncatt_get,nc=ncin)) %>%
mutate(values=map(name,ncvar_get,nc=ncin))
nc_close(ncin)
data
Create Processed Metrics
We can now operate on different data fields and append the post processed metrics as additional fields in the data frame. Here’s an example of calculating the scaler wind speed from it’s vector components.
newMetric<-data_frame(name='ws10',units='m s**-1',long_name='Wind Speed',
values=list(sqrt(data$values[data$name=='u10'][[1]]^2+data$values[data$name=='v10'][[1]]^2)))
data<-bind_rows(data,newMetric)
data
Tidy Timeseries
There’s lots of efficient ways to manipulate the NetCDF data for analysis, but for this exmaple we’ll convert the multidimensional data into a tidy 2-D data frame. To do this, first, create an empty data.frame with lat,lon and time index. Order is important to match dimensions of netcdf list
df<-expand.grid(lon=lon-360,lat=lat,timestamp=timestamp,name=data$name) %>%
mutate(coord=factor(paste(lon,lat,'/')))
Now, pull the data fields, simplify to two-dimensions and append to the empty dataframe.
df<-data %>%
pull(values) %>%
unlist() %>%
as_tibble() %>%
bind_cols(df)
head(df)
show plot
By converting the data to a 2-D data frame we can now leverage the power of the tidyverse functions to manipulate and visualize the data. For example, here’s a timeseries plot of the data for a subset of coordinates.
# plot timeseries
df %>%
filter(between(lat,42,45) & between(lon,-70,-69)) %>%
left_join(select(data,name,long_name),'name') %>%
ggplot(aes(timestamp,value,color=coord))+
geom_line()+
facet_wrap(~long_name,ncol=1,scales='free')+
theme_minimal()+
theme(legend.position = 'bottom')+
scale_color_viridis(discrete = T)
Aggregate and Geospatial Plot
We can also aggregate and visualize the 2-D data in ggplot2.
Temperature
mean_temp<-df %>%
filter(name=='t2m') %>%
group_by(lat,lon) %>%
summarise(value=mean(value))
ggplot(mean_temp,aes(lon,lat))+
geom_raster(aes(fill=value),interpolate = T,alpha=.75)+
geom_polygon(data=map_data('state'),aes(x=long,y=lat,group=group),color='black',fill=NA)+
theme_map()+
scale_fill_viridis(option = 'A')+
labs(title='Mean 2m Temperature')
Wind Speed
In this example, we can re-calculate the scaler wind speed from the aggregated vector components to show the average wind speed at 10m across the world for the month of March 2018.
mean_ws<-df %>%
filter(name %in% c('u10','v10')) %>%
group_by(lat,lon,name) %>%
summarise(value=mean(value)) %>%
spread(name,value) %>%
mutate(ws=sqrt(u10^2+v10^2))
ggplot(mean_ws,aes(lon,lat))+
geom_raster(aes(fill=ws),interpolate = T,alpha=.75)+
geom_polygon(data=map_data('state'),aes(x=long,y=lat,group=group),color='black',fill=NA)+
theme_map()+
scale_fill_viridis()+
labs(title='Mean 10m Wind Speed')
Hope you found this useful. Let me know what you think!
Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email