Snow Modeling

Contributors: Melissa Wrzesien1,2, Brendan McAndrew1,3, Jian Li4,5, Caleb Spradlin4,6

1 Hydrological Sciences Lab, NASA Goddard Space Flight Center, 2 University of Maryland, 3 Science Systems and Applications, Inc., 4 InuTeq, LLC, 5 Computational and Information Sciences and Technology Office (CISTO), NASA Goddard Space Flight Center, 6 High Performance Computing, NASA Goddard Space Flight Center

Learning Objectives

  • Learn about the role of modeling in a satellite mission

  • Open, explore, and visualize gridded model output

  • Compare model output to raster- and point-based observation datasets

The goal of a future snow satellite is to provide improved understanding of global snow mass, as compared to current estimates. However, a single sensor likely won’t be able to accurately observe all types of snow in all conditions for the entire globe. Instead, we need to combine future snow observations with other tools - including models and observations from currently available satellites.

Models can also help to extend snow observations to areas where observations are not available. Data may be missing due to orbit gaps or masked out in regions of higher uncertainty. Remote sensing observations and models can be combined through data assimilation, a process where observations are used to constrain model estimates.

../../_images/DAflowchart_v4.png

The figure above shows a conceptual example of how satellite observations with orbit gaps could be combined with a model to produce results with no missing data. (Figure modified from diagram provided by Eunsang Cho)

What is LIS?

Since models will likely play a role in processing observations from a snow satellite mission, it is important to become comfortable working with gridded data products. Today we will use sample model output from NASA’s Land Information System (LIS).

The Land Information System is a software framework for land surface modeling and data assimilation developed with the goal of integrating satellite and ground-based observational data products with advanced modeling techniques to produce estimates of land surface states and fluxes.

TL;DR LIS = land surface models + data assimilation

../../_images/LIS_schematic.png

One key feature LIS provides is flexibility to meet user needs. LIS consists of a collection of plug-ins, or modules, that allow users to design simulations with their choice of land surface model, meteorological forcing, data assimilation, hydrological routing, land surface parameters, and more. The plug-in based design also provides extensibility, making it easier to add new functionality to the LIS framework.

Current efforts to expand support for snow modeling include implementation of snow modules, such as SnowModel and Crocus, into the LIS framework. These models, when run at the scale of ~100 meters, will enable simulation of wind redistribution, snow grain size, and other important processes for more accurate snow modeling.

Development of LIS is led by the Hydrological Sciences Laboratory at NASA’s Goddard Space Flight Center. (Figure above provided by the LIS team)

Working with Modeled Output

Here we will use sample model output from a LIS-SnowModel simulation over a western Colorado domain. Daily estimates of SWE, snow depth, and snow density are written to output. The SnowModel simulation has a spatial resolution of 100 m. We’ve provided four years of output, though here we’ll mostly use output from water year 2020.

Below, we’ll walk through how to open and interact with the LIS-SnowModel output. Our main objectives are to demonstrate how to do the following with a gridded dataset like LIS-SnowModel:

  • Understand attributes of model data file and available variables

  • Create a spatial map of model output

  • Create time series at a point and averaged across domain

  • Compare LIS-SnowModel to raster and point data

  • Create a visualization for quick evaluation of model estimates

Import Libraries

# interface to Amazon S3 filesystem
import s3fs

# interact with n-d arrays
import numpy as np
import xarray as xr

# interact with tabular data (incl. spatial)
import pandas as pd
import geopandas as gpd

# interactive plots
import holoviews as hv
import geoviews as gv
import hvplot.pandas
import hvplot.xarray
# set bokeh as the holoviews plotting backend
#hv.extension('bokeh')
# set holoviews backend to Bokeh
#gv.extension('bokeh')

# used to find nearest grid cell to a given location
from scipy.spatial import distance

import fsspec
from datetime import datetime as dt

from geoviews import opts
from geoviews import tile_sources as gvts
from datashader.colors import viridis
import datashader
from holoviews.operation.datashader import datashade, shade, dynspread, spread, rasterize
from holoviews.streams import Selection1D, Params
import panel as pn
import param as pm
import warnings
import holoviews.operation.datashader as hd
warnings.filterwarnings("ignore")

pn.extension()
pn.param.ParamMethod.loading_indicator =True

Load the LIS-SnowModel Output

The xarray library makes working with labelled n-dimensional arrays easy and efficient. If you’re familiar with the pandas library it should feel pretty familiar.

Here we load the LIS-SnowModel output into an xarray.Dataset object:

# create S3 filesystem object
s3 = s3fs.S3FileSystem(anon=False)

# define the name of our S3 bucket
bucket_name = 'eis-dh-hydro/SNOWEX-HACKWEEK'

# define path to store on S3
lis_output_s3_path_time_chunk = f's3://{bucket_name}/2022/ZARR/SURFACEMODEL/LIS_HIST_rechunkedV4.d01.zarr'
lis_output_s3_path = f's3://{bucket_name}/2022/ZARR/SURFACEMODEL/LIS_HIST_default_chunks.d01.zarr/'

# create key-value mapper for S3 object (required to read data stored on S3)
lis_output_mapper = s3.get_mapper(lis_output_s3_path)
lis_output_mapper_tc = s3.get_mapper(lis_output_s3_path_time_chunk)

# open the dataset
lis_output_ds = xr.open_zarr(lis_output_mapper, consolidated=True)
lis_output_ds_tc = xr.open_zarr(lis_output_mapper_tc, consolidated=True)

Explore the Data

Display an interactive widget for inspecting the dataset by running a cell containing the variable name. Expand the dropdown menus and click on the document and database icons to inspect the variables and attributes.

lis_output_ds
<xarray.Dataset>
Dimensions:              (time: 1694, north_south: 6650, east_west: 4800)
Coordinates:
    lat                  (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
    lon                  (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
  * time                 (time) datetime64[ns] 2016-09-02 ... 2021-04-22
Dimensions without coordinates: north_south, east_west
Data variables:
    SM_SWE_inst          (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
    SM_SnowDensity_inst  (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
    SM_SnowDepth_inst    (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
Attributes: (12/18)
    DX:                      0.10000000149011612
    DY:                      0.10000000149011612
    MAP_PROJECTION:          LAMBERT CONFORMAL
    NUM_SOIL_LAYERS:         1
    SOIL_LAYER_THICKNESSES:  1.0
    SOUTH_WEST_CORNER_LAT:   35.49998092651367
    ...                      ...
    history:                 created on date: 2022-05-16T17:12:56.387
    institution:             NASA GSFC
    missing_value:           -9999.0
    references:              Kumar_etal_EMS_2006, Peters-Lidard_etal_ISSE_2007
    source:                  
    title:                   LIS land surface model output

Accessing Attributes

Dataset attributes (metadata) are accessible via the attrs attribute:

lis_output_ds.attrs
{'DX': 0.10000000149011612,
 'DY': 0.10000000149011612,
 'MAP_PROJECTION': 'LAMBERT CONFORMAL',
 'NUM_SOIL_LAYERS': 1,
 'SOIL_LAYER_THICKNESSES': 1.0,
 'SOUTH_WEST_CORNER_LAT': 35.49998092651367,
 'SOUTH_WEST_CORNER_LON': -109.50030517578125,
 'STANDARD_LON': -106.75,
 'TRUELAT1': 37.0,
 'TRUELAT2': 40.0,
 'comment': 'website: http://lis.gsfc.nasa.gov/',
 'conventions': 'CF-1.6',
 'history': 'created on date: 2022-05-16T17:12:56.387',
 'institution': 'NASA GSFC',
 'missing_value': -9999.0,
 'references': 'Kumar_etal_EMS_2006, Peters-Lidard_etal_ISSE_2007',
 'source': '',
 'title': 'LIS land surface model output'}

Accessing Variables

Variables can be accessed using either dot notation or square bracket notation. Here we will stick with square bracket notation:

# square bracket notation
lis_output_ds['SM_SnowDepth_inst']
<xarray.DataArray 'SM_SnowDepth_inst' (time: 1694, north_south: 6650,
                                       east_west: 4800)>
dask.array<open_dataset-c6d82014a30be2b9e8bf97ca3ffac825SM_SnowDepth_inst, shape=(1694, 6650, 4800), dtype=float32, chunksize=(1, 6650, 4800), chunktype=numpy.ndarray>
Coordinates:
    lat      (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
    lon      (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
  * time     (time) datetime64[ns] 2016-09-02 2016-09-03 ... 2021-04-22
Dimensions without coordinates: north_south, east_west
Attributes:
    long_name:      snow depth
    standard_name:  snow_depth
    units:          m
    vmax:           999999986991104.0
    vmin:           -999999986991104.0

Watch out for large datasets!

Note that the 4-ish years of model output (1694 daily time steps for a domain of size 6650 x 4800) has a size of over 200 gb! As we’ll see below, this is just for an area over western Colorado. If we’re interested in modeling the western United States or CONUS or even global at high resolution, we need to be prepared to work with some large datasets.

Dimensions and Coordinate Variables

The dimensions and coordinate variable fields put the “labelled” in “labelled n-dimensional arrays”:

  • Dimensions: labels for each dimension in the dataset (e.g., time)

  • Coordinates: labels for indexing along dimensions (e.g., '2020-01-01')

We can use these labels to select, slice, and aggregate the dataset.

Subsetting in Space or Time

xarray provides two methods for selecting or subsetting along coordinate variables:

  • index selection: ds.isel(time=0)

  • value selection ds.sel(time='2020-01-01')

For example, we can use value selection to select based on the the coorindates of a given dimension:

lis_output_ds.sel(time='2020-01-01')
<xarray.Dataset>
Dimensions:              (north_south: 6650, east_west: 4800)
Coordinates:
    lat                  (north_south, east_west) float32 dask.array<chunksize=(6650, 4800), meta=np.ndarray>
    lon                  (north_south, east_west) float32 dask.array<chunksize=(6650, 4800), meta=np.ndarray>
    time                 datetime64[ns] 2020-01-01
Dimensions without coordinates: north_south, east_west
Data variables:
    SM_SWE_inst          (north_south, east_west) float32 dask.array<chunksize=(6650, 4800), meta=np.ndarray>
    SM_SnowDensity_inst  (north_south, east_west) float32 dask.array<chunksize=(6650, 4800), meta=np.ndarray>
    SM_SnowDepth_inst    (north_south, east_west) float32 dask.array<chunksize=(6650, 4800), meta=np.ndarray>
Attributes: (12/18)
    DX:                      0.10000000149011612
    DY:                      0.10000000149011612
    MAP_PROJECTION:          LAMBERT CONFORMAL
    NUM_SOIL_LAYERS:         1
    SOIL_LAYER_THICKNESSES:  1.0
    SOUTH_WEST_CORNER_LAT:   35.49998092651367
    ...                      ...
    history:                 created on date: 2022-05-16T17:12:56.387
    institution:             NASA GSFC
    missing_value:           -9999.0
    references:              Kumar_etal_EMS_2006, Peters-Lidard_etal_ISSE_2007
    source:                  
    title:                   LIS land surface model output

The .sel() approach also allows the use of shortcuts in some cases. For example, here we select all timesteps in the month of January 2020. Note that length of the time dimension is now only 31.

lis_output_ds.sel(time='2020-01')
<xarray.Dataset>
Dimensions:              (time: 31, north_south: 6650, east_west: 4800)
Coordinates:
    lat                  (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
    lon                  (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
  * time                 (time) datetime64[ns] 2020-01-01 ... 2020-01-31
Dimensions without coordinates: north_south, east_west
Data variables:
    SM_SWE_inst          (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
    SM_SnowDensity_inst  (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
    SM_SnowDepth_inst    (time, north_south, east_west) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
Attributes: (12/18)
    DX:                      0.10000000149011612
    DY:                      0.10000000149011612
    MAP_PROJECTION:          LAMBERT CONFORMAL
    NUM_SOIL_LAYERS:         1
    SOIL_LAYER_THICKNESSES:  1.0
    SOUTH_WEST_CORNER_LAT:   35.49998092651367
    ...                      ...
    history:                 created on date: 2022-05-16T17:12:56.387
    institution:             NASA GSFC
    missing_value:           -9999.0
    references:              Kumar_etal_EMS_2006, Peters-Lidard_etal_ISSE_2007
    source:                  
    title:                   LIS land surface model output

Latitude and Longitude

You may have noticed that latitude (lat) and longitude (lon) are not listed as dimensions. This dataset would be easier to work with if lat and lon were coordinate variables and dimensions. Here we define a helper function that reads the spatial information from the dataset attributes, generates arrays containing the lat and lon values, and appends them to the dataset:

def add_latlon_coords(dataset: xr.Dataset)->xr.Dataset:
    """Adds lat/lon as dimensions and coordinates to an xarray.Dataset object."""
    
    # get attributes from dataset
    attrs = dataset.attrs
    
    # get x, y resolutions
    dx = .001 #round(float(attrs['DX']), 3)
    dy = .001 #round(float(attrs['DY']), 3)
       
    # get grid cells in x, y dimensions
    ew_len = len(dataset['east_west'])
    ns_len = len(dataset['north_south'])
    
    # get lower-left lat and lon
    ll_lat = round(float(attrs['SOUTH_WEST_CORNER_LAT']), 3)
    ll_lon = round(float(attrs['SOUTH_WEST_CORNER_LON']), 3)
    
    # calculate upper-right lat and lon
    ur_lat =  41.5122 #ll_lat + (dy * ns_len)
    ur_lon = -103.9831 #ll_lon + (dx * ew_len)
    
    # define the new coordinates
    coords = {
        # create an arrays containing the lat/lon at each gridcell
        'lat': np.linspace(ll_lat, ur_lat, ns_len, dtype=np.float32, endpoint=False).round(4),
        'lon': np.linspace(ll_lon, ur_lon, ew_len, dtype=np.float32, endpoint=False).round(4)
    }
    
    lon_attrs = dataset.lon.attrs
    lat_attrs = dataset.lat.attrs
    
    # rename the original lat and lon variables
    dataset = dataset.rename({'lon':'orig_lon', 'lat':'orig_lat'})
    # rename the grid dimensions to lat and lon
    dataset = dataset.rename({'north_south': 'lat', 'east_west': 'lon'})
    # assign the coords above as coordinates
    dataset = dataset.assign_coords(coords)
    dataset.lon.attrs = lon_attrs
    dataset.lat.attrs = lat_attrs
    
    
    return dataset

Now that the function is defined, let’s use it to append lat and lon coordinates to the LIS output:

lis_output_ds = add_latlon_coords(lis_output_ds)
lis_output_ds_tc = add_latlon_coords(lis_output_ds_tc)

Inspect the dataset:

lis_output_ds
<xarray.Dataset>
Dimensions:              (time: 1694, lat: 6650, lon: 4800)
Coordinates:
    orig_lat             (time, lat, lon) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
    orig_lon             (time, lat, lon) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
  * time                 (time) datetime64[ns] 2016-09-02 ... 2021-04-22
  * lat                  (lat) float32 35.5 35.5 35.5 35.5 ... 41.51 41.51 41.51
  * lon                  (lon) float32 -109.5 -109.5 -109.5 ... -104.0 -104.0
Data variables:
    SM_SWE_inst          (time, lat, lon) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
    SM_SnowDensity_inst  (time, lat, lon) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
    SM_SnowDepth_inst    (time, lat, lon) float32 dask.array<chunksize=(1, 6650, 4800), meta=np.ndarray>
Attributes: (12/18)
    DX:                      0.10000000149011612
    DY:                      0.10000000149011612
    MAP_PROJECTION:          LAMBERT CONFORMAL
    NUM_SOIL_LAYERS:         1
    SOIL_LAYER_THICKNESSES:  1.0
    SOUTH_WEST_CORNER_LAT:   35.49998092651367
    ...                      ...
    history:                 created on date: 2022-05-16T17:12:56.387
    institution:             NASA GSFC
    missing_value:           -9999.0
    references:              Kumar_etal_EMS_2006, Peters-Lidard_etal_ISSE_2007
    source:                  
    title:                   LIS land surface model output

Now lat and lon are listed as coordinate variables and have replaced the north_south and east_west dimensions. This will make it easier to spatially subset the dataset!

Basic Spatial Subsetting

We can use the slice() function on the lat and lon dimensions to select data between a range of latitudes and longitudes:

# uncomment line below to work with the full domain -> this will be much slower!
#lis_output_ds.sel(lat=slice(35.5, 41), lon=slice(-109, -104))

# just Grand Mesa -> smaller domain for faster run times
# note the smaller lat/lon extents in the dimensions
lis_output_ds.sel(lat=slice(38.6, 39.4), lon=slice(-108.6, -107.1))
<xarray.Dataset>
Dimensions:              (time: 1694, lat: 885, lon: 1306)
Coordinates:
    orig_lat             (time, lat, lon) float32 dask.array<chunksize=(1, 885, 1306), meta=np.ndarray>
    orig_lon             (time, lat, lon) float32 dask.array<chunksize=(1, 885, 1306), meta=np.ndarray>
  * time                 (time) datetime64[ns] 2016-09-02 ... 2021-04-22
  * lat                  (lat) float32 38.6 38.6 38.6 38.6 ... 39.4 39.4 39.4
  * lon                  (lon) float32 -108.6 -108.6 -108.6 ... -107.1 -107.1
Data variables:
    SM_SWE_inst          (time, lat, lon) float32 dask.array<chunksize=(1, 885, 1306), meta=np.ndarray>
    SM_SnowDensity_inst  (time, lat, lon) float32 dask.array<chunksize=(1, 885, 1306), meta=np.ndarray>
    SM_SnowDepth_inst    (time, lat, lon) float32 dask.array<chunksize=(1, 885, 1306), meta=np.ndarray>
Attributes: (12/18)
    DX:                      0.10000000149011612
    DY:                      0.10000000149011612
    MAP_PROJECTION:          LAMBERT CONFORMAL
    NUM_SOIL_LAYERS:         1
    SOIL_LAYER_THICKNESSES:  1.0
    SOUTH_WEST_CORNER_LAT:   35.49998092651367
    ...                      ...
    history:                 created on date: 2022-05-16T17:12:56.387
    institution:             NASA GSFC
    missing_value:           -9999.0
    references:              Kumar_etal_EMS_2006, Peters-Lidard_etal_ISSE_2007
    source:                  
    title:                   LIS land surface model output

Notice how the sizes of the lat and lon dimensions have decreased.

Subset Across Multiple Dimensions

Now we will combine the above examples for subsetting both spatially and temporally:

# define a range of dates to select
wy_2020_slice = slice('2019-10-01', '2020-09-30')

# define lat/lon for Grand Mesa area
lat_slice = slice(38.6, 39.4)
lon_slice = slice(-108.6, -107.1)

# select the snow depth and subset to wy_2020_slice
snd_GM_wy2020_ds = lis_output_ds['SM_SnowDepth_inst'].sel(time=wy_2020_slice, lat=lat_slice, lon=lon_slice)
snd_GM_wy2020_ds_tc = lis_output_ds_tc['SM_SnowDepth_inst'].sel(time=wy_2020_slice, lat=lat_slice, lon=lon_slice)

# inspect resulting dataset
snd_GM_wy2020_ds
<xarray.DataArray 'SM_SnowDepth_inst' (time: 366, lat: 885, lon: 1306)>
dask.array<getitem, shape=(366, 885, 1306), dtype=float32, chunksize=(1, 885, 1306), chunktype=numpy.ndarray>
Coordinates:
    orig_lat  (time, lat, lon) float32 dask.array<chunksize=(1, 885, 1306), meta=np.ndarray>
    orig_lon  (time, lat, lon) float32 dask.array<chunksize=(1, 885, 1306), meta=np.ndarray>
  * time      (time) datetime64[ns] 2019-10-01 2019-10-02 ... 2020-09-30
  * lat       (lat) float32 38.6 38.6 38.6 38.6 38.6 ... 39.4 39.4 39.4 39.4
  * lon       (lon) float32 -108.6 -108.6 -108.6 -108.6 ... -107.1 -107.1 -107.1
Attributes:
    long_name:      snow depth
    standard_name:  snow_depth
    units:          m
    vmax:           999999986991104.0
    vmin:           -999999986991104.0

Subset for more manageable sizes!

We’ve now subsetted both spatially (down to just Grand Mesa domain) and temporally (water year 2020). Note the smaller size of the data array -> a decrease from over 200 gb to 1.7 gb! This smaller dataset will be much easier to work with, but feel free to try some of the commands with the full dataset, just give it a few minutes to process!

Plotting

We’ve imported two plotting libraries:

  • matplotlib: static plots

  • hvplot: interactive plots

We can make a quick matplotlib-based plot for the subsetted data using the .plot() function supplied by xarray.Dataset objects. For this example, we’ll select one day and plot it:

# simple matplotlilb plot
snd_GM_wy2020_ds.sel(time='2020-01-01').plot();
../../_images/1_snowmodeling_tutorial_39_0.png

Similarly we can make an interactive plot using the hvplot accessor and specifying a quadmesh plot type:

# hvplot based map
snd_GM_20200101_plot = snd_GM_wy2020_ds.sel(time='2020-01-01').hvplot.quadmesh(geo=True, rasterize=True, project=True,
                                                                               xlabel='lon', ylabel='lat', cmap='viridis',
                                                                               tiles='EsriImagery')

snd_GM_20200101_plot