1.

## Introduction

The backbone of the current global observing system (GOS1) of the Earth’s atmosphere for global-scale numerical weather prediction (NWP) is the series of passive remote sensing microwave (MW) radiometers. The importance of MW radiance observations has been demonstrated numerous times at several weather centers by impact experiments and by regular monitoring of the sensitivity of the short-term forecast error to different observation types (Cardinali, 2009). Current MW sensors are expensive, difficult to accommodate, have long development cycles for replacements, and a single failure can have a large impact on global NWP. One approach to mitigate the challenges of high cost and possible data gaps for MW radiometers in the future is to explore alternatives such as SmallSats, which are lower cost, smaller size, and quicker to build and refresh. SmallSat designers can make use of infrastructure already developed for satellite platforms, solar panels, communication links, and ground systems (Poghosyan and Golkar, 2017), and multiple low-cost launch opportunities for SmallSats now exist. SmallSats in constellations provide shorter revisit times and the possibility of on-orbit backups. In comparison to present satellite missions, SmallSat constellations are expected to be regularly refreshed with state-of-the-art technology due to reduced manufacturing times.

The Earth Observing Nanosatellite–Microwave (EON-MW) is the planned culmination of a series of on-orbit experiments (Blackwell et al., 2019) being conducted by MIT/LL that have already demonstrated the capability of CubeSats to host MW sensors for remote sensing of the atmospheric temperature and water vapor profiles, of vertically integrated cloud parameters, and of surface conditions including skin temperature and MW emissivity, which is sensitive to surface properties such as soil moisture and vegetative properties. CubeSats are a class of SmallSats sized in multiples of 10 cm cubes or Units (U), giving rise to the 1U, 3U, etc., nomenclature used for describing these SmallSats. The MIT/LL experiments include the 8-channel 3U MicroMAS-1 (Osaretin et al., 2013), the 12-channel 3U MicroMAS-2, and the 10-channel plus global positioning system radio occultation (GPSRO) receiver 3U MiRaTA. There is now an ecosystems of CubeSat components that can be combined for novel applications. Based on the MIT/LL experiments two missions are planned. TROPICS is a NASA mission with a constellation of 6–12 MicroMAS-2 CubeSats in high-obliquity low Earth orbit (LEO) (Blackwell, 2017). The EON-MW satellite is a 12U CubeSat designed to provide data continuity with the existing NOAA operational AMSU and ATMS MW sounding systems and to provide a backup for the ATMS sensors on SNPP and JPSS (Blackwell and Pereira, 2016; Blackwell et al., 2017, 2019).

The present study—a risk reduction activity sponsored by NOAA OPPA—has the following high-level objectives:

• Assess the potential of the EON-MW sensor to address NOAA mission requirements and to mature NOAA systems to exploit this type of data;
• Provide inputs to Next-Gen Space Architecture decision making, e.g. by assessing new SmallSats capabilities and their impacts; and
• Assess the relative performance of different constellations of orbits of EON-MW satellites.

These objectives lead to specific questions at various levels of assessment. First, we assess the sensor characteristics and geophysical capabilities of the EON-MW sensor specifications. Here, observation centric approaches are used to quantify the inherent value of the EON-MW sensors, for example by assessing the accuracy of the EON-MW retrievals. By design the EON-MW channel set closely matches that of ATMS. We quantify the impact of remaining sensor differences on the geophysical capability of EON-MW. Second, we conduct observing system simulation experiments (OSSEs) using the Community Global OSSE Package (CGOP) to assess the impact of EON-MW on analyses and forecasts of the NOAA Global Data Assimilation System (GDAS) and Global Forecast System (GFS). In an OSSE, observations are synthesized from a nature run (NR). The NR is a long forecast generated by a sophisticated NWP model which statistically simulates the real atmosphere (Hoffman and Atlas, 2016) and which is considered to be the ‘truth’. In the current study, the NR is the NASA Goddard Earth Observing System Model, version 5 (GEOS-5) nature run (G5NR) (Putman et al., 2015). The simulated observations are then used by the NOAA data assimilation (DA) system to quantitatively estimate NR states, and a series of forecasts are verified against the NR. For the current experiments the CGOP has been updated from the 3D-ensemble variational (3DEnVar) to the 4D-ensemble variational (4DEnVar) version of the Gridpoint Statistical Interpolation (GSI), which calculates the analysis increments and is the central component of the DA system. The OSSEs conducted and their associated specific questions are listed in Table 1. For example, experiments AM and SSMIS provide a direct comparison of EON-MW and SSMIS to address the question of whether EON-MW can replace or mitigate the loss of the DMSP satellite for the purpose of global NWP. Table 2 details the differences between the experiments, which involve only changes in the MW and IR sensors in LEO that are used in each experiment. The particular sensors and platforms are detailed in Section 2, but note that the lower part of Table 2 shows counts of the different sensor types and orbits compared to the Control experiment. For example, compared to Control, experiment AM adds one MW sensor in an Early orbit, and experiment 2Tropics adds two MW sensors in tropical orbits.

Designing and conducting OSSEs can be difficult. On the one hand we want everything to be as realistic as possible, on the other hand our limited knowledge about the real world results in approximations and our limited computational resources result in small samples unless simplifications are made. Further, small samples in the context of current operational global NWP systems often result in impacts that are not statistically significant. Differences from operational practice and approximations to reality must be examined to insure that the experiment results are valid for the purposes of the experiment (Hoffman and Atlas, 2016). The goals of the current experiments are to examine the relative impact of different configurations of MW sensors. As described below, our experimental setup differs from the current operational setup in some important respects and a number of approximations were made, particularly in the simulation of the MW sensor observation errors. Consequently, results reported here are not expected to directly apply to current operational global NWP. However, our experimental setup is useful to answer in a preliminary way the questions raised about the relative impact of different configurations of MW sensors.

Some caveats to this study are the following:

• This study does not include assessments of the impact of EON-MW on NOAA mission products and services and on consumers of weather information.
• In this study we discuss the quantity, quality, and scales of the information contained in the observations, which we will hereafter refer to collectively as information value. However, we do not make use of the concepts of information content and degrees of freedom for signal that are derived from the information theory approach to Bayesian estimation in terms of the prior and posterior covariance matrices of the retrieval or data assimilation (Eyre, 1990; Rodgers, 1998; Fisher, 2003).
• The different assessment approaches provide different information and different methods are appropriate for different questions. It is important to not apply the results of an assessment beyond its domain of validity or to situations that invalidate some of its assumptions.
• An important reason to consider impacts at the different levels of assessment is that not all improvements in sensor capabilities translate immediately into improvements in analyses and forecasts and not all improvements in analyses and forecasts translate into noticeable improvements in customer experience (i.e. societal and economic impact on end-users).
• Currently, it may be very difficult to quantitatively trace the impact of a proposed new satellite system to societal benefits. Such investments are still made because we expect that as sensors, systems, tools, and products evolve in tandem, improvements will eventually feed upwards through the value chain.

We will return to some of these caveats in the conclusions. In addition, the realism of OSSEs results depend on the realism of the simulated observations errors. The importance of proper error addition was discussed by Privé et al. (2014). For comparing existing and proposed MW sensors we simulated observation errors with reported noise equivalent temperature difference (NEΔT) values. Other approaches are possible but are not directly comparable. For example, we could apply the Errico approach to evaluate appropriate sensor noise values for existing instruments, but then would have to somehow extend these the proposed instruments. Even our choice of NEΔT has some ambiguity as the specifications were often outperformed in reality.

The organization of this paper is as follows: The EON-MW channel characteristics are presented and compared to those of ATMS in Section 2. Section 3 defines the orbital constellations that will be considered here for EON-MW. A prerequisite for both the geophysical capability assessment and the OSSE (model) assessment is the simulation of the EON-MW observations. The use of the Community Radiative Transfer Model (CRTM) for this purpose is briefly outlined in Section 4. The inherent information value of the EON-MW observations is assessed in Section 5. The OSSE experiments and results are presented in Section 6. Finally, Section 7 contains a discussion and concluding remarks.

2.

## Sensor characteristics assessment

EON-MW is a NOAA/NESDIS mission built by MIT/LL to extend the JPSS mission and/or to mitigate a gap due to the loss of existing polar orbiting assets. The EON-MW channel set is very similar to ATMS. EON-MW is a cross-track scanning 12U CubeSat with a design specification that includes 20 kg mass (16 kg for the spacecraft bus and 4 kg for the sensor), $22×22×34$ cm volume, 50 W average power consumption, 2-year lifetime, radiation tolerance, wide observation swath, S-band uplink, and X-band downlink. For comparison, the ATMS sensor alone has a mass of 75 kg and 130 W average power consumption.

As a potential replacement for ATMS, the EON-MW channels are by design very similar to those on ATMS (Kim et al., 2014). Table 3 compares ATMS and EON-MW channels. Based on their sensitivity the channels are classified as surface channels (channels 1 and 2), temperature sounding channels (3–15) and water vapor sounding channels (16–22). Except for statrospheric temperature channels, all channels have at least some sensitivity to cloud hydrometeors (and a lot in the case of channel 16 near 90 GHz). Integrated water vapor can be very accurately retrieved from channel 1 (at the 23 GHz water vapor resonance). Seven of the EON-MW channels in the 50 GHz oxygen band and two in the 183 GHz water vapor band were flight tested on MiRaTA in 2017 (Blackwell et al., 2017). In addition three of the 183 GHz water vapor channels were flight tested on MicroMAS-2 in 2017.

Differences between ATMS and EON-MW are highlighted in bold in Table 3. Table 3 values were used to generate the CRTM coefficients for ATMS and EON-MW used in our experiments (and used operationally for ATMS). A notable change for channel central frequency is in the 183 GHz water vapor channels (17–22). The ATMS 183 GHz channels are all double side band, while the EON-MW channels are all single side band, keeping the higher frequency side band in each case. (Note for dual or quadruple sideband channels the bandwidth given in Table 3 is for a single sideband.) That is, while the ATMS channels are ± 7, 4.5, 3, 1.8, and 1 GHz about the 183.31 GHz water vapor resonance, the EON-MW channels are 8, 6, 4, 2, and 0 GHz above the resonance. For channel 10 the ATMS bandwidth is approximately twice that of EON-MW, while the opposite is true for channel 6. In practice, these changes in bandwidth, including the single vs. double sideband choice for the 183 GHz channels, will effect the instrument NEΔT levels achieved. Another notable change is that all the EON-MW channel radiances are the average of purely horizontally and vertically polarized signals, while the ATMS channels are either quasi-horizontally or quasi-vertically polarized. Quasi-horizontal means polarization is in the along-track plane, and quasi-vertical means that it is normal to the along-track plane (Kim et al., 2014).

The effect of all these changes taken together are significant as seen in Fig. 1, amounting to a few degrees in brightness temperature (BT) for the surface sensing channels and for the humidity sensing channels. There are virtually no differences for the higher peaking temperature channels 5–15. The statistics displayed in Fig. 1 are based on a small but representative set of calculations. There are differences in polarization for all channels, but since the clear atmosphere is not polarized, BT changes as a result of this difference only occur due to the interaction of radiation with the surface as seen in channels 1–4. The differences for the humidity sensing channels are a result of the differences in the central frequencies and band passes. These differences affect the water vapor channel weighting functions in a way that potentially increases the quantity of information contained of the EON-MW water vapor channels relative to the ATMS water vapor channels.

Fig. 1.

BT difference statistics for EON-MW minus ATMS for each channel for bias, RMSE, minimum and maximum (different line types and colors), calculated for a diverse set of 58 cases that are used to test CRTM. The test cases, obtained from a GSI analysis are for both land and ocean, and clear and cloudy conditions.

Panels a and b of Fig. 2 show the weighting function comparison of the proposed EON-MW (solid lines) to ATMS (dotted lines) for a standard atmosphere. For channels where the weighting functions agree (e.g. channels 1–5), the two lines overlap and only the solid lines are visible. The weighting functions half-widths and number of channels determine the vertical resolution of the instrument. Since neighboring weighting functions overlap the retrievals (and their errors) are correlated. The wider the weighting functions and the greater the overlap, the lower the number of independent pieces of information (sometimes called degrees of freedom) that can be retrieved from the observations. Here, EON-MW and ATMS have very similar weighting functions and hence very similar effective vertical resolution. Also, both instruments are better able to resolve vertical profiles of temperature relative to those of water vapor. Except for channels 6 and 10 for which the bandwidths are different (Table 3), all the temperature channels weighting functions are the same or nearly the same since they differ only in their spectral response function. However, for water vapor channels 18–22, the weighting functions while roughly the same shape are displaced vertically with the net effect that EON-MW is sensitive to more of the vertical column than ATMS. In Fig. 2b, EON-MW channel 22 peaks higher and channel 18 peaks lower in the atmosphere than any of the ATMS channels. These same relationships hold for both tropical and polar atmospheres (not shown). As a result the EON-MW channels contain more information about water vapor than the ATMS channels, provided the instrument noise levels are comparable. While these differences may appear small in Fig. 2, there are noticeable positive humidity impacts seen in the OSSEs reported below (Section 6) due to EON-MW.

Fig. 2.

Weighting functions (unitless) calculated for EON-MW (solid lines) and ATMS (dotted lines) for (a) the surface and temperature sounding channels and (b) the water vapor sounding channels, both for a standard atmosphere. Note the different vertical axes.

In terms of errors, EON-MW has higher specified noise (NEΔT) values than ATMS (Table 4) for the most of the lower frequency channels (1–16, except for 10, 13 and 15, which are equal). For the higher frequency channels (17–22) the specified requirements match, in spite of the bandwidth differences noted above. It should be noted that the NEΔT levels achieved by ATMS both in pre-launch tests (Table 4) and as estimated on orbit are substantially better than the specifications and for many channels the values estimated by (Kim et al., 2014, Fig. 6) and in an on-going basis by the STAR Integrated Calibration/Validation System are roughly only half the specified value. (See the tab ‘RDR Channel NEDT’ at https://www.star.nesdis.noaa.gov/icvs/status_NPP_ATMS.php.) It is unclear whether EON-MW will outperform its specifications in a similar manner.

In reality, the true performance of EON-MW will depend on pointing accuracy and calibration as well as the NEΔT levels achieved. Solutions for these issues for SmallSats are emerging and will be implemented for EON-MW. Regarding pointing accuracy, capabilities have greatly improved recently, with sub-arcsecond (1-sigma) pointing control achieved on-orbit in a 3U CubeSat (Pong, 2018). Regarding calibration accuracy and stability, noise diodes have been used for the warm calibration reference point (and deep space for the cold calibration reference point) (e.g. Iturbide-Sanchez et al., 2007; Draper et al., 2015). In the EON-MW design, noise diodes will be sampled for calibration every two seconds. Crews et al. (2020) report that for TROPICS, prelaunch noise diode testing has been successfully extended to 205 GHz. Various ‘vicarious’ methods will be employed post launch to evaluate pointing and calibration accuracy and stability, for example, by monitoring the position of coastlines as evident in surface sensitive channels, and by comparing observed BTs to BTs simulated from collocated profiles of temperature and water vapor from radiosondes, GPSRO retrievals, and NWP analyses (e.g. Crews et al., 2018). The reader should note the distinction usually made between the sensor noise values and the BT observation errors usually used to simulate and assimilate these data. As discussed in Sections 4 and 6, the observation errors should include sensor errors, forward problem errors and representativeness errors, however, in the current set of experiments only sensor errors are included for radiance observations.

3.

## Proposed on-orbit investment: the EON-MW sensor and CubeSat constellations

A number of scenarios for EON-MW orbital constellations are considered in this study. This is not an exhaustive list and other scenarios might be the subjects of further studies. The orbits of polar orbiting satellites are conveniently referenced by their equatorial crossing time closest to local noon. Equatorial crossing time can change, e.g. for DMSP F18, and in this study these times are correct for late summer of 2014: the SNPP ‘afternoon’ orbit has a 13:30 crossing time, while the MetOp-A ‘mid-morning’ orbit has a 9:29 crossing time, and the DMSP F18 ‘early-morning’ orbit has an 8:06 crossing time. MetOp-B and MetOp-A are approximately opposite each other in the same orbital plane (separated by 49 minutes in a 101-minute orbit and with equatorial crossing times only 2 minutes apart).

In this study it is assumed that EON-MW observations have the same temporal and spatial distributions as existing instruments. At higher polar orbits (e.g. 833 km for ATMS) the swath width is larger (2300 km for ATMS), the ground instantaneous field of view (GIFOV) are larger, and the ground track is slower than at lower orbits (e.g. 402 km height and 878 km swath width for TRMM). The following potential polar orbits are considered for EON-MW: SNPP, F18, and shifted F18 (F18’), which is offset to be 2 h earlier in local time and 30° eastward in longitude. The expected EON-MW data coverage during a 6 h assimilation window is shown for these three orbits in Fig. 3. In addition, Fig. 3c shows the expected EON-MW data coverage for the TRMM orbit. One further potential EON-MW orbit is the shifted TRMM (TRMM’) orbit, which is offset 180° eastward in longitude.

Fig. 3.

Six-hour data coverage for simulated EON-MW observations including (a) the F18 early morning orbit, (b) the afternoon SNPP orbit, (c) the high obliquity TRMM (tropical) orbit, and (d) the shifted F18 (F18’) orbit. The six-hour windows are centered on 0000 UTC 8 August 2006 (a, b, d) or 0600 UTC 8 August 2006 (c).

In reality the EON-MW orbit is still to be determined. For MW radiometers like EON-MW and ATMS, the GIFOV is measured in tens of km depending on frequency and, at any given frequency, is proportional to orbit height divided by aperture size. As a result, if EON-MW, with an 11-cm aperture, is in a low enough orbit such as the TRMM orbit, it can provide similar horizontal resolution as ATMS, with a 20-cm aperture size, albeit with a corresponding reduced swath width. However, in this study all BTs are simulated from the NR, or within the DA from the short-term forecast, interpolated to the observation locations assuming horizontal homogeneity of the atmospheric and surface parameters across the satellite footprint, effectively ignoring observation spatial resolution. This is a reasonable assumption in this study because the effective resolutions of the global NR and forecast models are several times the quoted resolution (i.e. grid spacing) and are thus several tens of km (Laprise, 1992). In situations where small scales are important, such as for hurricanes and severe storms, the horizontal resolution of these instruments may smooth small-scale features and somewhat reduce their usefulness.

4.

## Sensor simulation

For all of the geophysical capability assessment (Section 5), the OSSE (model) assessment (Section 6), and the DA procedures, EON-MW radiances (in the form of BTs) are simulated using the CRTM (Chen et al., 2012; Zhu et al., 2012). CRTM is used in all cases to simulate radiances (BTs) for other MW and IR sensors. For clear-sky cases, the principal inputs to the CRTM are profiles of temperature and water vapor mixing ratio, along with surface properties and the observing geometry of the sensor. For the calculation of optical depth CRTM uses ‘coefficient’ tables for each sensor. The sensor coefficients are determined by best fitting the fast optical depth model to values calculated by the very accurate and computationally expensive line-by-line model. For EON-MW, new CRTM coefficient files were generated by the CRTM team based on the sensor channel characteristics provided by MIT/LL that are given in Table 3 and assuming a box-car spectral response function. With the exception of the radiances calculated for Fig. 1 and for Section 5, all the radiance simulations in this study are clear sky, which is realized by setting the cloud liquid and ice water to zero.

In the geophysical capability assessment (Section 5) and in the OSSEs (Section 6), the CRTM inputs are obtained by interpolating the gridded G5NR fields linearly in latitude and longitude and time to the locations and times of the satellite observations. The same procedure is used in the GSI but interpolates the current estimate of the analysis, which initially is the 6 h forecast. In all cases the G5NR surface types were used to simulate observations and the GSI surface types were used in the DA. As a result the emissivity used by CRTM in simulating radiances and in the DA are different. In reality our knowledge of surface emissivity is limited. Therefore, it is realistic that the emissivity used in the analysis is different than used to simulate the observations. Note that due to the flexible interface of CRTM, no vertical interpolation is necessary. To simulate EON-MW observations for the OSSEs, the times, locations, and viewing geometries of the observations must be defined. For the purpose of this study, the operational observational data files used by GSI provided templates to create simulated observation files. Then, the observation latitude, longitude, time, scan angle, and zenith angle from real observations were extracted (with exceptions noted below) and used to interpolate the G5NR profiles and to provide the inputs for CRTM. Since, for the OSSEs, observations were required for the period of 08 August to 15 September 2006, real ATMS, SSMIS, and TMI observations for the same dates during 2014 were used to define the observation templates needed to generate the simulated BTs, with the year simply changed from 2014 to 2006. The viewing pattern for EON-MW in the F18 orbit were simulated by shifting the SNPP ATMS viewing pattern in longitude to match the F18 orbit and then assigned the times from the F18 orbit. For the TRMM orbit, we calculated the EON-MW viewing geometry from the location of the TRMM satellite and the assumed EON-MW scanning pattern. Thus the relevant geometric parameters inherited by the simulated EON-MW observations are the altitude (833 km for ATMS and 402 km for TRMM) and swath widths (2300 km for ATMS and 878 km for TRMM). For F18’ and TRMM’ all longitudes for F18 and TRMM were shifted 30° eastward and 180° eastward, respectively.

The outputs from CRTM are so-called ‘perfect’ observations, which in fact have errors due to the interpolation from the NR, the conversions by the forward operators, and the representativeness errors that arise because of the difference in scales between the NR and the DA system model. For the OSSEs, normally distributed uncorrelated random observation errors are explicitly added to the perfect BTs. In the current experiments we used NEΔT values as the estimate of the standard deviation of these explicitly added observation error. Except for EON-MW, where the design values (Table 4) were used, the NEΔT values were obtained from the STAR Integrated Calibration/Validation System (https://www.star.nesdis.noaa.gov/icvs/status_NPP_ATMS.php). For ATMS, these values are approximately equal to the pre-launch values given in Table 4. For all other observations (i.e. all non-radiance observations) the explicitly added observation error standard deviations are estimated by the Errico et al. (2013) procedure. We note here that for DA, the ‘real’ errors include instrument, forward problem, and representativeness errors (see Hoffman et al. (2017b) for a thorough discussion). The representativeness errors in this context include the variability on all spatial scales between the scales of the observations (25–50 km for typical MW BTs) and the scales resolved by the DA system (100–200 km for typical global NWP DA systems). In any case it is an idealization to assume that the simulated observation errors are normal, uncorrelated, and unbiased and have a constant variance as was done in the present study. Such idealized errors are less challenging to DA systems, which are designed to filter out just such errors. Future OSSEs might model the simulated observation errors more realistically, with biases and standard deviations that vary by location or synoptic situation.

5.

## Geophysical capability assessment

Sensor data typically flows through a processing chain with three distinct data levels—engineering data, sensor data, and geophysical data—often referred to Level 0, 1, and 2 data or L0, L1, and L2 data. For EON-MW L1 data are BTs and associated meta-data, while L2 data contain retrieved profiles of temperature and water vapor mixing ratio, as well as surface temperature and emissivity. Note that the L1 and L2 error characteristics are critical to how the data are used. For satellite IR and MW sensors, DA systems often assimilate radiances or BT (L1 data). However, assessments of geophysical capabilities conducted using retrieval algorithms (i.e. L1 to L2 converters) can be extremely useful and avoid some of the limitations that are inherent in current DA systems. For example, a new MW sensor might provide potentially valuable hydrometeor profiles that are not used in the current DA systems, in part, perhaps because this type of data has not been available from previous sensors.

There are many sources of error in the retrieval process and in practice the information value of a satellite IR and MW sensor can be assessed by applying the CRTM or a similar forward problem within a variational retrieval scheme. For an existing sensor, the retrieved geophysical profiles can be compared to in situ observations (e.g. radiosondes), operational analyses, or retrievals from an already validated sensor. For a new (proposed) sensor, everything can be done in simulation or by analogy with an existing similar sensor. In the present study, the variational retrieval was replaced by a machine learning tool based on deep(-layer) neural networks (DNNs) (Boukabara et al., 2019). Although not critical in the present case, such new tools can provide orders of magnitude computational speedup.

Figure 4 compares the retrieval accuracy for profiles of temperature and water vapor mixing ratio for four MW sensors—AMSU-A/MHS, SSMIS, EON-MW, and ATMS. For this figure, the four sensors were simulated by CRTM from the G5NR for 1 and 8 August 2006, using the locations and times of day of the observations from the real ATMS on SNPP for 1 and 8 August 2014 (about 250000 observations per day). These simulations, unlike the other simulations in this study, have no explicitly added errors, but do include the effects of clouds by providing the G5NR cloud liquid water and cloud ice content profiles to the CRTM. It is reasonable to train the ML model using simulated observations without errors. Including errors in the validation data set would increase the retrieval errors, but should have little impact on the comparison of the different sensors, which is the object of this calculation. The 1 August cloudy simulated data and corresponding G5NR geophysical data were used to train one DNN for each sensor (Shahroudi et al., 2019a). Each DNN has 3 hidden layers with 35, 40, and 35 nodes and retrieves the temperature profile, the water vapor mixing ratio profile, three integrated cloud parameters, and the skin temperature. Then, the resulting machine learning model retrievals for the independent sample of 8 August were verified against the G5NR at the 72 G5NR vertical layers. The capabilities of the four sensors to retrieve geophysical parameters are similar, but there are notable differences: At levels around 500 hPa, SSMIS is better in terms of RMSE temperature, but for pressure levels around 200 and 800 hPa the SSMIS RMSE water vapor mixing ratio is larger than the other instruments, in part due to larger biases. EON-MW is as good or better than the other sensors for the RMSE of water vapor mixing ratio, most notably in the upper troposphere from 300 to 150 hPa. The bias seen in Fig. 4 for all the sensors is less than approximately 2 K for temperature and less than approximately 10% for mixing ratio. With a bigger and more representative training set we should expect near zero bias. The current situation is due to the difference of global BT between the training day (1 August) and the verification day (8 August).

Fig. 4.

Vertical profiles of retrieval mean errors (bias, a, c) and RMSE (b, d) for temperature (K, a, b) and water vapor mixing ratio (%, c, d) for four MW sensors (colors) for the 8 August 2006 independent sample described in the text.

Results presented in Fig. 4 show that EON-MW has capabilities to derive temperature and water vapor similar to other MW radiometers. In Fig. 5, AMSU-A/MHS, SSMIS, EON-MW, and ATMS are compared in terms of summary assessment metrics (SAMs, Hoffman et al., 2017a, 2018). SAMs are averages of normalized primary assessment metrics (PAMs), including in this example the RMSE, bias, and correlation. The normalization used is the empirical cumulative density function (ECDF) determined from the sample including all treatments, in this case AMSU-A/MHS, SSMIS, EON-MW, and ATMS. Consequently under the null hypothesis that the different treatments are indistinguishable each normalized assessment metric (NAM) has a uniform probability on the unit [0,1] interval with a mean of 1/2. Also, under the null hypothesis, the SAMs are approximately Gaussian and have mean 1/2, and variance 1/(12n), where n is the number of NAMs averaged. Here, SAMs are calculated separately for temperature from the surface to 10 mb (48 layers) and for water vapor mixing ratio from the surface to 100 mb (35 layers). The ECDF normalized PAMs are unitless and are approximately distributed uniformly on the interval from 0 (worst) to 1 (best). Results for a second type of normalization, the rescaled minmax normalization is also shown in the figure (as black outines) to give a sense of the sensitivity of the method to the normalization method (Hoffman et al., 2018). The use of SAMs increases statistical significance, but note that in Fig. 5 the individual PAMs are assumed independent. Thus, SSMIS and ATMS are statistically significantly better than AMSU-A/MHS and EON-MW for temperature retrievals, and EON-MW is statistically significantly better than the others for water vapor mixing ratio retrievals. Some of the other differences between sensors is within the 95% confidence interval error bars plotted at the ends of the SAM color bars. For example, SSMIS is somewhat better than ATMS for temperature retrievals but these SAM differences based on the limited sample used are not statistically significant. The reader should note that the additional SSMIS capabilities to retrieve surface properties (e.g. ocean surface wind speed, precipitation, and sea-ice concentration) and mesospheric temperatures were not included in the above comparisons. Further, the results presented in this section must be considered preliminary and the precise numerical values are not meaningful. The results are expected to vary with the time period of the sample and the retrieval method used

Fig. 5.

SAMs for AMSU-A/MHS, SSMIS, EON-MW, and ATMS for temperature (left) and water vapor mixing ratio (right) for the same sample as Fig. 4. In this case, the PAMs include RMSE, correlation, and bias of temperature and water vapor mixing ratio profiles. The color bars are for ECDF normalization and the black outlines are for rescaled minmax normalization (Hoffman et al., 2018). Confidence intervals for the ECDF SAMs are plotted at the 95% level and grey shading indicates the 95% confidence interval for the null hypothesis (H0) that there is no difference between sensors.

Figure 6 provides a geophysical capability assessment, i.e. the ability to describe the environment, of EON-MW alone, the suite of SNPP sensors (ATMS, CrIS, and VIIRS) and the constellation of sensors on SNPP, NOAA-18, and MetOp-B. This analysis shows the complementarity of the various sensors. In Fig. 6 each panel is a table with geophysical variables (measured by MW or IR sensors) as columns and instrument attributes as rows. Each cell is assigned optimal, intermediate, or low capability based on comparing the sensor, suite or constellation attributes to the maximum and minimum useful values. The individual sensor attributes are based on DNN retrievals explained above and on the instrument specifications. The maximum and minimum useful values are based on the current NOAA retrieval algorithms such as the Microwave Integrated Retrieval System (MiRS) and the NOAA Unique Combined Atmospheric Processing System (NUCAPS) (https://www.star.nesdis.noaa.gov/portfolio/productCatalog.php). Sensor attributes better than the maximum useful value are assigned the optimal capability, attributes worse than the minimum useful value are assigned the low capability and otherwise (i.e. in between) the sensor attributes are assigned the intermediate capability. When combining results for multiple sensors, the best capability of all the individual sensors in each table cell is kept. For this capability assessment, ATMS and EON-MW are equivalent because of the overall similarity of these two sensors. While EON-MW (or ATMS) alone leaves many gaps in desired observation capability, the combination of sensors in the SNPP suite or the polar orbiting constellation is very capable, scoring optimal for most parameters and for most characteristics. The main limitations for the constellation of sensors on SNPP, NOAA-18 and MetOp-B is that vertical resolution, characterized by the weighting function half widths and the number of channels, while adequate for many purposes, is less than optimal.

Fig. 6.

Geophysical capability assessment for different geophysical variables (columns) and key sensor attributes (rows) for (top) EON-MW, (middle) all SNPP sensors (ATMS, CrIS, VIIRS), and (bottom) SNPP sensors complemented by the sensors on two additional polar orbiters—AMSU-A/MHS and HIRS/4 on NOAA-18 and AMSU-A/MHS, IASI, and HIRS/4 on MetOp. Among the attributes, performance refers to error size, density to the observation spacing, and reliability to impact of losing one sensor in a constellation. The color in each cell indicates the capability assessment for that geophysical variable (column) and attribute (row). Green, yellow, and red indicate optimal, marginal, and low capability, respectively.

6.

## NWP analysis and forecast impact assessment

In Section 5 we considered how to assess the observation system capabilities in terms of the L1 and L2 observations themselves. Here, in this Section 6 we shift our focus from the observation space, i.e. from intrinsic data impacts, to the model space, i.e. to application dependent model impacts. Here, OSSEs are conducted to evaluate proposed future EON-MW observations in current global NWP applications.

6.1.

### Methods

Global observing system experiments (OSEs) are data denial experiments to determine the impact of existing observing systems on global scale NWP. First, a Control experiment is run that parallels current practice. Then identical Test runs are made excluding or adding a particular observing system. OSSEs determine the impact of new observing systems by performing data denial experiments similar to OSEs, but using simulated observations. OSSEs provide a rigorous, cost-effective approach to evaluate the potential impact of new observing systems and alternate deployments of existing systems, and to optimize observing strategies. Well designed OSSEs can account for many aspects of the interaction of data and DA system, but there are always limitations and caveats (Hoffman and Atlas, 2016) including the quantification of observation error characteristics based on specifications or preflight engineering studies and the fact the today’s DA system will evolve prior to launch. Further discussion of the limitations and caveats of this study is included in the conclusions (Section 7). OSSEs are also used to prepare for the assimilation of new types of data in order to accelerate their application to operational prediction, as well as to optimize the assimilation of existing data (Hoffman and Atlas, 2016). The OSSE process is depicted in Fig. 7. In an OSSE, the DA system (GDAS/GFS in the figure) assimilates simulated observations prepared by interpolating a realistic NR to observation locations given by a template file and applying different forward models: the CRTM for radiance observations, simple vertical interpolation for conventional observations, and the GPSRO forward model for GPSRO observations. Normally distributed uncorrelated random observation errors are explicitly added to the simulated radiances, conventional observations, and GPSRO profiles. Except for the NR, taken to represent the ‘true’ atmosphere, and the simulation of the observations, everything in the OSSE setup is made to be as similar as possible to the operational DA and forecast system. Of course in the OSSE framework we can validate analyses and forecasts versus the truth. In our experiments we use the CGOP, which has been documented and validated by Boukabara et al. (2016, 2018b). In summary, the CGOP includes

Fig. 7.

Flow chart of the OSSE system. Here, blue are inputs, orange are processes, and green are calculated quantities, although calculated quantities such as the error added simulated observations are also the input to the next process. See text for explanation.

• A NR, the G5NR, a 2-yr (May 2005–May 2007), 7-km-resolution, non-hydrostatic GEOS-5 forecast (Gelaro et al., 2015; Putman et al., 2015);
• Forward operators, to simulate error-free observations, including the CRTM for BTs (Chen et al., 2012; Zhu et al., 2012) and a GPSRO observation simulator (Cucurull et al., 2013);
• An observation error addition procedure (Errico et al., 2013, described briefly in Section 4);
• A DA and forecast system, here the GDAS including the hybrid 3DEnVar or 4DEnVar GSI DA system (Kleist and Ide, 2015a, 2015b) and the GFS (NOAA, 2015); and
• A forecast model, the global spectral model (GSM) (NWS, 2014).

In contrast to other recent SmallSat OSSEs (Shahroudi et al., 2019b; Zhou et al., 2019), the current experiments do include explicit observation errors for all observation types. The standard deviations of these errors are given by the estimated NEΔT values for all radiance observations and by values determined from the Errico et al. procedure for all other observations. Similar to those recent studies, the current study employed a research version of GDAS/GFS configuration with reduced resolution of T670 (20 km resolution) for the deterministic forecast, and T254 (53 km resolution) for the ensemble forecasts and data assimilation. The 64-layer sigma-pressure hybrid coordinate and 80-member ensembles are the same as the currently operational configuration. The OSSE system has been extensively validated by Boukabara et al. (2018a). This study showed that relative performance in parallel OSEs and OSSEs are well matched even in the case of no explicitly added observation errors when the absolute performance in the OSSEs is superior to that in the OSEs.

For the current experiments the CGOP GSI DA system has been updated from 3DEnVar to 4DEnVar. The GSI combines variational and Kalman filter methods. The main GSI analysis is a high-resolution updated version of the model state that is obtained by variationally combining information from the observations and the short-term (or background) high-resolution deterministic forecast. This analysis uses a hybrid estimate of the background error covariance that combines a climatological estimate and a dynamic uncertainty estimate. The dynamic uncertainty estimate is derived from a parallel low-resolution ensemble Kalman filter DA. At every cycle the ensemble is re-centered around the high-resolution variational analysis in order to synchronize the Kalman filter DA to the variational DA.

A new routine was added to the GSI for processing EON-MW observations that closely parallels the methodology used operationally for ATMS observations in terms of observation error characterization, quality control (QC), and data thinning. As is the case for all radiance observations, channel-channel correlations are ignored by the DA system for EON-MW. The estimated observation error (EOE) standard deviation required by GSI (Table 4) are taken from the operational system and have been tuned to reflect instrument error, forward problem error and representativeness error. As with other MW sensors, ATMS channels sensitive to the surface and moisture have higher forward problem and representativeness errors and hence higher estimated observation error standard deviation than the temperature sounding channels. Although the explicitly added errors are larger for EON-MW than for ATMS (Section 4) the same error standard deviations are used in GSI for EON-MW and ATMS. This holds for the surface effected channels and the water vapor channels because the forward problem and representativeness errors are the same and dominate the total observation error. For the higher peaking temperature channels the GSI closely fits ATMS and will therefore overfit EON-MW in our experiments. Thus, in the OSSEs, as is the case in reality, the GSI values differ from the true observational error standard deviations. The QC routines that were implemented for EON-MW are analogous to those for ATMS. First, following operational practice, all channels are used except that all Channel 15 observations are eliminated (i.e. blacklisted) because of the high noise level in that channel (see Table 4). (However, with sufficient spatial filtering, the data from this highest peaking channel could be useful.) Second, BTs that would be adversely affected by cloud liquid water (CLW) present in the G5NR are eliminated. Channels adversely affected by surface emissivity are also eliminated. This helps to create a realistic pattern of observation used by the analysis even though all radiances are simulated for clear-sky conditions. However, the G5NR overestimates cloud water amounts globally and precipitation over land and in the Pacific Intertropical Convergence Zone (ITCZ) (Gelaro et al., 2015). Therefore, due to this QC procedure, we expect the GSI to use fewer MW observations in the OSSEs than in reality. (See discussion of Fig. 8.) Third, a background check removes any points where the departure of the EON-MW observation from the background (O–B) exceeds 3 times the observation error. There is no explicit thinning in the OSSE because data is only simulated at locations in the template files that had already been selected by the thinning procedures of the operational system. That is, simulated observations were only created for those real observations that were actually provided to the operational DA system after operational preprocessing (i.e. after being thinned and passing gross QC). The operational thinning operates on each sensor separately and selects the best observations closest to the analysis time and that are not closer than a set distance (typically 145 km) to observations already selected.

Fig. 8.

DA diagnostics for BT globally averaged over two weeks for (top, a–d) ATMS from an impact experiment using real observations with the same set up as experiment ATMS but for 2014 and for (bottom, e–i) EON-MW from the PM experiment. For each of ATMS and EON-MW, the following statistics are given as a function of channel: (a, e) global mean of O–A and O–B (K), (b, f) global standard deviation of O–A and O–B (K), (c, g) the bias correction (K) relative to the analysis and background, and (d, i) the number of observations assimilated (counts/day).

Observations used in the experiments described here come from a variety of platforms including conventional platforms (surface stations, ships, radiosondes, etc.) and satellites in LEO or geosynchronous equatorial orbit (GEO). As described earlier for the simulated EON-MW observations, all observations are simulated at the times and locations of the 2014 real observations with the year changed to 2006, with the exception of the addition of GOES-16 atmospheric motion vectors from 2017. Except for IR and MW sensors in polar orbit, and the new EON-MW sensors (in polar or TRMM-like orbits) all OSSEs use the same observations, which include:

• All conventional observations normally used by the (January 2015 implementation of the) GDAS, except for aircraft observations;
• All satellite based winds (cloud track winds, scatterometer winds, etc.) normally used by the GDAS;
• All radio occultation observations normally used by the GDAS; and
• Radiances normally used by GDAS from the GOES-15 sounder (SNDR) and Meteosat-10 SEVIRI sensors.

Table 2 lists the orbit(s) (i.e. platform(s)) for each LEO IR and MW sensor used in each experiment. Note that only a subset of all available radiance observations are included in the experiments. The basic scheme outlined for simulating EON-MW observations in Section 4 is followed for all other observation types, except that CRTM is replaced with either the forward operator of Cucurull et al. (2013) for radio occultation observations and simple vertical (logarithmic in pressure) interpolation for the conventional and atmospheric motion vector (AMV) observations. The following caveats are noted here: First, the simulated AMVs are located where actual AMVs were available in 2014. These locations are only climatologically correct since the 2014 actual cloud cover does not match the NR cloud cover. Similarly, the simulated radiosondes ascend vertically rather than following the wind in the NR. These factors might have some impacts on the sensitivity of the analysis to these data types. Second, if more data from other satellites (e.g. AIRS and AMSU on Aqua, and SSMIS on F18, all sensors on Chinese satellites) and from aircraft were included in the Control configuration, the OSSE impacts of adding EON-MW observations might be less. The other satellite data were excluded by design for to enhance the impact of the configurations studied within computer resource constraints. The aircraft data were not included inadvertently due to a 2015 change in input formats that removed aircraft observations from the customary conventional observation input file.

All experiments listed in Table 2 were initialized at 1800 UTC 7 August 2006 from day 7 of the Control OSSE that itself was initialized at 1800 UTC 31 July 2006 from an operational analysis. Each of the OSSEs run from 8 August through 15 September, with the DA system cycled every 6 h at the four synoptic times—0000, 0600, 1200, and 1800 UTC. The first 7 days are considered a spin up period and all assessments are based on the DA cycles and initial forecast times from 0000 UTC 15 August through 0000 UTC 15 September. The GFS 0 to 168 h forecasts were initialized by the 0000 UTC GDAS analysis each day during the assessment period.

6.2.

### Results

Results are presented here in terms of analysis accuracy, forecast scorecards, and SAMs. One advantage of OSSEs is that the ‘true’ state of the atmosphere (i.e. the NR) is perfectly known. The global analyses and forecasts from OSSE experiments in this study are verified with respect to the G5NR. To assess the impact on the GDAS analysis, the bias and RMSE of the analysis were calculated for geopotential height and relative humidity (RH) for different levels. To assess the impact on the GFS forecast, the RMSE, bias (absolute mean error or AME) and anomaly correlation (AC) were calculated for geopotential height, temperature, RH, and vector wind, for different forecast times, different levels and different domains. Some of the ACs are not calculated because in the standard NCEP Verification Statistics Data Base (VSDB) package climatologies are only available for certain level and variable combinations. As a result, there are no RH ACs and ACs are missing for some levels for the other variables.

In order to check that the DA system makes similar use of the real and simulated BT observations, DA diagnostic statistics were compared between experiment PM and a similar OSE assimilating real ATMS observations. This OSE was run for the period 25 May through 7 August 2014 and statistics were collected for the last two weeks of this period. For these experiments, Channel 15, which has the highest noise level of all the channels (see Table 4) is blacklisted (not used). Figure 8 shows that the O–A statistics closely follow the O–B statistics which is expected when the observations are being used consistently. The largest changes from O–B to O–A are seen in the 183 GHz humidity channels (17–22), where the addition of ATMS or EON-MW reduces the bias by about 0.1 K for each channel and for each cycle. Biases are somewhat larger for EON-MW compared to ATMS, with the largest values being close to 1 K. Biases and standard deviations are all small enough that we can conclude that the DA system is fitting the EON-MW simulated observations as closely as the ATMS real observations. As expected, the variational bias correction (VarBC) is making larger corrections in the real data case for ATMS compared to EON-MW OSSE, because observations in the OSSE include random errors, but no explicit biases. The DA system is also using more of the ATMS observations than for EON-MW. This is expected as discussed earlier due to the QC procedure based on cloudiness.

In an OSSE, we can also examine the actual analysis error—a key impact metric—since the truth is known from the NR. Accordingly, Fig. 9 shows the vertical profiles of the mean (left panels) and standard deviation (right panels) of the analysis error of geopotential height (top panels) and relative humidity (bottom panels) for 6 of the experiments.2 Geopotential height bias increases linearly through the troposphere to a maximum of approximately 7 m, while geopotential height RMSE reaches as much as 12 m in the mid-troposphere. Relative humidity bias is uniformly less than 3% in the troposphere, while relative humidity RMSE is typically between 10 and 15%. Differences between experiments are very small. Even the differences between Control and 1Polar are small, on order of 1 m of geopotential height or 1% of RH. In the figure the curves for Control and the experiments that add one or two sensors to Control are packed very close together. Errors are larger for 1Polar and PM, an experiment that adds an EON-MW sensor to 1Polar, and the differences between 1Polar and PM RMSE for relative humidity show a small (order of 1%) but consistent improvement for most of the troposphere (800 to 200 hPa) due to adding the EON-MW sensor.

Fig. 9.

Vertical profiles of analysis mean error (i.e. bias, a, c) and analysis RMSE (b, d) for geopotential height (m, a, b) and relative humidity (%, c, d) for the different OSSEs (colors) for the global domain.

Figure 10 shows one of the many possible scorecards that were made comparing pairs of experiments. This scorecard compares AM to Control for forecast days 1, 3, 5, and 6 (or forecast hours 24, 72, 120, and 144). The metrics include AC and RMSE for different variables—geopotential height, vector wind, and temperature—at different vertical levels, over different regions—North America, Northern Hemisphere extratropics (NHX), Southern Hemisphere extratropics (SHX), and Tropics. Colors and shapes reflect the improvement or degradation of the impact as explained in the legend. Most of the changes are small, but there are significant improvements for AM for height forecasts in the stratosphere. Note that these stratospheric levels are not included in the figures that follow.

Fig. 10.

Scorecard comparing forecast skill for anomaly correlations and RMSE for different variables at different levels for different forecast lengths for AM vs. Control. The symbols and colors indicate the probability that AM is better than Control. As shown below the scorecard, the green symbols (from left to right) indicate that AM is better at the 95%, 99% and 99.9% significance levels, respectively, while the red symbols indicate that AM is worse at the 99.9%, 99% and 95% significance levels, respectively. Gray indicates no statistically significant differences and blue indicates that the anomaly correlations in the tropics are not considered.

Figures 11 and 12 display summary assessment metrics (SAMs, Hoffman et al., 2017a, 2018) for the eight experiments globally and for various categories. Under the null hypothesis that there is no difference between the experiments each SAM would have an expected value of 1/2, which is the base of the color bars, and 95% of the SAMs would be within the grey shading around 1/2. These SAMs are calculated for the categories given along the x-axes in Figs. 11b and 12, i.e. for eight forecast times (from 0 to 7 days), for five levels (250, 500, 700, 850, 1000 hPa), for three domains (NHX, SHX, Tropics), for four variables (Z, T, V, RH), and for three statistics (AC, RMSE, AME). The SAMs are averages over the 32 forecast verification times (0000 UTC 15 August to 15 September 2006) as well as over all categories except the category along the x-axis. Correlations are accounted for along each dimension (i.e. between forecast times, levels, domains, variables and statistics) in assessing confidence intervals using the method of Hoffman et al. (2018). To avoid the limitation of the relatively small sample size of the current experiments, for the SAMs presented here, we use the factors to reduce the sample size to an effective sample size that were calculated by Hoffman et al. for three years of forecasts from three operational global NWP centers.

Fig. 11.

Forecast impacts in terms of (a) global ECDF SAMs and (b) ECDF SAMs as a function of forecast time for each experiment (colors). The color bars are for ECDF normalization (Hoffman et al., 2018). Confidence intervals for the ECDF SAMs are plotted at the 95% level and grey shading indicates the 95% confidence interval for the null hypothesis (H0) that there is no difference between experiments. Correlations have been accounted for in determining the confidence intervals as noted in the text. Note that the forecast time zero SAM in panel (b) is the analysis SAM since verification is with respect to the G5NR.

Figure 11a presents the global SAMs, which average over all NAMs listed in the previous paragraph. The experiments based on Control are significantly better than those based on 1Polar, but more detailed comparisons are not statistically significant. Compared to Control adding more data is more notably helpful for SSMIS, 2AM, and 2Tropics than for AM. There is some improvement for AM, but in this configuration replacing SSMIS requires more than a single EON-MW. Compared to 1Polar adding more data is more helpful for PM than for ATMS. There is also improvement for ATMS, but a single EON-MW more than replaces a single ATMS in this very observation poor scenario. Note that AM vs. SSMIS and PM vs. ATMS are direct head-to-head comparison where the only difference is that SSMIS or ATMS is replaced by EON-MW.

Figure 11b presents the same plot as Fig. 11a but separately for each forecast time from 0 to 168 h. The decay of impact with forecast time has been seen in all previous OSEs/OSSEs that we have examined with SAMs and is caused by the mixing of model errors with initial errors. At time zero, the analysis SAMs show the full impact of the different observing systems. As the forecast proceeds, model errors, which have the same causes for all experiments being compared, grow and dominate the total errors and the differences between experiments decreases. Note that the analysis SAMs (at 0 h forecast time) show nearly significant improvement for SSMIS and 2Tropics compared to Control and for ATMS and PM compared to 1Polar. AM and 2AM show small analysis improvements compared to Control. All these improvements decay with forecast time.

Figure 12 presents SAMs for each level, domain, variable and statistic for both the analysis alone (color bars) and averages over the forecast times from 24 to 168 h (black outlines). (In Fig. 11, the black outlines and color bars are identical, but since the analysis impact was shown in Fig. 11b to be the most important, in Fig. 12 color bars are plotted for the analysis SAMs and black outlines for the forecast SAMs.) In Fig. 12a it is striking that AM is best at 250 hPa during the forecasts, but at lower levels and at all levels at the analysis time SSMIS and 2Tropics are better. ATMS has a somewhat negative impact at the lowest levels. In Fig. 12b, 2AM is slightly better in the extratropical forecasts, where it provides more observations than any other experiment, but worse than AM for the tropical analysis, where SSMIS and 2Tropics are both good. By variable, in Fig. 12c, although SSMIS and 2Tropics provide the best analysis, the results for Z and T are somewhat confused, with Control besting 2AM for the Z analysis. For V and RH, additional data, helps all the analyses and forecasts. For RH, 2Tropics and 2AM are better than SSMIS, and PM is better than 1Polar for both analyses and forecasts. Part of this improvement is likely due to the changes of EON-MW water vapor bands (Ch-17-22) that are seen in Table 3 and discussed in Section 2. Finally, in Fig. 12d, 2AM is best in terms of forecast AC, while for RMSE and AME (forecast bias), SSMIS and then 2Tropics are best.

Fig. 12.

Analysis and forecast impacts in terms of ECDF SAMs by (a) level (hPa), (b) domain, (c) variable, and (d) statistic for each experiment (colors). The color bars are for the analysis SAMs (forecast hour 0 only) and the black outlines are for the forecast SAMs (forecast hours 24 through 168). In this figure, confidence intervals are plotted at the 95% level for the forecast SAMs and grey shading indicates the 95% null hypothesis (H0) confidence interval for the analysis SAMs.

7.

## Discussion and concluding remarks

Two types of observing system assessment were carried out for EON-MW, a 12U CubeSat analog of ATMS. Since they have different strengths and weaknesses, different approaches are used here to assess (1) the geophysical capability of EON-MW and (2) the forecast and analysis impact of EON-MW on the GDAS. This two-pronged assessment gives a more complete picture of the value of a proposed observing system. This is the case because current DA systems have some limitations. (More about this below.) Therefore, the full potential of current and proposed sensors is not exploited, and may not be exposed by analysis and forecast impact tests. As DA systems evolve, they will exploit more and more of the information value of the observations. Consequently, geophysical capability assessments are a valuable adjunct to assessments based on OSSEs that use a complete, but current, DA and forecast system.

For EON-MW, the sensor characteristics and geophysical capabilities assessments are straightforward. EON-MW was compared to ATMS in terms of instrument specifications, weighting functions, and retrievals. The complementarity of different sensor sets to ATMS or EON-MW was also examined. The results of these sensor specifications and geophysical capability assessments are that EON-MW is close to equivalent to ATMS. This result was anticipated since the EON-MW channels and sensor specifications closely follow those of ATMS. However, there is evidence that EON-MW has more useful humidity information than ATMS. Thus, impacts were expected to be seen in the OSSE water vapor results because of the differences in the water vapor channels (Table 3) in terms of central frequency, band width, and the number of pass bands (2 for ATMS and 1 for EON-MW).

The assessments presented in this paper do not address the pros and cons comparing a suite of SmallSats to a single large observatory. For example, a constellation of MW and IR SmallSats even if flying in close formation could not attain the degree of image registration provided by current LEO satellites, potentially leading to changes in the way the data can be used or loss of accuracy for some applications. On the other hand, SmallSats are more economical, have a quicker development cycle and are easier to launch; therefore SmallSats can provide more robust solutions for a lower price and allow for greater risk-taking in testing new measurement concepts and in advancing cutting edge technologies.

The OSSEs described in Tables 1 and 2 were designed to evaluate several different EON-MW constellations for gap mitigation and/or replacement of ATMS in the context of global NWP. In the OSSEs, the truth is taken to be the G5NR. CRTM creates simulated observations from the geophysical profiles of temperature and moisture evaluated from the G5NR. Each experiment was designed to address one of the questions in Table 1. As in most OSSE studies, clear-cut unambiguous results are limited by sample size and the chaotic nature of the atmospheric dynamics. Here, our results and conclusions are summarized in the context of these and other caveats. The summary findings, based principally on the SAMs, are the following:

1. All experiments show positive impacts when considering global (overall) SAMs. Adding one or two MW sensor to Control or one sensor to 1Polar yields analysis improvements that are nearly statistically significant in some cases.
2. AM (the Control + EON-MW on F18 experiment) compared to SSMIS (the Control + SSMIS on F18 experiment) shows that a single EON-MW sensor partially mitigates the loss of the DMSP satellite. This mitigation is not complete and SSMIS alone results in somewhat better analyses and forecasts relative to an EON-MW replacement.
3. 2AM (the Control + EON-MW on F18 and F18’ experiment) is comparable in terms of global skill to SSMIS, indicating that 2 EON-MW sensors in staggered early morning orbits, even if only two hours apart, are capable of replacing the troposphere and stratosphere temperature and humidity sounding capabilities of SSMIS. However, the analysis SAMs for AM and 2AM are comparable.
4. 2Tropics (the Control + EON-MW on TRMM and TRMM’ experiment) is also very comparable in terms of global skill to SSMIS and 2AM, but there is a substantial improvement in analysis skill for 2Tropics and SSMIS compared to Control, AM, and 2AM.
5. As expected, 1Polar significantly reduced global analysis and forecast skills compared to Control.
6. PM (the 1Polar + EON-MW on SNPP experiment) mitigates the loss of the SNPP satellite. In this scenario, the loss of ATMS is fully compensated by just a single EON-MW. That is, ATMS (the 1Polar + ATMS on SNPP experiment) compared to PM shows that EON-MW improves (but not significantly improves) analysis and forecast skill compared to ATMS alone in this otherwise data poor scenario.

The reader should note that these quantitative impacts depend on the DA system used (here, a low resolution research version of the operational NCEP system), and the metrics examined (here, metrics describing the large-scale global forecast skill).

In the results presented here, differences between the Control-based scenarios or 1Polar-based scenarios are not statistically significant at the 95% level (but are nearly so for some of the analysis SAM comparisons). One reason is the limited sample size in these experiments. A second reason is that for the current GOS there are many satellite sensors and eliminating or adding a single sensor or satellite may not have a large impact on forecast skill. To partially ameliorate this, the Control experiment excludes BTs from some polar-orbiting sensors (e.g. AIRS and AMSU on Aqua, and SSMIS on F18). Aircraft observations were also excluded for reasons given in Section 6.1.

Two further issues in current DA systems present barriers to the full exploitation of the information value of observations, and especially for satellite radiance observations. First, there are representativeness errors. In the DA context, representativeness error is the variability present in observations, but not represented by the DA system, and is considered a component of observation error. This means that the DA system ignores some information contained in the observations and as DA system resolution increases, the DA system estimates of the errors of the satellite BTs will decrease, and more of the information contained in the BTs will be extracted by the DA system. Second, dense observing systems like satellite radiances often have correlated observation errors. These correlations are difficult to estimate and are often ignored in DA systems. Instead the DA estimate of the error standard deviations are inflated, or the data are thinned or replaced with super-observations. However, Bormann et al. (2010) and Stewart et al. (2014) estimated such error correlations. Bathmann (2018) investigated the convergence properties of the Desroziers method used for estimating observation error correlations. Bormann et al. (2011) argue that simple error inflation is not sufficient to account for the observed correlations. In any case, incorrect specifications of error characteristics reduce DA skill. The finding that two additional instruments add little to the analysis benefit obtained from a single additional instrument (AM vs. 2AM) may be related to these factors. Therefore additional tuning of the error standard deviations and data thinning used in the analysis, factors which shift the weight given to individual data sources within the analysis, should be the subject of further research.

In conclusion, impact assessments, like the ones described here support decision-making, but do not make decisions. An impact assessment based on the OSSE method focuses on the value of specific observing systems as they relate to specific performance metrics of specific (necessarily current) mission applications. The outcome of this type of assessment and its applicability are critically dependent on precisely how the experiment was framed. Accordingly, while impact assessments are important inputs to decision making, results delivered alone and out of context can lead to poorly informed decisions. Therefore, there is also a need to assess the inherent geophysical information value. Since current DA systems may not take full advantage of the potential of current and proposed sensors, underutilized sensors may have a greater value in a future DA system. Thus, geophysical capability assessments are needed to evaluate the potential utility of sensors outside of the context of DA system, and to point the way for how DA systems might evolve. Finally, there are also many critical investment/divestment decision factors, such as real cost, opportunity costs, partnership implications, exploitability and sustainability, that are not addressed by assessment experiments. In the case of EON-MW, the latest decision is that NOAA is funding the formulation phase of EON-MW. The nominal launch date is 2021.