NASA Logo, National Aeronautics and Space Administration
CDAWeb
+ FEEDBACK
CDAWeb banner

README for Binning in CDAWeb

CDAWeb now allows most quantities to be binned in time. Thus, for example, plasma data measured at 80 second resolution could be binned to give 5-minute resolution if the higher cadence was not required. The choice of begin and end times and a binning interval produces a unique set of bins that are then filled with all points that fall within them. The set of values in each bin is averaged to get the resulting value. This means that disparate datasets can be binned in the same way such that scatter plots and other inter-comparisons are possible that would often not be possible at native resolution. Also, the standard deviation of the values in each bin is returned if there are more than three points in the bin. If error bars are provided for the input data, then the value of the output "standard deviation" is equal to the input error if that error is larger than the standard deviation.

New variables are created in the form <var>_nbin that provide the number of points in each bin. For bins that have no values, the user can choose to either use a fill value or to linearly interpolate across sets of missing values. The latter is especially useful when the data will be used for such things as Fourier transforms, where uniformity is assumed.

All dependent variables that can be binned are time dependent (denoted 0th order dependence). However, many dependent variables have higher order dependencies, for example plasma wave data spectrograms are at least dependent on both time and frequency. If a dependent variable has one or more higher order dependencies that are time varying, the binning code decomposes the data into unique subsets (modes) for which the higher order dependencies are not time varying in each subset. Of the subsets only the dominant one (highest occurrence) is binned, and the dependent variable in the other subsets is set to fill value. If the number of subsets is larger than 5 then the data will not be binned. For example, some time varying angular dependencies are decomposed into more that 400 subsets and this binning method would likely lead to a severe decimation of the data.

Finally, a spike removal algorithm is included that allows the elimination of outlier points. The algorithm is described in https://hpde.gsfc.nasa.gov/CleanAlgorithm.pdf. It is based on a running multi-sigma criterion. The choice of how "aggressive" the criterion is for eliminating "bad points" is controlled in a drop-down menu. The choices correspond to a "six, five, and four sigma" criterion for the extreme, moderate, and mild outlier options. Most recent datasets have been despiked before being posted on the SPDF (CDAWeb), but a number of older sets still contain outliers that this procedure eliminates effectively.

9/20/2018

NASA Logo - nasa.gov