Optical Spectra Data Quality Flags

Here we describe the data quality flags for SDSS-III/IV spectroscopic data. The data quality information is broken down on a per-plate and per-object basis. The basic idea is that unique plates or spectra known to be science quality are called "primary", so selecting quality spectra should be easy. In addition, we have separated the Legacy program in SDSS-I and -II as a special case and tracked the unique, best spectrum for each Legacy target to make it easy to define uniform samples.

Plate Quality

Each plate is assigned a quality (PLATEQUALITY) with one of three values:

good
a good science quality plate
bad
a plate whose results should be treated with skepticism

For SDSS-I and II, each plate has three flags set to 0 or 1:

IS_BEST
set to 1 if this plate is the best observation of a plate (whether or not it is marked as bad); 0 otherwise.
IS_PRIMARY
set to 1 if this plate is the best observation of a given plate, and the observation is not marked as "bad"; 0 otherwise. A plate can only be IS_PRIMARY if it is also IS_BEST.
IS_TILE
set to 1 if this plate is the best Legacy plate for covering its location; select all "IS_TILE" plates to get just the Legacy survey; 0 otherwise. A plate can only be IS_TILE if it is also IS_PRIMARY.

Selecting plates which are not "bad" will yield a good sample of spectra. Many of the "bad" plates actually contain useful data (in particular, many highly certain redshifts). However, bad plates should be treated with care (in particular, they may have bad spectrophotometry or residual sky subtraction problems).

The PLATEQUALITY string is set independently for each observation (labeled by its MJD) of each plate. For DR8 plates the definition varies depending on whether the plate is an SDSS plate (that is, has survey set to 'sdss'), a SEGUE-1 plate (that is, has survey set to 'segue1'), or a SEGUE-2 plate (that is, has survey set to 'segue2').

For SDSS-III and IV, the conditions are based on the signal-to-noise and the fraction of bad pixels. The thresholds also changed mid 2018:

Before 2018-05-23 (MJD 58029): SN2_B > 10 AND SN2_R > 22 AND FBADPIX < 0.10 -> 'good'
After  2018-05-23 (MJD 58029): SN2_B >  8 AND SN2_R > 18 AND FBADPIX < 0.10 -> 'good'
otherwise -> 'bad'

For SDSS plates, the conditions are based on the signal-to-noise and the fraction of bad pixels:

PLATESN2 > 15 AND FBADPIX < 0.05 -> 'good'
PLATESN2 > 9  AND FBADPIX < 0.13 -> 'marginal' (if not 'good')
otherwise -> 'bad'

For SEGUE-1 plates, the conditions are based on the signal-to-noise of the main sequence turnoff at g = 18 (stored as SNTURNOFF), except for some special plates:

for faint plates SN of turnoff @ g=18 > 16 -> 'good'
for bright plates SN of turnoff @ g=18 > 7.5 -> 'good'
for low-latitude or test plates, set by hand

For SEGUE-2 plates, the conditions are also based on the signal-to-noise of the main sequence turnoff at g = 18:

median(SN for MS-turnoff @ g=18) > 10 -> 'good'
otherwise -> 'bad'

Finally, for many plates we have simply identified the data as bad by hand, and flagged them as such. The conditions used are noted in the QUALITY_COMMENTS fields in the files.

Spectrum Quality

Quality information is also available on a per-object basis. In particular, to select a unique set of targets, one wants to select on one of the following fields:

SPECPRIMARY
set to 1 if this is the best observation of a particular position on the sky; 0 otherwise.
SPECLEGACY
set to 1 if this is the best observation of a particular position on the sky from a Legacy plate; 0 otherwise

To be "primary", the catalog entry has to be observed on a "primary" observation of a plate as defined above (the best observation of a plate, and not bad quality). Of course, the same object can be observed on different, but neighboring plates, and such duplicates are removed with the following preferences in decreasing order of importance:

  1. Prefer observations with positive SN_MEDIAN
  2. Prefer observations with ZWARNING_NOQSO=0
  3. Prefer observations with larger SN_MEDIAN

The same criteria apply to be a Legacy spectrum, except the catalog entry must be on a Legacy plate (not just a primary plate.

As implied above, there are some general pieces of information stored in the catalog about the quality of the spectrum:

SN_MEDIAN
a "median" signal-to-noise per resolution element from the four spectrographs
ZWARNING_NOQSO
a bitmask flagging anything unusual about the spectrum; sometimes these are benign, sometimes they indicate errors.

Pixel masks

Quality information also exists on a per-pixel basis as well. There are uncertainty values associated with the flux in each pixel, and there are also bitmasks recording information about of the quality of each pixel.

HDUs 1 to 3 of the spPlate files store the error and mask information. HDU1 stores the "inverse variance" of the uncertainties (one over sigma-squared, that is). This quantity may be used, for example, in model fits to the spectra. It is set to zero for pixels that should be ignored entirely (another way of thinking about it is that they have infinite error).

HDU2 and HDU3 store the pixel mask information. These images yield a bitmask for each pixel, in particular the SPPIXMASK bitmask. Since the final spectrum is a combination of individual exposures, it may be that some bits were flagged in some exposures but not in others. HDU2 is the "and mask", which lists all the bits that were set for that pixel in all exposures. HDU3 is the "or mask", which lists all the bits that were set for that pixel in any one (but not necessarily all) of the exposures. The "and mask" (HDU2) is the mask of greatest use.