APOGEE Visit Spectra Reduction

The first stage of the APOGEE data reduction pipeline (apred) reduces the raw spectra of consecutive, spectrally-dithered exposures of one visit (of a particular plate on a given night) and extracts the individual spectra of each of the objects on a plate.  The reduction includes a dark subtraction, flat-fielding, wavelength and flux calibration, an attempt to remove sky emission and absorption within the Earth’s atmosphere, and combines individual spectrally-dithered exposures into single spectrum for each object.  The pipeline also makes an initial estimate of the radial velocity of each object.  The visit reduction steps are described in more detail below.  See Nidevar et al. (2014) for further technical details.

Spectroscopic Observing Procedures

The APOGEE visit data reduction steps are easier to follow in the context of how the data are gathered, from plate plugging through the raw data collection.

Plate Plugging

When the observatory is ready to observe a plate, the observatory staff plug optical fibers into the holes drilled into the plates, and map which fibers correspond to which holes (and therefore which objects) by shining light through each fiber from the slit ends. These mapping data are incorporated into one of the Header and Data Units (HDUs) of the apPlate file described below.

Raw Data Collection

Observers mount cartridges containing the plugged plates on the telescope.   A series of 500s exposures are made for each plate. For most APOGEE plates, on most nights, 8 exposures are obtained on any given night, although this number can vary based on available time, observing conditions, etc.

The resolution of the APOGEE spectra in comparison to the pixel size of the APOGEE detector leads to the property that the spectrum from a single exposure slightly undersamples the resolution at the short wavelength end. To avoid the challenges of working with undersampled data, APOGEE spectra are taken in pairs, with the detectors shifted slightly (by a distance of one half of a detector pixel) between the two exposures of the pair; we refer to this shifting as dithering, and each “dither pair” of exposures includes observations at these offset dither positions. A standard 8-exposure APOGEE observing sequence consists of exposures at the two differ dither positions (A and B) and typically taken in the pattern ABBA ABBA. The data reduction requires exposures in dither pairs, so that any unpaired exposure is discarded.

APOGEE’s infrared detectors have the capability to be read “non-destructively”, so that the amount of charge per pixel can be detected without affecting that charge. This permits the levels on the detectors to be measured during the expsosure. While readout noise can be significant for a single read of the detectors, the ability to read them multiple times during an exposure allows for a reduction in the net readout noise in the final exposure. For APOGEE, the detectors are read in an “up-the-ramp” mode where the detectors are read every 10.7 seconds. Thus, a single exposure generally consists of 47 readouts over an exposure time of 500 seconds. Because of the multiple readouts, the raw APOGEE data for an exposure are actually in a “data cube”, with two of the dimensions representing the location on the detectors and the third dimension representing the time sequence.

APOGEE Visit Data Reduction

The apred sequence consists of three sequential steps:

Extract 2-dimensional images from the 3-dimensional raw data cubes and apply the basic calibration steps of dark subtraction and flat fielding.
Extract and calibrate 1-dimensional spectra from the 2-dimensional images and attach a wavelength calibration.
From the 1-dimensional spectra, measure the dither shifts between the individual exposures, subtract sky from each fiber, correct for telluric absorption in each fiber, combine the dithered exposures into a single well-sampled visit spectrum, perform flux calibration, and get an initial radial velocity estimate for the spectrum.


For each readout of each exposure, the raw data are first corrected for bias variations in the IR detectors and electronics. This is accomplished by using a reference array of pixels that are generated by the readout electronics, as well as a set of reference pixels around the edge of each detector.

Each individual readout is then corrected for a dark current contribution, by subtracting a calibration dark current frame made from a combination of multiple individual dark frames.

The data are then collapsed from the 3D data cubes into 2D images. This is done on a pixel-by-pixel basis. At the most basic level, a linear function is fit to the series of up-the-ramp readouts for each pixel to determine the best-fitting slope. A linear function is used to fit all exposures, even if conditions vary throughout the exposure. The best-fitting linear slope, multiplied by the total exposure time, is taken to be the flux at this pixel location for the exposure.

The up-the-ramp sampling allows for the recognition of cosmic ray events during the course of the exposure, which appear as significant jumps in the rate of charge accumulation in the up-the-ramp sampling. The ap3d software attempts to recognize these events, and flag the affected pixels.

The 2D images are then corrected for variations in pixel-to-pixel response by dividing them by a calibration flat field, which is constructed from an average of multiple exposure frames illuminated by a flat light source within the spectrograph.


The ap2d routine takes the calibrated 2D images and extracts individual 1D spectra for each exposure. This is accomplished by modelling the distribution of the light from each fiber as a function of wavelength. The flux from all 300 fibers is fit simultaneously to enable accounting for contributions of the wings of the light distribution from each fiber into the distribution of the two adjacent spectra. The profiles for each fiber are derived from a calibration frame taken throught the telescope immediately after the exposure sequence on each plate. The shape and magnitude of the contribution of light from the wings of the fiber into the adjacent fibers is estimated using a library of calibration observations where only every sixth fiber is illuminated.

After the 1D images are extracted, a wavelength calibration is applied, as determined from observations of arc calibration lamps. Because the APOGEE spectrograph is in a gravitationally-fixed orientation and is kept at a stable vacuum and temperature, the form of this wavelength correction is very stable, and a single wavelength calibration is adopted to determine the non-linear terms in the conversion between pixel location and wavelength. Note the the wavelength scale for each fiber is slightly different because of the different locations of the fibers in the pseudo-slit.

The wavelength calibration of the APOGEE data is done using vacuum wavelengths.  However, the wavelengths of atomic transitions are usually quoted at standard temperature and pressure (S.T.P.); this is how the CRC Handbook of Chemistry and Physics lists them for transitions redward of 2000 Ångstroms. Thus, recognizing spectral lines associated with specific atomic transitions may require converting the SDSS data to the equivalent values at S.T.P.  For APOGEE data, we have used the conversion from Ciddor (Applied Optics, Vol 35, p 1566, 1996) to convert between vacuum and air wavelengths. For a vacuum wavelength (VAC) in Ångstroms, convert to air wavelength (AIR) using the equation:

AIR = VAC / (1.0 +  5.792105E-2/(238.0185E0 - (1.E4/VAC)^2) + 1.67917E-3/( 57.362E0 - (1.E4/VAC)^2)

There are small linear shifts in the wavelength scale between different exposures, which result from (i) the intentional dithering of the detectors between exposures to allow for well-sampled combined images, and (ii) a small, slowly varying flexure in the instrument optical bench as the liquid nitrogen tank depletes over time (a larger “reset” shift occurs when this tank is filled, but this is always done during the daytime). The linear shifts are measured using prominent night sky emission lines that appear in every spectrum, and these shifts are applied to the wavelength solution.


The first stage in ap1dvisit determines to high accuracy the linear shifts between each exposure in a visit that result from the dithering of the detectors. This can be done at higher accuracy than the determination of the wavelength zeropoint from the sky lines by cross-correlating the different exposures with each other.

Each fiber of each exposure is then corrected for contribution of night sky emission. The IR portion of the spectrum includes significant numbers of very bright OH emission lines. There can also be some continuum sky contribution, especially when there is significant moonlight (and even more so when thin clouds are present). Sky subtraction is accomplished using 35 sky fibers that are distributed across each plug plate. Multiple fibers are used because the IR sky can be spatially variable. For each object, the sky is estimated from nearest four sky fibers. However, because the wavelength scale is not identical for each fiber, the sky spectra need to be shifted lightly before they can be subtracted. Also, because the profiles of the lines differ slightly from fiber to fiber, there are small differences that lead to imperfect sky subtraction, in particular, of the bright night sky lines. As a result, the sky subtraction of the bright night sky lines is imperfect, and essentially, the small regions surrounding each line are rendered useless for science. This is an area for potential improvement in the pipeline, but we note that even with perfect sky modelling, the signal-to-noise under bright sky lines would be significantly degraded compared with the surrounding spectrum.

The Earth’s atmosphere also leads to significant absorption in the observed spectra, which arises from CO2, H2O, and CH4 bands in the APOGEE spectral window. A correction for this telluric absorption is derived from observations of 35 “telluric” standards spread across the plate. These stars are chosen by their intrinsic color, with the goal of targeting hot stars having relatively few spectral features in the APOGEE wavelength region. Multiple telluric stars are chosen because the absorption can vary across the field of view. For each telluric standard, the amplitude of the absorption for the separate families of CO2, H2O, and CH4 bands are estimated by fitting model absorption spectra to the observed. A surface is fit to these scaling factors, and this surface is used to predict the appropriate scale factors to be used for each individual fiber. These scaling factors, along with model telluric spectra that are convolved with the fiber-specific line spread function, are used to correct each individual spectrum. This method seems to work reasonably well in many cases, but the telluric correction is still imperfect in some cases.

After sky correction, pairs of dithered frames are combined to produce well-sampled images. All of the different pairs are then combined to produce a single spectrum of each object for the visit.

The final visit spectra are then approximately flux calibrated. The relative flux calibration is performed using a calibration frame that tabulates the instrument spectral response, as determined from an observation of a blackbody source. The absolute level of the spectrum is then determined using a scaling with the objects catalog H-band magnitude. We note that subsequent pipeline for the analysis for stellar parameters and abundances (ASPCAP) normalizes the spectra to a pseudo-continuum, so the flux calibration done here is not critical.

Finally, an initial radial velocity (RV) estimate is made by cross-correlating each visit spectrum with a grid of synthetic spectra. The best matching one serves as a template, and the derived shift between the observed spectra and the best-fitting templates provide an initial RV estimate. Note that this estimate is later refined using multiple visits to the same object, because these provide a higher signal-to-noise spectrum.

Output visit spectra: apVisit files

The final dither-combined spectra from a given visit are written into individual apVisit files, as described in detail in the apVisit data model. See the documentation on APOGEE data for more information on how to retrieve these.

Multiple visit spectra of the same object are combined in the next stage of the pipeline, visit combination.