APOGEE Visit Spectra Reduction

This page gives a brief description of the first stage of the APOGEE Data Reduction Pipeline (apred), which reduces the raw spectra of consecutive, spectrally-dithered exposures of one visit of a particular plate on a given night.

Overview of APRED

Essentially, the apred pipeline stage extracts the individual visit spectra for each of the objects targeted on a plate and observed on a given night. While some details of the reduction pipeline have changed over the course of the APOGEE and APOGEE-2 surveys, Nidever et al. (2015) describes most of the procedures. Updates are provided in the focused APOGEE Data Release Papers and summarized here.

Data obtained from both the APOGEE-N and APOGEE-S spectrographs are processed in the same way, but different calibration frames are used. The visit-level data handling collected with the NMSU-1m (connected to the APOGEE-N spectrograph) is discussed in Holtzman et al. (2015).

APOGEE Visit Data Reduction

The apred sequence consists of three sequential steps:

Extract 2-dimensional images from the 3-dimensional raw data cubes and apply the basic calibration steps of dark subtraction and flat fielding.
Extract 1-dimensional spectra from the 2-dimensional images and attach a wavelength calibration.
From the 1-dimensional spectra, measure the dither shifts between the individual exposures, subtract sky from each fiber, correct for telluric absorption in each fiber, combine the dithered exposures into a single well-sampled visit spectrum and perform flux calibration.


For each readout of each exposure, the raw data are first corrected for bias variations in the IR detectors and electronics. This is accomplished by using a reference array of pixels that are generated by the readout electronics, as well as a set of reference pixels around the edge of each detector.

Each individual readout is then corrected for a dark current contribution, by subtracting a calibration dark current frame made from a combination of multiple individual dark frames.

The data are then collapsed from the 3D data cubes into 2D images, which is done on a pixel-by-pixel basis. A linear function is fit to the series of up-the-ramp readouts for each pixel to determine the best-fitting slope. A linear function is used to fit all exposures, even if conditions vary throughout the exposure. The best-fitting linear slope, multiplied by the total exposure time, is taken to be the flux at this pixel location for the exposure.

The up-the-ramp sampling allows for the recognition of cosmic ray events during the exposure. Cosmic ray events appear as significant jumps in the rate of charge accumulation within the series of data points in up-the-ramp sampling. The ap3d software attempts to recognize these events using this signature and then flags the affected pixels.

The 2D images are then corrected for variations in pixel-to-pixel response by dividing them by a calibration flat field. The calibration flat field is an average of multiple exposure frames illuminated by a light source within the spectrograph. This light source is neither spatially nor spectrally flat, so it serves only to remove pixel-to-pixel response variations.

Attempts have been made to correct for the persistence in the APOGEE-N that affected a third of the “blue” detector before it was replaced in 2014 and also impacted a smaller fraction of the “green” detector. Based on an analysis of illuminated frames followed by a series of long dark frames, a double-exponential fit for the amplitude of the persistence was derived for all pixels. This correction, described in Holtzman et al. (2018), depends only on the exposure level and elapsed time. It was only applied to the "blue" detector for the 2011-2014 data. This correction is only partial and does not wholly remove the persistence issues. Therefore, during the visit combination step, visits that have been significantly affected by persistence are down-weighted. Note that the persistence effects in the Southern spectrograph are minimal.

The processed individual exposure 2D frames are saved as ap2D/as2D (for APOGEE-N/APOGEE-S) files.


The ap2d routine takes the calibrated 2D images and extracts individual 1D spectra for each exposure by modeling the distribution of the light from each fiber along the slit as a function of wavelength. The wings of the light distribution from each fiber can affect adjacent spectra; thus, the flux from all 300 fibers is fit simultaneously, allowing for contributions to the light from a fiber and its two adjacent neighbors. The profiles for each fiber are derived from an flat-field frame taken through the telescope (with the mirror covers closed and illuminated) immediately after the exposure sequence. The shape and magnitude of the contribution of light from the wings of the fiber into the adjacent fibers are estimated using a library of calibration observations where only every sixth fiber is illuminated. The modeling is not perfect, so objects in which the star in the adjacent fiber is much brighter than the star in the extracted fiber are flagged.

The external flat field frames are also used to correct for fiber-to-fiber throughput variations and to remove some structure from sensitivity variations across the detector.

After the 1D images are extracted, a wavelength calibration is applied. The general wavelength solution is determined from observations of arc calibration lamps, and a separate zeropoint for each exposure is determined from sky emission lines. Because the APOGEE spectrograph is in a gravitationally-fixed orientation and is kept at a stable vacuum and temperature, the form of the general wavelength solution is very stable. Annual wavelength solutions have been derived from stacks of individual arc calibration frames taken throughout the year. Note that the wavelength solution is slightly different for each fiber because of the distinct locations of the fibers in the pseudo-slit.

There are small linear shifts in the wavelength scale between different exposures. These result from two sources: (i) the intentional dithering of the detectors between exposures to allow for well-sampled combined images, and (ii) a small, slowly varying flexure in the instrument optical bench. The flexure in the optical bench occurs as the liquid nitrogen tank depletes over time (a larger "reset" shift occurs when this tank is filled, but this is always done during the day). The linear shifts are measured using prominent night sky emission lines that appear in every spectrum, and these shifts are applied to the wavelength solution.

The wavelength calibration of the APOGEE data is done in vacuum wavelengths. However, the wavelengths of atomic transitions are usually quoted at standard temperature and pressure (S.T.P.); this is how the CRC Handbook of Chemistry and Physics lists them for transitions redward of 2000 Ångstroms. Thus, recognizing spectral lines associated with specific atomic transitions may require converting the SDSS data to the equivalent values at S.T.P.  For APOGEE data, we have used the conversion from Ciddor (Applied Optics, Vol 35, p 1566, 1996) to convert between vacuum and air wavelengths. For a vacuum wavelength (VAC) in Ångstroms, convert to air wavelength (AIR) using the equation:

AIR = VAC / (1.0 +  5.792105E-2/(238.0185 - (1.E4/VAC)^2) + 1.67917E-3/( 57.362 - (1.E4/VAC)^2)

The processed individual exposure 1D frames are saved as ap1D/as1D files.


The first stage in ap1dvisit determines the linear shift between each exposure with a visit; these shifts result from the dithering of the detectors. A linear shift is determined to higher precision than a direct measurement of the wavelength zero point (e.g., determined from the sky lines) by cross-correlating the different exposures with each other.

Each fiber of each exposure is then corrected for the contribution of night sky emission. The IR portion of the spectrum includes a significant number of very bright OH emission lines. There can also be some continuum sky contribution, especially when there is substantial moonlight or when thin clouds are present. Sky subtraction is accomplished using sky fibers that are distributed across each plug plate. Multiple fibers are used to take into account variations in the IR sky. For each object, the sky is estimated from the nearest four sky fibers. However, as the wavelength scale is not identical for each fiber, the sky spectra need to be shifted before they can be subtracted. Also, because the line profiles differ slightly from fiber to fiber, the pipeline limits sky fibers to be no more than 75 fibers along the slit from the fiber being subtracted. Even still, LSF variations and interpolation issues lead to imperfect sky subtraction, particularly for the brightest night sky lines. Because the sky subtraction for the bright night sky lines is non-ideal, there are small regions of the spectra that are effectively rendered useless for science surrounding each sky feature. Sky removal remains an area for improvement in the pipeline. We note, however, that even with perfect sky modeling, the signal-to-noise under bright sky lines would be substantially degraded compared with the surrounding spectrum.

The Earth's atmosphere also leads to significant absorption in the observed spectra, which arises from CO2, H2O, and CH4 bands in the APOGEE spectral window. A correction for this telluric absorption is derived from observations of "telluric" standards spread across the plate. The goal is to use hot stars that exhibit relatively few spectral features in the APOGEE wavelength region to determine the strength of the telluric absorption. To achieve this, each plate targets a number of blue stars based on their intrinsic color. Multiple telluric stars are chosen for each plate because the absorption can vary across the field of view. For each telluric standard, the amplitude of the absorption from the separate families of CO2, H2O, and CH4 bands are estimated by fitting model absorption spectra to that observed. A surface is fit to these scaling factors and this surface is used to predict the appropriate scale factors for each fiber. The individual-fiber scaling factors, together with model telluric spectra that have been convolved with the fiber-specific line spread function, are used to correct each individual science spectrum. Significant improvements have been made to the telluric correction over time, but there are still some cases where the correction remains imperfect, since it depends on the accuracy of the model and also of the LSF determination.

After sky correction, pairs of dithered frames are combined to produce well-sampled images. The spectra from each pair are then combined to create a single "visit" spectrum for each object observed.

The final visit spectra are approximately flux calibrated. The relative flux calibration is performed using the observations of the hot telluric stars, which are assumed to have a $F_\lambda\propto \lambda^{-4}$ shape. The absolute level of the spectrum is then determined using a scaling based on the object's catalog H-band magnitude. We note that the subsequent pipeline for the analysis for stellar parameters and abundances (ASPCAP) normalizes the spectra to a pseudo-continuum, so the flux calibration done here is not critical.

Prior to DR17, an initial radial velocity (RV) estimate was made at the ap1divisit stage even though it was not subsequently used; this has been turned off for the DR17 processing.

The corrected individual exposure frames are saved as apCframe/asCframe files.

Output visit spectra: apVisit/asVisit files

The final dither-combined spectra from a given visit are written into individual apVisit/asVisit files, as described in detail in the apVisit/asVisit data model . Here, the "ap" refers to spectra taken from the northern spectrograph at APO and "as" refers to spectra taken from the southern spectrograph at LCO.

See the documentation on APOGEE data for more information on how to retrieve these visit-level spectra.

Multiple visit spectra of the same object are combined in the next stage of the pipeline: Visit Combination.