The Cannon

Note that the APOGEE data in DR15 are identical to those in DR14, but use the DR15 documentation pages for reference.

Stellar Parameters and Abundances: the Cannon

DR15 includes an alternate set of stellar parameters and abundances as derived from a data-driven method called the Cannon (Ness et al. 2015, Casey et al. 2016). This technique parameterizes the spectral fluxes as a function of a set of externally-determined stellar parameters and abundances. In principle, these could be any physical or parameterized quantities, so they are generically referred to as labels. The Cannon method uses a training set of stellar spectra to determine the values of function parameters that best match the training set spectra; this function is then applied to a larger data set to derive their unknown labels. The method has the advantage of exploiting all of the information that may be present in the stellar spectra, even if a complete physical understanding of it is lacking. For more details, refer to the papers listed above.


As described in Holtzman et al. (in press at AJ), we have trained the Cannon on the calibrated ASPCAP stellar parameters and abundances for giant stars, so it is critical to note that the Cannon results are not an independent determination of these. However, in some cases, the Cannon may give better results because it can respond to features in the spectra that may not be well-modeled in the ASPCAP analysis -- e.g., lines with imperfect atomic data, lines missing from the line list, lines that are not well-modeled with the 1D LTE approach used by ASPCAP, among others.

Cannon results have been determined using the Cannon-2 code (Casey et al. 2016), with some modifications specific to APOGEE data, which we now described.

  • For the uncertainties in the input spectra, we have adopted the same uncertainties used in the ASPCAP pipeline. In particular, these uncertainties use better knowledge of the sky spectra to mask broader regions around skylines, which are often imperfectly subtracted.
  • For the labels that are associated with individual elemental abundances, we use "censoring" in the Cannon parameterization. This means that we only allow pixels that we expect to be affected by the abundance of the element based on our line list. Censoring in this way minimizes the potential issue of having (intrinsic, astrophysical) correlations of abundances of different elements in the training set imposing such correlations on the full data set. We have found that without censoring, such correlations can lead to abundances that appear to be of higher precision, but this precision may not reflect higher accuracy if the correlations are not present over the entire data set. The implementation of censoring was done by using the elemental windows used by the ASPCAP analysis; it is possible that this is overly conservative because the ASPCAP windows reject regions in the spectrum that have abundance sensitivity if they are also sensitive to other abundances in the same elemental abundance group.
  • The use of censoring has a significant effect on results from the Cannon. Without censoring, results from the Cannon almost always show less scatter in astrophysical trends than the ASPCAP results. However, with censoring, scatter from the Cannon can be larger, and, for some elements, exceeds the scatter in ASPCAP results.

In practice, we apply the following steps to derive the Cannon labels:

  1. The combined apStar spectra are normalized, following the prescription in Casey et al. (2016).
  2. A training set is constructed that attempts to sample a wide range of stellar parameters. We split the Teff-log g-[M/H] space into cubes covering the range 3500 K$\le$Teff$\le$5500 K, 0$\le$log g$\le$3.9, and -2.5$\le$[M/H]$\le$0.5, and take the 50 stars with the highest signal-to-noise in each cube. These restrictions result in a training set of $\sim$1500 stars. For the labels of this subsample, we adopt the calibrated ASPCAP Teff, log g, [M/H], [alpha/M], and [X/H] for 20 different species. Note that we restrict the training set to giant stars, both because of increasing uncertainties in (or lack of) the ASPCAP calibration relations for dwarfs, and because with a broader range of stellar parameters, the quadratic parameterization used by the Cannon-2 is inappropriate.
  3. A Cannon model is trained on this sample, using the full spectrum for $T_{\rm eff}$, log $g$, [M/H], and [alpha/M], but using wavelength censoring for the labels that refer to individual element abundances. We adopt the ASPCAP windows for the individual elemental abundances as the wavelength censors.
  4. The trained model is applied to the remainder of the APOGEE data whose ASPCAP parameters fall within the range of the labels adopted for the training set.

Using Cannon Labels

The Cannon-derived labels have an identical format to the ASPCAP-derived ones and are delivered in files that mimic the ASPCAP data products as closely as possible (see below).

It is important to note that all the issues and caveats associated with ASPCAP apply to the Cannon results, because the latter depend on the former. See further description of these issues in the ASPCAP documentation and in APOGEE caveats.

In addition, users should pay attention to the distribution of $\chi^2$ in their Cannon results, which indicate roughly how closely the Cannon model, using the derived labels for a given star, is able to reproduce that star's spectrum. Stars with large $\chi^2$ will not have reliable results.

An in-depth discussion of the Cannon results can be found in Holtzman et al. (2018, submitted). A comparison of the Cannon results to external abundance studies can be found in Jönsson et al. (in press at AJ).

Data Products

The data products from the Cannon pipeline have a similar format to those from ASPCAP.

For example, the allStarCannon file compiles all of the Cannon label results in a single SAS file and CAS table, analogous to the allStar file with ASPCAP results. The allStarCannon file has been constructed to be a line-for-line match with the allStar file, to make it simple to use either ASPCAP or Cannon results, or to compare them. However, the allStarCannon file does not repeat all of the parameter/abundance information contained in the allStar file: it replaces it with Cannon label results.

The cannonStar files contain the Cannon results for individual stars, including labels, normalized spectra, uncertainties, and Cannon recreated spectra, in FITS image format on the SAS.

The cannonField files bundle up all of the cannonStar results for stars in a given field, analogous to the aspcapField files. These contain the APOGEE spectra and Cannon recreations, as well as the derived Cannon label values, in FITS format on the SAS.

In addition, we provide the ASPCAP allStar entries for the stars used to train the Cannon model in a FITS table on the SAS, along with the trained Cannon-2 model itself.

Cannon Data
File Type/Name Description SAS Location CAS Table
allStarCannon catalog of all Cannon stellar parameters and abundances FITS file cannonStar
cannonField Cannon stellar parameters and abundances for stars in a single field path/LOCATION_ID/ ---
cannonStar Cannon labels, normalized spectra, uncertainties, and best fit spectra for a single star path/LOCATION_ID/cannonStar*.fits ---
cannonTrainingSet IDs, properties, and labels for stars in the training set FITS file ---
cannonModel trained Cannon model, containing the pixel coefficients and covariances FITS file ---