Photometric Redshifts

Summary

Data Release 12 includes photometric redshift estimations for all primary photometric measurements tagged as galaxies (i.e. the elements of the GalaxyTag view). The current version features a greatly expanded training set, an updated method of template fitting and a more detailed approach to errors. As opposed to previous releases, we only provide one version of the table, with one estimation technique; in particular, the photozRF table has been discontinued.

This page summarizes the methods used to calculate the photometric redshift estimates, with details to follow in Beck et al. (in preparation, 2015).

Methods overview

The estimation method is the same as the one used in Data Release 10; following the name used in Csabai et al. (2007), we refer to it as a kd-tree nearest neighbor fit (KF). The KF estimates are stored in the table Photoz.

The method is empirical in the sense that it uses a training set as a reference, then applies a machine learning technique to estimate redshifts. The training set contains photometric and spectroscopic observations for galaxies. We have chosen this approach - as opposed to template fitting methods - because of the machine learning techniques' higher overall precision. The second estimation method was dropped because we have found that the main limiting factor in the accuracy of the results is the composition and photometric errors of the training set, not the choice of machine learning technique.

To infer values of physical parameters of galaxies, such as k-corrections, spectral type, and rest frame colors, we extend the KF method with a conservative method of template fitting. We determined the best-fitting template via a minimum chi-square fit to the photometric magnitudes, using the composite spectral template atlas of Dobos et al. (2012). The photometric errors were calculated using the prescriptions of Scranton et al. (2005).

The previous method used in Data Release 10 calculated a non-negative linear combination (NNLS) of spectral model templates. While this method is more sophisticated, it is prone to overfitting, and it also allows non-physical spectral solutions, which is especially a problem in cases where the photometric errors are underestimated. The current method is limited by the number and coverage of templates used, but it avoids the aforementioned issues.

Training set overview

The training set is made up of three main subsets. The first two are extracted from the DR12 spectroscopic catalog: the main galaxy sample, containing more than 830,000 galaxies (average r magnitude: 17.3, average redshift: 0.15, extending to 0.5), and the BOSS sample, comprised of over 1,060,000 galaxies (average r: 19.7, average redshift: 0.45, extending to 0.8). We also matched results from nine public spectroscopic redshift surveys to SDSS photometry, yielding 76,000 additional galaxies in the third set (average r: 19.1, average redshift: 0.25, extending to 0.8). We applied cuts in the photometric color space and errors to ensure higher accuracy, but at the same time greatly limiting the size and redshift coverage of the third subset. The RMS of the estimation errors for the three parts of the training set are 0.029, 0.050 and 0.070, respectively. Fainter objects generally have significantly higher photometric errors, which results in them having larger errors in redshift estimation, too.

Error fields and flags

The error statistics of the reference set are only good indicators of the error of the estimated redshifts when the objects to be estimated follow the same distribution in colour space and have the same photometric error properties as the training set. The KF method provides an explicit estimate of the redshift errors (zErr), and we have found this estimate to be reliable and unbiased if these assumptions hold.

The new flag field photoErrorClass in the Photoz table divides the galaxies into 7 categories based on their photometric errors, 1 being the best and matching the limits of the training set, 2 being somewhat worse, and so on until 7, the worst. Also, the sign of the photoErrorClass field shows whether the estimated galaxy is within the bounding box of its k nearest neighbors: negative if outside (meaning that we extrapolate), positive if inside the bounding box (we interpolate). The following table shows the average RMS for different photoErrorClass values, calculated for all galaxies with available redshifts.

Average RMS for different photoErrorClass values
photoErrorClass	RMS	photoErrorClass	RMS
1	0.043	-1	0.066
2	0.074	-2	0.17
3	0.074	-3	0.15
4	0.085	-4	0.16
5	0.097	-5	0.16
6	0.11	-6	0.17
7	0.17	-7	0.26

The redshift error (zErr) values in the Photoz table are only valid estimates for photoErrorClass 1. For other error classes, the additional statistical error needs to be taken into account, however, it is highly dependent on the location in color and magnitude space. We recommend using photoErrorClass 1 (and perhaps -1, 2 and 3, at the most), with additional filtering based on the zErr values.

We added table PhotozErrorMap, which provides supporting information regarding the error-dependence in color space, based on the training set. It shows the average actual RMS, the average error estimate, the average of the standard deviation of the k nearest neighbours, the average photometric and spectroscopic redshift, and the number of galaxies in the training set, for a grid in r magnitude, and g-r, r-i colors. This table can be used to pinpoint regions with poor training set coverage, and with bad error estimates.

The KF method provides some additional parameters that can be useful for quality assurance. For each galaxy in the Photoz table, nnCount is the number of nearest neighbors, after removing outliers. A value much smaller than 100 indicates poor training set coverage for that galaxy. Similarly, the parameter nnVol (the volume of the bounding box) warns if the reference set is only very sparsely populated around that galaxy. Although the spectroscopic redshift of the nearest object (nnSpecz) and the average nearest neighbor redshift (nnAvgZ) are not as good estimators as the fitted redshift (z), significantly different values might indicate large errors. Note that in all the related tables instead of NULL values we use the large negative value of -9999 to indicate that the estimation was not possible for some reason, or that data is not available.

Template fitting details

After the photometric redshift of each galaxy is determined, template fitting is used to estimate the galaxy's k-correction, distance modulus, absolute magnitudes, rest frame colors, and spectral type. We consider the templates at the fixed redshift given by the KF estimator. Where applicable, - to match what is used elsewhere in SkyServer/CasJobs - Omega=0.2739, Lambda=0.726, h=0.705 cosmology was assumed, where the unit of the luminosity distance is Mpc. The chisq and rnorm values indicate the quality of the minimum chi-square fit, and bestFitTemplateID identifies the spectral template giving the best fit. Note that bestFitTemplateID=0 indicates a failed fit. The empirical spectral templates described in Dobos et al. (2012) were used, and the following table shows the bestFitTemplateID values with the corresponding names from the library (http://www.vo.elte.hu/compositeatlas).

bestFitTemplateID values with the corresponding names from the spectral library
bestFitTemplateID	name	bestFitTemplateID	name	bestFitTemplateID	name
1	p_RG	14	GG	27	s_G
2	h_RG	15	p_BG	28	G
3	hh_RG	16	h_BG	29	RED0_0
4	t_RG	17	hh_BG	30	RED1_0
5	l_RG	18	t_BG	31	RED2_0
6	s_RG	19	l_BG	32	RED3_0
7	RG	20	s_BG	33	RED4_0
8	p_GG	21	BG	34	SF0_0
9	h_GG	22	p_G	35	SF1_0
10	hh_GG	23	h_G	36	SF2_0
11	t_GG	24	hh_G	37	SF3_0
12	l_GG	25	t_G	38	SF4_0
13	s_GG	26	l_G

Examples

Two examples of how to query photometric redshifts in DR12 data is shown in SkyServer at Sample Queries: Photometric Redshifts.

External survey data

The following table references the spectroscopic redshift surveys that we used to extend the training set.

Spectroscopic redshift survey references
Survey name	Reference	Website
2dF	Colless et al. (2001), Colless et al. (2003)	http://magnum.anu.edu.au/~TDFgg/
6dF	Jones et al. (2004), Jones et al. (2009)	http://www.6dfgs.net/
DEEP2	Davis et al. (2003), Newman et al. (2013)	http://deep.ps.uci.edu/
GAMA	Driver et al. (2011), Baldry et al. (2014)	http://www.gama-survey.org/
PRIMUS	Coil et al. (2011), Cool et al. (2013)	http://primus.ucsd.edu/
VIPERS	Garilli et al. (2014), Guzzo et al. (2014)	http://vipers.inaf.it/
VVDS	Le Fèvre et al. (2004), Garilli et al. (2008)	http://cesam.oamp.fr/vvdsproject/vvds.htm
WiggleZ	Drinkwater et al. (2010), Parkinson et al. (2012)	http://wigglez.swin.edu.au/site/
zCOSMOS	Lilly et al. (2007), Lilly et al. (2009)	https://cosmos.astro.caltech.edu/page/photoz

Note on redshifts with random forests (photozRF)

Prior SDSS data releases included photometric redshift estimates derived using another method based on random forests. Those estimates were stored in a CAS table called photozRF and were accessible through SkyServer and CasJobs.

Random forest-based photometric redshift estimates are no longer used in Data Release 12 (DR12), because they perform significantly worse at estimating photometric redshifts of the faint red galaxies targeted in BOSS. We recommend using the values in the photoZ table in all cases.

If you need data from the photozRF table, you can still get values from prior SDSS data releases. The most recent SDSS release to contain photozRF was Data Release 10 (DR10). The DR10 estimates are described in the DR10 photometric redshifts page and in Carliles et al. 2010.

Photometric Redshifts

Summary

Methods overview

Training set overview

Error fields and flags

Template fitting details

Examples

External survey data

Note on redshifts with random forests (photozRF)

Explore

Learn

About

SDSS is supported by