Large-Scale Structure Galaxy Catalog

How do I create a create a large-scale structure (LSS) galaxy catalog?

This page describes the the necessary files and processes used through DR11 to produce BOSS LSS catalogs from lower level SDSS-III files. If you are able to successfully navigate this tutorial, you can review the updated processes described Reid et al. (2016) for BOSS DR12, Ata et al. (2017) for eBOSS DR14 quasars, and Bautista et al. (2018) for eBOSS LRGs in order to reproduce those LSS catalogs.

If you simply would like to access the LSS catalogs, they are described here.

The first section describes the necessary files and where to get them, and the subsequent sections each describe a step in the procedure.

Necessary Files

In order to follow this procedure, you will have to have the following files.

Target list: bosstile-final-collated-boss2-boss32.fits - data model
Redshifts: specObj-dr10.fits - data model
Acceptance mask: boss_geometry_2012_11_19.fits or boss_geometry_2012_11_19.ply data model
Rejection mask (bad fields): badfield_mask_unphot-ugriz_pix.ply
Rejection mask (bright stars): bright_star_mask_pix.ply
Rejection mask (centerposts): centerpost_mask.ply
Rejection mask (collision priority): collision_priority_mask.ply

Create target photometric catalog

The first step toward producing a uniform large scale structure catalog is to generate a list of objects targeted with the same target selection algorithm. This is complicated by the fact that the final photometry for an object (as given in the photoobj table) may not match the photometry when the targeting algorithm was run for that object. Early chunks used an earlier version of the photometric data, and one must use the correct target photometry when generating a catalog. Also, the target selection algorithms changed slightly after the early chunks (or, in the case of LOWZ, after bugs in target selection were fixed). To this end, we have created a merged target list, using the correct target photometry for each object, and have created a version of the BOSS_TARGET1 target flag that reflects the final target selection, as applied to the appropriate photometry for each object. The target list file, linked above, includes a bitfield--BOSS_TARGET1_009--that can be used just like BOSS_TARGET1 to select a set of galaxy targets:

LOWZ: (BOSS_TARGET1_009 > 0) && ((BOSS_TARGET1_009 && 2⁰) > 0)
CMASS: (BOSS_TARGET1_009 > 0) && ((BOSS_TARGET1_009 && 2¹) > 0)

Selecting objects with these flags will remove chunks 1-6 for LOWZ (where the target selection bug appeared), and will use the correct version of the photometry to select targets for CMASS, applying the final version of target selection (i.e., with the final brighter magnitude cut). If you want to see the values of the target photometry (e.g., magnitudes) that were used to target each object, the run/rerun/camcol/field/id values in this file refer to the target photometry in photoObj. Note that some targets fall into both the LOWZ and CMASS selection boxes. This is not a problem if you are only considering objects from one of the samples. If you are performing a combined analysis using both sets of targets, you should assign a redshift cut to separate them. The minimum in their number densities falls around z ~ 0.43.

Classify (match) objects with redshifts

Objects matched to the target photometry should be classified as one of four types:

good redshift from SDSS-I/II ("legacy"), ZWARNING == 0, SPECPRIMARY == 1, classified as GALAXY.
good redshift from BOSS observations (classified as a galaxy or a star, with SPECPRIMARY == 1), using Z_NOQSO.
redshift failures (ZWARNING_NOQSO > 0).

Objects should be matched in the above sequence. If an object matches multiple times, the last match overrides the others. These all contribute differently when computing the completeness in each sector on the sky.

Redshift matching details.

Match objects from your target catalog with redshifts from the specObj file. Use the PROGRAMNAME field in that file to select "boss"and "legacy" redshifts:

PROGRAMNAME == "boss"
or
PROGRAMNAME == "legacy"

Match targets to their redshifts with RUN, RERUN, CAMCOL, FIELD, ID (in the target list) and TARGETOBJID (in specObj). See the ObjID glossary entry for a description of the relationship between these fields, and how to convert between them. Warning: TARGETOBJID in specObj is a 22-character string (because of problems with unsigned 64-bit integers in fits binary tables), so you will have to strip it of whitespace and convert to an unsigned 64-bit int before performing the match. From this matched target/redshift list, select the individual primary, "best" spectra, (to ignore multiple observations of the same object) with the following applied to all of the selections below:

SPECPRIMARY == 1

Select "good" redshifts from the "best" observations with:

(ZWARNING_NOQSO == 0) and (PROGRAMNAME == "boss")

(ZWARNING == 0) and (PROGRAMNAME == "legacy")

Select "failed" redshifts from the "best" observations with:

(ZWARNING_NOQSO > 0) and (PROGRAMNAME == "boss")

(ZWARNING > 0) and (PROGRMANAME == "legacy")

To use the redshift failure correction of Anderson et al. (section 3.5), you need to separate good star redshifts from good galaxy redshifts. Select objects with spectra that are best fit as "star"from "best" redshifts with:

Weights

Fiber correction weights

Because of the finite size of the fibers, objects closer than 62" cannot have spectra taken on the same plate. In order to correct for this, one must apply a set of weights to those targets that have "collided" with other targets within that radius. There are a number of different choices for how to correct for fiber collisions, including using the nearest-neighbor redshift (used in Anderson et al. 2012) projected-correlation function weights (used in White et al. 2011), and using the sectors covered by multiple plates to directly compute the correlation function of collided fibers (detailed in Guo et al. 2012). See the respective papers for details on these different methods. In addition, Guo et al. (2012) provides a comprehensive overview of the pros and cons of each method.

Redshift failures weights

As discussed in detail in Section 2.3 of Ross et al.(2012), a small fraction of targeted BOSS galaxies do not obtain a redshift (1.8% for CMASS, 0.4% for LOWZ), and the distribution of these redshift failures is not uniformly distributed; to remove these spurious fluctuations, the weight of a redshift failure galaxy is applied to its nearest neighbor in the analysis of the CMASS and LOWZ samples.

Angular systematic weights

In DR10, we find both seeing and stellar density to correlate with galaxy density, as suggested in Ho et al. 2012 and Ross et al. 2012. We correct the effects by taking a seeing map and a stellar density map and calculate a weight (for each galaxy). For more details see Anderson et al 2013. (in prep). For DR10, we apply the weights derived to each galaxy according to the stellar density and seeing at its position:

weight_star = Astar_interpol + (Bstar_interpol*starden)

weight_seeing = 2.0d0/Aseeing/(1.-erf((seeingval-Bseeing)/Cseeing))

weight_systot = weight_star*weight_seeing

Minimum variance weights

To minimize the error in the measured clustering signal, weights based on the sample redshift distribution, such as FKP or J3 should be applied. In the DR10 LSS catalogs, FKP weights were applied.

Weights summary

The weights we have described above should be combined using Equation 18 of Anderson et al. 2012 to generate a final weight for each galaxy. Note that use of the J3 weights (as in Reid et al. 2012) is slightly more complicated, as the J3 weight is applied to pairs of galaxies.

Angular Selection function

The masks describe the regions of sky observable by BOSS. The masks are spherical polygon files in the mangle format. The mask includes both an acceptance mask (regions of the sky that were included in the survey), and an rejection mask (regions of the sky that are explicitly excluded). The rejection mask removes regions around bright stars, the center posts of the plates, fields with bad imaging data, and regions where other targets had priority for being assigned a fiber. When computing a correlation function statistic, we use points uniformly randomly distributed in the mask to trace out the geometry. The program ransack, distributed with mangle, will generate uniformly distributed randoms in an inclusion mask. Note that ransack will not work with fits files, but we have provided a mangle .ply formatted version of the file for this purpose.

Acceptance masks

There is one acceptance mask file given in the file list above. Accept all objects (galaxy targets and randoms) that are contained within the polygons in the acceptance mask.

Rejection masks

There are 4 rejection mask files given in the file list above. Reject all objects (galaxy targets and randoms) that are in the polygons in the bright-star mask, centerposts mask, collision priority mask, and bad field mask.

Determine BOSS completeness by sector

This completeness specifies the probability in a given sector of the survey of obtaining a redshift for a target, and is an input for creating the angular mask of your galaxy sample. Anderson et al. 2012 (section 3.3) provide details on how to account for redshift failures, fiber collision corrections, and legacy objects when computing the sector completeness. The completeness in sectors containing no BOSS targets is ambiguous (these sectors are typically very small). In Anderson et al., we chose to remove such sectors if they were not surrounded in every direction by nearby sectors within 2 degrees that had spectroscopic observations, or if they were smaller than 0.1273 square degrees.

Downsample the legacy sample and close pairs to BOSS completeness

The DR10 analysis subsamples the "legacy" galaxy sample in each sector based on its "BOSS" completeness so that the full galaxy sample is described by a single mask. This uses the sector completeness defined above. This random downsampling of "legacy" galaxies ensures that it has the same selection function as the BOSS sample. Moreover, one redshift is removed from fiber collision BOSS-legacy and legacy-legacy pairs in each sector based on the fraction of unresolved fiber collisions on a sector-by-sector basis. See section 3.3 of Anderson et al. 2012 for more detail.

Remove incomplete sectors

One should also reject sectors from the mask (and their associated galaxies and randoms) with a completeness less than some threshold value (taken to be 70% in Anderson et al., section 3.5), to remove highly incomplete sectors that have been only partially observed. Anderson et al. also applied a redshift completeness for sectors with more than 10 galaxies, but less than 80% good redshifts (see equation 13). This removes sectors with a significant fraction of bad data.

Radial selection function

The radial selection function for both LOWZ and CMASS galaxies differs between the northern and southern hemispheres, so distinct radial distributions for the two hemispheres must be used when assigning redshifts to the random galaxy catalog. Section 6 of Ross et al. (2012) compares methods of sampling the underlying redshift distribution to assign redshifts to the random galaxies, including randomly selecting from a "shuffled" list of galaxy redshifts and various choices for smooth spline fits to the observed distribution.

Clustering statistics

The instructions above provide all of the necessary information to generate data and random catalogs. You are now ready to compute your favorite clustering statistic!