# Working with MaNGA Data

Below, we collect some usage guidelines for the MaNGA data to help you navigate its complexity. Please make sure you understand and follow these guidelines because they can be critical to any science application of the data. Please also have a look at our MaNGA tutorials for further guidance.

## DRP: Data Quality Flags

The 3D phase of the DRP has an overall reduction quality bit * DRP3QUAL * that indicates any potential quality control issues with a given output file for each observation. Most of these issues, like shallow observations, are simply warnings that the data might not be of the usual quality. Flux-calibration failures, however, trigger the * CRITICAL* quality bit, which indicates that there may be severe problems with the data. This is determined by whether or not the astrometric calibration is successful without a substantial rescaling of the flux to match the imaging data.

Critical failures occur in roughly 1% of DR15 observations (which are identical to DR16). These are a mixture of true critical failures (where, e.g., an IFU is badly out of focus, such as 7495-6103) and less critical issues where bright objects at the edges of the field or transients cause problems with the astrometric solution. Reasons for the latter can include some instances where the on-sky surface brightness distribution seems to be genuinely different from that predicted by the preimaging (such as 8332-12702). In some cases the extra flux comes from astronomical transients such as supernovae in the MaNGA target galaxies, and in other cases from terrestrial transients (e.g., satellite trails, etc.). Such terrestrial transients are generally identified by visual inspection and the relevant exposures manually removed from the final set for a given galaxy (see bogey.par). Astronomical transients are not removed from the MaNGA data cubes.

## DRP: Cube Quality Array

Each MaNGA data cube has an associated 3d mask extension ('MASK') describing the quality of a given spaxel in the data cube *DRP3PIXMASK*, and whether it should be used in any analysis. This includes effects such as the IFU footprint, missing data, foreground stars (where known), etc. Any use of the MaNGA data cubes should consider these maskbits.

## DRP: LSF Estimates

The estimates of the spectral line-spread function (LSF) are provided in four different ways for each data cube. The estimates are either based on measurements that do not include (`PREDISP`

, `SPECRES`

) or do include (`DISP`

, `SPECRES`

) the integration of the LSF over the detector pixel, and they are provided either for each spaxel in the cube (`*DISP`

) or as a median resolution for the cube as a whole (`*SPECRES`

).

The measurements of the LSF are performed for each fiber spectrum in the row-stacked spectra (RSS). The spectral resolution is `R = \lambda / (2.355\ \sigma_\lambda)`, where `\sigma_\lambda` are values in the `*DISP`

extension and `\lambda` is in the `WAVE`

extension.

The `*SPECRES`

extension vector is the per-wavelength median of the `R` measurements over all fiber spectra for a given `PLATEIFU`

. For the data cubes, the measurements of `\sigma_\lambda` for each spaxel is computed via a weighted sum of `\sigma_\lambda^2` following the same interpolation algorithm as for the flux rectification (see Law et al. 2016, AJ, 152, 83).

In terms of the pixel integration, the choice of whether or not you should use the `PRE*`

extensions depends on how you will have analyzed the spectra. For example, if you fit a Gaussian function directly to the spectra in the LOGCUBE file, the affect of the LSF on your line-width measurements already includes the pixel integration. This means you should *not* use the LSF estimates in the `PRE*`

extensions, but instead use the `DISP`

or `SPECRES`

extensions. Alternatively, if you setup a template spectrum that you then convolve with a Gaussian kernel to fit the spectrum, you likely *do* want to use the LSF estimates in the `PRE*`

extensions to remove the instrumental dispersion because your modeling of Doppler broadening does not include the pixel integration, it is instead included in the matching of the model spectrum to the pixel-integrated data.

## DRP: Datacube Covariance

Since the individual fiber spectra are combined together into a rectified data cube there is significant covariance between adjacent spaxels. When combining spectra from multiple spaxels, a rigorous calculation of the inverse variance in the combined spectrum must account for this covariance. Roughly, the calibration of the noise vector is:

`n_{covar}/n_{no\ covar} = 1 + 1.62 \log_{10}(N_b)`,

for `N_b \leq 100` and

`n_{covar} / n_{no\ covar} = 4.2`,

for `N_b > 100`.

where `n_{no\ covar}` is determined via a nominal error calculation using the inverse variance provided in the datacube and `N_b` is the number of binned spaxels. The correction factor is constant above `N_b = 100` because additional spaxels at that point are uncorrelated with the original spaxels. It is important to note that this calibration is dependent on the spaxels being adjacent to one another.

In DR16, this calculation can be done more rigorously using the correlation matrices included with each DRP data cube. Since the MaNGA reconstructed PSF is nearly constant with wavelength, the spatial covariance matrix also varies only slowly with wavelength. The DRP therefore describes the covariance in the data cubes by providing sparse correlation matrices (i.e., covariance matrices normalized so that all non-zero diagonal elements are unity) at `g`, `r`, `i`, and `z` bands that may be interpolated to any intermediate wavelength.

The correlation matrix is nominally extremely large, having about 5184 x 5184 elements for a 127-fiber IFU data cube. This is because there is an element describing the correlation of each of the spaxels in the 72x72 image slice with every other spaxel in that slice. The vast majority of elements are zero however, and the matrix is symmetric. The DRP therefore saves the information in sparse table format for substantial space savings. The correlation matrix table is a binary table containing one row for each non-zero element; see, e.g., the `GCORREL`

extension in the `LOGCUBE`

files here.

The binary-table extension has 5 columns: `INDXI_C1`

, `INDXI_C2`

, `INDXJ_C1`

, `INDXJ_C2`

, and `RHOIJ`

.

Here (`INDXI_C1`

, `INDXI_C2`

) describes the x,y index of the first point in the flux array, (`INDXJ_C1`

, `INDXJ_C2`

) describes the x,y index of the second point in the flux array, and `RHOIJ`

gives the correlation coefficient between the two spaxels. The header of the extension provides the wavelength at which the correlation matrix has been calculated (`BBWAVE`

), the 0-indexed slice number associated with wavelength `BBWAVE`

(`BBINDEX`

), the type of matrix provided (either Covariance or Correlation; `COVTYPE`

), and the dimensionality of the full correlation matrix `COVSHAPE`

. There are facilities in both IDL and python to construct (sparse) matrices from these data.

When binning data, one can calculate the uncertainty in the binned spaxels by writing the binning operation as a matrix multiplication for each wavelength slice:

`B = T\ \times\ F`,

where `T` is an `N_b \times N_s` array that bins the `N_s` spaxels in the flattened flux map `F` into `N_b` bins, and

`C = T\ \times\ \Sigma\ \times\ T^T`

is the `N_b \times N_b` covariance matrix for the binned fluxes and `\Sigma` is the `N_s \times N_s` covariance read from the DRP file.

See also further discussion in Law et al. (2016, AJ, 152, 83) and Westfall et al. in prep.

## Array indexing (IDL vs. astropy)

The primary data products of the Data Reduction Pipeline (DRP) are fits files. When reading these files, it is important to understand the ordering of the data within the array. Fits files were originally developed using FORTRAN, a row-major language. When reading the files using IDL, the intended ordering of the axes as (x,y,λ) is maintained. However, this ordering is transposed when using astropy.io.fits to (λ,y,x). Please see their FAQ, specifically the response to this question. Please see the MaNGA Python Tutorial for example code.

## DAP: The Hybrid Binning Scheme

Once the binning step is performed, each DAP module only works with the "binned" spectra (called this even if a bin consists of a single spaxel), except for the `HYB`

binning approach. In the `HYB`

case, the emission-line modeling (see here) is done by first fitting the binned spectra and then distributing those results as a starting point for fitting the individual spaxels closest to each bin. By fitting the data as a hybrid between the Voronoi (`VOR`

) binning and unbinned schemes, there are a few things to keep in mind:

- Because the stellar kinematics are held fixed to the binned results during the spaxel-by-spaxel continuum+emission-line fit, there will be covariance among the emission-line and spectral-index results for spaxels associated with a single bin, beyond what one would expect from the datacube construction alone (described above). These covariances have not yet been characterized.
- The binned spectra provided in the
`HYB`

model cube files are from the Voronoi binning step; however, the emission-line models are fit to the individual spaxels. When using the model cube files for this binning scheme:- The stellar-continuum fits (computed using data in the model
`LOGCUBE`

file) should be compared to the Voronoi binned spectra in the file, but - the best-fitting model spectra (stellar continuum + gas emission) in the
`MODEL`

extension should be compared to the individual spectra from the DRP`LOGCUBE`

file!

- The stellar-continuum fits (computed using data in the model
- Because the emission-line modeling is done on the individual spaxels, the emission-line moments are recalculated after the emission-line modeling to ensure the stellar continuum used for both the Gaussian model and the moment is identical. In the
`HYB`

case, this means the emission-line moments are provided for the individual spaxels. It also means that the spectral indices are measured on the individual spaxels because the emission-line model is first subtracted from the data before the index measurements.

## DAP: Velocity-dispersion Corrections

**WARNING: Some MAPS file extensions must be corrected to obtain the astrophysically relevant quantities as discussed here.**

The **stellar and gas velocity dispersion measurements** must be corrected for instrumental resolution effects to obtain the astrophysical Doppler broadening. The corrected gas velocity dispersion is:

` sigma_gas_corr = sqrt( square(EMLINE_GSIGMA) - square(EMLINE_INSTSIGMA) ) `

and the corrected stellar velocity dispersion is:

`sigma_star_corr = sqrt( square(STELLAR_SIGMA) - square(STELLAR_SIGMACORR) )`

,

where `EMLINE_GSIGMA`

, `EMLINE_INSTSIGMA`

, `STELLAR_SIGMA`

, `STELLAR_SIGMACORR`

are the relevant extensions in the `MAPS`

file.

In both cases, beware of imaginary numbers. That is, when the correction is larger than the provided value, the above equations result in taking the square-root of a negative number. The correction for the stellar velocity dispersion measurements are based on a fit of the optimal template with and without the resolution matched to the MaNGA data, as described in the DAP technical paper.

Also, velocity-dispersion corrections are provided for the **spectral indices** (see the detail of these calculations in the DAP technical paper). To apply the corrections, you have to know the unit of each index, which can be determined using the `Un`

(`n`

is the number of the channel) header keywords in the SPECINDEX extension. For indices that are either unitless or in angstrom units:

`specindex_ang_corr = SPECINDEX * SPECINDEX_CORR`

and for magnitude units:

`specindex_mag_corr = SPECINDEX + SPECINDEX_CORR`

,

where `SPECINDEX`

and `SPECINDEX_CORR`

are the relevant extensions in the `MAPS`

file.

Tutorials are available that demonstrate how to apply these corrections.