Astroinformatics Tools
Astrophysics is one of today's hotspots in science, where increasingly massive datasets are quietly revolutionizing knowledge of both traditional astrophysics topics and the most fundamental physics.
FASTlab logo GT

FASTlab Home Papers/Code Team
Efficient Photometric Selection of Quasars from the Sloan Digital Sky Survey II. ~1,000,000 Quasars from Data Release 6
Gordon T. Richards, Adam D. Myers, Alexander G. Gray, Ryan N. Riegel, Robert C. Nichol, Robert J. Brunner, Alexander S. Szalay, Donald P. Schneider, Scott F. Anderson
The Astrophysical Journal Supplement, 2009

Our nonparametric Bayes classification algorithms enabled an unprecedented 1 million quasars to be obtained, which is roughly all the quasars that can possibly be obtained from the SDSS, the current largest sky survey. [pdf]

Abstract: We present a catalog of 1,172,157 quasar candidates selected from the photometric imaging data of the Sloan Digital Sky Survey (SDSS). The objects are all point sources to a limiting magnitude of i = 21.3 from 8417 deg^2 of imaging from SDSS Data Release 6 (DR6). This sample extends our previous catalog by using the latest SDSS public release data and probing both ultraviolet (UV)-excess and high-redshift quasars. While the addition of high-redshift candidates reduces the overall efficiency (quasars:quasar candidates) of the catalog to ~80%, it is expected to contain no fewer than 850,000 bona fide quasars, which is ~8 times the number of our previous sample and ~10 times the size of the largest spectroscopic quasar catalog. Cross-matching between our photometric catalog and spectroscopic quasar catalogs from both the SDSS and 2dF survey yields 88,879 spectroscopically confirmed quasars. For judicious selection of the most robust UV-excess sources (~500, 000 objects in all), the efficiency is nearly 97% — more than sufficient for detailed statistical analyses. The catalog's completeness to type 1 (broad-line) quasars is expected to be no worse than 70%, with most missing objects occurring at z < 0.7 and 2.5 < z < 3.0. In addition to classification information, we provide photometric redshift estimates (typically good to delta-z +/- 0.3 [2 sigma]) and cross-matching with radio, X-ray, and proper-motion catalogs. Finally, we consider the catalog's utility for determining the optical luminosity function of quasars and are able to confirm the flattening of the bright-end slope of the quasar luminosity function at z ~4 as compared to z ~2.

@article{richards2009quasarsj2, title = "{Efficient Photometric Selection of Quasars from the Sloan Digital Sky Survey II. $\sim$ 1,000,000 Quasars from Data Release Six}", author = " Gordon T. Richards and Adam D. Myers and Alexander G. Gray and Ryan N. Riegel and Robert C. Nichol and Robert J. Brunner and Alexander S. Szalay and Donald P. Schneider and Scott F. Anderson", journal = APJS, year = "2009", volume = "180", pages = "67--83" }
See also

Fast n-point Correlation Functions
Efficiently computing n-point correlation functions is of fundamental importance in astrophysics. [see full entry here]

Fast Friends-of-Friends
We demonstrate the most overall efficient algorithm for the classic EMST problem, in the context of hierarchical clustering (aka "friends-of-friends" in astronomy). [see webpage here]

In preparation

Kernels for Measurement Error
We are developing ways to incorporate measurement errors into kernel methods, motivated by astronomical problems
Eight-Dimensional Mid-Infrared/Optical Bayesian Quasar Selection
Gordon T. Richards, Rajesh P. Deo, Mark Lacy, Adam D. Myers, Robert C. Nichol, Nadia L. Zakamska, Robert J. Brunner, W. N. Brandt, Alexander G. Gray, John K. Parejko, Andrew Ptak, Donald P. Schneider, Lisa J. Storrie-Lombardi, Alexander S. Szalay
The Astrophysical Journal, 2009

Our nonparametric Bayes classification algorithms enabled this demonstration of the ability to obtain quasar detections over the largest redshift and luminosity range to date. [pdf]

Abstract: We explore the multidimensional, multiwavelength selection of quasars from mid-IR (MIR) plus optical data, specifically from Spitzer-IRAC and the Sloan Digital Sky Survey (SDSS). Traditionally quasar selection relies on cuts in 2-D color space despite the fact that most modern surveys (optical and infrared) are done in more than 3 bandpasses. In this paper we apply modern statistical techniques to combined Spitzer MIR and SDSS optical data, allowing up to 8-D color selection of quasars. Using a Bayesian selection method, we catalog 5546 quasar candidates to an 8.0 micro m depth of 56 micro Jy over an area of 24 deg^2. Roughly 70% of these candidates are not identified by applying the same Bayesian algorithm to 4-color SDSS optical data alone. The 8-D optical+MIR selection on this data set recovers 97.7% of known type 1 quasars in this area and greatly improves the effectiveness of identifying 3.5 < z < 5 quasars which are challenging to identify (without considerable contamination) using MIR data alone. We demonstrate that, even using only the two shortest wavelength IRAC bandpasses (3.6 and 4.5 micro m), it is possible to use our Bayesian techniques to select quasars with 97% completeness and as little as 10% contamination (as compared to 60% contamination using colors cuts alone). We compute photometric redshifts for our sample; comparison with known objects suggests a photometric redshift accuracy of 93.6% (Delta z +/- 0.3), remaining roughly constant when the two reddest MIR bands are excluded. Despite the fact that our methods are designed to find type 1 (unobscured) quasars, as many as 1200 of the objects are type 2 (obscured) quasar candidates. Coupling deep optical imaging data, with deep mid-IR data could enable selection of quasars in significant numbers past the peak of the quasar luminosity function (QLF) to at least z 4. Such a sample would constrain the shape of the QLF both above and below the break luminosity (L^*_Q) and enable quasar clustering studies over the largest range of redshift and luminosity to date, yielding significant gains in our understanding of the physics of quasars and their contribution to galaxy evolution.

@article{richards2009spitzer, title = "{Eight-Dimensional Mid-Infrared/Optical Bayesian Quasar Selection}", author = "Gordon T. Richards and Rajesh P. Deo and Mark Lacy and Adam D. Myers and Robert C. Nichol and Nadia L. Zakamska and Robert J. Brunner and W. N. Brandt and Alexander G. Gray and John K. Parejko and Andrew Ptak and Donald P. Schneider and Lisa J. Storrie-Lombardi and Alexander S. Szalay", journal = AJ, year = "2009", volume = "137", pages = "3884-3899" }
The Three-Point Correlation Function of Luminous Red Galaxies in the Sloan Digital Sky Survey
Gauri V. Kulkarni, Robert C. Nichol, Ravi K. Sheth, Hee-Jong Seo, Daniel J. Eisenstein, and Alexander Gray
Monthly Notices of the Royal Astronomical Society, 2007

Our n-point algorithms enabled the first large-scale calculation of the 3-point correlation function of luminous red galaxies, providing sharper constraints on cosmological models. [pdf]

Abstract: We present measurements of the redshift-space three-point correlation function of 50,967 Luminous Red Galaxies (LRGs) from Data Release 3 (DR3) of the Sloan Digital Sky Survey (SDSS). We have studied the shape dependence of the reduced three-point correlation function (Qz(s,q,theta)) on three different scales, s = 4, 7 and 10 h-1 Mpc, and over the range of 1 < q < 3 and 0 < theta < 180. On small scales (s = 4 h-1 Mpc), Qz is nearly constant, with little change as a function of q and theta. However, there is evidence for a shallow U-shaped behaviour (with theta) which is expected from theoretical modeling of Qz . On larger scales (s = 7 and 10 h-1 Mpc), the U-shaped anisotropy in Qz (with theta) is more clearly detected. We compare this shape-dependence in Qz(s,q,theta) with that seen in mock galaxy catalogues which were generated by populating the dark matter halos in large N-body simulations with mock galaxies using various Halo Occupation Distributions (HOD). We find that the combination of the observed number density of LRGs, the (redshift-space) two-point correlation function and Qz provides a strong constraint on the allowed HOD parameters (M_min, M_1, alpha) and breaks key degeneracies between these parameters. For example, our observed Qz disfavors mock catalogues that overpopulate massive dark matter halos with many LRG satellites. We also estimate the linear bias of LRGs to be b = 1.87 +/- 0.07 in excellent agreement with other measurements.

@article{kulkarni2007sdss3pt, title = "{The Three-Point Correlation Function of Luminous Red Galaxies in the Sloan Digital Sky Survey}", author = "G. Kulkarni and R. Nichol and R. Sheth and H. Seo and D. Eisenstein and Alexander G. Gray", journal = MNRAS, year = "2007", volume="378", pages="1196--1206" }
A High Redshift Detection of the Integrated Sachs-Wolfe Effect
Tommaso Giannantonio, Robert G. Crittenden, Robert C. Nichol, Ryan Scranton, Gordon T. Richards, Adam D. Myers, Robert J. Brunner, Alexander G. Gray, Andrew J. Connolly, and Donald P. Schneider
Physical Review D, 2006

Our nonparametric Bayes classification algorithms and n-point algorithms enabled what appears to be the cosmologically-earliest evidence for dark energy. [pdf]

Abstract: We present evidence of a large angle correlation between the cosmic microwave background measured by WMAP and a catalog of photometrically detected quasars from the SDSS. The observed cross correlation is (0.30 +- 0.14) microK at zero lag, with a shape consistent with that expected for correlations arising from the integrated Sachs-Wolfe effect. The photometric redshifts of the quasars are centered at z ~1.5, making this the deepest survey in which such a correlation has been observed. Assuming this correlation is due to the ISW effect, this constitutes the earliest evidence yet for dark energy and it can be used to constrain exotic dark energy models.

@article{giannantonio2006isw, title = "{A High Redshift Detection of the Integrated Sachs-Wolfe Effect}", author = "T. Giannantonio and R. Crittenden and R. Nichol and R. Scranton and G. Richards and A. Myers and R. Brunner and Alexander G. Gray" # " and A. Connolly and D. Schneider", journal = PHYSREVD, volume = "74", year = "2006" }
First Measurement of the Clustering Evolution of Photometrically-Classified Quasars
Adam D. Myers, Robert J. Brunner, Gordon T. Richards, Robert C. Nichol, Donald P. Schneider, Daniel E. Vanden Berk, Ryan Scranton, Alexander G. Gray, and Jon Brinkmann
The Astrophysical Journal, 2006

Our nonparametric Bayes classification algorithms and n-point algorithms enabled the largest-scale study of the 2-pt function of quasars. [pdf]

Abstract: We present new measurements of the quasar autocorrelation from a sample of ~80,000 photometrically-classified quasars taken from SDSS DR1. We find a best-fit model of omega(theta) = (0.066 +/-^{0.026}_{0.024}) theta^{-(0.98 +/- 0.15)} for the angular autocorrelation, consistent with estimates from spectroscopic quasar surveys. We show that only models with little or no evolution in the clustering of quasars in comoving coordinates since z ~1.4 can recover a scale-length consistent with local galaxies and Active Galactic Nuclei (AGNs). A model with little evolution of quasar clustering in comoving coordinates is best explained in the current cosmological paradigm by rapid evolution in quasar bias. We show that quasar biasing must have changed from b_Q ~3 at a (photometric) redshift of z = 2.2 to b_Q ~1.2-1.3 by z = 0.75. Such a rapid increase with redshift in biasing implies that quasars at z ~2 cannot be the progenitors of modern L* objects, rather they must now reside in dense environments, such as clusters. Similarly, the duration of the UVX quasar phase must be short enough to explain why local UVX quasars reside in essentially unbiased structures. Our estimates of b_Q are in good agreement with recent spectroscopic results, which demonstrate the implied evolution in b_Q is consistent with quasars inhabiting halos of similar mass at every redshift. Treating quasar clustering as a function of both redshift and luminosity, we find no evidence for luminosity dependence in quasar clustering, and that redshift evolution thus affects quasar clustering more than changes in quasars' luminosity. We provide a new method for quantifying stellar contamination in photometrically-classified quasar catalogs via the correlation function.

@article{myers2006quas2pt, title = "{First Measurement of the Clustering Evolution of Photometrically-Classified Quasars}", author = "A. Myers and R. Brunner and G. Richards and R. Nichol and D. Schneider and D. {Vanden Berk} and R. Scranton and Alexander G. Gray" # " and J. Brinkmann", journal = APJ, volume="638", pages ="622--634", year="2006" }
Detection of Cosmic Magnification with the Sloan Digital Sky Survey
Ryan Scranton, Brice Menard, Gordon T. Richards, Robert C. Nichol, Adam D. Myers, Bhuvnesh Jain, Alexander G. Gray, Matthias Bartelmann, Robert J. Brunner, Andrew J. Connolly, James E. Gunn, Ravi K. Sheth, Neta A. Bahcall, John Brinkman, Jon Loveday, Donald P. Schneider, Aniruddha Thakar, and Donald G. York
The Astrophysical Journal, 2005

Our nonparametric Bayes classification algorithms enabled the first large-scale detection of cosmic magnification, an effect predicted by general relativity. [pdf]

Abstract: We present an 8 sigma detection of cosmic magnification measured by the variation of quasar density due to gravitational lensing by foreground large scale structure. To make this measurement we used 3800 square degrees of photometric observations from the Sloan Digital Sky Survey (SDSS) containing ~200,000 quasars and 13 million galaxies. Our measurement of the galaxy-quasar cross-correlation function exhibits the amplitude, angular dependence and change in sign as a function of the slope of the observed quasar number counts that is expected from magnification bias due to weak gravitational lensing. We show that observational uncertainties (stellar contamination, Galactic dust extinction, seeing variations and errors in the photometric redshifts) are well controlled and do not significantly affect the lensing signal. By weighting the quasars with the number count slope, we combine the cross-correlation of quasars for our full magnitude range and detect the lensing signal at > 4 sigma in all five SDSS filters. Our measurements of cosmic magnification probe scales ranging from 60 kpc/h to 10 Mpc/h and are in good agreement with theoretical predictions based on the WMAP concordance cosmology. As with galaxy-galaxy lensing, future measurements of cosmic magnification will provide useful constraints on the galaxy-mass power spectrum.

@article{scranton2005cosmag, title = "{Detection of Cosmic Magnification with the Sloan Digital Sky Survey}", author = "R. Scranton and B. Menard and G. Richards and R. Nichol and A. Myers and J. Bhuvnesh and Alexander G. Gray and M. Bartelmann and R. Brunner and A. Connolly and J. Gunn and R. Sheth and N. Bahcall and J. Brinkmann and J. Loveday and D. Schneider and A. Thakar and D. York", journal = APJ, volume = "633", pages="589--602", year="2005" }
Galaxy Ecology: Groups and Low-Density Environments in the SDSS and 2dFGRS
Michael Balogh, Vince Eke, Chris Miller, Ian Lewis, Richard Bower, Warrick Couch, Robert Nichol, Joss Bland-Hawthorn, Ivan K. Baldry, Carlton Baugh, Terry Bridges, Russell Cannon, Shaun Cole, Matthew Colless, Chris Collins, Nicholas Cross, Gavin Dalton, Roberto De Propris, Simon P. Driver, George Efstathiou, Richard S. Ellis, Carlos S. Frenk, Karl Glazebrook, Percy Gomez, Alexander G. Gray, Edward Hawkins, Carole Jackson, Ofer Lahav, Stuart Lumsden, Steve Maddox, Darren Madgwick, Peder Norberg, John A. Peacock, Will Percival, Bruce A. Peterson, Will Sutherland, and Keith Taylor
Monthly Notices of the Royal Astronomical Society, 2004

Our kernel density estimation algorithms enabled the calculation behind this analysis explaining spiral versus elliptical galaxy formation. [pdf]

Abstract: We analyse the observed correlation between galaxy environment and H-alpha emission line strength, using volume-limited samples and group catalogues of 24968 galaxies drawn from the 2dF Galaxy Redshift Survey (Mb < -19.5) and the Sloan Digital Sky Survey (Mr < -20.6). We characterise the environment by 1) Sigma_5, the surface number density of galaxies determined by the projected distance to the 5th nearest neighbour; and 2) rho1.1 and rho5.5, three-dimensional density estimates obtained by convolving the galaxy distribution with Gaussian kernels of dispersion 1.1 Mpc and 5.5 Mpc, respectively. We find that star-forming and quiescent galaxies form two distinct populations, as characterised by their H-alpha equivalent width, EW(Ha). The relative numbers of star-forming and quiescent galaxies varies strongly and continuously with local density. However, the distribution of EW(Ha) amongst the star-forming population is independent of environment. The fraction of star-forming galaxies shows strong sensitivity to the density on large scales, rho5.5, which is likely independent of the trend with local density, rho1.1. We use two differently-selected group catalogues to demonstrate that the correlation with galaxy density is approximately independent of group velocity dispersion, for sigma=200-1000 km/s. Even in the lowest density environments, no more than ~70 per cent of galaxies show significant H-alpha emission. Based on these results, we conclude that the present-day correlation between star formation rate and environment is a result of short-timescale mechanisms that take place preferentially at high redshift, such as starbursts induced by galaxy-galaxy interactions.

@article{balogh2004kde, title = "{Galaxy Ecology: Groups and Low-Density Environments in the SDSS and 2dFGRS}", author = "M. Balogh and V. Eke and C. Miller and I. Lewis and R. Bower and W. Couch and R. Nichol and J. Bland-Hawthorn and I.K. Baldry and C. Baugh and T. Bridges and R. Cannon and S. Cole and M. Colless and C. Collins and N. Cross and G. Dalton and R. De Propris and S.P. Driver and G. Efstathiou and R.S. Ellis and C.S. Frenk and K. Glazebrook and P. Gomez and O. Lahav and S. Lumsden and S. Maddox and D. Madgwick and P. Norberg and Alexander G. Gray" # " and E. Hawkins and C. Jackson and J.A. Peacock and W. Percival and B.A. Peterson and W. Sutherland and K. Taylor", journal = MNRAS, volume = "348", year = "2004" }
Efficient Photometric Selection of Quasars from the Sloan Digital Sky Survey: 100,000 z < 3 Quasars from Data Release One
Gordon T. Richards, Robert C. Nichol, Alexander G. Gray, Robert J. Brunner, Robert H. Lupton, Daniel E. Vanden Berk, Shang Shan Chong, Michael A. Weinstein, Donald P. Schneider, Scott F. Anderson, Jeffrey A. Munn, Hugh C. Harris, Michael A. Strauss, Xiaohui Fan, James E. Gunn, Zeljko Ivezic, Donald G. York, and J. Brinkmann
The Astrophysical Journal Supplement, 2004

Our nonparametric Bayes classification algorithms enabled the simultaneous scalability and accuracy needed to achieve the largest and most reliable quasar catalog to date. [pdf]

Abstract: We present a catalog of 100,563 unresolved, UV-excess (UVX) quasar candidates to g=21 from 2099 deg^2 of the Sloan Digital Sky Survey (SDSS) Data Release One (DR1) imaging data. Existing spectra of 22,737 sources reveals that 22,191 (97.6%) are quasars; accounting for the magnitude dependence of this efficiency, we estimate that 95,502 (95.0%) of the objects in the catalog are quasars. Such a high efficiency is unprecedented in broad-band surveys of quasars. This ``proof-of-concept'' sample is designed to be maximally efficient, but still has 94.7% completeness to unresolved, g < ~19.5, UVX quasars from the DR1 quasar catalog. This efficient and complete selection is the result of our application of a probability density type analysis to training sets that describe the 4-D color distribution of stars and spectroscopically confirmed quasars in the SDSS. Specifically, we use a non-parametric Bayesian classification, based on kernel density estimation, to parameterize the color distribution of astronomical sources -- allowing for fast and robust classification. We further supplement the catalog by providing photometric redshifts and matches to FIRST/VLA, ROSAT, and USNO-B sources. Future work needed to extend the this selection algorithm to larger redshifts, fainter magnitudes, and resolved sources is discussed. Finally, we examine some science applications of the catalog, particularly a tentative quasar number counts distribution covering the largest range in magnitude (14.2 < g < 21.0) ever made within the framework of a single quasar survey.

@article{richards2004quasarsj, title = "{Efficient Photometric Selection of Quasars from the Sloan Digital Sky Survey: 100,000 $z<3$ Quasars from Data Release One}", author = "G. Richards and R. Nichol and Alexander G. Gray and R. Brunner and R. Lupton and D. {Vanden Berk} and S. Chong and M. Weinstein and P. Schneider and S. Anderson and J. Munn and H. Harris and M. Strauss and X. Fan and J. Gunn and Z. Ivezic and D. York and J. Brinkmann", journal = APJS, volume = "155", pages = "257--269", year = "2004" }
The Clustering of Active Galactic Nuclei in the Sloan Digital Sky Survey
David A. Wake, Christopher J. Miller, Tiziana Di Matteo, Robert C. Nichol, Adrian Pope, Alexander S. Szalay, Alexander G. Gray, Donald P. Schneider, Donald G. York
The Astrophysical Journal, 2004

Our n-point algorithms enabled the largest-scale study of the 2-point function of active galactic nuclei. [pdf]

Abstract: We present the two-point correlation function (2PCF) of narrow-line active galactic nuclei (AGNs) selected within the First Data Release of the Sloan Digital Sky Survey. Using a sample of 13,605 AGNs in the redshift range 0.055 < z < 0.2, we find that the AGN autocorrelation function is consistent with the observed galaxy autocorrelation function on scales from 0.2 to greater than 100 h^-1 Mpc. The AGN hosts trace an intermediate population of galaxies and are not detected in either the bluest (youngest) disk-dominated galaxies or many of the reddest (oldest) galaxies. We show that the AGN 2PCF is dependent on the luminosity of the narrow [O III] emission line (L[OIII]), with low L[OIII] AGNs having a higher clustering amplitude than high L[OIII] AGNs. This is consistent with lower activity AGNs residing in more massive galaxies than higher activity AGNs, and L[OIII] providing a good indicator of the fueling rate. Using a model relating halo mass to black hole mass in cosmological simulations, we show that AGNs hosted by ~10^12 M_{solar} dark matter halos have a 2PCF that matches that of the observed sample. This mass scale implies a mean black hole mass for the sample of M_{BH} ~10^8 M_{solar}.

@article{wake2004agn, title = "{The Clustering of Active Galactic Nuclei in the Sloan Digital Sky Survey}", author = "D. Wake and C. Miller and T. Di Matteo and R. Nichol and A. Pope and A. Szalay and Alexander G. Gray and D. Schneider and D. York", journal = APJL, volume = "610", pages="L85--L88", year="2004" }