MathStatBites at SCMA8: Astro Image Processing is BLISS?
In June 2023, astronomers and statisticans flocked to “Happy Valley’” Pennsylvania for the eighth installment of the Statistical Challenges in Modern Astronomy, a bidecadal conference. The meeting, hosted at Penn State University, marked a transition in leadership from founding members Eric Feigelson and Jogesh Babu to Hyungsuk Tak, who led the proceedings. While the astronomical applications varied widely, including modeling stars, galaxies, supernovae, X-ray observations, and gravitational waves, the methods displayed a strong Bayesian bent. Simulation based inference (SBI), which uses synthetic models to learn an approximate function for the likelihood of physical parameters given data, featured prominently in the distribution of talk topics. This article features work presented in two back-to-back talks on a probabilistic method for modeling (point) sources of light in astronomical images, for example stars or galaxies, delivered by Prof. Jeffrey Regier and Ismael Mendoza from the University of Michigan-Ann Arbor.
Bridging the Gap between Models and Data
One of the key goals of science is to create theoretical models that are useful at describing the world we see around us. However, no model is perfect. The inability of models to replicate observations is often called the “synthetic gap.” For example, it may be too computationally expensive to include a known effect or to vary a large number of known parameters. Or, there may be unknown instrumental effects associated with variability in conditions during the data acquisition.
Calling all writers!
We’re looking for new writers to join us!
Finding Clusters of Crime in Philadelphia
“Violent crime fell 3 percent in Philadelphia in 2010” – this title from the Philadelphia Inquirer depicts Philadelphia’s reported decline in crime in the late 2000s and 2010s. However, is this claim exactly what it appears to be? In their paper, “Crime in Philadelphia: Bayesian Clustering and Particle Optimization,” Balocchi, Deshpande, George, and Jensen use Bayesian hierarchical modeling and clustering to identify more nuanced patterns in temporal trends and baseline levels of crime in Philadelphia.
On swallowing shrewd marketing baits: A silent salute to demand evolution
To John, enticements never can exert a pull. Probably the product of a disciplined upbringing. When John wants to buy something, he knows exactly what he’s looking for. He gets in and he gets out. No dilly-dallying, no pointless scrolling. Few of us are like John; the rest secretly aspire to be. Go on. Admit it! The science of enticing customers is sustained by this weakness.
An Introduction to Second-Generation p-Values
For centuries, the test of hypotheses has been one of the fundamental inferential concepts in statistics to guide the scientific community and to confirm one’s belief. The p-value has been a famous and universal metric to reject (or not to reject) a null hypothesis H0, which essentially denotes a common belief even without the experimental data.
How Adopting Reproducible Practices Can Benefit Data Science Education
As the fields of statistics and data science have grown, the importance of reproducibility in research and easing the “replication crisis” has become increasingly apparent. The inability to reproduce scientific results when using the same data and code may lead to a lack of confidence in the validity of research and can make it difficult to build on and advance scientific knowledge.
Pinpointing Causality across Time and Geography: Uncovering the Relationship between Airstrikes and Insurgent Violence in Iraq
“Correlation is not causation”, as the saying goes, yet sometimes it can be, if certain assumptions are met. Describing those assumptions and developing methods to estimate causal effects, not just correlations, is the central concern of the causal inference field. Broadly speaking, causal inference seeks to measure the effect of a treatment on an outcome. This treatment can be an actual medicine or something more abstract like a policy. Much of the literature in this space focuses on relatively simple treatments/outcomes and uses data which doesn’t exhibit much dependency. As an example, clinicians often want to measure the effect of a binary treatment (received the drug or not) on a binary outcome (developed the disease or not). The data used to answer such questions is typically patient-level data where the patients are assumed to be independent from each other. To be clear, these simple setups are enormously useful and describe commonplace causal questions.
Pulling Patterns out of Data with a Graph
Large volumes of data are pouring in every day from scientific experiments like CERN and the Sloan Digital Sky Survey. Data is coming in so fast, that researchers struggle to keep pace with the analysis and are increasingly developing automated analysis methods to aid in this herculean task. As a first step, it is now commonplace to perform dimension reduction in order to reduce a large number of measurements to a set of key values that are easier to visualize and interpret.