MathStat Bites – Page 2 – Learn about new maths and stats research, a bite at a time

April 18, 2024 by Mackenzie Simper

A balanced excited random walker returns home

Title: Balanced Excited Random Walk in Two Dimensions Authors and Year: Omer Angel, Mark Holmes, Alejandro Ramirez; 2023 Journal: Annals of Probability Will balance and excitement always lead a random walker home? A new paper in the Annals of Probability attempts to answer this question and explores paths along the way. Random Walks Imagine you moved into a new neighborhood and you are excited to go on a walk to explore. The neighborhood is arranged in a grid structure, so at each intersection you have four choices for which direction to take: left, right, forwards, or backwards. Since you don’t know where you’re going, you decide to use some randomness to pick which direction to take. This is a random walk: a type of random process that is just a succession of steps on some sort of graph according to some probabilistic rules. The neighborhood grid gives a walk in…

markov chain probability random walk randomness recurrence transience

February 27, 2024 by Moinak Bhaduri

“Changes” in statistics, “changes” in computer science, changes in outlook

No matter how free interactions become, tribalism remains a basic trait. The impulse to form groups based on similarities of habits – of ways of thinking, the tendency to congregate across disciplinary divides, never goes away fully regardless of how progressive our outlook gets. While that tendency to form cults is not problematic in itself (there is even something called community detection in network science that exploits – and exploits to great effects – this tendency) when it morphs into animosity, into tensions, things get especially tragic. The issue that needs to be solved gets bypassed, instead noise around these silly fights come to the fore. For example, the main task at hand could be designing a drug that is effective against a disease, but the trouble may lie in the choice of the benchmark against which this fresh drug must be pitted. In popular media, that benchmark may be the placebo – an inconsequential sugar pill, while in more objective science it could be the drug that is currently in use. There are instances everywhere of how scientists and journalists come in each other’s way (Ben Goldacre’s book Bad Science imparts crucial insights) or how even among scientists, factionalism persists: how statisticians – even to this day – prefer to be classed as frequentists or Bayesians, or how even among Bayesians, whether someone is an empirical Bayesian or not. The sad chain never stops. You may have thought of this tendency and its result. How it is promise betrayed, collaboration throttled in the moment of blossoming. While the core cause behind that scant tolerance, behind that clinging on to, may be a deep passion for what one does, the problem at hand pays little regard to that dedication. The problem’s outlook stays ultimately pragmatic: it just needs solving. By whatever tools. From whatever fields. Alarmingly, the segregations or subdivisions we sampled above and the differences they lead to – convenient though they may be – do not always remain academic: distant to the point of staying irrelevant. At times, they deliver chills much closer to the bone: whether a pure or applied mathematician will get hired or promoted, how getting published in computer science journals should be – according to many – more frequent compared to those in mainstream statistics, etc.

change point Cumulative Sum cusum neural network statistics

February 7, 2024 by Alyssa Columbus

Decoding Digital Preferences: A Collaborative Approach to Solving the Mysteries of A/B Testing

As a digital detective, your mission is to decipher the preferences of your website visitors. Your primary tool? A/B testing – a method used in online controlled experiments where two versions of a webpage (version A and version B) are presented to different subsets of users under the same conditions. It’s akin to a magnifying glass, enabling you to scrutinize the minute details of user interactions across two versions of a webpage to discern their preferences. However, this case isn’t as straightforward as it seems. A recent article by Nicholas Larsen et al. in The American Statistician reveals the hidden challenges of A/B testing that can affect the results of online experiments. If these challenges aren’t tackled correctly, they can lead to misleading conclusions, affecting decisions in both online businesses and academic research.

A/B testing Collaboration Controlled Experiments Data Bias Evidence Methods Sampling statistics

November 10, 2023 by David Han

Differential Privacy Unveiled – the Case of the 2020 Census for Redistricting and Data Privacy

Census statistics play a pivotal role in making public policy decisions such as redrawing legislative districts and allocating federal funds as well as supporting social science research. However, given the risk of revealing individual information, many statistical agencies are considering disclosure control methods based on differential privacy, by adding noise to tabulated data and subsequently conducting postprocessing. The U.S. Census Bureau in particular has implemented a Disclosure Avoidance System (DAS) based on differential privacy technology to protect individual Census responses. This system adds random noise, guided by a privacy loss budget (denoted by ϵ), to Census tabulations, aiming to prevent the disclosure of personal information as mandated by law. The privacy loss budget value ϵ determines the level of privacy protection, with higher ϵ values allowing more noise. While the adoption of differential privacy has been controversial, this approach is crucial for maintaining data confidentiality. Other countries and organizations are also considering this technology as well.

Data Bias Data Privacy Demographics Differential Privacy Generalized Additive Model Herfindahl-Hirschman Index Race States statistics US Census Voting Districts

November 3, 2023 by Mackenzie Simper

Improving Nature’s Randomized Control Trial

Does a higher body mass index (BMI) increase the severity of COVID-19 symptoms? Mendelian randomization is one method that can be used to study this question without worrying about unmeasured variables (e.g., weight, height, or sex) that could affect the results. A recent paper published in the Annals of Statistics developed a new technique for Mendelian randomization which improves the ability to measure cause-and-effect relationships.

BMI Body Mass Index COVID COVID-19 Mendelian Randomization Randomized Control Trial RCT statistics Winner's Curse

October 19, 2023 by Amal Machtalay

How can we model political polarization?

Title: An Agent-Based Statistical Physics Model for Political Polarization: A Monte Carlo Study Authors & Year: Hung T. Diep, Miron Kaufman, and Sanda Kaufman (2023) Journal: Entropy [DOI: https://doi.org/10.3390/e25070981] Review Prepared by Amal Machtalay Political polarization refers to a phenomenon where people’s political beliefs become radical, often resulting in the increasing division between political parties, which can have significant social consequences. This polarization is a complex system that is characterized by multiple factors: numerous interacting components (individual agents/voters, politicians, groups, media, etc.), non-linear dynamics (meaning that small changes can lead to large and uncertain effects), and emergent behavior (where collective phenomena result from local interactions, like when individuals engage with posts on social media that align with their political beliefs). The authors study the case of three USA political groups, each group indexed by $i\in \left\{ 1,2,3\right\}$: Two types of interactions are classified and illustrated in Figure 1:…

agent based complex systems group interactions hamiltonian modeling politics polling data statistical mechanics

October 17, 2023 by Erin McGee

Protecting representation and preventing gerrymandering

By: Erin McGee Paper title: Sequential Monte Carlo for Sampling Balanced and Compact Redistricting Plans Authors and year: Cory McCartan, Kosuke Imai, 2023 Journal: Annals of Applied Statistics (Forthcoming 2023), https://doi.org/10.48550/arXiv.2008.06131 In 2011, the Pennsylvania General Assembly was accused of drawing a redistricting plan for the state that diluted the power of Democratic voters, while strengthening the Republican vote. The case made its way to the Pennsylvania Supreme Court, where it was determined to be an unfairly drawn map. With more complicated techniques, gerrymandering, or altering districts to purposefully amplify the voting power of some, while reducing others, becomes harder to recognize. Gerrymandered districts are usually identifiable by the ‘jigsaw’ shapes that split counties and municipalities in an attempt to pack certain voting groups into the same district, while splitting others. However, proving that a district map has been purposefully manipulated, as it was in Pennsylvania, is no easy task.…

gerrymandering Monte Carlo Simulations statistics

October 3, 2023 by Andrew Saydjari

Combining Nested Sampling and Normalizing Flows

In order to validate our understanding of the world around us, we want to compare theoretical models to data we have actually observed. Often, these models are functions of parameters, and we want to know the values of those parameters such that the models most closely represent the world. For example, we may believe the concentration of one molecule in a chemical reaction should decrease exponentially with time. However, we also want to know the rate constant, the parameter in the model that multiplies time in the exponential, such that the model exponential curve actually resembles a specific reaction that we observe. This is the problem of parameter inference, for which we often turn to Bayesian methods, especially when working with complex models and/or many parameters..

astronomy Bayesian monte carlo methods nested sampling normalizing flows

September 19, 2023 by David Han

Assurance, a Bayesian Approach in Reliability Demonstration Testing for Quality Technology

Title: Assurance for Sample Size Determination in Reliability Demonstration Testing Authors & Year: Kevin Wilson & Malcolm Farrow (2021) Journal: Technometrics [DOI: 10.1080/00401706.2020.1867646] Why Reliability Demonstration Testing? Ensuring high reliability is critical for hardware products, especially those involved in safety-critical functions such as railway systems and nuclear power reactors. To build trust, manufacturers use reliability demonstration tests (RDT) where a sample of products is tested and failures are observed. If the test meets specific criteria, it demonstrates the product’s reliability. The RDT design varies based on the type of hardware product being tested, whether it is failure on demand or time to failure. Traditionally, sample sizes for RDT have been determined using methods that consider the power of a hypothesis test or risk criteria. Various approaches, such as Bayesian methods and risk criteria evaluation, have been developed over the decades in order to enhance the effectiveness of RDT. These measures…

Bayesian BAyesian acceptance testing reliability reliability demonstration test sample size determination statistics

September 5, 2023 by Moinak Bhaduri

Conform – or else! Conformal scores as tools to lay out a set of likely classification labels

The setting repeats depressingly often. A hurricane inching towards the Florida coast. Weather scientists glued to tracking monitors, hunched over simulation printouts, trying to remove people out of harm’s way. Urgent to them is the need to mark a patch of the shore where the hurricane is likely to hit. Those living in this patch need to be relocated. These scientists, and many before them – it’s hard to say since when – realized what’s at issue here is not quite so much the precise location the storm is going to hit – precise to the exact grain of sand, but a stretch of land (whose length may shrink gradually depending on how late we leave the forecasting) where it is going to affect people with a high chance. A forecast interval of sorts.

classification conformal predictions coverage prediction sets

20/56