Flexible Priors to Predict the Number of Unseen Features
How many species in our ecosystem have not been discovered? How many words did Williams Shakespeare know but not include in his written works? The unseen species problem has applications in both sciences and humanities, and it has been studied since the 1940s. This classical problem is recently generalized to the unseen features problem. In genomic applications, a feature is a genetic variant compared to a reference genome, and the scientific goal is to estimate the number of new genetic variants to be observed if we were to collect more samples.
A solid which floats in every orientation: An answer to a problem from the Scottish Book
If a solid object floats in water in every position, is it necessarily a sphere? In a paper published this year in the Annals of Mathematics, Dmitry Ryabogin proves the answer is “no”.
The Stellahedral Geometry of Matroids
Some of the hardest questions to answer in math are the simplest to state. For example “when does a sequence of numbers $a_1, a_2, a_3, \ldots$ have the property that $a_{i}^2 \geq a_{i-1}a_{i+1}?” A sequence having this property is called “log-concave”. To get familiar with log-concavity, let’s consider the most famous log-concave sequence: the sequence found by specifying a row of Pascal’s triangle
How Statistics Can Save Lives in a Pandemic
In responding to a pandemic, time is of the essence. As the COVID-19 pandemic has raged on, it has become evident that complex decisions must be made as quickly as possible, and quality data and statistics are necessary to drive the solutions that can prevent mass illness and death. Therefore, it is essential to outline a robust and generalizable statistical process that can not only help to diminish the current COVID-19 pandemic but also assist in the prevention of potential future pandemics.
Determining the best way to route drivers for ridesharing via reinforcement learning
Companies often want to test the impact of one design decision over another, for example Google might want to compare the current ranking of search results (version A) with an alternative ranking (version B) and evaluate how the modification would affect users’ decisions and click behavior. An experiment to determine this impact on users is known as an A/B test, and many methods have been designed to measure the ‘treatment’ effect of the proposed change. However, these classical methods typically assume that changing one person’s treatment will not affect others (known as the Stable Unit Treatment Value Assumption or SUTVA). In the Google example, this is typically a valid assumption—showing one user different search results shouldn’t impact another user’s click behavior. But in some situations, SUTVA is violated, and new methods must be introduced to properly measure the effect of design changes.
E-values in statistics: apt additions or instruments of generational revolt?
It was never meant to last, you know. Statistical measures have their heydays; permanent relevance is no guarantee. The p-value was – and still is – a tool like no other. Through the years it has been caressed and condemned, worshipped and feared, praised and slandered – all the while standing at the crossroads of almost every hypothesis testing, modeling, and prediction. Operationally, a p-value is convenient: we reject, almost mechanically, our null assumption if this value falls below certain discipline-specific thresholds like 0.01, 0.05, etc. Still, its cumbersome construction, triggering its tricky interpretation and stunning misuses, frequently lands it on the wrong side of both practitioners and stats purists. Bodies such as the American Statistical Association routinely issue caution around its use (https://doi.org/10.1080/00031305.2016.1154108). Experts have been hearing its death rattle for quite a while. The article “E-values: calibration, combination, and applications” by V. Volk and R. Wang could be the final twist of the knife. Here, the authors offer a promising alternative – the e-value – which can coexist with – and, at times, replace – its troubled ancestor.
Predicting the Future (events)
For quality assessments in reliability and industrial engineering, it is often necessary to predict the number of future events (e.g., system or component failures). Examples include the prediction of warranty returns and the prediction of future product failures that could lead to serious property damages and/or human casualties. Business decisions such as a product recall are based on such predictions.
Improving imaging to get the best possible picture of cancer
Have you ever had to retake a photograph on your phone but the sun was shining way too
brightly in the background, causing the subject to appear with a halo? Maybe your arm just
wasn’t within reach and those 10 family members just couldn’t fit inside the frame, leaving
someone just ever so slightly on the outskirts?
Making the Joint Statistical Meetings truly “joint” with a broader audience
I recently got back from the Joint Statistical Meetings in Washington, D.C. where I talked about making audiences concrete and motivating authentic arguments for statistics students (and spread the word about MathStatBites of course). This is a big conference where statisticians from all over the world get together to talk shop, and it was back in person after a few years of going virtual.
Navigating the “Black Hole of Statistics”: Model Selection
A statistical toolbox in some ways is like an endless buffet. There are tons of statistical methods out there, ranging from linear models to statistical tests to neural networks. In addition, with increasing amounts of data, new applications from other fields, and increased computational power, methods are constantly being created or improved upon. Having so many possibilities, of course, has its perks. But researchers inevitably must face this daunting question: what method do you choose and why?