Category Archives: Uncategorized

As the fields of statistics and data science have grown, the importance of reproducibility in research and easing the “replication crisis” has become increasingly apparent. The inability to reproduce scientific results when using the same data and code may lead to a lack of confidence in the validity of research and can make it difficult to build on and advance scientific knowledge.

Read more

“Correlation is not causation”, as the saying goes, yet sometimes it can be, if certain assumptions are met. Describing those assumptions and developing methods to estimate causal effects, not just correlations, is the central concern of the causal inference field. Broadly speaking, causal inference seeks to measure the effect of a treatment on an outcome. This treatment can be an actual medicine or something more abstract like a policy. Much of the literature in this space focuses on relatively simple treatments/outcomes and uses data which doesn’t exhibit much dependency. As an example, clinicians often want to measure the effect of a binary treatment (received the drug or not) on a binary outcome (developed the disease or not). The data used to answer such questions is typically patient-level data where the patients are assumed to be independent from each other. To be clear, these simple setups are enormously useful and describe commonplace causal questions.

Read more

This MathStatBites post is a bit meta; this time around we are considering how you might use posts on this site in your teaching. MathStatBites posts are great for those who want to get a sense of what is happening in the latest mathematics and statistics research but who may not have a detailed background on the particular topic at hand. Beyond reading for fun or self study, they can also be a great resource for teachers looking to introduce their students to research and communication strategies for reaching broader audiences.

Read more

Whatever your exact interests in data, frequently, inseparable from model-building, stand other related responsibilities. Sample two crucial ones:

a. the checking of how well your model did: the less frequently you make big, bad decisions – like predicting someone’s salary to be $95,000, an estimate far adrift from the real, say, $70,000 in case it’s a regression problem, or saying a customer will buy a product when, in fact, she won’t, under a classification environment – the happier you are. These accuracies are unsurprisingly, often used to guide the model-building process.

b. the explaining of how you arrived at a prediction: this involves unpacking or interpreting the $95,000. The person, due to his experience, makes $10,000 more than the average, due to his education, makes $20,000 more, but due to his state of residence, makes $5000 less than the average, etc. These ups and downs contribute to a net final value.

Read more

Large volumes of data are pouring in every day from scientific experiments like CERN and the Sloan Digital Sky Survey. Data is coming in so fast, that researchers struggle to keep pace with the analysis and are increasingly developing automated analysis methods to aid in this herculean task. As a first step, it is now commonplace to perform dimension reduction in order to reduce a large number of measurements to a set of key values that are easier to visualize and interpret.

Read more

For statistical modeling and analyses, construction of a confidence interval for a parameter of interest is an important inferential task to quantify the uncertainty around the parameter estimate. For instance, the true average lifetime of a cell phone can be a parameter of interest, which is unknown to both manufacturers and consumers. Its confidence interval can guide the manufacturers to determine an appropriate warranty period as well as to communicate the device reliability and quality to consumers. Unfortunately, exact methods to build confidence intervals are often unavailable in practice and approximate procedures are employed instead.

Read more

Have you ever wondered what it’s like to run an insurance company? What role does statistics play in insurance company operations and how can its use be profitable? In this article, we’re going to explore property insurance and a very recent improvement in statistical modeling in this area.

Read more

Can you imagine lying really, really still for at least 15 minutes? That is the reality of patients who need to complete a magnetic resonance imaging (MRI) scan. Even if you could keep still for that long, a scan could take up to 15 – 90 minutes! Patients need to lie as still as possible so that the MRI machine can capture images used to detect and diagnose diseases. Even the tiniest patient movement can distort the final image that is returned. 

Read more

How many species in our ecosystem have not been discovered? How many words did Williams Shakespeare know but not include in his written works? The unseen species problem has applications in both sciences and humanities, and it has been studied since the 1940s. This classical problem is recently generalized to the unseen features problem. In genomic applications, a feature is a genetic variant compared to a reference genome, and the scientific goal is to estimate the number of new genetic variants to be observed if we were to collect more samples.

Read more

40/50