Blog

Whatever your exact interests in data, frequently, inseparable from model-building, stand other related responsibilities. Sample two crucial ones:

a. the checking of how well your model did: the less frequently you make big, bad decisions – like predicting someone’s salary to be $95,000, an estimate far adrift from the real, say, $70,000 in case it’s a regression problem, or saying a customer will buy a product when, in fact, she won’t, under a classification environment – the happier you are. These accuracies are unsurprisingly, often used to guide the model-building process.

b. the explaining of how you arrived at a prediction: this involves unpacking or interpreting the $95,000. The person, due to his experience, makes $10,000 more than the average, due to his education, makes $20,000 more, but due to his state of residence, makes $5000 less than the average, etc. These ups and downs contribute to a net final value.

Read more

Large volumes of data are pouring in every day from scientific experiments like CERN and the Sloan Digital Sky Survey. Data is coming in so fast, that researchers struggle to keep pace with the analysis and are increasingly developing automated analysis methods to aid in this herculean task. As a first step, it is now commonplace to perform dimension reduction in order to reduce a large number of measurements to a set of key values that are easier to visualize and interpret.

Read more

For statistical modeling and analyses, construction of a confidence interval for a parameter of interest is an important inferential task to quantify the uncertainty around the parameter estimate. For instance, the true average lifetime of a cell phone can be a parameter of interest, which is unknown to both manufacturers and consumers. Its confidence interval can guide the manufacturers to determine an appropriate warranty period as well as to communicate the device reliability and quality to consumers. Unfortunately, exact methods to build confidence intervals are often unavailable in practice and approximate procedures are employed instead.

Read more

Have you ever wondered what it’s like to run an insurance company? What role does statistics play in insurance company operations and how can its use be profitable? In this article, we’re going to explore property insurance and a very recent improvement in statistical modeling in this area.

Read more

Can you imagine lying really, really still for at least 15 minutes? That is the reality of patients who need to complete a magnetic resonance imaging (MRI) scan. Even if you could keep still for that long, a scan could take up to 15 – 90 minutes! Patients need to lie as still as possible so that the MRI machine can capture images used to detect and diagnose diseases. Even the tiniest patient movement can distort the final image that is returned. 

Read more

How many species in our ecosystem have not been discovered? How many words did Williams Shakespeare know but not include in his written works? The unseen species problem has applications in both sciences and humanities, and it has been studied since the 1940s. This classical problem is recently generalized to the unseen features problem. In genomic applications, a feature is a genetic variant compared to a reference genome, and the scientific goal is to estimate the number of new genetic variants to be observed if we were to collect more samples.

Read more

If a solid object floats in water in every position, is it necessarily a sphere? In a paper published this year in the Annals of Mathematics, Dmitry Ryabogin proves the answer is “no”. 

Read more

Some of the hardest questions to answer in math are the simplest to state. For example “when does a sequence of numbers $a_1, a_2, a_3, \ldots$ have the property that $a_{i}^2 \geq a_{i-1}a_{i+1}?” A sequence having this property is called “log-concave”. To get familiar with log-concavity, let’s consider the most famous log-concave sequence: the sequence found by specifying a row of Pascal’s triangle

Read more

In responding to a pandemic, time is of the essence. As the COVID-19 pandemic has raged on, it has become evident that complex decisions must be made as quickly as possible, and quality data and statistics are necessary to drive the solutions that can prevent mass illness and death. Therefore, it is essential to outline a robust and generalizable statistical process that can not only help to diminish the current COVID-19 pandemic but also assist in the prevention of potential future pandemics. 

Read more

10/17