Zeros of Random Polynomials and Their Higher Derivatives
Complex polynomials are one of the oldest and most fundamental objects of study in mathematics, and are ubiquitous in applications.
How Adopting Reproducible Practices Can Benefit Data Science Education
As the fields of statistics and data science have grown, the importance of reproducibility in research and easing the “replication crisis” has become increasingly apparent. The inability to reproduce scientific results when using the same data and code may lead to a lack of confidence in the validity of research and can make it difficult to build on and advance scientific knowledge.
Pinpointing Causality across Time and Geography: Uncovering the Relationship between Airstrikes and Insurgent Violence in Iraq
“Correlation is not causation”, as the saying goes, yet sometimes it can be, if certain assumptions are met. Describing those assumptions and developing methods to estimate causal effects, not just correlations, is the central concern of the causal inference field. Broadly speaking, causal inference seeks to measure the effect of a treatment on an outcome. This treatment can be an actual medicine or something more abstract like a policy. Much of the literature in this space focuses on relatively simple treatments/outcomes and uses data which doesn’t exhibit much dependency. As an example, clinicians often want to measure the effect of a binary treatment (received the drug or not) on a binary outcome (developed the disease or not). The data used to answer such questions is typically patient-level data where the patients are assumed to be independent from each other. To be clear, these simple setups are enormously useful and describe commonplace causal questions.
Explainable groupings in the face of noisy, high-dimensional madness: Wild ambitions tamed through features’ salience
Whatever your exact interests in data, frequently, inseparable from model-building, stand other related responsibilities. Sample two crucial ones:
a. the checking of how well your model did: the less frequently you make big, bad decisions – like predicting someone’s salary to be $95,000, an estimate far adrift from the real, say, $70,000 in case it’s a regression problem, or saying a customer will buy a product when, in fact, she won’t, under a classification environment – the happier you are. These accuracies are unsurprisingly, often used to guide the model-building process.
b. the explaining of how you arrived at a prediction: this involves unpacking or interpreting the $95,000. The person, due to his experience, makes $10,000 more than the average, due to his education, makes $20,000 more, but due to his state of residence, makes $5000 less than the average, etc. These ups and downs contribute to a net final value.
Pulling Patterns out of Data with a Graph
Large volumes of data are pouring in every day from scientific experiments like CERN and the Sloan Digital Sky Survey. Data is coming in so fast, that researchers struggle to keep pace with the analysis and are increasingly developing automated analysis methods to aid in this herculean task. As a first step, it is now commonplace to perform dimension reduction in order to reduce a large number of measurements to a set of key values that are easier to visualize and interpret.
New Methods for Calculating Confidence Intervals
For statistical modeling and analyses, construction of a confidence interval for a parameter of interest is an important inferential task to quantify the uncertainty around the parameter estimate. For instance, the true average lifetime of a cell phone can be a parameter of interest, which is unknown to both manufacturers and consumers. Its confidence interval can guide the manufacturers to determine an appropriate warranty period as well as to communicate the device reliability and quality to consumers. Unfortunately, exact methods to build confidence intervals are often unavailable in practice and approximate procedures are employed instead.
How do insurance companies make profits?
Have you ever wondered what it’s like to run an insurance company? What role does statistics play in insurance company operations and how can its use be profitable? In this article, we’re going to explore property insurance and a very recent improvement in statistical modeling in this area.
When Physics and Engineering Imaging Solutions Collide in MRI Scans
Can you imagine lying really, really still for at least 15 minutes? That is the reality of patients who need to complete a magnetic resonance imaging (MRI) scan. Even if you could keep still for that long, a scan could take up to 15 – 90 minutes! Patients need to lie as still as possible so that the MRI machine can capture images used to detect and diagnose diseases. Even the tiniest patient movement can distort the final image that is returned.
How to Learn about Housing Dynamics when You Don’t Have Housing Data
Data surrounds us in many aspects of our lives. We look at ratings on Amazon to determine whether to buy a product. We use Fitbits to track our step count. We browse Netflix recommendations generated using our streaming history. Everywhere, decisions are being made from numbers and data. However, while it seems like we can get data on anything, some datasets are much easier to collect than others.