Machine learning models are excellent at discovering patterns in data to make predictions. However, their insights are limited to the input data itself. What if we could provide additional knowledge about the model features to improve learning? For example, suppose we have prior knowledge that certain features are more important than others in predicting the target variable. Researchers have developed a new method called the feature-weighted elastic net (“fwelnet”) that integrates this extra feature knowledge to train smarter models, resulting in more accurate predictions than regular techniques.
One of the key goals of science is to create theoretical models that are useful at describing the world we see around us. However, no model is perfect. The inability of models to replicate observations is often called the “synthetic gap.” For example, it may be too computationally expensive to include a known effect or to vary a large number of known parameters. Or, there may be unknown instrumental effects associated with variability in conditions during the data acquisition.
Pinpointing Causality across Time and Geography: Uncovering the Relationship between Airstrikes and Insurgent Violence in Iraq
“Correlation is not causation”, as the saying goes, yet sometimes it can be, if certain assumptions are met. Describing those assumptions and developing methods to estimate causal effects, not just correlations, is the central concern of the causal inference field. Broadly speaking, causal inference seeks to measure the effect of a treatment on an outcome. This treatment can be an actual medicine or something more abstract like a policy. Much of the literature in this space focuses on relatively simple treatments/outcomes and uses data which doesn’t exhibit much dependency. As an example, clinicians often want to measure the effect of a binary treatment (received the drug or not) on a binary outcome (developed the disease or not). The data used to answer such questions is typically patient-level data where the patients are assumed to be independent from each other. To be clear, these simple setups are enormously useful and describe commonplace causal questions.
In responding to a pandemic, time is of the essence. As the COVID-19 pandemic has raged on, it has become evident that complex decisions must be made as quickly as possible, and quality data and statistics are necessary to drive the solutions that can prevent mass illness and death. Therefore, it is essential to outline a robust and generalizable statistical process that can not only help to diminish the current COVID-19 pandemic but also assist in the prevention of potential future pandemics.