Explainable groupings in the face of noisy, high-dimensional madness: Wild ambitions tamed through features’ salience
Whatever your exact interests in data, frequently, inseparable from model-building, stand other related responsibilities. Sample two crucial ones:
a. the checking of how well your model did: the less frequently you make big, bad decisions – like predicting someone’s salary to be $95,000, an estimate far adrift from the real, say, $70,000 in case it’s a regression problem, or saying a customer will buy a product when, in fact, she won’t, under a classification environment – the happier you are. These accuracies are unsurprisingly, often used to guide the model-building process.
b. the explaining of how you arrived at a prediction: this involves unpacking or interpreting the $95,000. The person, due to his experience, makes $10,000 more than the average, due to his education, makes $20,000 more, but due to his state of residence, makes $5000 less than the average, etc. These ups and downs contribute to a net final value.
Large volumes of data are pouring in every day from scientific experiments like CERN and the Sloan Digital Sky Survey. Data is coming in so fast, that researchers struggle to keep pace with the analysis and are increasingly developing automated analysis methods to aid in this herculean task. As a first step, it is now commonplace to perform dimension reduction in order to reduce a large number of measurements to a set of key values that are easier to visualize and interpret.