Profiling Target with BoxPlots
What is this about?
The use of Boxplots in importance variable analysis gives a quick view of how different the quartiles are among the various values in a binary target variable.
## Loading funModeling ! library(funModeling) data(heart_disease)
plotar(data=heart_disease, str_input="age", str_target="has_heart_disease", plot_type = "boxplot")
Rhomboid near the mean line represents the median.
When to use boxplots? When you need to analyze different percentiles across the classes to predict. Note this is a powerful technique since the bias produced due to outliers doesn't affect as much as it does to the mean.
Boxplot: Good vs. Bad variable
Using more than one variable as inputs is useful in order to compare boxplots quickly, and thus getting the best variables...
plotar(data=heart_disease, str_input=c('max_heart_rate', 'resting_blood_pressure'), str_target="has_heart_disease", plot_type = "boxplot")
max_heart_rate is clearly a better predictor than
As a general rule, a variable will rank as more important if boxplots are not aligned horizontally.
Statistical tests: percentiles are another used feature used by them in order to determine -for example- if means across groups are or not the same.
cross_plot can handle from 1 to N input variables, and plots generated by them can be easily exported in high quality with parameter
plotar(data=heart_disease, str_input=c('max_heart_rate', 'resting_blood_pressure'), str_target="has_heart_disease", plot_type = "boxplot", path_out = "my_awsome_folder")
- Key in mind this when using Histograms and BoxPlots They are nice to see when the variable:
- Has a good spread -not concentrated on a bunch of 3, 4..6.. different values, and
- It has not extreme outliers... (this point can be treated with
prep_outliersfunction present in this package)