https://nightingaledvs.com/ive-stopped-using-box-plots-should-you/
Thinking about this argument against box plots:
It's true that there are many distributions that are not well-suited to box plots, and that it's easy to concoct examples that are poorly/confusingly represented by box plots.
My counterpoints:
* Being able to come up with an example where a box plot is misleading reminds me of Anscombe's Quartet (https://en.wikipedia.org/wiki/Anscombe%27s_quartet). You will *always* be able to do that -- at a super high conceptual level, any time you compress or throw away data, as one does when converting a set of numbers into its mean or standard deviation, there's wiggle room that can hide stuff. It's like projecting points in the plane onto the line -- you lose something.
A box plot, abstractly, is like projecting a set of N elements into a set of 5 numbers, so you'll always have room to get some kind of Anscombe's Quartet behavior.
* Box plots work pretty well for symmetric, unimodal distributions -- "normal-ish", you might say. On the one hand, that's a big limitation; on the other hand, there are a LOT of those distributions in practice!
So. I don't think the argument there is strong enough to *entirely* abandon box plots, but it is correct that these days we have lots of other, better options.
#statistics #data #data_visualization