It's disheartening to see that the content of this isn't just "well, duh!" and that some people actually need to be convinced.
That said, this article is weird ("theoretical probability" vs. "experimental probability"... huh?) and is not exactly something I would want to share around. It almost seems like the author wrote it for their own benefit, after themselves recently realizing that probability is important and useful/essential in data science, and wanting to share what they learned. I'm not sure it's worth anyone's time to read it.
Yes, it's very weird. I can't tell if the author can't express themselves well enough in English, doesn't understand the topic, or something else happened (editorialization, SEO garbage...)
> Likelihood is applied when you want to increase the chances of a specific event or outcome occurring.
???
> if 12 cars go down a particular road at 11 am every day, we can use Poisson distribution to figure out how many cars go down that road at 11 am in a month.
???
> if we’re looking at how many chocolate bars were sold in a day, we would use the normal distribution. However, if we want to look into how many were sold in a specific hour, we will use t-distribution
I suffered whiplash for how quickly I went from "awesome, this person is going to explain some nuanced distinction I've never considered!" to "i don't think this makes any sense"
My guess is they're trying to get kids to think about inductive Vs deductive reasoning. And it may set a foundation for understanding the difference between a posterior and a priori distributions. But it comes across as if they're a different type of object, which feels a little off.
I'm not sure if this is related, but I believe there are very deep open questions in probability (eg.: the interpretation of probabilities based on counter-factuals) and it's what came to mind: https://en.wikipedia.org/wiki/Probability_interpretations
In some sense statistics is like "applied" probability, which does sound a bit like "experimental" probability but that's not what I understand the linked article to mean. My understanding is:
I flip a coin twice. It lands heads-up both times. Then the experimental probability of this coin landing heads-up is 1.
You give me a coin which you guarantee has a 50/50 chance of landing heads-up. The theoretical probability of it landing heads-up twice is 1/4.
Either type of probability can be used in statistics.
Dunno, at my college course we were taught that probability is about knowing the probabilistic properties of an experiment and predicting its outcomes, whereas statistic is about knowing the outcomes and predicting the probabilistic properties of the experiment
Statistics (the old school kind) is just about computing summary metrics for large amounts of data.
Probability is about deriving properties of a sample from the population, based on population parameters.
Statistical inference (what is usually meant by "statistics" these days) is about figuring out what sort of population parameters could have produced the sample at hand -- i.e. probability but backwards.
Well I indeed expected more than a few basic definitions for sure.
For instance I was expecting a few case studies where a blind application of sklearn defaults led to issues fixed with proper probabilistic thinking.
I remain unconvinced about the importance of probabilities for data science, even more so when I doubt 1% of practicing data scientists could explain what a measure is.
If only there were people who lived before us who put some ink on dead trees and made an effort to think
clearly about probability. We could just read what smart people figured out. Let's start a new field called Human Learning!
The intuitive probability we have built-in in our brains doesn't match the mathematical concept of probability, leading to certain paradoxical outcomes that don't make sense in regular language but are perfectly normal for trained minds. This pops up when data scientists discuss their findings with untrained managers all the time. The longer a mind is used to the mathematical notion of probability, the less it remembers early intuitive notion and loses the contact with regular folks and their perception.
The problem with probability is that it's actually quite a difficult topic. It is not very intuitive to most people, and it's basically the "footgun" of applied math - things could feel right, but still be woefully incorrect. There are enough fallacies for everyone.
I'm not saying that this is a "problem" in the sense that one should discard probability and statistics - but rather that in order to truly know what's going on, you need to actually study the topics. Deeper than surface-level knowledge. It's not something you can learn over the course of a tutorial or two.
Statistican used to be kinda insane wizard figures in a lot of hospitals. Because thinking hard about statistics literally requires you to think really hard about unknowns and simplifying/modeling the entire world using equations. But a lot of that sense of the herculean task has been lost in the modern "data science", where neural nets are basically treated as a panacea, there are no measurement errors, and all problems are perfectly predicted.
If a statistican from the 50s walked into a modern start up office, they'd laugh at the lack of validation and testing and not be at all shocked when none of the predictions worked
> To break it down - probability is about possible results, whilst likelihood is about hypotheses.
Nooooo. I want to report this article to the fake news / harmful speech authorities. Never a better case has been made against free speech and pro censorship.
‘theoretical probability vs. experimental probability’ take a moment to reflect on that.
We are all drilled to some extend to accept probability theory, but there are some truly mind bending exercises in interpreting probability.
How do you define probability for a event that can only happen once, e.g. what is the probability of our cosmic microwave background pattern ?
How do you define conditional probability when conditioning on the assumption that an impossible event has occurred, e.g. what is the probability of the Color of my wall changing assuming Garfield the cartoon cat would enter the room?
> How do you define conditional probability when conditioning on the assumption that an impossible event has occurred, e.g. what is the probability of the Color of my wall changing assuming Garfield the cartoon cat would enter the room?
I mean mathematically you cannot apply a measure to something outside the space the measure has been defined on (the probability space is defined as the powerset of the set of possible values), so at least from a theory standpoint this question isn't that deep.
Actually it is, since measure theory and the Kolmogorov construction of probability is only one way to build an axiomatic theory of probability.
It just happens to be the one most of us learn first.
Its a bit like asking if the set of all sets that do not contain themselves is contained in itself. The question makes no sense in set theory, but it leads to important extensions.
Disclaimer: Googler here, but not on the TF team and opinions are my own.
Tangential, but related. For the Deep learning folks I recommend having a look at tensorflow probability or some pytorch analogue. For instance, following [1] you see that bolting the capability to model at least aleatoric uncertainty is super simple.
The benefits are enormous. In fact, apart from technical aspects that such an approach can unlock (eg if the output is favorable but the uncertainty is too high still don't act on the predictions because the cost of a acting on a false positive is >> than non acting on a false negative) I found that being able to output the model uncertainty goes a long way in convincing non technical stakeholders about trusting predictions.
I didn't know there was a name for this and established set of tools for it. Good things come from even the trashiest of HN front page articles, thank you.
Edit: wow, I don't think I've seen "support vector machine" mentioned even once anywhere in the last 2 years. Times have changed! I wonder how this toolbox has been updated in the age of deep learning and probabilistic programming.
I think this is supposed to be a general technique that can be applied to any classification algorithm. Whether the performance is practical for DNNs is another matter. As I understand it there is ongoing research in the area. I think there is overlap between the inventors of SVMs and conformal prediction, so that might explain why they are mentioned!
I wish people would just give simple explanations for this stuff and not make it more complicated than it needs to be.
Something like a visual explanation showing how a conditional distribution is like a single row or column from a 2D table would be so much easier for people to understand.
But also not accurate. Probability interpreted as having to do with sets is just the classical formulation, and really, it's more like measure-theory than "probability" which is a specific application.
The bayesian interpretation, which is easily the most general, is happy to take the arguments of the probability function inscrutably as bare propositions.
It's not not accurate, it's just not complete from the perspective of modern theory.
I agree that conditional probability is best demonstrated graphically. The sample space is a big rectangle, and events are represented as polygons or other closed shapes, with area proportional to their probability. Conditional probability elegantly falls out of this representation by treating one of these event-shapes as a new "sub sample space" and computing fractions of areas.
I don't know. I think you need to start with the "world in a rectangle model" to build intuition.
Then you can blow your students' minds with the possibility that the "rectangle" is actually the unknowably massive timeline of the entire universe, from the Big Bang until its heat death, and all of our models and reasoning are conditional on some subset of that gigantic world-rectangle. In my opinion, the power of this philosophical leap is lost without the basic geometric intuition.
It’s not totally accurate, but it gives a pretty good intuition that an average middle school student could easily grasp.
Think about it like PCA. Do you lose some information? Yes. But you can do a lot with only the first few principle components (i.e. basic concepts).
Things like events, sets, measures, etc add important details like why you have to divide the selected row / column by its sum to get a true conditional distribution; how many dimensions the array should have; constraints, properties, and relationships of subarrays, etc. But that all becomes a lot easier to understand once you have the basic intuition. And once you get good enough you can start to let go of the grounding in arrays and ground in other concepts and then think about it more abstractly.
That said, this article is weird ("theoretical probability" vs. "experimental probability"... huh?) and is not exactly something I would want to share around. It almost seems like the author wrote it for their own benefit, after themselves recently realizing that probability is important and useful/essential in data science, and wanting to share what they learned. I'm not sure it's worth anyone's time to read it.