The Importance of Probability in Data Science

nerdponx · on Feb 24, 2023

It's disheartening to see that the content of this isn't just "well, duh!" and that some people actually need to be convinced.

That said, this article is weird ("theoretical probability" vs. "experimental probability"... huh?) and is not exactly something I would want to share around. It almost seems like the author wrote it for their own benefit, after themselves recently realizing that probability is important and useful/essential in data science, and wanting to share what they learned. I'm not sure it's worth anyone's time to read it.

qsort · on Feb 24, 2023

> this article is weird

Yes, it's very weird. I can't tell if the author can't express themselves well enough in English, doesn't understand the topic, or something else happened (editorialization, SEO garbage...)

> Likelihood is applied when you want to increase the chances of a specific event or outcome occurring.

???

> if 12 cars go down a particular road at 11 am every day, we can use Poisson distribution to figure out how many cars go down that road at 11 am in a month.

???

> if we’re looking at how many chocolate bars were sold in a day, we would use the normal distribution. However, if we want to look into how many were sold in a specific hour, we will use t-distribution

???

isoprophlex · on Feb 24, 2023

I too suffered from a very high "wtf per minute" rate when reading this article.

GPTZero thinks it's written by a human, for what its worth... i guess the author is a bit confused or unable to clearly articulate ideas.

parpfish · on Feb 24, 2023

I suffered whiplash for how quickly I went from "awesome, this person is going to explain some nuanced distinction I've never considered!" to "i don't think this makes any sense"

n4r9 · on Feb 24, 2023

> "theoretical probability" vs. "experimental probability"... huh?

That raised my eyebrows as well. It looks like it's a Thing They Teach In Schools These Days: https://www.statisticshowto.com/theoretical-probability/

My guess is they're trying to get kids to think about inductive Vs deductive reasoning. And it may set a foundation for understanding the difference between a posterior and a priori distributions. But it comes across as if they're a different type of object, which feels a little off.

epgui · on Feb 24, 2023

I'm not sure if this is related, but I believe there are very deep open questions in probability (eg.: the interpretation of probabilities based on counter-factuals) and it's what came to mind: https://en.wikipedia.org/wiki/Probability_interpretations

AstixAndBelix · on Feb 24, 2023

Wait, isn't "experimental probability" literally what statistics is?

n4r9 · on Feb 24, 2023

In some sense statistics is like "applied" probability, which does sound a bit like "experimental" probability but that's not what I understand the linked article to mean. My understanding is:

I flip a coin twice. It lands heads-up both times. Then the experimental probability of this coin landing heads-up is 1.

You give me a coin which you guarantee has a 50/50 chance of landing heads-up. The theoretical probability of it landing heads-up twice is 1/4.

Either type of probability can be used in statistics.

AstixAndBelix · on Feb 24, 2023

Dunno, at my college course we were taught that probability is about knowing the probabilistic properties of an experiment and predicting its outcomes, whereas statistic is about knowing the outcomes and predicting the probabilistic properties of the experiment

kqr · on Feb 24, 2023

This is close to correct.

Statistics (the old school kind) is just about computing summary metrics for large amounts of data.

Probability is about deriving properties of a sample from the population, based on population parameters.

Statistical inference (what is usually meant by "statistics" these days) is about figuring out what sort of population parameters could have produced the sample at hand -- i.e. probability but backwards.

mhh__ · on Feb 24, 2023

Subjectivity in probability theory goes back quite a long time.

hcks · on Feb 24, 2023

Well I indeed expected more than a few basic definitions for sure.

For instance I was expecting a few case studies where a blind application of sklearn defaults led to issues fixed with proper probabilistic thinking.

I remain unconvinced about the importance of probabilities for data science, even more so when I doubt 1% of practicing data scientists could explain what a measure is.

keithalewis · on Feb 24, 2023

If only there were people who lived before us who put some ink on dead trees and made an effort to think clearly about probability. We could just read what smart people figured out. Let's start a new field called Human Learning!

bitL · on Feb 24, 2023

The intuitive probability we have built-in in our brains doesn't match the mathematical concept of probability, leading to certain paradoxical outcomes that don't make sense in regular language but are perfectly normal for trained minds. This pops up when data scientists discuss their findings with untrained managers all the time. The longer a mind is used to the mathematical notion of probability, the less it remembers early intuitive notion and loses the contact with regular folks and their perception.

TrackerFF · on Feb 24, 2023

The problem with probability is that it's actually quite a difficult topic. It is not very intuitive to most people, and it's basically the "footgun" of applied math - things could feel right, but still be woefully incorrect. There are enough fallacies for everyone.

I'm not saying that this is a "problem" in the sense that one should discard probability and statistics - but rather that in order to truly know what's going on, you need to actually study the topics. Deeper than surface-level knowledge. It's not something you can learn over the course of a tutorial or two.

SQueeeeeL · on Feb 24, 2023

Statistican used to be kinda insane wizard figures in a lot of hospitals. Because thinking hard about statistics literally requires you to think really hard about unknowns and simplifying/modeling the entire world using equations. But a lot of that sense of the herculean task has been lost in the modern "data science", where neural nets are basically treated as a panacea, there are no measurement errors, and all problems are perfectly predicted.

If a statistican from the 50s walked into a modern start up office, they'd laugh at the lack of validation and testing and not be at all shocked when none of the predictions worked

epgui · on Feb 25, 2023

There's a quip in academia that machine learning is really just statistics, but without any assumptions checking.

yantrams · on Feb 24, 2023

Surprised to see a low effort article like this reach the front page. Feels like an SEO fluff piece to me.

nerdponx · on Feb 24, 2023

Same. I almost flagged it, but I figured it would attract some good negative comments that might be instructive for others.

thanatropism · on Feb 24, 2023

> To break it down - probability is about possible results, whilst likelihood is about hypotheses.

Nooooo. I want to report this article to the fake news / harmful speech authorities. Never a better case has been made against free speech and pro censorship.

niemandhier · on Feb 24, 2023

‘theoretical probability vs. experimental probability’ take a moment to reflect on that.

We are all drilled to some extend to accept probability theory, but there are some truly mind bending exercises in interpreting probability.

How do you define probability for a event that can only happen once, e.g. what is the probability of our cosmic microwave background pattern ?

How do you define conditional probability when conditioning on the assumption that an impossible event has occurred, e.g. what is the probability of the Color of my wall changing assuming Garfield the cartoon cat would enter the room?

bootsmann · on Feb 24, 2023

> How do you define conditional probability when conditioning on the assumption that an impossible event has occurred, e.g. what is the probability of the Color of my wall changing assuming Garfield the cartoon cat would enter the room?

I mean mathematically you cannot apply a measure to something outside the space the measure has been defined on (the probability space is defined as the powerset of the set of possible values), so at least from a theory standpoint this question isn't that deep.

niemandhier · on Feb 24, 2023

Actually it is, since measure theory and the Kolmogorov construction of probability is only one way to build an axiomatic theory of probability. It just happens to be the one most of us learn first.

Its a bit like asking if the set of all sets that do not contain themselves is contained in itself. The question makes no sense in set theory, but it leads to important extensions.

Edit: Have a look at the Borel–Kolmogorov paradox, and approaches to consolidate it https://link.springer.com/article/10.1007/s11229-016-1070-8

marcyb5st · on Feb 24, 2023

Disclaimer: Googler here, but not on the TF team and opinions are my own.

Tangential, but related. For the Deep learning folks I recommend having a look at tensorflow probability or some pytorch analogue. For instance, following [1] you see that bolting the capability to model at least aleatoric uncertainty is super simple.

The benefits are enormous. In fact, apart from technical aspects that such an approach can unlock (eg if the output is favorable but the uncertainty is too high still don't act on the predictions because the cost of a acting on a false positive is >> than non acting on a false negative) I found that being able to output the model uncertainty goes a long way in convincing non technical stakeholders about trusting predictions.

[1] https://blog.tensorflow.org/2019/03/regression-with-probabil...

anonymousDan · on Feb 24, 2023

You might find the theory of conformal prediction interesting given your comment: https://arxiv.org/abs/0706.3188

nerdponx · on Feb 24, 2023

I didn't know there was a name for this and established set of tools for it. Good things come from even the trashiest of HN front page articles, thank you.

Edit: wow, I don't think I've seen "support vector machine" mentioned even once anywhere in the last 2 years. Times have changed! I wonder how this toolbox has been updated in the age of deep learning and probabilistic programming.

anonymousDan · on Feb 25, 2023

I think this is supposed to be a general technique that can be applied to any classification algorithm. Whether the performance is practical for DNNs is another matter. As I understand it there is ongoing research in the area. I think there is overlap between the inventors of SVMs and conformal prediction, so that might explain why they are mentioned!

random314 · on Feb 24, 2023

Next article.

The importance of control structures in imperative programming.

sva_ · on Feb 24, 2023

From the article:

> Conditional probability is the possibility of an event/outcome occurring based on an existing event/outcome.

From https://www.investopedia.com/terms/c/conditional_probability...

> Conditional probability is defined as the likelihood of an event or outcome occurring, based on the occurrence of a previous event or outcome.

"Sure you can copy my homework, just change it a bit".

mjburgess · on Feb 24, 2023

Neither definition is correct in any case, probability can be interpreted as a function of propositions.

And "bare" probability can be seen as conditioned on the outcome space, P(A) = P(A|Omega)

pfisherman · on Feb 24, 2023

I wish people would just give simple explanations for this stuff and not make it more complicated than it needs to be.

Something like a visual explanation showing how a conditional distribution is like a single row or column from a 2D table would be so much easier for people to understand.

mjburgess · on Feb 24, 2023

But also not accurate. Probability interpreted as having to do with sets is just the classical formulation, and really, it's more like measure-theory than "probability" which is a specific application.

The bayesian interpretation, which is easily the most general, is happy to take the arguments of the probability function inscrutably as bare propositions.

nerdponx · on Feb 24, 2023

It's not not accurate, it's just not complete from the perspective of modern theory.

I agree that conditional probability is best demonstrated graphically. The sample space is a big rectangle, and events are represented as polygons or other closed shapes, with area proportional to their probability. Conditional probability elegantly falls out of this representation by treating one of these event-shapes as a new "sub sample space" and computing fractions of areas.

mjburgess · on Feb 24, 2023

Yes, but it rather poisons the well against the bayesian, who says that conditional probability is basic; there are no unconditional probabilities.

So this area model at once creates a lot of confusion elsewhere.

nerdponx · on Feb 24, 2023

I don't know. I think you need to start with the "world in a rectangle model" to build intuition.

Then you can blow your students' minds with the possibility that the "rectangle" is actually the unknowably massive timeline of the entire universe, from the Big Bang until its heat death, and all of our models and reasoning are conditional on some subset of that gigantic world-rectangle. In my opinion, the power of this philosophical leap is lost without the basic geometric intuition.

pfisherman · on Feb 24, 2023

It’s not totally accurate, but it gives a pretty good intuition that an average middle school student could easily grasp.

Think about it like PCA. Do you lose some information? Yes. But you can do a lot with only the first few principle components (i.e. basic concepts).

Things like events, sets, measures, etc add important details like why you have to divide the selected row / column by its sum to get a true conditional distribution; how many dimensions the array should have; constraints, properties, and relationships of subarrays, etc. But that all becomes a lot easier to understand once you have the basic intuition. And once you get good enough you can start to let go of the grounding in arrays and ground in other concepts and then think about it more abstractly.

clircle · on Feb 24, 2023

Can't believe that apparently, we need to argue in favor of learning probability for data science.

calebm · on Feb 24, 2023

The importance of pipes in plumbing

Exuma · on Feb 24, 2023

"1.5 or ½"