Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As I have mentioned several times to you start with vapnik. That's where modern ml begins. Brieman is great but he got a few things wrong, for example he thought boosting and bagging are fundamentally the same, that is they are just variance minimization methods. In fact boosting us what gives random forest its power. BTW less well known trick, use random rotation not just random selection of features or datapoints. Their's lot more to ml than decision trees. One of its fundamental framework is convergence of empirical processes, which is essentially glivenko cantelli on steroids. Vapnik and Chervonenkis provided the first modern breakthrough result but a lot has happened since then.

I would say screw the hype just read what it is. I think your disdain for the name 'machine learning' is getting in your way.

I strongly disagree with your allegation of academic dishonesty though. Optimization has been a fundamental part of ml from the very start and with a strong overlap in the community. I am sure nemirovskii will ring a bell. In any case how can ml steal from applied math when it is applied math. The lingo differs here and there: estimating parameters become learning parameters etc.

Andrew NGO's course is not for you. Go directly to the books and then conference and journal papers.

Focus varies but info theory, stats, signal processing, approximation theory are pretty much the same thing, so are their open problems.

Btw you could not be more wrong about stochastic optimal control or repeated games being far from current ml. Oh and another major major tool in ml is geometry of function spaces. You are reading the wrong sources, probably the populist ones. Get to the real stuff given what i know of your taste in math i think you will enjoy it.



Thanks, I'll keep that.

"glivenko cantelli on steroids", good. Sounds like they actually did something.

Yes, I'm torqued by the new learning labels on old bottles of pure/applied math, but that is not in my way.

> The lingo differs here and there: estimating parameters become learning parameters etc.

Rubs my fur the wrong way.

If they have some stuff beyond borrowing from Breiman, okay.

What's "in my way" now is my startup: I've got the math derived and typed into TeX and the 80,000 lines of typing for the code, with the code running, intended for production, and in alpha test, so just for now I no longer have any pressing math problems to solve.

But, in time, I may return to my math and tweak it a little to try to get some variance reduction. Maybe some of the better machine learning literature would help, or maybe I'll just derive it myself again.

Function space geometry is about where much of my core math is.

Thanks.


Heh! indeed, its all geometry :)

Happy to hear back from you. I am actually gladdened that your anomaly detection work is getting some interest on HN lately. Hope something comes out of it. I am now slowly coming to the conclusion that pushing better methods on an existing stack would be really hard. Too much friction, too much politics. Perhaps the way is to create your own better cloud of servers, but that's really big league stuff. Not sure I have the stomach for that.

Curious if you have given any thought to the choice of the metric space where you define your statistics. That might play an important role from what I have seen. There might be an interesting manifold story there.

Big spoilsports are non-stationarities and even bigger are those fat tails. If only everything had a moment generating function.

I see that you have been pointed to Abu-Mostafa he is definitely a good source. Not that Andrew Ng is unaware of the stuff, far from it, he is fighting a different battle: to make parts of ML a commodity.

If you have time then you can browse the proceedings of COLT (conference on learning theory) and ICML.

> or maybe I'll just derive it myself again.

You almost always have to derive it yourself anyway even after you have seen the derivation by somebody else.


http://amlbook.com

https://www.youtube.com/watch?v=mbyG85GZ0PI (incidentally for graycat, Yaser Abumostafa is a Muslim Egyptian immigrant from Cairo) He covers the VC dimension in https://www.youtube.com/watch?v=Dc0sr0kdBVI, and leaves the proof to an appendix of the book.


Thanks. I looked a few minutes at two of his lectures. I'll keep the URL and watch his lectures during dinners.

Lucky he got out of Egypt before they strung him up! Such good people is what US immigration has for some decades now tried to be about. Maybe we will get back to it.

"Muslim"? I don't care if he is Zoroastrian either. Or worships some sun god. I don't care about his religion. I do care if he wants to blow up buildings. Somehow I doubt if he does.

Looking at his videos, my first cut, crude guess is that he is looking at modern generalizations of old discriminate analysis. Yup, that can be important. Maybe it could be important for, say, one of my old interests, anomaly detection, say, as a doable alternative to Neyman-Pearson where often in practice we don't have nearly enough data. Maybe his interest is in medical diagnosis which, IIRC, was some of Breiman's interest.

But, first cut, it looks like, again, the criterion will be, does the model fit the data well? That is, we have little or nothing to recommend the model except that it fits the data well. But, then, in the case of his lectures, it looks like maybe he is making progress to also knowing that the model will predict well. I'm looking forward to how he does that.

In contrast, if that is important, in my work in anomaly detection, discussed here on HN often enough, I found false alarm rate from some derivations in applied probability with no model fitting at all. Okay, I don't care if the cat is black or white as long as it catches mice.

From a glance, it looks like he is addressing what is meant by learning -- terrific! Not just throwing words around! Then he seems to be addressing when such learning is feasible, etc. Sounds good; I've wondered some about something like that.

But my interest now in what he is doing is a bit limited since the core math in my startup seems to be quite different.

Thanks.


> it looks like maybe he is making progress to also knowing that the model will predict well. I'm looking forward to how he does that.

Yes that's exactly it. By way of Vapnik and Chervonenkis' result (essentially an uniform law of large numbers) one upper bounds the expected accuracy (over the unknown distribution) of a classifier in terms of the training error and another quantity that depends on the class of hypothesis that one is using. One can give bounds even when one is using an infinite class, for example all linear functions in the feature space, or some Hilbert space of functions.

This was one of _the_ major early break through result. Its often quoted in the context of ML but it really is a result in probability theory. Since it bounds the most pessimistic situation possible, they are quite bad (although achievable).

It also brought about a paradigm change in the mindset. Since the optimal classifier is just the thresholded conditional density, early approaches had focused mostly on estimating this conditional density. But that's an impossible task. V&C showed even if you do not have enough data to learn the density, you may have more than enough for good prediction accuracy. Don't learn the conditional density, just learn the discriminating function directly by optimizing its expected loss.

People have moved to different tools to bound expected prediction accuracy. You get a lot more reasonable bounds, say with the PAC-Bayesian theorem.

Key thing is that these are distribution independent, non-asymptotic and also dimensionality independent.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: