Hacker Newsnew | past | comments | ask | show | jobs | submit | HighFreqAsuka's commentslogin

Take a look at The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text (https://arxiv.org/pdf/2506.05209). They build a reasonable 7B parameter model using only open-licensed data.


They mostly do that. They risked legal contamination by using Whisper-derived text and web text which might have gotchas. Other than that, it was a great collection for low-risk training.


> Here’s what we’ve found out: It’s not about the cocoa-beans, but about the way they are treated during the manufacturing process.

I eat a lot of high end single origin chocolate bars, and I simply don’t believe this. Two bars from the same brand at the same percentage of cocoa content using different beans taste completely different. In exactly the same way as wine or coffee. It’s one of the most interesting parts of eating good chocolate. I just don’t believe this approach will ever replace my chocolate consumption, but may have a shot at the larger market of bad chocolate bars.


What are some of you choice bars? Recently I tried a few single origin bars and was floored by how different they tasted amongst themselves, and how dramatically different they were to your run of the mill godiva or lindt.


Fruition is my entry recommendation. Every bar they make is good. Then Castronova, Goodnow Farms, Askinosie, Soma, and Dick Taylor. A good heuristic for evaluating a new brand is if the package tells you where the beans are from and if the ingredients list is “cocoa beans, sugar”.

EDIT: I realize you asked about bars not brands, but I’m in transit and brands was easier than individual bars. I’m a huge fan of the Askinosie orange bar in particular.


How much of this is variation between farms and how much is variations between batches? If the taste of every batch (a given farm's crop in a given season) were effectively random we would also see large variations from farm to farm. The theory are suggesting is that a given farm (or a given country, or a given brand) is long-term consistent and different from one another, but a double blind difference taste test at low N would be insufficient; You would need to test/quantify other forms of variation as well.


I'm likewise extremely skeptical, but I'd still like to taste it.

Maybe a replacement where chocolate isn't the main ingredient?


We've collectively failed at this problem as a society.

We've 1. made it extremely difficult, even illegal, to build more denser housing. 2. devalued the status of electricians, plumbers, and carpenters leading to a shortage of people we need to build more homes. 3. made it much too easy to get very large mortgage loans, incentivizing people to leverage themselves much more than they should to purchase homes. Thus bidding up the price of homes as otherwise financially smart people are forced to play the game as well. 4. built software to collectively price fix rents, favoring higher rents over maximum occupancy.

All of this has turned housing into an asset class, in which a significant fraction of the average American's net worth is invested, and has led to huge inflows of investor money. The incentives to not fix any of this are very strong.


How is this any fault of “society’s” beyond the requirement that you must live in or close to certain cities/areas to make “good money” in any non-law/non-medical profession? I’m going to lookout for my family without any malicious intent towards others. Other people might have different values and they are free to live, vote, etc where and how they want. As a resident of the Phoenix metro area I wouldn’t expect to be able to have any impact on how Santa Monica is building.


The current state of affairs are the direct result of government policy and investment, starting with mid-century "urban renewal" (which demolished existing dense and close-in housing and infrastructure), continuing with government bias towards suburbanization through municipal bonds and auto infrastructure spending, and most recently with the high tax and subsidy benefits that go to homeowners, and particularly first-time purchasers. Zoning and "character" codes factor in as well. Private interests were certainly involved (particularly with the realty industry's contribution to white flight and the banking industry's mortgage financing schemes), but much of this very much "society's" fault. I go to the "The Devil Wear's Prada" theory of decision-making: if you picked it off the rack or see a lot of other people doing the same thing, question who made the decision for you.


Well in Phoenix's case the big failure is continuing to subsidize water for agricultural usage. That is a choice society has made and it is causing housing in the valley to get more expensive as new builds are starting to be banned over water rights.


Canada only did 1, but not really 2 or 3 as much as the states, and the mess is even worse.

What it always comes down to is a pretty basic supply and demand problem in my head. Build a surplus of housing, and even the (4) algorithms will push rent and prices lower.


Did you not read the word “potentially”? Topological spaces are a more general case of spaces that contain the discrete case as a subset.


By having nearly no volume to begin with. Many US stocks are listed across many of the ~13 US exchanges, so what you're describing is already the status quo. According to RegNMS regulations, traders cannot trade through protected quotes, and so trading firms already have to be aware of the prices on all US exchanges.

But LSTE has so little volume, that you frankly forget they exist most of the time.


The exclusivity and (assumed) low volume is what made me think of arb opportunities in the first place :) I doubt I'll be getting an invitation to trade.


I haven't tried, but I believe that upon request, your brokerage firm will direct trades to LTSE or any venue of your choosing.


No, there are many very mathematically inclined deep learning researchers. It's an empirical science because the mathematical tools we possess are not sufficient to describe the phenomena we observe and make predictions under one unified theory. Being an empirical science does not mean that the field is a "wild west". Deep learning models are subjectable to repeatable controlled experiments, from which you can improve your understanding of what will happen in most cases. Good practitioners know this.


>It's an empirical science because the mathematical tools we possess are not sufficient to describe the phenomena we observe and make predictions under one unified theory.

To me the deep learning is actually itself a [long-awaited] tool (which has well established, and simple at that, math underneath - gradient based optimization, vector space representation and compression) to make a good progress toward mathematical foundations of the empirical science of cognition.

In the 90-ies there were works showing that for example Gabors in the first layer of the biological visual cortex are optimal for the feature based image recognition that we have. And as it happens in the DL visual NNs the convolution kernels in the first layers also converge to the Gabor-like. I see [signs of] similar convergence in the other layers (and all those semantically meaningful vector operations in the embedding space in LLMs are also very telling). Proving optimality or similar is much harder there, yet to me those "repeatable controlled experiments" (i.e. stable convergence) provide strong indication that it will be the case (as something does drive that convergence, and when there is such a drive in dynamic systems, you naturally end asymptotically up ("attracted") near something either fixed or periodic), and that would be a (or even "the") math foundation for understanding of cognition (dis-convergence from the real biological cognition, ie. emergence of completely different, yet comparable, type of cognition would also be great, if not even the much greater result) .


The main point you're making is fair

The only gripe I have is > Being an empirical science does not mean that the field is a "wild west"

I think what you meant to say is: "Being an empirical science does not <b>necessarily</b> mean that the field is a \"wild west\""

you clearly haven't seen the social sciences

> Good practitioners know this

sure?

Edit: Removed unnecessary portions that wouldn't have continued the conversation in any meaningful way


I think the necessarily is clearly implied from context.


I've seen quite a few of these books attempting to explain deep learning from a mathematical perspective and it always surprises me. Deep learning is clearly an empirical science for the time being, and very little theoretical work that has been so impactful that I would think to include it in a book. Of the such books I've seen, this one seems like actively the worst one. A significant amount of space is dedicated to proving lemmas that provide no additional understanding and are only loosely related to deep learning. And a significant chunk of the code I see is just the plotting code, which I don't even understand why you'd include. I'm confident that very few people will ever read significant chunks of this.

I think the best textbooks are still Deep Learning by Goodfellow etal and the more modern Understanding Deep Learning (https://udlbook.github.io/udlbook/).


This book is not aimed at practitioners but I don’t think that means it deserves to be called „actively the worst one”.

Even though the frontier of deep learning is very much empirical, there’s interesting work trying to understand why the techniques work, not only which ones do.

I’m sorry but saying proofs are not a good method for gaining understanding is ridiculous. Of course it’s not great for everyone but a book titled „Mathematical Introduction to x” is obviously for people with some mathematical training. For that kind of audience lemmas and their proof are natural way of building understanding.


Just read the section on ResNets (Section 1.5) and tell me if you think that's the best way to explain ResNets to literally anyone. Tell me if, from that description, you take away that the reason skip connections improve performance is that they improve gradient flow in very deep networks.


the reason skip connections improve performance is that they improve gradient flow in very deep networks.

Can you prove this statement?


Neither do the authors in the book, and I'd argue that after (only) reading the book, the reader wouldn't be equipped to attempt this either (see my other post in this thread), so I think the parent poster has a point.


Yes, I have a very good point in fact. But the above comment purposely chooses not to argue with it, because it's easier to ignore it entirely and argue something else.


The problem is you presented something as a fact while it’s just a guess. Some people guess it’s an improved gradient flow, others guess it’s a smoother loss surface, someone else guesses it’s a shortcut for early layer information to reach later layers, etc. We don’t actually know why resnets work so well.


The point of that comment doesn't have anything to do with how ResNets actually work. You missed the actual point.

> We don’t actually know why resnets work so well.

Yes actually we do. We know, from the literature, that very deep neural networks suffered from vanishing gradients in their early layers in the same way traditional RNNs did. We know that was the motivation for introducing skip connections which gives us a hypothesis we can test. We can measure, using the test I described, the differences in the size of gradients in the early layers with and without skip connections. We can do this across many different problems for additional statistical power. We can analyze the linear case and see that the repeated matmults should lead to small gradients if their singular values are small. To ignore all of this and say that well we don't have a general proof that satisfies a mathematician so i guess we just don't know is silly.


You're doing it again - presenting guesses as facts. Why would a resnet - a batch normalized network using ReLU activations suffer from vanishing gradient problem? Does it? Have you actually done the experiment you've described? I have, and I didn't see gradients vanish. Sometimes gradients exploded - likely from a bad weights initialization (to be clear - that's a guess), and sometimes they didn't, but even when they didn't the networks never converged. The best we can do is to say: "skip connections seem to help training deep networks, and we have a few guesses as why, none of which is very convincing".

We know, from the literature

Let's look at the literature:

1. Training Very Deep Neural Networks: Rethinking the Role of Skip Connections: https://orbilu.uni.lu/bitstream/10993/47494/1/OyedotunAl%20I... they're making a hypothesis that skip connections might help prevent transformation of activations into singular matrices, which in turn could lead to unstable gradients (or not, it's a guess).

2. Improving the Trainability of Deep Neural Networks through Layerwise Batch-Entropy Regularization: https://openreview.net/pdf?id=LJohl5DnZf they are making some hypothesis about an optimal information flow through the network, and that a particular form of regularization helps improve this flow (no skip connections are needed).

3. Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers https://arxiv.org/abs/2203.08120: focus on initial conditions and propose better activation functions.

Clearly the issues are a bit more complicated than the vanishing gradients problem, and each of these papers offer a different explanation of why skip connections help.

It's similar to people building a bridge in 15th century - there was empirical evidence and intuition of how bridges should be built, but very little theory explaining that that evidence or intuition. Your statements are like "next time we should make the support columns thicker so that the bridge doesn't collapse", when in reality it collapsed due to the resonant oscillations induced by people marching on it in unison. Thicker columns will probably help, but they do nothing to improve understanding of the issue. They are just a guess.

That's why we need mathematicians looking at it, and attempting to formalize at least parts of the empirical evidence, so that someone, some day, will develop a compelling theory.


Empirically yes, I can consider a very deep fully-connected network, measure the gradients in each layer with and without skip connections, and compare. I can do this across multiple seeds and run a statistical test on the deltas.


Empirical studies are only useful until the system is mathematically understood. For example, I can construct transformer circuits where the skip connection (provably) purely adds noise.

I can also prove in particular cases the MLP's sole purpose is to remove the noise added from the skip connection.


UDL has some dense math notation in it.

Math isn't just about proofs. It's a way to communicate. There are several different ways to communicate how a neural net functions. One is with pictures. One is with some code. One is with words. One is with some quite dense math notation.


I would say UDL should be very accessible to any undergrad from a strong program.

I would not call the notation ‘dense’ rather it’s ‘abused’ notation. Once you have seen the abused notation enough times, it makes just makes sense. Aka “mathematical maturity” in the ML space.

My views on this have changed as a first year PhD in ML I got annoyed by the shorthand. Now as someone with a PhD, I get it — It’s just too cumbersome to write out what exactly you mean and you write like you’re writing for peers +\- a level.


I agree with that, I think UDL uses the necessary amount of math to communicate the ideas correctly. That is obviously a good thing. What it does not do is pretend to be presenting a mathematical theory of deep learning. Basically UDL is exactly how I think current textbooks should be presented.


I think the mathematical background starts making sense once you get a good understanding of the topic, and then people make the wrong assumption that understanding the math will help learning the overall topic, but it that's usually pretty hard.

Rather than trying to form an ituition based on the theory, it's often easier to understand the technicalities after getting an intuition. This is generally true in exact sciences, especially mathematics. That's why examples are helpful.


Looks like you left the submissions instructions for AISTATS on the last page of the PDF. Don't know if that was intentional but I'm guessing it wasn't.


Ugh, LaTeX is such a pain sometimes. Thanks for catching that.


Reading these comments is starting to feel like an https://xkcd.com/2304/ situation.


There's a theory, price-over-volume, that the sudden shift in demand and expectation of inflation gave companies room to explore a different point on the revenue curve, where they increase the price and simply sell less volume. Prior to the pandemic this was risky and people assumed they were near optimal already. During the pandemic a bunch of companies learned they could push on price, sell less, but still make more revenue. All companies did this independently and simultaneously so the usual competitive effects didn't kick in. And now we're at a new equilibrium that the few companies in each industry are happy with.


Plenty of industries raised prices with very little effect on volume, however, it was not truly "independent", companies can easily coordinate by sending signals via forecasts, press releases, news leaks, etc. If P&G and Unilever both decide to raise the price of soap and toothpaste, are you going to shower or brush less? Meanwhile demand exploded on discretionary spending like travel, restaurants and recreation despite price increases.


> however, it was not truly "independent",

Of course I don't mean to imply they operate in a vacuum. They can obviously see competitors raising their prices. I just mean to say they don't get into a room and actively decide on a price.


Except price fixing does occur. I’m not saying everything is the result of it, but they absolutely do.


Just for clarity, the linked paper in the twitter thread is "An autonomous laboratory for the accelerated synthesis of novel materials" (https://www.nature.com/articles/s41586-023-06734-w) which does have two authors from DeepMind but seems to be mostly from material science researchers at UC Berkeley. This thread is not about the recent Nature paper "Scaling deep learning for materials discovery" (https://www.nature.com/articles/s41586-023-06735-9) from Deepmind which made news a few days ago.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: