Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To add a little context, this suspension comes immediately after Anna's Archive publicly implicated themselves in the Spotify scraping "hack" in which they downloaded nearly the entire content library of Spotify and was preparing to release it publicly (~300TB worth) via torrent.

They published a blog post outlining their plans.



Did the operators _want_ to poke the well connected & well funded bear with a historical anger problem?


No, but they weren't not going to, given that their mission is to archive all cultural content, by hook or by crook.


Archiving it and publishing it are different things.

More importantly, they may sabotage their mission: If Spotify shuts them down, their exiting archives and especially future archives may be effectively lost.


I guess I should say more accurately: Their mission is to both archive it and publish it. They seem to be explicitly against copyright, on principle. Which I greatly respect.


It's time to abolish copyright. It creates more problems (stiffles innovation, creates rents) than it solves (rewards innovation).


It doesn't create problems for large companies that make AI systems.


Yeah, it seems to only be a problem when you're a human being remixing the culture you grew up with.

Meta can admit to soullessly scraping books they don't own for their for-profit AI datasets [1], and it's not a problem because they're Meta. But if you're an artist? Nope. Sampling in hip hop songs, for example, is in a "complex legal gray area" (translation: "it's illegal but we don't want to admit that out loud") [2].

[1] https://futurism.com/the-byte/facebook-trained-ai-pirated-bo...

[2] https://urbanspook.com/copyright-laws-2025-impact-on-hip-hop...


Fortunately, Spotify does not have that power. Annas Archive is not based in US or EU jurisdictions. They can make access for normal people a bit harder, but not shut it down.

(Edited for clarity)


> Fortunately, Spotify does not have that power. They are not based in US or EU jurisdictions.

Perhaps I misunderstood something, but according to my understanding

1. Spotify is registered in Luxembourg and has its operational headquarter in Sweden (Stockholm). Both are EU countries.

2. I guess it won't be Spotify that sues, but the individual music labels (very likely united).


Annas archive is not based in the EU (sorry for being not clear). So the law in EU is limited to enforce a ban. In germany it is already "banned" via ISP but just DNS.

But the real servers are hosted in kazachstan or russia I think. And they do not cooperate so much with EU courts.

So unless the EU installs a great firewall like china, they cannot really shut it down.


> But the real servers are hosted in kazachstan or russia I think. And they do not cooperate so much with EU courts.

I believe the "official" AA servers only host the website + source code. The actual copyrighted content is stored by volunteers who seed the torrents.


Exactly, this is why the 'Hydra' is difficult to take it down.


Presumably the opposing party is residing in non-US-or(and? depends on the order of evaluation)-EU territory, but I might be mistaken. "They" refers to both sides in the parent comment.


I'm not sure archiving and publishing are different things.


They are, but archiving without publishing is pointless.

I occasionally wonder how many enormous collections of culture like that of Marion Stokes[1] have been lost because their curators made no effort to realize the value of their collection.

1. https://en.wikipedia.org/wiki/Marion_Stokes


Most archives - the ones in libraries, etc. - are not published, except they are available to qualified people who physically travel there. Most are not even fully indexed - nobody knows all of what's there.


My perspective is compatible with this fact. An archive that approximately nobody can access and/or nobody knows what it contains has no value to society at large, except the potential that it may some day be published.

The good news is I'd guess the number of (nonreligious/nonproprietary) institutionally managed pointless archives is dwindling.


> They are, but archiving without publishing is pointless.

One may collect/archive now (when the data is, well, "available"), and publish later, when copyright expires and the material will likely be harder to obtain.


Both are illegal, if you just hoard you will never know if what you have is useful. Only way to judge that is by letting people use it.


I can save a copy of my friend's book on my computer, archiving it. Nobody else could see it unless I publish it.


They stated that they would pass the information on to other archivists and public/private trackers no? They obviously have backups, since there are multiple users seeding Gbs and even TBs of data. Mirrors can be created as well, like TPB.


No, because they are all backed up on torrent. Good luck, getting those "shut down" from the DHT


They didn’t come anywhere close to the entire content library, the 300TB represents about 33% of Spotify, though it is close to 100% of the played music.


Kind of nuts that 66% of their library is virtually unplayed. It’s hard to make it as a musician.


It is ridiculously easy to create an album with Suno and push it Spotify. I'm surprised its only 66% TBH


Anna's archive has a great analysis of the Spotify data.

They identify a huge surge in tracks that few listen to after gen AI started.

The analysis is worth reading. The distribution is (Pareto)^3 ~99% of the tracks played are 1% of the catalogue.


1. Generate slop music nobody will ever listen to 2. ???? 3. Profit


It's actually:

1. Generate slop music no _human_ will ever listen to

2. Use a botnet to "play" this music en masse

3. Profit

This is a whole arms race, with companies (such as Beatdapp) specializing in detecting fraudulent plays.

Source: I work for a niche music retailer that struggles with the same issues on a smaller scale.


From a stat I saw years ago, about the same amount of apps on the iOS app store have never been downloaded.


To be completely fair, I am not certain what it means for a track to be "virtually unplayed".

First off, it was striking to me how little of the "top 10 000" they published back on Christmas I recognize. I'm not sure what I expected, but 10 000 sounds like a big number, so it seemed likely to me, that if I get a random song from my playlist I could find it there. It turned out I hardly can find an artist I recognize. Ok, I can recall a song from Lady Gaga and even Billie Eilish, I've heard of Bruno Mars (cannot recall any song), but I have no idea what is "Bad Bunny", "Doechii", "Drake". I mean, I think I do have a pretty good idea what these things are (abstractly), and I probably wouldn't want to listen that. And while I knew that all this stuff is very popular, I didn't quite realize how little place in the top-10000 it leaves for the music I (and everyone I know) actually listen to.

I didn't download the metadata they released (it would be hard to process it on my laptop anyway), but now I wonder how much of my 3 TB music collection is in top 100 000, or heck, even top 1M Spotify, or on Spotify at all.

I also am sometimes surprised how little scrobbles some tracks get. I didn't bother to find out what this means, how many people still scrobble to Last.fm or ListenBrainz, but it is just surprising when I see that a track that I didn't consider to be obscure was scrobbled like 50 times this year.

So I'm saying that music worlds seems to be terribly fragmented, even more than I imagined. So the very premise of AA backing-up 97% of Spotify (by the number of plays) may be much lesser achievement at "preserving culture" than it may sound. And of course we are about 8 years too late to backup everything, since by now half of it must be generative NN bullshit. And I'm not even sure it's in those leftover 3% (bots listen to bot-generated music too, right)?


> It turned out I hardly can find an artist I recognize

I've heard of 9 of the top 10 and 15 of the top 20 at https://chartmasters.org/most-monthly-listeners-on-spotify/

You might not listen, but surely you have heard of Taylor Swift, Justin Bieber, Ariana Grande, Ed Sheeran, Coldplay and of course Christmas Staples of Mariah Carey and Wham?


First off, this is not the top we are talking about, since there is one that AA provided[0]. I am not sure what it matters which names exactly I've heard of, but if you are that curious: I don't know what is Ed Sheeran and Wham (but cannot vouch I've never heard their music in a supermarket), but I definitely remember "Coldplay" being mentioned in a joke onstage by a NIN member[1], but I didn't bother to check out what they are. I can imagine the faces of Taylor Swift & Justin Bieber, but cannot name any song, and I'm sure I've heard Mariah Carey somewhere, since that name is around longer than Rihanna. I have a song or two of Ariana Grande in my playlist though.

Edit: Ok, I've finally googled "Coldplay". Yeah, definitely heard "Clocks" somewhere.

[0] https://annas-archive.li/blog/spotify/spotify-top-10k-songs-...

[1] https://www.youtube.com/watch?v=qboe5CebixA


You're a (waaay) outlier.


Are you sure? See, my point is a conjecture (based on a reasonable assumption that I cannot be that special), that there must be really a lot of us "outliers" out there (so I'm not even sure it's reasonable to call us that).

Let's reiterate. I am well aware that more people listen to that Bad Rabbit, Taylor Swift or Justin Bieber than they listen to <random name from my playilist>, it's not really a surprise. There even is a special name for people like that, it's "celebrity". In fact, that's probably how most people who are into music (including myself, I might say) would categorize them, as "celebrities", not as "musicians" (though, mind you, of course they are musicians, as everyone who ever sang a song is, it's just that when I hear the word "musician" I don't necessarily think of Taylor Swift). Hence these people indulge themselves for not knowing who these guys are, explaining it that "they are not into celebrities".

And it's no surprise that a lot of people listen to celebrities. I mean, if Trump would release a song right now, it would become #1 on Spotify in no time (for a very short time, but still). Well, maybe not #1, but close.

But I also suppose there are a lot of people who are into music. Maybe not so many, as there are people who are into celebrities, but it's still a lot. And after seeing that top-10 000 I suddenly find it very plausible, that a lot of tracks these people call "massive hits" may turn out to be "virtually unplayed". And hence not in those "97% of Spotify (by # of plays)" that AA archived. I am not even claiming it, I'm just saying that this doesn't seem to be impossible.

For instance, any DnB fan would say that "everyone knows Noisia and Black Sun Empire". It would be absolutely laughable attempt at "preserving human culture" not to include them. Surely all of their tracks must be at least in top-5M, right? Well, after seeing top 10K I'm not so sure anymore.

Maybe you've never heard of them, but surely you've heard of Prodigy. Not a single track from Prodigy on top-10K. Or Chemical Brothers. Or Burial, or Placebo, or Nighwish, or King Crimson. These are very famous names in respective circles. There are 2 tracks from Massive Attack — both featured in super-famous movies and trending on TikTok right now. For God's sake, there are only 8 tracks from Madonna in top 10K. Versus 26 from Imagine Dragons and 124 from "Bad Bunny", whatever it is. How do you like Madonna for an obscure artist?

So, my point is that there may be a lot of people listening almost exclusively to "virtually unplayed" music. Entire discographies of (niche) cult-artists may turn out to be buried in these 66% of "virtually unplayed" tracks.

I guess I should just get the metadata and check, but I'm pretty sure that would be outside of capabilities of the hardware I have on hand, so I'm not sure how to go about that.


The metadata torrent is only ~200GB, which should be well within your capabilities.

https://annas-archive.li/torrents/spotify

Anyway, I think you should keep in mind 2 things:

1) 10,000 tracks really is not a lot. It sounds like a lot, but isn't. My own - relatively small - collection is nearly double that.

2) 10,000 tracks... out of 256,000,000 that AA archived.

I'd be very interested to see some more analysis done on this, particularly as it relates to, say, Last.fm statistics - but I suspect the missing music is not as significant as you think.

In any case, even if every one of those "niche" artists you list are missing from this collection, I don't think it's fair to say it's a "laughable attempt" - it's certainly better than nothing, even if it's not perfect.


The funny thing is, since the advent of streaming I no longer listen to the radio. I listen to new music, but little pop music, and I have never heard a single track from Swift, Bieber, Grande or Sheeran. Coldplay is the only act I like on that list, and the streaming services are pretty good at only playing what I like.

If they were pre-streaming artists I probably would have heard a lot of their catalog because radio played it over and over. Unfortunately you just can’t get away from the Christmas music.


Sure, but I'm sure you've heard of Taylor Swift and Justin Bieber.


Traditional radio mostly sucks, but Soma.fm and KEXP are both great for discovering new music.


Very hard if you have little talent..



> For now this is a torrents-only archive aimed at preservation, but if there is enough interest, we could add downloading of individual files to Anna’s Archive. Please let us know if you’d like this.

If it is torrents only, what relevance does unregistering the domain make?


Ideally, if AA doesn't have any public web presence it's a lot harder to publicly disseminate those torrents.

Realistically, it's just a way for someone to say something is being done about this, even if it's not going to actually make a difference.


Establishing a position Anna's opponents may consider an advantage.

And there is a site idea!

Annasopponents.news --> Can inform passersby on anything related to Anna's Archive along with activism related material, how to's and the like.


Yeah, obviously I don't know if it is actually related, but my first thought when I couldn't open it today was "Told you so"...


Spotify was created from a library of pirated music.. the irony


Came here to say that.

An while back, another site started with a pile of pirated music, and that was allofmp3.com Remember those peeps?

Their business model was to sell music by selling bandwidth. Basically is was all the music you want charged by the megabit download.

Pop titles were $0.10 to $0.25. A whole album at 256mbps was roughly $3 give or take.

What got me really thinking was how great the UX experience was. At the time, few came close.

The end of that site was packaged up with Russia's entry into the WTO.

I seem to remember hearing about huge torrents out there too. The right infohash can point a person to huge archives of various kinds, books, video, academic papers, music, the WikiLeak insurance files, which is password protected, as perhaps all of these are.


As someone who grew up poor in an ex-Eastern Bloc country, allofmp3.com was a godsend.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: