Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Internet is losing its memory: Cerf (itnews.com.au)
134 points by adrian_mrd on June 29, 2018 | hide | past | favorite | 77 comments


In many ways he's understating the problem. At least there used to be file formats to support, and files to lose. But how much of the data you interact with daily is even directly accessible to you? How much of it can you access when you're offline?

Take Slack. You can install it locally, but that doesn't matter, because it won't start without an internet connection, and its local storage is completely opaque. Compare this to a mail client, where everything is stored locally, indexed and searchable offline, and can be exported into a universal albeit messy MIME format, as well as imported into other accounts.

Of course this is necessary for the business model, the Slack free-plan event horizon wouldn't be effective if it only applied to which new data you can sync down... and if users only discovered this when e.g. moving to a new computer, they would quickly start to wonder why they can't just transfer the data they already have themselves.

With Google Drive, your 'docs' are just placeholders pointing to the cloud. You can export them individually to Word or print them to PDF, but it's a manual process, which you will likely only think of when it's too late.

For an example of how this can really matter: an acquaintance is embroiled in a legal dispute with an ex-employer. Thanks to Apple's sane implementation of IMAP Mail on iOS, she still has access to all her company communications there to use as evidence. Unlike on her PC, where she was just using webmail the entire time and has nothing.

The incentive in the cloud age is to create dysfunctional products that provide an illusion of permanence instead of the reality of tangibility. I expect this is only going to become a bigger problem over time.


These are all features for companies - no liability of haunting, written word.

I'm desperately trying to connect to "modern" chat services via things like Pidgin to have logs - while on some level, remembering everything exactly the way it was is unnatural, links and knowledge is important to be possible to be kept.

I think people are starting to realize that actually having a copy is important. Hosting things for yourself, taking care of backups, etc, are painful, hard problems, but they worth it in the long run.

One more thing: many digital media is significantly more ephemeral, than a lot of us realizes. Out of countless CD from the past 20 years, I can only still read a few of them (the ones written with a 1x Plextor SCSI drive are all still fine). If you truly value an photo, make a proper print, archival grade tint, archival grade paper.


> If you truly value an photo, make a proper print, archival grade tint, archival grade paper.

Or you could parity-pad your backups so even 20% corruption is still recoverable and periodically renew them. Several orders of magnitude cheaper.


Then there is a house fire and your computers are destroyed and you're in a coma. Would your spouse know how to retrieve your wedding photo from your parity-padded offsite backups? Does anyone in your life know where they are, and the access information, and the encryption key/password, and how they're organized, and how to find specific content within them?


> You can export them individually to Word or print them to PDF

Actually it's really really easy to do in bulk: https://takeout.google.com/settings/takeout?pli=1

Not invalidating the rest of what you said.


I think the rest of what they said matters, though. Its not "preservable by default". If you need access to that data in any situation where your connectivity to Google or that account is severed, you're shit out of luck. The best case example is if you don't have internet. The bad case is really that "legal dispute" argument, where you're dealing with a bad actor who has power over you and your Google account. The worst case: Google themselves severs your access.


I'm trying to not phrase this in a dick way, but what did you think I meant by "Not invalidating the rest of what you said"?


Very true, but also similar to not having a good system of backups, which is hardly uncommon for many users (present company excluded I'm sure).


Google's data take-out has proven fairly robust.


> The incentive in the cloud age is to create dysfunctional products that provide an illusion of permanence instead of the reality of tangibility. I expect this is only going to become a bigger problem over time.

Isn't this what SaaS is all about?


Well more specifically it's one of the longstanding arguments against SaaS as a concept.

It's a tired tune to say that RMS was right, but he was. Letting people be controlled by software is not a good idea for pragmatic reasons that go beyond mere morality.


I think it is up to everyone to weight the pros and cons of using SaaS. Many entreprises would not even exist if they had to build and maintain their stack themselves.

Also, I would not equivocate not controlling your software with being controlled by it. In some cases maybe, but definitely not all.


There are shades of grey in SaaS implementations, though. That the product is operated through a third-party server doesn't always imply it needs to be only available when you're on-line, and doesn't imply that your data needs to be taken hostage. A big part of the problem here is what 'stillkicking mentioned - doing away with files. Instead of data files, you have "documents" (meaning specific to a service). You can no longer open them independently, copy to local storage, or e-mail it to a friend - you're restricted to "sharing" them within the platform. This is not a technical necessity. It's just driven by business model.


This is why I believe in CRDTs: https://news.ycombinator.com/item?id=17221221

With the right spec, you can have Google Docs style real-time collaboration or Slack style chat while allowing users to own and archive their data, remaining resilient under arbitrary network conditions and topologies, and retaining full edit history with the option to link to any previous revisions or even individual changes (with context). The system could be built on top of databases, files, or both. Your "*.crdt" documents could live in your Dropbox and work seamlessly with any software that understands the spec.

It'll take some work to get there, though. And of course, it'll never be as resource-efficient as a centralized architecture. But today’s devices can more than handle the strain.


> For an example of how this can really matter: an acquaintance is embroiled in a legal dispute with an ex-employer. Thanks to Apple's sane implementation of IMAP Mail on iOS, she still has access to all her company communications there to use as evidence. Unlike on her PC, where she was just using webmail the entire time and has nothing.

I agree this is an issue, but just thought it was worth pointing out that (cloud + offline) is better than just offline.

I'm a bit of a digital packrat. Despite carefully preserving and porting my data from system to system, I lost years of email archives in the aughts because Outlook Express would prompt me to archive old mail, but I didn't realize it was over-writing the old archive until I went looking for a message and couldn't find it (when it was too late). The lost email covered all but the last few months of my undergraduate degree.


Cloud + offline is best. Just offline is worse, but just cloud is the worst, and that's what we're being offered 99% of the time. E-mail is a special case because the protocol and traditions around it predate modern commercialized Internet. But e.g. all the other communication services regular people use nowadays are cloud-only.


This is needless pandering and fear-mongering, akin to the hysteria that inevitably comes with the evolution of technology. For example, the telephone will be the end of society trope that was popular at the turn of the century.

Anyway.

> With Google Drive, your 'docs' are just placeholders pointing to the cloud. You can export them individually to Word or print them to PDF, but it's a manual process, which you will likely only think of when it's too late.

This has been trod to death over the years. Let's do some find-and-replace.

With a Hard Drive, your 'docs' are just placeholders pointing to the platter. You can copy them individually to diskette or print them to hardcopy, but it's a manual process, which you will likely only think of when it's too late.


People need to keep their own personal archives. While there are great projects like archive.org, it's becoming more clear that it's not enough. There's not enough resources for an organization(that isn't spying on everyone) to archive YouTube, or Slack history, etc., so it's up to you to back up that data if it's important to you. Culture can be easily lost to the void if we don't do this.

As I've mentioned in another HN thread, I've been using youtube-dl to back up YouTube videos that are important to me since it's increasingly obvious that YouTube is merely a profit machine that would delete all of its user's videos in one fell swoop if they made an extra buck off it. I've also begun to back up some blogs that have vast mounts of great content, but I've found that HTTrack is... well, terrible. That's not to say it doesn't technically do its job, but it's way too aggressive. I wan't something that's easier to configure and can do a better job of taking "snapshots" of blog entries while blocking ads, javascripts, being able to resume properly if my power goes out, etc.


There may be more modern projects, but there's also "Lots of Copies Keep Stuff Safe" at https://www.lockss.org

When you say personal archives, I wonder a little about what we'd lose to filtering by people who know they're building an archive for posterity, versus people in the future rediscovering things that were safely forgotten for some long span of time.

At the delta of personal archives and projects like this, I imagine something a little like SETI or folding for home archival (is this just describing IPFS?), but I think one of the challenges is how we preserve the right kind of data. I did some spring-data-cleaning this year while building a new system, and one of my tasks was addressing duplicate files. This forced me to confront hundreds of gigabytes of VM images, snapshots, and massive piles of duplicate files from various installations of Ruby and Python.

But perhaps I underestimate the value of even the duplicate VM images or language installs. Perhaps, centuries hence, researchers will hopefully sift yottabyte of data looking for all of the dependencies to successfully resurrect applications they need to experience in order to understand references to them in well-stewarded tweets, posts, or articles.


You can always help by volunteering machine time to the archive team if you want a low effort way of helping. https://archiveteam.org/


I noticed old articles I had bookmarked were disappearing so I’ve started using a self-hosted Bookstack to save a copy of everything interesting I read. I enter reader mode and copy eveeything on the page, and then paste it into the WYSIWYG editor. Everything is formatted nicely and it even creates a menu using the head tags of the sections. The only problem is, while images copy correctly, they’re still hosted on the original server so I have to replace them with an uploaded copy. Only takes a couple of minutes though. I could probably automate this with a script.


If you use Firefox, there have been several implementations of this add-on that I've used for ~10 years:

https://en.wikipedia.org/wiki/ScrapBook


Losing memory via attrition is certainly a concern. But I think the bigger concern is losing memory via censorship and copyright. For most people, the internet is now google search and a couple of social media companies. Google has been curating search results for a while now. We no longer see what is out there in the internet, but a narrow google approved view of the internet. For example, 10 years ago, google showed us the entire elephant ( more or less ), today, we only get to see the tail or the trunk or whichever part of the elephant google decides to frame. Now all of the social media companies are now curating. And with the loss of neutrality, ISPs will undoubtedly curate.

And then there is the copyright issues where archiving is getting more difficult. Kids today don't know what a great tool google cache was because it's gone. And archiving sites are being attack from news, media, politicians, etc.

In other words, the internet dark age isn't going to be a result of formats getting old ( though that is an issue ). It's going to be a result of us only being allowed to see through the frame that a handful of companies deem appropriate. The dark ages didn't happen because formats got old. The dark ages happened because the powerful decided that we should view the world within a certain narrow religious frame and censored everything else.


In the olden days, everyone was slapping together their own websites, rather than using the same old templated wordpress, hugo, etc, so there was far more character, and not everything was a blog. And so people surfed the web. These days the web is more purpose oriented, people looking for something, the "surfing" all within walls, wave pools.


The days of the web ring, which often as not was broken or took you through dozens of content-free sites but could occasionally drop you into something fantastic.

Browsing Internet Directories.

Just plain finding some guy's massive meticulously-maintained HTML-only site full of fascinating articles, bookmarking it, slowly working your way through it over the course of your evening browsing sessions.


> These days the web is more purpose oriented, people looking for something

It also does its best to make that difficult. Advertisers luring people away from what they were doing. Content marketers spamming the web to the point that a regular person is more likely to be trapped in some bullshit low-quality clickbait article than to get correct information. You need to be hyper-focused just to get what you're looking for.


By contrast, before the internet, most everything was routinely lost anyway.

Only papers from famous people were kept. Even so, for example, HP's historical archive was all on paper and placed in a single building. Which burned down.

The WTC collapse destroyed the unpublished archive of Kennedy pictures.


It is true that a lot more stuff gets kept these days. But we seem to be much more liable to losing important information. As Bret Victor [0] has pointed out, we use largely the same mechanisms for managing the most trivial to the most important information, and most people don't really understand how they work.

> a 2013 survey of law- and policy-related publications found that, at the end of six years, nearly fifty per cent of the URLs cited in those publications no longer worked. According to a 2014 study conducted at Harvard Law School, "more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs within United States Supreme Court opinions, do not link to the originally cited information." [1]

I doubt that the US Supreme Court was regularly losing track of cited documents before "information technology" came along. A lot of documents may be recoverable even if the original URLs are broken, but is this really the best we can do to manage information in the information age?

[0]: http://worrydream.com/TheWebOfAlexandria/2.html

[1]: https://www.newyorker.com/magazine/2015/01/26/cobweb


The fact that urls cited in publications no longer work over time is why I often included their content as apendix's to my essays. That way even if the original source is lost the referenced material isn't.

Then again material published in journals and books is greater than material published on the internet, if for nothing other than it can be relied upon to be found in decades to come.


Well, if it was published it was deposited with the national library - British Library or Library of Congress. I think other nations have similar systems. Covered newspapers, magazines and books. The British Library page:

https://www.bl.uk/aboutus/legaldeposit/introduction/

"By law, a copy of every UK print publication must be given to the British Library by its publishers, and to five other major libraries that request it. This system is called legal deposit and has been a part of English law since 1662.

From 6 April 2013, legal deposit also covers material published digitally and online"


What happens when the Library of Congress, or the British library, burns down? What happens when the Vatican library burns down (a large part of it consists of unique documents with zero other copies)?


We still have receipts and tax records carved into ancient Babylonian clay tablets, and letters from the 18th and 19th century, so I believe it's not really true that only papers from famous people were kept, rather that artifacts of interest to elite classes (be they religious or scientific) tended to remain preserved or published.

Although it is interesting (and a bit depressing) that we can't really seem to escape that model for preserving knowledge over the long term. Physical media needs patronage, buildings, printing presses, etc, while digital media needs infrastructure, manufacturing, programmers, etc.


Do you have any letters, photos, tax records, etc., from your great grandparents? Do you know anything at all about your ancestors who came to America (if you're living in America)?


The fraction of retained materials is minuscule. The totals amount of stored and written records, also.

At the time of Gutenberg, there were about 30,000 books in all of Europe. (Inthink -- research suggests this may be higher, see mss. chart below.) Not titles, but books: individual, discreet, volumes. The University of Paris in 1200 had on the order of 2,000 volumes, amongst the largest collections of the time.

There were something shy of one billion volumes by 1800. Publishing in England during the 19th century was about 1,000 titles/yr. For much of the 2nd half of the 20th century, the US Library of Congress added about 300,000 titles/yr. That's remained fairlyconstant, though "nontraditional" publishing (self-published and on-demand titles) take this to over one million titles annually.

https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/Eu...

https://upload.wikimedia.org/wikipedia/commons/thumb/2/24/Eu...

Google has counted the total number of books now in existence (titles), and came up with 129,864,880.

https://www.telegraph.co.uk/technology/google/7930273/Google...

http://booksearch.blogspot.com/2010/08/books-of-world-stand-...

The Thesaurus Linguae Graecae is a comprehensive archive of all known surviving Ancient Greek literature:

Today the Online TLG contains more than 110 million words from over 10,000 works associated with 4,000 authors and is constantly updated and improved with new features and texts.

http://stephanus.tlg.uci.edu/tlg.php

(Many of the authors survive only in fragmentary quotation.)

Given that 100k words is a substantial book (about 400 pages), the entire surviving Greek bibliography -- every retained written word -- would be 1,000 such volumes.

A bit under 5 GB of uncompressed text.

The chemical stability of clay means that, yes, some records have survived. They are a minute fraction of all ever created, and those represent a minute fraction of all information that existed to be recorded, but never was.

The largest surviving cuneiform archive is at the British Museum, comprised of about 130,000 tablets, though the core collection is about 30,000. Each tablet is more analogous to a page than a book.

https://members.bib-arch.org/biblical-archaeology-review/31/...

https://cdli.ucla.edu/collections/bm/bm.html


David Granick wasn't famous, yet his works were preserved, by a library.

* https://news.ycombinator.com/item?id=16472614

* https://news.ycombinator.com/item?id=16363283


I'm probably weird in that my philosophy doesn't jive well with the current societal norm, but I think it is important to come to terms with a simple truth: All things are ultimately lost. Physics tells that even the universe itself will end, even if different models disagree as to precisely how. Things begin, they exist, and then they end. That they are not eternal does not mean that their existence was meaningless. Which isn't to say that preservation is pointless or wrong, but I think it is unhealthy to try and hang on too firmly to the past.


I think that's fighting human nature. One of the things that separates us from other creatures is our constant desire to fight entropy. Sentient life itself is the ultimate reversal of entropy; while the universe becomes more complex around us, we reduce complexity in every environment we enter by organizing matter to build more and more efficient systems.

Its not crazy to think that, if we live that long, we will figure out a way to escape even the end of the universe (I'm reminded of Asimov's The Last Question). But that progress definitely won't happen if future generations can't build on what we've learned.

I actually think living with the idea that everything is temporary is very very dangerous. Nothing is temporary, except you. We need to live with the humility to recognize that our life on this planet is incomprehensibly short, and we need to spend it doing everything we can to give our children the headstart they need to do everything they can to give their children a headstart... into infinity. That's the only way we improve as a species. That's why there is no greater sin than destroying or limiting access to information, and conversely why the internet is literally the most substantial development in the history of mankind.


That doesn't separate us from other living things, or even natural processes, in any way. Many things appear to sew order in the universe on the local scale, but the math is pretty clear that this isn't actually fighting entropy at all.

The Last Question is a work of fiction. All works of fact seem to support the concept that it is not possible. That isn't to say that we will never discover properties of the universe that open new possibilities, but A) I think eternity is just a different kind of ending anyway, and B) stubbornly grasping to such a concept seems a desperate attempt to deny one's own mortality.

I'm not advocating for forgetting things for the sake of it, I'm advocating against the idea that everything must be preserved. Certainly we should try to keep in mind the lessons of our past, but we should also be willing to let go of things that haven't given us good reason to keep them around.


Your last sentence completely missed the point of careful archiving. Because someone "didn't think it was important to keep around" we've lost, permanently, entire parts of history. The western alphabet is based on the Phoenicians and that's a name given to them by the Greeks. The very basis of our language was created by a culture which we don't even know the name they called themselves. I'm confused as to why you think it's a good thing to accept mortality? If anything our past has shown us that reaching past mortality is one of the major reasons for human achievement.


In what way would the world be improved if we did know what name they called themselves? Let's assume it was. There is a cost to preserving all this information. Time, resources, space, etc. If you preserved everything every human ever said throughout all of history, that collection of information would now be so ludicrously vast that it is difficult to imagine finding anything within it that you're actually looking for. And what have you given up for this effort? We try to preserve what we think is important, and sometimes we are wrong, but that's ok. Things don't have to be perfect.

Accepting mortality is important for a lot of reasons I find difficult to explain properly. It's a matter of shared context being the basis for communication and the requirement for a whole lot of concepts to have to be understood by both of us, and referenced by the same words, to convey actual understanding. It is difficult, for instance, to separate the topic of mortality from the discussion of the conception of self. I will attempt an explaination regardless.

Knowing, and accepting that your end is inevitable is important context for how you choose to interact with the world. Let's try a thought experiment: There's a version of the world you want to live in, and there's alternatively a version in which you don't, right? Since you will someday die anyway, it makes no sense to survive at the cost of moving the world towards the version that you don't want to live in. If you don't accept that the end is inevitable, you can justify such acts anyway because you'll always be able to move the world the other direction later, just as soon as the current existential threat is dealt with. Only, of course, there are always more existential threats, whether real or imagined.

Many atrocities were and are committed in the name of survival, whether of an individual's physical self, or one of their shared memetic selves like culture or society, their genetic self embodied in their children, or sometimes even more raw memetic concepts. That's the "rational" way of interacting with the world. You have a goal (survival of some conception of self), and you do anything to achieve that goal, because rational approaches do not accept failure as an outcome. Accepting mortality is recognizing that the goal is ultimately unachievable, and because of that, there are potentially things of greater value than survival. You don't have to survive, failure to achieve that goal is an acceptable outcome, and you can choose to interact with the world differently.


It's the good kind of nihilism I suppose: accepting that ultimate futility doesn't spare you from having to strive for meaning, but it does help you relativize your losses.

That said on the topic of conservation, or more accurately History, I don't think you can convincingly argue for forgetting things as a society. The cost of repeating errors are as great as the benefits of safekeeping knowledge.


I'm advocating for the letting go of things. Certainly there are things that are valuable to preserve, but preservation for preservation's sake, I think, is unhealthy. Will society really benefit from preserving every geocities site, every newgrounds flash game, every scrap of poorly written slashfic? Probably not. Are there instances of each that are culturally relevant? Sure, and it might be worth keeping those around longer.


Yes but remembering the past and applying its lessons to the present also is not free.


I have some sympathy for your philosophy, but clearly it's important to have a reasonable amount of memory and preservation, and to have some degree of control over it. What we're facing seems like a rather random, senseless erosion of those things.


Is it? If it were really of great value, would we have allowed it to errode?


Yes. Happens all the time. Individuals, companies and societies put biggest priority on short-term benefits. All it takes for some knowledge to erode is to be temporarily not worth maintaining, compared with other things you have to deal with at the moment. Then some time later you realize that knowledge would be extremely useful, but it's too late now.

You mention physics. A constant in our universe is that things decay unless actively maintained. All life is fight against entropy. Trying to preserve things is only natural.


By that metric most of the universe fights against entropy. Stars form from clouds of hydrogen, planetary systems from dust, structure is born from formlessness. The stars grow, they consume, they mature, and then they die. Never once did they fight entropy except on a local scale. As I said, the math is pretty clear on this.

We put priority on benefits we can realize. Short term benefits are easier to realize, and decisions made in the moment, absent malicious intent, make sense in the moment. Hindsight, as they say, is 20/20. We make a prediction now what will be important later, sometimes we are right, sometimes we are wrong. Perfection is impossible, and even if it weren't, it would not be worth the cost of trying to attain it.


Shit happens. That's human nature.


The comment about Archimedes and infinitesimals makes reminds me of something that has been bothering me for a long time: what brilliant ideas are there lurking in the world unknown to almost all of us that could change the course of history if only they were publicised?


I have a similar thought experiment (or Sci-Fi plot?) for a long time.

If an advanced alien civilization came to Earth, and helped ancient human civilizations to built and operate a computer system, which can share, store and translate scientific discoveries all across the globe (while they are not allowed to help humans besides operating the system) throughout 5000 years of history - how could it change the human history?


The ancient humans would then also (eventually) develop patents because someone somewhere would have thought about it even back then - and with worldwide distribution - it would catch on. Effectively, what I'm saying is - when faced with a common greater good that is a game changer for humanity, humanity then has to collectively want the greater good. And humans can be selfish - ancient or not.


Another take: What brilliant ideas exist publicly today that have not yet been recognized as brilliant?

And how might you go about systematically identifying them?

And what might your incentives be for doing so?


“These days, the problem isn’t how to innovate; it’s how to get society to adopt the good ideas that already exist.”

— Douglas Engelbart


http://longbets.org/601/

https://adactio.com/journal/11937

"The original URL for this prediction (www.longbets.org/601) will no longer be available in eleven years."

This has been a known problem for a while, but it's always nice to see more voices crying out about it to the more general public.


Due to the Curse of SaaS soon the web won't be about documents any more but about applications that have no obligation to preserve anything. The only thing you can be sure of is that your advertising profiles will be stored for the next 100 years or so.


Frankly, its somewhat hilarious to hear all this coming from Cerf, who works for the worst offenders of data centralization of all time. Think of the data that gets lost on Drive every day due to their product decision around the SaaS model + proprietary document formats, or the data that was lost/hidden on Code, Wave, Plus, their shattered IM platforms, etc.


A large problem in archiving is actually the copyright law. Sites like sci-hub or Library Genesis do a great job in archiving scientific papers and books, but are illegal.

Also archive.org is sometimes on the brink of legal/illegal, e.g. not all material that can be found on archive.org is really legal; though for lots of the formally illegal archived content, the copyright holders do not care or do not want to cause an outcry.


Slightly related to this article, I recommend this Tedx talk by Cerf himself: https://www.youtube.com/watch?v=GV0A82TCrf0


Adobe Flash player (closed SWF format) death takes toll also with uncountable number of human artistic creations, games etc. I wonder if there is some way to preserve that relic of 2000s.


> I wonder if there is some way to preserve that relic of 2000s.

Flash is fading from mainstream use on the web (thankfully), but not hard to run Flash today if you want to.


Seems like virtual machines will be able to run flash for a long long time.


Seems like there are a few people still working on Shumway, the flash engine written in JavaScript.


I doubt anyone is working on Shumway. Its in the Firefox graveyard.

https://bugzilla.mozilla.org/describecomponents.cgi?product=...


From Mozilla yes, but there was some activity by others this year: https://github.com/mozilla/shumway/pull/2442


At first I thought this was saying all information on the internet should be preserved (which is a growing privacy concern), but after reading more carefully I think it’s about having a way to preserve the information that is intended to be public and permanent from the beginning. We don’t have any good ways yet to guarantee future access to public information.

I’d never heard of the Digital Object Architecture (DOA). The article itself makes light of the unfortunate acryonym with “History pronounced DOA”. That actually left me confused about what they were talking about for a minute.

No idea if it’s a good way to preserve academic papers on the internet, but the business model side of it is a pure open question. That makes me wonder whether it solves anything at all. The problem with information on the internet is that the people who publish eventually lose the interest or the ability to continue paying for storage and access.

“Economics/Business Model

While the Handle System has been used for many years in publishing and library systems, generalizing to other applications, e.g., Internet of Things, will likely generate economic concerns related to the business model of the system, especially at the Global Handle Registry. Will organizations be charged for each identifier? Will organizations that acquire a prefix be able to create unlimited sub-prefixes or will they be charged for each sub-prefix? How will these policies be developed? How will the money flow? What will be the impact on developing countries or small businesses?”

https://www.internetsociety.org/resources/doc/2016/overview-...


Think about it, we still have almost-complete versions of Usenet archives from the 70s to the late-90s, the whole network has been preserved as if it was a time capsule. You can visit https://olduse.net (maintained by a retired Debian developer), and see the heyday of hackers and early technological adopters as if it was just yesterday, the entire society is archived here.

Discussing about the 4-color theorem recently proved, latest version of C compiler, difference between a vacuum tube amplifier and a solid-state amplifier, where GNU Project and Linux kernel was launched, and early online culture and tons of colorful, hilarious, but forgotten and buried memes, and weird phenomena emerged from the collective (un)consensus... Sci-Fi fandom being an integral part of online and hacker culture, millions of lewd story written in alt.sex, "Immediate Death of Usenet Predicted!", "There is no cabal", alt.french.captain.borg.borg.borg, Coffee and Cat warning, The church of Kibology, Anti-spam Movement, creationism vs evolutionism debates at talk.origin, Meow Wars - the first meme war online, all the personal attacks, trolls, flame wars, and "cyber-stalking", etc.

Then centralized WWW replaced distributed Usenet, crappy HTML replaced perfect machine-readable data format. Would we have a similar archive for Reddit or Hacker News? Possibly not. So Hacker News, just come and create one! You can make it! Anther unique challenge created by WWW is the inaccessibility of server-side software - exporting and preserving the data is NOT enough, unlike Usenet which you can just load any data. The user-interface and functionality of one website itself is also the collective memory that needs to be preserved - we need replicated software of a website, which has identical user-interface, which has all the functions from the original website: users to click an username and see the posts, karma of this user, etc. I don't think anyone even noticed the existence of this problem. Luckily, major websites online such as Reddit or 4chan, all use FLOSS software which would make the work easier, but still a huge challenge due to the inaccessibility of raw database. Also, to make some contents meaningful in the future, external resources such as hyperlinks to other websites and images should also be preserved, considering this, the chance of creating an authentic and complete archive is even lower.

---

But even if we're still using a distributed network where data preservation is still technically possible, and there is no walled garden, it may still be difficult to implement. In the era of Usenet, you often attach your name, address and phone number - there was virtually no threats except for a few trolls - this is why archiving Usenet was possible in the first place. But the Internet is not the Net anymore, now not only humans - almost every piece of equipment involved on the route may be your enemy.

The ongoing security and privacy movement is a huge threat of historical records. From my observation, at least of infosec hackers community - After Snowden's revelation, public and open discussion is slowing being transformed into private, closed, encrypted and temporary activity, plus self-hosted platforms like ActivityPub, GNU/Social, Mastodon. This is indeed good from a security and privacy perspective and it is exactly what we need now.

But we are also creating a huge gap of knowledge, information and history on the Internet. After my death, none of my self-hosted code, or my blog, or my GNU/Social posts will survive. In conclusion, "collect 'em all" is both an malicious NSA dragnet surveillance, and a glorified act of history preservation. This is where the contradiction lies.

I don't know what to do. For WWW, archive.org is a workaround and I think it needs more donation. But for all the other self-hosted things like Mastodon and git server, there is no solution at all.


How do I navigate this olduse.net site? Is this a forum? I can't make heads or tails of how to see all those articles you mentioned. I've clicked every link on the homepage.


Use your Usenet client to connect the server nntp.olduse.net. While the original news client rn by Larry Wall written in 1984 is probably not working today, other implementation, such as slrn is still here.

See also: https://en.wikipedia.org/wiki/Newsreader_(Usenet)

Note that olduse.net is a playback of Usenet, time-shifted 20 years back, some of the events I mentioned has yet to occurs, you can navigate the website, download the original archives, and load it by your own to explore.

If you don't have the setup yet, to get an quick idea of how it works, read these interesting articles.

* http://olduse.net/blog/what_rms_saw/

* http://olduse.net/blog/Dennis_Ritchie/

* http://olduse.net/blog/stargate_controversy/

You can also just browse the old Usenet from Google Groups. It's the same contents anyway, but the experience is poor.


It's worth noting that in complex systems, forgetting can be just as important as remembering. The ability to evolve and change is in part predicated on the ability to selectively forget some elements that are no longer helpful. As another commenter pointed out, some aspects of our society's cultural memory are, probably, best left in the past, if preserved at all.


Knowing what should be forgotten requires either foreknowledge of the future or relies on inaccurate predictions based on current assumptions. Thus the less information of uncertain usefulness that is retained, the more that those assumptions dictate what will be considered useful in the future without having to rediscover things entirely. That limits adaptability. Just try to imagine how many things were invented in the past that weren't seen as useful at the time and had to be rediscovered later. How many ideas were lost in the Dark Ages, for instance, because they offended religious sensibility at the time? If there is capacity to retain information in an organized way without undue cost, it should be retained as a hedge against future uncertainty. No single generation of humans should be trusted to make such decisions without being unduly influenced by the biases of their time.


If you're on Windows, https://www.mailstore.com has a mature, free version that will backup all your local and cloud mail to a searchable offline archive. Uses standard formats. Can also move your mail between cloud providers.


We have been forgetting things for thousands of years. Last I looked, we still don’t know how to make Damascus steel or Roman concrete. People will reinvent the things they need and maybe never even know that it’s a reinvention. I reinvented a couple fundamental graphics techniques because I needed them and didn’t study computer graphics in school. I’m sure this happens all the time.

I’m all for making it easier to recover data and generally make it easier to store stuff. And I believe you have a right to your data (yay GDPR). But this isn’t a catastrophe.


> we still don’t know how to make Damascus steel or Roman concrete

That's an oft-repeated but ultimately meaningless statement.

We know how to make things better than Damascus steel or Roman concrete. Our processes have exceeded the ancients. The quality of Damascus steel is likely overrated, since it wasn't quite as absolute shit as what was being regularly traded at the time. Same with concrete.

We might not have exquisite written recipes and procedures for these materials, but they have been reverse-engineered, and it turns out, they weren't that great compared to modern chemical and material engineered products.

We don't need to 'rediscover' Damascus steel because it is obselete. A romantic idea, and poetic in how it was 'lost' to time, but consider that it was 'lost' to time, similar to Japanese steel-working, because nobody wanted to buy it anymore. It became economically and culturally irrelevant.


Exactly. We forgot how to do a thing and invented better. The “lost” knowledge isn’t really worth much to us today.

And that will be true of essentially everything we forget in the future as well — the forgetting is the sign that it wasn’t needed.


In case anyone hasn't already heard if it, IPFS(https://ipfs.io) is a really cool project that aims to help solve this problem


Glad someone else mentioned it. I haven't seen much news from them recently other than a picture-sync application.

Looking forward to the next few years and their (anticipated) growth


The Archive Team has been doing this work for at least 10 years. If you'd like to help archive the internet, they have an easy-to-use tool you just run and let work in the background.

https://archiveteam.org/index.php?title=ArchiveTeam_Warrior




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: