Yesterday I was wondering about just deleting my data archives. Over the last 20 years I have accumulated so much data. Sure most of it is backed up to the cloud and the writing part of it is a very slim section of the overall data. I was considering just wiping it all and starting over. My thoughts then drifted to a few thoughts about what I might miss after that great data purge. I might need my writing archives, photos, and videos. Maybe keeping all my photos and family videos would be a good thing to actually complete. From what I know all 3 sets of that data are backed up to a couple of clouds and I have alternate copies. I’m not sure why I went to all of that trouble to keep all of that data. We used to print our favorite photographs and share them in books with people. Nobody is going to want to look through 20 or 30 thousand digital photos. They are not even shared online anymore. At one point, all of my digital photos were posted on Flickr or just shared in an online directory. I have considered bringing back a digital photograph section of my weblog to share content. I’m not talking about sharing all my photos this time around, but it would be good to share the ones that people might enjoy or use as a desktop background.
Today I’m really focused on what parts of the internet are more permanent than others. A decade from now Today I’m really focused on what parts of the internet are more permanent than others. I’m curious about what will happen in the future, “A decade from now will GitHub and YouTube still be housing content?” It is really about my effort to question what will remain online year after year. Back on May 20, 2021 I released an album on YouTube called, “This is an ambient music recording called dissonant dystopia.” That work of art is 33 minutes of dissonant music and it will exist online as long as YouTube houses it. That means its existence is pretty much tied to the permanence of YouTube as a platform. I’m going to guess that a lot of content faces the same constraint. The continued existence of that art is tied to the platform where it is hosted. I could probably post the album to a few other places to increase the odds of it outlasting YouTube as a platform, but I’m not sure that is an effort that is worth my time. My guess about the future of online permanence is that Instagram and YouTube will continue to exist for as long as the modern internet persists as a technology.
It is times like these when I begin to wonder what will happen to the world wide web when pockets of private isolation creep up within the walls of applications. We are seeing a fragmentation of what was the open internet. Be at the continued growth of dark pockets of the online world or just application based islands. You are seeing parts of the internet that you can gain access to the front door, but they are not truly a part of an open internet. They are something else and that something else is evolving right now before our eyes. We could very well see a change in the format of the content in the next decade. Sure hypertext has connected the world, but a metaverse will potentially be a video/image stream that is way beyond a text based communication method. Keep in mind that this weblog barely contains any imagery and the primary method of communicating content is text based. In a metraverse of rooms, zones, areas, or community spaces it is entirely possible that it will be immersive and that image and sound will define the method of communication that will be occurring.
Really the most advanced method of communication I have considered is either recording these missives as audio for a podcast or working to make a video version of a podcast which just really includes a perspective of me reading the content. Either way that will be a one way method of communication either via text dissemination, audio recording, or video recording. It will be nothing outside of an asynchronous method of communication. I might respond to a comment or a note that somebody provided, but it would not be within an immersive environment. It would be purely asynchronous in nature.
At the end of my writing session yesterday I accidentally sent out 307 tweets. Deleting every one of those by hand on Twitter as the rate limited API spit them out was a little bit nerve racking. My expectation was that either the deduplication feature over at Twitter would catch this or the integration code on my side was written well enough not to post things modified using the bulk edit feature. Neither of those things held true and logic failed. That really did mean that a bunch of people who had alerts turned on received a lot of updates notifications. Given that I have recently started using a vtech landline headset system to obfuscate my cellular connection to avoid notifications I’m feeling a little bit of shame related to that blaring coding mistake.
Releasing those posts from the private mode back to published brings the public archive to a complete status from 2020 to current. At some point, I’m going to bring all the posts back from the 3,000 word a day writing habit period of 2018, but I’m going to need to fix that integration with Twitter before making that update. The easiest way to fix that integration would be to simply go to the settings menu and disconnect Twitter. Right now the setting for “Sharing posts to your Twitter feed” has been enabled. It would just take one click to disconnect it and that would pretty much solve the problem, but it would not do it via code it would do it via literally removing the potential for the problem to occur again. Maybe later this week that is what it will come to after some contemplation about the problem. I am really considering releasing the 153 posts that are currently set to private mode that occurred in that highly productive writing period.
I have really spent a fair amount of time thinking about the nature of permanence and the written word recently. Until we start saving content to crystals (5D optical data storage) all of this writing and posting is going to be ephemeral at best. It is possible that my code on GitHub will be stored that way at some point and the GPT-2 model trained on my writing Corpus would fall into that storage process and be saved for posterity. However, just because content got saved to crystal and was potentially accessible for ages does not mean any interest in the content would exist. People might not boot up the Nels bot for dialogue and exchange. Most of the interest in complex language modeling right now is based on overwhelming large datasets vs. contained individual personality development.
To that end I was reading this article called “The Pile: An 800GB Dataset of Diverse Text for Language Modeling” from the arXiv website from Gao et al., 2020. That diverse collection of data includes 825 gigabytes of content which functionally has been cleared of all sources and the authorship removed. This action has removed individuality from the language model in favor of generalization. Future models might end up going the other direction and favoring personality over generalization, but that might end up being more isolated based on what I’m seeing so far in terms of language modeling.
On the brighter side of things, is that these experiences are focusing my research interests on that pivotal point of consideration between generalized and personality specific language models. I have a sample IEEE paper format template saved as a Microsoft Word document ready to house that future paper on my desktop screen right now. It’s entirely possible that after hitting publish on this missive that is where my attention will be placed for the rest of the day.