Skip to content

Instantly share code, notes, and snippets.

@phonedude
Created June 28, 2023 18:14
Show Gist options
  • Save phonedude/602312d451f4d9e0f6c9b85f6acafa35 to your computer and use it in GitHub Desktop.
Save phonedude/602312d451f4d9e0f6c9b85f6acafa35 to your computer and use it in GitHub Desktop.
Sky Blog which once was the leader of the blogs in France is going to shut down mid August. How does this exemplify the risk of losing our internet memory ?
It is indeed very concerning. This site holds significant cultural and technological importance from the time it was introduced, although blogging has since been overshadowed by social media platforms like Facebook, Twitter, Instagram, and others. I understand that it's not the same as it was when it first came out in 2002 or whenever it was.
Now, there is some precedent for this. Do you remember the Geocities site in the US? It was quite popular in the 90s and allowed users to create their own websites, although it wasn't exactly a blog platform. It enjoyed popularity until it was eventually sold to Yahoo, who struggled to find a clear direction for it and eventually shut it down. Consequently, this time capsule of the mid to late 90s mostly disappeared. While a few sites have recreated Geocities, and the Internet Archive has archived a significant portion of it, its disappearance marked an important loss in the web's development. I fear that the same fate awaits Skype Blogs.
Although I haven't delved into it yet, I believe a substantial portion of Skyblog is already archived at the Internet Archive. It would be worthwhile to explore it further. In the press release, I noticed that they mentioned collaborating with INA a nd BNF to archive the materials. However, I'm uncertain about the details of their plan, particularly regarding the anonymization of blogs and the transfer of archives. Nonetheless, I trust that INA is a reputable organization, and I'm familiar with their work. I hope they will handle this situation appropriately. It's encouraging to see that Skyblog is in contact with them and the BNF, indicating their intention to give proper attention to this matter instead of abruptly shutting it down.
What are the reasons behind such decisions?
So, there is a common misconception that maintaining a website is either free or inexpensive, but that's not entirely true, is it? The cost of hosting has indeed decreased significantly over the years, It's undoubtedly much cheaper than it was 10 or 20 years ago. However, the actual price of keeping a website up and running smoothly, preventing hacking attempts, and ensuring a smooth transition to the latest platform requires a considerable amount of human labor. Although there are some economies of scale in place, I assume the people running Sky Blog, or is it Skyrock, realized they could no longer generate revenue from it, either through advertising or other means. The advertising dollars have largely shifted to bigger platforms like Instagram, Facebook, and Twitter. Consequently, the funds have dried up, making it an unprofitable burden for the parent company. Unfortunately, companies generally prioritize their bottom line over preserving cultural history. It's librarians, journalists, and academics who truly care about these matters. While companies may pay lip service to such sentiments, when it comes to committing financial resources every year indefinitely, they lose interest. Considering Sky Blog has been in operation for 21 years, it's honestly quite an impressive run for an internet company.
How could Skyblog’s disappearance have consequences, in terms of culture, memory ?
So, the French cultural history, or at least the online pop history, is certainly at risk. Fan sites, sports blogs, music discussions, TV show reviews—these all contribute to the cultural zeitgeist of France from 20 years ago, and accessing Sky Blog is essential for studying that era. However, there are other aspects to consider as well. While Sky Blog may disappear, content on the web often finds new locations rather than completely vanishing. It's likely that a significant portion of it has already been archived by the Internet Archive, especially since they are collaborating with I and A&B&F. So, the content will still exist in some form.
The challenge lies in finding it. It may be tucked away in one or more web archives, making it difficult for the average person to locate and explore. Academics and researchers will likely find ways to access it, but we're concerned about the loss of accessibility for the average person who wishes to browse and engage with this historical content. This situation reminds me of Geocities, as I mentioned earlier. Another example is Tumblr, which still exists but has undergone significant changes, arguably losing its original essence.
This is part of the natural evolution or cycle of these systems. Social sites like Myspace, which was an early innovator, technically still exist, but their audiences have moved on to other platforms. Sky Blog captures a specific moment in time when people embraced blogging before the advent of social media. It holds value and will continue to exist, albeit with difficulties in locating it.
You led a study in 2012 which was losing my revolution. How many resources shared on social media have been lost. What did you find out then ?
That was a particular concern, especially because I was collaborating with my PhD student at the time, who happened to be Egyptian. He was in the US while the revolution was unfolding in Egypt. Naturally, he relied on social media for updates since there was no reliable source of news coming out of the country. Social media became a vital platform for sharing information, especially since governments hadn't yet learned to control or influence it effectively. It acted as a subversive force during that period.
However, the landscape has changed since then. Governments and individuals are now fully aware of the power of social media and engage in ideological battles, using bots and other tactics to influence public opinion. Back then, images, videos, and information were shared on social media platforms. For example, Twitter didn't have its own image hosting services, so people relied on external image hosting sites like LiveDog and others. Unfortunately, many of these sites have vanished, resulting in the loss of a significant amount of data. This occurred only 11 to 12 years ago, around the time the revolution started.
If one were to reconstruct a timeline of events and understand the evolution of the revolution, it would require considerable effort to find the original tweets, locate the associated images, and piece them together to form a coherent narrative. The web is often considered the first draft of history, and those moments were truly historical. Historians, academics, journalists, and anyone interested in history would undoubtedly want access to that material. However, a large portion of it has either disappeared entirely or become challenging to locate.
Apart from social media, there may be moral or ethical reasons to remember and preserve that information, particularly in high-stakes situations like a revolution. Suppose the regime that comes into power seeks to punish those who opposed it during the revolution. In such cases, web archives should be cautious not to become tools of oppression, enabling individuals to dig up past support for opposition groups and use it as a means of retribution. Privacy and the protection of individuals' safety must be considered in these contexts.
You mentioned the 2012 study: Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost? http://arxiv.org/abs/1209.3026
I'd like to also point to the 2013 follow-up study we did : Resurrecting My Revolution: Using Social Link Neighborhood in Bringing Context to the Disappearing Web
http://arxiv.org/abs/1309.2648
I'd also like to draw your attention to a recently published article where we evaluate how much we can trust the archived web to be an accurate replay of the past web: Hashes are not suitable to verify fixity of the public archived web
https://doi.org/10.1371/journal.pone.0286879
But perhaps most importantly, especially with respect to blogs and other social media, is that history is not just the purview of historians. For the Internet Archive's 20th anniversary (2016) and 25th anniversary (2021), our research group did a series of blog posts about topics of personal interest and not just boring academic stuff:
https://ws-dl.blogspot.com/2016/11/2016-11-21-ws-dl-celebration-of-ia20.html
https://ws-dl.blogspot.com/2021/11/2021-11-03-ws-dl-celebration-of.html
Because that's the paradox. At the same time, things are disappearing, but also something never disappears completely from the Internet…
Yeah, it both disappears and never goes away at the same time. So if you were tweeting during the Egyptian revolution and I really wanted to get the dirt on you and could expend the resources, I could probably surface at least some of the content now. In 2012, the web archives were previously characterized by limited resources and slow progress. However, the situation has changed significantly. Take the Internet Archive, for instance. While it may not possess the same level of resources as Google or Amazon, it has considerably enhanced its capabilities. The organization now operates with greater efficiency and speed. It is highly improbable that a significant event would occur without the Internet capturing a substantial portion of it. This situation can be viewed both positively and negatively, depending on the nature of the content and one's perspective.
In our original study, we observed a yearly disappearance rate of approximately 10% for online resources. This trend persists until it reaches a baseline, where some content continues to exist while the majority fades away. Through various studies, we have noticed a linear decay of resources on the web, especially concerning platforms or sites involving human activity. Deleting a single image or tweet is not the typical scenario. Instead, users often choose to delete their entire accounts or face account suspension, resulting in the removal of all associated content. Consequently, rather than sporadic individual deletions, bulk removals tend to occur. This pattern is exemplified by instances such as the downfall of Sky Blog, where an entire site ceases to exist.
Who are the archivers of the Internet ? Are there concerns about them ?
Indeed, there are numerous web archive initiatives, many of which are based in Europe and supported by national libraries and archives. For example, Archivo is a public web archive initiative in Portugal, while the British Library and the National Library of France (BNF) maintain their own web archives. Although accessing the BNF web archive requires a visit to a special room in Paris, it contains a wealth of valuable content. In the United States, the most prominent and trusted initiative is the Internet Archive, a nonprofit organization founded by Brewster Kahle. Unlike commercial companies, the Internet Archive is not involved in selling data for purposes like facial recognition. However, it is important to recognize that both benevolent and malevolent actors are aware of the Internet Archive's vast collection. While the Internet Archive does not engage in questionable activities, its resources can be accessed and utilized by both good and bad actors.
In addition to the Internet Archive, there are other web archives in the United States. One such initiative is Perma CC, based at Harvard University, which focuses on preserving legal documents and academic literature for the legal academic community. Another organization called Archive.today is based in Eastern Europe. These web archives typically operate on a page-by-page basis, allowing users to request specific URLs for archiving. The popularity of a requested URL often determines the number of copies archived. However, the Internet Archive is the only known organization in the US that conducts broad-based web crawling, aiming to capture as much content as possible without prior knowledge of its future significance. This approach is particularly valuable considering that individuals who may become influential figures in the future, such as the next Prime Minister of France, may have had personal blogs during their formative years. Being able to explore their perspectives from their youth and early education can offer unique insights.
Within the Internet Archive, there is a subgroup called Archive Team, which acts as an emergency response team when a major site announces its closure. For example, when Yahoo shut down Yahoo Answers, Archive Team mobilized its decentralized software, enabling individuals to mine and archive the content in a distributed manner. These efforts ensure that valuable knowledge bases, like the extensive Q&A repository on Yahoo Answers, are preserved. It is highly likely that Archive Team would also be activated in a scenario involving Sky Blog's shutdown, given its significance and extensive history.
.
I saw a comparison about the data loss of the internet that could be like losing the great Library of Alexandria. Is that what is at stake?
That metaphor is indeed a favorite among librarians, academics, philosophers, and others. The historical loss of the Library of Alexandria serves as a poignant reminder of the importance of preserving knowledge. However, the scale of what is currently at stake is vastly larger. It is true that a considerable portion of the content at risk may not be deemed important, such as an abundance of kitten pictures. However, the challenge lies in the fact that we cannot predict in advance what will become significant in the future. That is precisely why we strive to collect and preserve as much content as possible. While the metaphor emphasizes the magnitude of cultural output that is endangered, it is crucial to acknowledge that some of it may indeed be lost forever rather than simply relocated to another digital space.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment