Skip to content

Instantly share code, notes, and snippets.

@horsemankukka
Last active April 4, 2023 08:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save horsemankukka/35fceead5425be2146a9649cc90460a3 to your computer and use it in GitHub Desktop.
Save horsemankukka/35fceead5425be2146a9649cc90460a3 to your computer and use it in GitHub Desktop.

So, apparently in the end of September Elisa’s customers got a message saying that Elisa's website space will be shut down “by the end of the year” due to low usage and outdated technology.[1] Exact date/time is apparently not stated, based on customers talking about both Dec 31 and Jan 01.

Many of the sites under those domains probably went down already in 2020, when Elisa moved all mail boxes and website spaces included in broadband contracts without additional price under separate paid email service costing 36€/year.

The official support site talks just about services connected to Elisa email accounts which are all supposed to be at http://www.elisanet.fi/personal.name[2], but the discussion on support community has quote from support chat confirming the discontinuation of all saunalahti.fi-hosted websites too.[1] The format for URLs of these pages is supposed to be http://www.saunalahti.fi/~username. Customers having their websites at kolumbus.fi are also reporting their websites will be closed[3]. Those urls seem to be of format http://www.kolumbus.fi/personal.name

Domains listed in tags of a customer forum FAQ message about Elisa email service[4]: netti.fi, nic.fi, kolumbus.fi, elisanet.fi, kotikone.fi, netikka.fi, kotiposti.fi (just email).

Following sites are going to be closed, according to customers:

  • netti.nic.fi/~username[5]
    • under the same IP address are:
      • www.nic.fi/~username
      • gamma.nic.fi/~username
Probably under threat:
  • www.netti.fi/~username
  • www.kotikone.fi/username
The vast amount of domains is the result of multiple acquisitions over the years.

The support page documents that if public_html does not contain index.htm or index.html, a not found page is shown. Completely similar page appears when the user does not in fact exist. So 404 at domain.tld/username doesn't necessarily mean that nothing in that webspace exists, so checking for index may not be ideal strategy for the purpose of limiting requests to paths gathered from Wayback CDX, Common crawl or similar.

Google site operator results for relative scale (search with bigger results (www ot not www) retained):

site:kotikone.fi ~743 results
site:nic.fi ~6,260 results
site:netti.nic.fi ~963 results
site:gamma.nic.fi ~2,550 results
site:www.saunalahti.fi ~26,600 results
site:kolumbus.fi ~96,800 results
site:www.elisanet.fi ~21,500 results
site:www.netti.fi ~1,870 results

UPDATE 2022-11-29T15:50: More domains from support forum thread announcing discontinuation of various old SMTP/POP/IMAP servers[6]

domain host operator google results
personal.eunet.fi ~209 results
www.personal.eunet.fi ~49 results
wwnet.fi ~925 results
www.dlc.fi ~4,710 results
*.pp.fi (these seem to be company websites though) ~948 results
www.nettilinja.fi ~108 results

UPDATE 2022-11-29T16:06: probably also sci.fi (~5,520 results)

  1. a b https://yhteiso.elisa.fi/muut-elisan-palvelut-29/elisan-kotisivutilan-lopetus-1-1-2023-523653
  2. ^ https://elisa.fi/asiakaspalvelu/aihe/sahkoposti-kotisivut/ohje/kotisivutila/
  3. ^ http://www.kolumbus.fi/sami.nordlund/
  4. ^ https://yhteiso.elisa.fi/saehkoeposti-68/mikae-on-elisa-saehkoepostipalvelu-520819
  5. ^ http://netti.nic.fi/~tomk/mina_ja_minusta.html
  6. ^ https://yhteiso.elisa.fi/saehkoeposti-68/ennakkotieto-elisan-vanhat-saehkopostipalvelimet-paeaettyvaet-521195?postid=674989#post674989

@Pokechu22
Copy link

I currently have this list running on archivebot as an !a < list job (so it recurses over http://www.elisanet.fi/ seeded with the URLs in that list). I generated that list by grabbing all URLs that were previously crawled by IA and then stripping them to just the username. I also added google and duckduckgo results, but only got about 300 results total (after setting it to 100 results/page), although some of them were new. I'm not sure if there are more results that google just isn't giving, or if they inflate the number of pages. If you've got advice on how to do this better, let me know.

I've also built up a much larger list for kolumbus.fi based on existing data for www.kolumbus.fi and web.kolumbus.fi (I'm treating users on the later as if they existed on the former, since I don't think anything still exists on there). That isn't being run currently, though.

The ArchiveTeam wiki page on ISP Hosting mentions a few different URL formats starting with ~ on both of those domains, which do seem to exist (though there seem to be more of those than just the ones listed). It also mentions a previous project for Saunalahti pages, done in around 2015, so probably those are already saved (there probably isn't much new content at least) and can be deprioritized (though it may still be good to do them again).

One other annoying thing is that nonexistent entries both unconditionally redirect from http://www.elisanet.fi/personal.name to http://www.elisanet.fi/personal.name/ and give 403s (instead of giving 404s/not redirecting if no user exists and 403s if they exist but have no index page).

@Pokechu22
Copy link

Pokechu22 commented Dec 1, 2022

The list for kolumbus.fi has also been started. Of the ~40000 possible usernames listed, about 780 of them actually had an index page (though some of those index pages were also actively updated, such as http://www.kolumbus.fi/Jarpen/ which was last updated in October, and mentions http://www.escstats.com/ as its new location (which was updated today)). I got that number by looking at the dashboard after it finished going through the main list and started looking at subpages and images, but the actual list of working usernames won't be available until after the job finishes.

Oh, I also forgot: here is my source list of URLs for kolumbus.fi, and here's the list for elisanet.fi. And here are the archivebot viewer pages for elisanet.fi and for kolumbus.fi, though they aren't ready yet.

@Pokechu22
Copy link

I believe everything has been at least partially saved at this point, though a second pass to check for users without index pages and better scraping of google would be good.

This includes pp.fi (which personal.eunet.fi/pp/ is the same as, at least for the most part?), though at this time only a few users survive there (http://www.james.pp.fi, http://www.juuso51.pp.fi, http://www.ltoy.pp.fi, http://www.mikan.pp.fi, http://www.takoja.pp.fi, http://personal.eunet.fi/pp/tstop/ (weirdly http://www.tstop.pp.fi/ doesn't work), http://www.posic.pp.fi/en/yhteydet.htm (no index, but note how it says no permission for /~posic-1/ instead of /~/), http://www.nuuska.pp.fi/finnfox.htm (also lacks an index), and theoretically something on http://www.ural.pp.fi/ but it seems unlikely it'll be findable). I also scraped all of the google cache links for pp.fi (which was a bit tedious to do since it required manually coming up with search filters) as it seems like some subdomains on there went down in the last few months but are still cached. I'm fairly confident that everything that can be done for pp.fi is done.

I've been a bit disorganized so I don't have the lists I used readily available; most should be findable on https://archive.fart.website/archivebot/viewer/ or show up in a few days though.

@horsemankukka
Copy link
Author

@Pokechu22 Thank you so much for your service! Looks like I didn’t get any notifications about these comments but noticed your efforts now. Amazing job, regardless of how much was saved, it was very probable that most former URLs and usernames just didn’t exist anymore anyway. And interesting to see that there was a grab in 2015 too. Have to dig in to this treasure trove at some point…

What I forgot completely was to check Finnish NatLib’s Web Archive’s CDX server. (Their grabs are not available on public web (copyright law), but CDX server can be queried from public.) Maybe I should still do it to compare the saved stuff. I expect them to have less though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment