Skip to content

Instantly share code, notes, and snippets.

@aphyr
Created October 13, 2015 18:42
Show Gist options
  • Save aphyr/b53faed7aed740482ddf to your computer and use it in GitHub Desktop.
Save aphyr/b53faed7aed740482ddf to your computer and use it in GitHub Desktop.
Spideroak
Aphyr
Oct 11, 9:12 PM
Hi there!
First time user here--I bought a 5TB plan for the year, checked my homedir in
"Backup", and hit save. It did a bunch of disk IO for a few minutes, then just
sat at 100% CPU for the last 8 hours. The "Scan Now" spinner is rotating and it
says "Scanning folders for new backup items". The Activity tab is empty and the
Actions log says "Application: save backup selection" is finalizing. I killed
the process and restarted it, but it seems stuck in the same way. Any idea what
I have to kick to get it working again?
--Kyle
A few more details: this is on an updated Debian Jessie box, spideroakone
1:6.0.1 amd64, and the device name is "waterhouse". Attached a couple
screenshots too. :)
Thanks for your time!
--Kyle
Hi Kyle,
This is Dana from SpiderOak Support.
I'm sorry that you are experiencing backup issues. I must point out that we do not recommend backing up the entire home directory. It is best to only select the folders that are under it. This is because the Home directory contains application data that should not be uploaded with SpiderOak. We recommend only backing up file types such as documents, pictures, videos, music, etc. Please try selecting individual folders under your Home directory instead and see what happens.
Best,
Dana
Yeah but, like, the whole point of me buying backup software is to take backups
of, well, all my data. Dotfiles, logs, git repos, databases, VMs, experiment
datafiles... I've got a lot of stuff here besides just photos and documents. If
you can't handle a 2TB homedir, why do you even offer a 5 TB plan? :-/
--Kyle
SpiderOak is not designed to back up entire home directories. Backing up dot files and logs should be ok, but it is not recommended to back up the other file types you have mentioned and there is nothing more we can do about that.
GIT repos are not recommended because there are special tools to back them up and we're not one of them.
No database should be backed up live. You have to run a database-specific tool to make a backup, then point SpiderOak to that.
VMs are way too big to back up and they are constantly changing since they are live. SpiderOak does not handle constantly changing files well and this can cause rapid space consumption on your account.
Best,
Dana
Yes, I know if you back up a file while it's being written it might be corrupt.
I'm a database engineer. I accept this. That's why I have mirrored disks, FS
snapshots, and onsite backups. I just wanted Spideroak for an offsite replica in
case of earthquake or fire.
I'm also not really concerned about having to recover the DBs--they're
infrequently changing and and relatively small. Ditto for the VMs: they're only
a few GB a piece and almost never change. How do I know how much space my
changes take? Because I keep ZFS snapshots going back *years*. The deltas just
aren't that big. Look at the monthlies:
gravitas/home@2015-03-01_03.30.01--365d 1.18G - 1.61T -
gravitas/home@2015-04-01_03.30.01--365d 6.62G - 1.63T -
gravitas/home@2015-05-01_03.30.01--365d 6.13G - 1.64T -
gravitas/home@2015-06-01_03.30.01--365d 7.36G - 1.63T -
gravitas/home@2015-07-01_03.30.01--365d 23.1G - 1.58T -
gravitas/home@2015-08-01_03.30.01--365d 1.12G - 1.64T -
gravitas/home@2015-09-01_03.30.01--365d 861M - 1.65T -
Rsyncing the whole 2TB to my backup server takes like, 10 minutes a night. A
fresh rsync takes a few hours. I assure you, this is not a pathological dataset.
There's no excuse for just chewing through 100% CPU for days on end without even
generating a file list or size estimate when I can traverse the entire directory
in *under 3 seconds*.
aphyr@waterhouse ~ [1]> time du -sh /home/aphyr
1.7T /home/aphyr
0.24user 2.32system 0:02.58elapsed 99%CPU (0avgtext+0avgdata 4400maxresident)k
0inputs+0outputs (0major+6542minor)pagefaults 0swaps
Tell you what: how about you just refund my plan and close my account, and I'll
find some other service. :-/
--Kyle
p.s. 1978 called--they said they've invented something called a "symbolic link"
and it might be cool if you could support it someday
Copy link

ghost commented Aug 22, 2017

Is this part of GitHub for SpiderOak? It is hard to tell. Anyhow, I can see that, indeed, 'SpiderOak is not designed to back up entire home directories.' For, indeed, it stalls the computer when trying to do so. It would have been nice to know this before installing and configuring - and then, givem this situation, _un_installing - the software.

@YSEHaMATY
Copy link

It's worth noting that when you're dealing with a network connection of limited speed, it's not possible for any backup process (short of totally parallel servers, which is something different) to indiscriminately back up everything in a live OS while keeping up with all the rapid changes that happen to any temp files. There has to be some level of compromise. Such software might be able to try to watch and take periodic snapshots files that are in use, if they temporarily become available, but when a program creates or opens a file, it can also specify whether or not that file will be accessible to other processes (e.g. the sharing permissions), whether for reading or writing. Even if this restriction is evaded through a low-level system hook, the resulting backup of that file will easily be corrupt. At the very least, the system would have to completely suspend the app and whatever it else relies upon to even begin to try to avoid logical corruption. Also, if you aren't selective with what is backed up, it will also simply eat up your network bandwidth and space with needless redundant transfers of the same temp files, over and over. If your backup uses versioning, it will have a crazy number of different versions for such temp files, thus eating up your disk quota.

@aphyr
Copy link
Author

aphyr commented Jan 12, 2021

It's worth noting that when you're dealing with a network connection of limited speed, it's not possible for any backup process (short of totally parallel servers, which is something different) to indiscriminately back up everything in a live OS while keeping up with all the rapid changes that happen to any temp files.

As I mentioned earlier, I'm a database engineer and I'm familiar with these issues. As I also mentioned earlier, I use ZFS snapshots. As I also mentioned earlier, the deltas involved just aren't that big.

I switched to duplicacy, which handles my entire 6TB homedir just fine. I'm pretty sure this was just spideroak's client being slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment