Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I use Arq[1] on my macs and it does the same thing but handles all the "complicated" bits that I'd rather be programmatically dealt with. I have 6 macs, about 2TB of backups in Glacier and a ~20GB "system image" backup in hot-storage on S3.

[1] http://www.haystacksoftware.com/arq/



I also use Arq and recommend it to everyone on OS X. It has a sane transparent format (modelled on git), it can store to Glacier and it doesn't periodically explode like Back Blaze used to.

Interested to hear more about your "system image". How do you do it and what do you store in it? Documents / preferences / Library etc? How would you use it in a recovery situation?


It's a separate backup vault in Arq that's maintained by clean OS install + my essential documents, dotfiles, and an encrypted PII volume with scans of passport/SSC/Drivers license/Birth certificate. It's all on a 60GB SSD that's booted every week to update to the latest versions of everything and then to be Arq backed up.

Restoration is downloading the vault to a new SSD and shoving it in a computer. The "root" user is included in the backup as an encrypted volume. My setup's a little strange in that the OS disks on my computers are filevaulted and each user's homedir is also in an encrypted volume only they have the password too that's mounted to the correct /Users/folder with fstab.

(It'd probably be helpful to note that I have a hackintosh this is done on with 10 internal bays, in which one resides this SSD with a clone of this SSD elsewhere)


Could you write that up in a blog post please? I'd love to read how that works. (I'm the author of Arq)


I just needed to upvote you for creating Arq. For a developer it's easy to guess at what it's going to do and it doesn't ever seem to do anything it shouldn't.

There is one question I've had about it. I keep everything in Glacier (about 700GB, I think). My understanding is that nothing is deleted from Glacier. If I deleted a whole backup set and then re-added it, would it upload the content again?

Also, I'm keeping an eye on http://www.filosync.com and I'm planning on trying it out soon. We currently run aerofs which is mostly really good - but sometimes we get into a situation (of our own making, we run it on an aws spot instance that can vanish) and sorting it out is hard work (and time consuming). The most frustrating aspect is that we don't know how it works, so we don't know what to expect when we perform certain operations. Sometimes it doesn't work how we expect and that bites us (I'm currently waiting for 12GB to sync - even though all the machines already have all the content).

Given my (really positive) experiences with Arq I'd like to try a product that was a bit more open about how it worked.


I could probably whack something together, but what specifically were you wondering about?


I'm wondering about the whole process. It sounds like an external drive that you periodically copy your documents, dot files etc to, and then let Arq back up? Or is it something else? (You talk about "booting" the disk -- do you mean booting the computer from that disk, or just mounting it?) Also I didn't understand the "root" user part. And is this a multi-user computer? You mentioned something about "each user".


Ah, I suppose I should break down my setup.

My main system: Hackintosh -- core i5 / 32GB RAM / GTX680, 2 x 250GB SSD, 2x 60GB SSD, 6 x 3TB 7.2K drive. (The handy thing here is my hackintosh build runs the vanilla kernel and I patch all my kexts at runtime so all the binaries are stock -- thus the backups are portable). This is my workstation and it's primarily a single-user system but there is the occasional remote user login. I use this same setup on all my macs though, and this is how it works:

   1 60GB SSD for the OS, applications, and caches
   X number of SSDs (usually 256GB), one for each user
   
The 60GB OS SSD is filevaulted with a password known to all users, so the disk is FDE'd while any of us can boot it. Each user SSD is filevault encrypted with their personal key. Each user is assigned a bash alias to mount their drive (So you can paste keys if you're using pubkeys with secure entry):

       maroch='diskutil cs unlockVolume <logicalvolumeid>' ##mount aroch's home directory
And there's the associated fstab entry

    /Volumes/aroch /Users/aroch hfs rw,bind 0 0
The "root" account is an admin account setup much the same way but but it is just an encrypted DMG and is mounted at boot using launchd and a hardware dongle as the password. If no dongle is present, there's a non-admin account with a real homedir that can also be used to elevate into an admin shell

The 60GB backup SSD (in slot #4): An install of (now) OSX Mavericks that includes my standard set of homebrew installs, xcode and the ~20 applications I use daily + Arq. It also has the a backup of my fstab and the root account DMG. There's a daily cron that copies my personal dotfiles to it as well. There's a weekly cron that reboots the workstation into the backup SSD and there's a startup script on the backup SSD that checks for system updates /MAS updates, applies them, Arq backup (Takes ~10mins usually) and then reboots back into the main OS SSD. The files total about 17GB right now but the delta is usually less than 100mb.

If needed, the OS and settings can be restored with Arq by using the backup SSD's vault


sounds like an awesome setup. So in theory you could restore your backup system drive from the workstation to a laptop as you are running the vanilla kernel ? How would one do that in practice ? I was thinking about a way to sync my workstation and Laptop in an easy way on a system level at night/morning..would something like this work for that ? EG using the nightly system drive image of the workstation and clone it to the laptop in the morning ?


Yes, in fact I have restored from my workstation to my mac mini and MBP in the past. Closing across two drives daily is probably not worth it. You'd be better off rsync / rdiff over your local network


ok sounds reasonable, but can you also sync system files (like installed homebrew etc) that way ?


Yes, you can sync /opt/ and make sure to have some key excludes in there (/var/ and /etc/ being the most likely)


Wow! That is an impressive setup!


Is there something like Arq compatible with both OSX and Windows ?

Features I'm interested in: - must use S3 or similar, as long as it's private and "unlimited" - must support multiple clients, even though I don't need real time sync - should encrypt on the client without need of a separate sw

I've been using Dropbox + CloudFogger so far, but dropbox doesn't scale anymore with the quantity of data I have, mostly due to the number of files (and I don't like depending on two different sw for one task).


The only two I have used are git-annex [1] and duplicati [2]. I much prefer git-annex but the windows version isn't so stable.

[1] http://git-annex.branchable.com/

[2] http://www.duplicati.com/


Another happy Arq user here -- I just use it for Glacier backup, no need for hot S3 storage.


How do you like glacier? It seems it's fairly cheap until you want your data back. Any insights?


Personally I've factored that in to my cost of recovery. It's going to cost me about £120 to restore my data over the course of a week. It depends on your use case but for me - in the event that glacier is the last point I can get my data from, £120 is a small price to pay to get it back. I used backblaze before and it was about the same to get them to post out a hdd (which was an awesome service on the one occasion I had to use it).


In addition, Glacier should be thought of as "offline" storage , like a tape rotation scheme. When thought of that way, the cost for restoration is non-existent.

Its not for "I may need this file next week" (thats what your local NAS / Time Machine is for...Glacier is for "what happens if this place burns down". At that point a week and £120 is not high on the list of your concerns...


It's too bad Amazon doesn't offer a discounted rate to restore from Glacier if you pay upfront.


Sorry to chase you between threads, but yes, I'd like to get in touch for the Getty files! Care to share contact info?


m8r-fhf8r1@mailinator.com (apologies, would prefer not to share my public email address here)

I'll reply back from my personal email account.


I like it quite a bit, Arq is my "backup of last resort" and what I use for provision new macs if they're not with me. I have daily/weekly/monthly local backups and semi-local, bi-weekly backups (stored offsite or in a building with a different mac). Setting up a new Mac remotely costs about $30 for the 60GB provisioning profile to be downloaded and restored (maybe more or less depending on the internet connection at the place). I pay a few hundred yearly for backups that I have no doubt will be there if I need them.


Arq is simple and works great, although it seems that a half of my S3 bills are for the thousands of requests that Arq makes back and forth. Simpler upload & forget backup model probably could save a few bucks...


I make, on average, 40K requests a month which is like $2 while my storage costs are ~$22. At the low end you'll be paying what seems like a lot for requests but as things scale your requests costs become a much smaller percentage.

You an cut down on requests by moving to daily backups (what I do) and narrowing down what you're backing up. Backing up your applications folders will take a ton of requests (Some apps have upwards of 1000 assets whose bless values may change if you open the application and this trigger Arq)


Has anyone been able to accurately figure out how much Glacier will costs once storage is retrieved from it?


Above the free retrieval quota (5% of your storage per month), pricing is based entirely on your peak hourly retrieval rate for the month, billed at $7.20 x [peak rate measured in GB/hr].

So for example, if you retrieve 100 GB in one hour, that's $720. If you retrieve 10 TB at the rate of 100 GB/hr for 100 hours, that's also $720.

Translated into bandwidth (assuming you're pulling at this rate for at least an hour), it's about $3 per Mbps. So if you limit your retrieval to 10 Mbps, you'll pay $30, regardless of how much data you retrieve. This makes it fairly easy to cap your expenditure, if you have a client that can stagger its requests appropriately (you need to throttle the Glacier retrieval requests, possibly using Glacier's range requests if you have very large files, not throttle at the network level).


Here's my last bill:

    $0.010 per GB / month - Storage 	2112.146 GB-Mo 	21.12
    $0.050 per 1,000 Requests 	40,447 Requests 	2.02

My last "restoration" was actually setting up a new iMac at a remote site involved downloading ~60GB over ~20hour and cost me about $30


Do you recommend Glacier over Crashplan? I'm thinking about switching


They are very different.

Glacier takes hours just to get the listing of files, or start the file download.

Their pricing is very different. Crashplan charges per computer. Glacier charges per GB of storage and per GB of transfer out.


In what cases would be better to use Glacier? I don't have too many GB, ~30GB.


Glacier is for when you want to store essentially never-changing backups that you expect to very, very rarely access. You can store a couple GB or petabytes.

Think an archive of family videos and photos spanning 30 years, or all the tax documents for a large enterprise.


A 30GB backup, that's updated nightly, on AWS Glacier will cost somewhere in the neighborhood of $1 a month. A 300GB backup somewhere around $5 on AWS.


Glacier is cheaper for small backups that you almost never touch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: