Storing tons of data

"Please leave a message at the beep, we will get back to you when your support contract expires."

Moderators: phlip, Moderators General, Prelates

User avatar
undecim
Posts: 289
Joined: Tue Jan 19, 2010 7:09 pm UTC
Contact:

Storing tons of data

Postby undecim » Mon Oct 15, 2012 4:03 pm UTC

So I'm helping a friend of mine set up some serious file storage.

As of right now, he has a Linux box running 3x 2TB drives in a RAID 5 configuration, giving him 4TB of storage. He does a lot of audio and video work, and would like to archive everything he does to have later. He's also be interested in versioning with git or mercurial. (I hear the latter is better with binary files, but I haven't worked much with it. Obviously, I'll be reading up on it.)

In about 6 months, he's already filled up the 4TB, and has started storing on external drives. He's happy with paying for more drives and parts as long as he can have all his files available.

So I'm looking for a solution that is reliable, cheap (cost/capacity), and scalable.

I can handle geo-redundancy and can set him up with some user-friendly user and admin utilities. All I really need is to store all the files.

My initial thoughts are a setup like this.

I can add some PCI cards to the box to get 14 drives. I'll put the computer in some kind of cabinet for easy access to replace drives, and set up some kind of software RAID, ZFS, Btrfs, or LVM setup to utilize 10 of those drives for storage, 2 for parity, 1 spare (or more parity), and 1 for system use. If there are 2TB drives, this gives 20GB of immediately available storage.

Next, I'll add a cron job for nightly versioning. This will make a snapshot of the current filesystem with git, hg, or maybe a custom script (Btrfs seems like it would be good for this, but I haven't used it). Data that is no longer part of the most current snapshot will be marked for removal, and can be moved to a tape drive when space is low.

I have two main concerns with this setup:

1) I don't know how reliable RAID 6 will be at this array size.

2)The cheapest LTO Ultrium 5 tape deck on Newegg is $1,700 (http://www.newegg.com/Product/Product.a ... 6840121080). After around 75TB of storage, tape becomes cheaper byte-for-byte than hard drives even considering the cost of the deck, so this looks to be a good long-term investment.

Any thoughts?
Blue, blue, blue

LikwidCirkel
Posts: 169
Joined: Thu Nov 08, 2007 8:56 pm UTC
Location: on this forum (duh)
Contact:

Re: Storing tons of data

Postby LikwidCirkel » Mon Oct 15, 2012 5:23 pm UTC

I've worked a bunch with raw audio and video, and can totally relate to this.

I'd suggest looking into external 8-bay (or more) eSATA RAID enclosures. I find they're more cost-effective than finding motherboards and PCIe cards to host a whole bunch of drives on the same system. The enclosure will often become obsolete much slower than the computer anyway. The effective speed might be a bit slower than with PCIe drive controllers, but it hasn't been an issue for me. I generally run RAID 0+1, rather than RAID 5 or 6, but use what makes sense.

cphite
Posts: 1370
Joined: Wed Mar 30, 2011 5:27 pm UTC

Re: Storing tons of data

Postby cphite » Mon Oct 15, 2012 10:07 pm UTC

You say he wants his files available... define "available" - does he want them to be immediately available, or available via loading a tape and waiting for it to retrieve?

Tape has some very clear advantages for someone who has a LOT of data that they don't want to lose. First, you can add as much storage as your wallet can support. But even more importantly, tapes can be taken offsite. At a friends house, a storage locker, or some other such location. That way, in the case of something like a fire, theft, or whatever he doesn't lose all that data. Given that, $1,700 for a tape drive is a pretty good investment.

The disadvantage of tape is that it's relatively slow. So if by "available" he means he wants his files to just be there, then I would suggest something like an external array like LikwidCirkel mentioned; it's an easy way to add a lot of HDD.

If money isn't an issue, consider both... let him use the eSATA array as a primary "working" backup, and back that up to tape on a schedule.

User avatar
undecim
Posts: 289
Joined: Tue Jan 19, 2010 7:09 pm UTC
Contact:

Re: Storing tons of data

Postby undecim » Mon Oct 15, 2012 10:59 pm UTC

LikwidCirkel wrote:I've worked a bunch with raw audio and video, and can totally relate to this.

I'd suggest looking into external 8-bay (or more) eSATA RAID enclosures. I find they're more cost-effective than finding motherboards and PCIe cards to host a whole bunch of drives on the same system. The enclosure will often become obsolete much slower than the computer anyway. The effective speed might be a bit slower than with PCIe drive controllers, but it hasn't been an issue for me. I generally run RAID 0+1, rather than RAID 5 or 6, but use what makes sense.


I'm worried about hardware RAID solutions. If the bay hardware dies, will I still be able to retrieve the data? I would rather set up any hardware in as JBOD and use a software redundancy solution. That way I can just go to another computer with the disks if any other hardware dies.

cphite wrote:You say he wants his files available... define "available" - does he want them to be immediately available, or available via loading a tape and waiting for it to retrieve?

Tape has some very clear advantages for someone who has a LOT of data that they don't want to lose. First, you can add as much storage as your wallet can support. But even more importantly, tapes can be taken offsite. At a friends house, a storage locker, or some other such location. That way, in the case of something like a fire, theft, or whatever he doesn't lose all that data. Given that, $1,700 for a tape drive is a pretty good investment.

The disadvantage of tape is that it's relatively slow. So if by "available" he means he wants his files to just be there, then I would suggest something like an external array like LikwidCirkel mentioned; it's an easy way to add a lot of HDD.

If money isn't an issue, consider both... let him use the eSATA array as a primary "working" backup, and back that up to tape on a schedule.


Well he needs his most recent projects available. I figure 20GB of immediately available storage should give him around 6 months to 1 year projects (with versioning), and then the tape can be used for archiving projects older than that.
Blue, blue, blue

arbyd
Posts: 24
Joined: Thu Feb 11, 2010 4:33 pm UTC

Re: Storing tons of data

Postby arbyd » Tue Oct 16, 2012 6:11 pm UTC

undecim wrote:1) I don't know how reliable RAID 6 will be at this array size.

Current practice is to avoid striped RAID with large drives, especially consumer grade drives. Capacities are so large that you're very likely to have another read error while rebuilding a failed array. Just google raid 5 large drives for any number of articles about the problem. In particular, see http://storagemojo.com/2010/02/27/does-raid-6-stops-working-in-2019/, and the second paragraph of http://queue.acm.org/detail.cfm?id=1670144. I'd stick with raid 1 or 10.

LikwidCirkel
Posts: 169
Joined: Thu Nov 08, 2007 8:56 pm UTC
Location: on this forum (duh)
Contact:

Re: Storing tons of data

Postby LikwidCirkel » Tue Oct 16, 2012 6:41 pm UTC

undecim wrote:I'm worried about hardware RAID solutions. If the bay hardware dies, will I still be able to retrieve the data? I would rather set up any hardware in as JBOD and use a software redundancy solution. That way I can just go to another computer with the disks if any other hardware dies.


This is a legitimate concern, but pretty much any RAID box can be used as JBOD if you'd like. If you get a device with a relatively common RAID controller, it's less of a concern. Hardware failures like this are much less likely than drive failures, but not impossible. I just use software RAID (mdadm), so I don't have this problem, but you can't generally do that nicely with an external RAID box.

I do like the tape idea, but only you're ok with treating it as archive that isn't instantly available, of course. If it was my setup, I'd probably use a RAID box primarily and backup to tape daily/weekly, and also dump stuff to tape for archival purposes when I think I'm not going to need the data but I'd better save it just in case.

I also echo the comments about RAID 5/6. I don't use those because I'm too paranoid - it's always RAID 1 or 10 for me.

User avatar
undecim
Posts: 289
Joined: Tue Jan 19, 2010 7:09 pm UTC
Contact:

Re: Storing tons of data

Postby undecim » Wed Oct 17, 2012 6:19 pm UTC

arbyd wrote:Current practice is to avoid striped RAID with large drives


Well what about non-raid redundancy solutions like LVM or special filesystems?
Blue, blue, blue

wumpus
Posts: 546
Joined: Thu Feb 21, 2008 12:16 am UTC

Re: Storing tons of data

Postby wumpus » Thu Jan 03, 2013 6:51 pm UTC

arbyd wrote:
undecim wrote:1) I don't know how reliable RAID 6 will be at this array size.

Current practice is to avoid striped RAID with large drives, especially consumer grade drives. Capacities are so large that you're very likely to have another read error while rebuilding a failed array. Just google raid 5 large drives for any number of articles about the problem. In particular, see http://storagemojo.com/2010/02/27/does-raid-6-stops-working-in-2019/, and the second paragraph of http://queue.acm.org/detail.cfm?id=1670144. I'd stick with raid 1 or 10.


I fail to see how raid 1 or 10 could possibly help you (with this issue). While you might get a read error with RAID5, you are roughly (1/N for N drives) likely to get the same error with RAID1 or 10. Going RAID6 should fix all these, but I haven't seen a good enough explanation of what that is (I'm assuming some sort of Reed-Solomon implementation).


Return to “The Help Desk”

Who is online

Users browsing this forum: No registered users and 6 guests