1737: "Datacenter Scale"

This forum is for the individual discussion thread that goes with each new comic.

Moderators: Moderators General, Prelates, Magistrates

User avatar
thunk
Posts: 466
Joined: Sat Apr 23, 2016 3:29 am UTC
Location: Arguably Exiled

1737: "Datacenter Scale"

Postby thunk » Fri Sep 23, 2016 5:05 am UTC

Image

alt-text: "Asimov's Cosmic AC was created by linking all datacenters through hyperspace, which explains a lot. It didn't reverse entropy--it just discarded the universe when it reached end-of-life and ordered a new one."

News at 11: Randall vindicates the Vogons.
Free markets, free movement, free plops
Blitz on, my friends Quantized, GnomeAnne, and iskinner!
troo dat

User avatar
Pfhorrest
Posts: 5008
Joined: Fri Oct 30, 2009 6:11 am UTC
Contact:

Re: 1737: "Datacenter Scale"

Postby Pfhorrest » Fri Sep 23, 2016 5:12 am UTC

I'm assuming that this is hyperbole taking off from some kind of actual practice, but I'm a bit stumped to imagine what that actual practice might be.

In the scenario in the first panel: OK, so maybe RAID is pointless because of higher-level redundancy between different machines, but if it comes down to buying a new machine, or buying a new HD, how is buying the new machine actually more efficient? Not having redundancy within one machine, sure, but still just replacing the broken part to get a functional machine (which is part of a higher-level, inter-machine redundancy), instead of replacing the whole machine, seems like it would always be more efficient.

The further panels just look like larger-scale versions of that same problem, and I can't think of a smaller-scale version, much less why a smaller-scale one wouldn't be equally absurd, so this seems like absurdist hyperbole exaggerating... something that's already perfectly absurd to begin with, so what does the hyperbole add?
Forrest Cameranesi, Geek of All Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
The Codex Quaerendae (my philosophy) - The Chronicles of Quelouva (my fiction)

User avatar
rhomboidal
Posts: 791
Joined: Wed Jun 15, 2011 5:25 pm UTC
Contact:

Re: 1737: "Datacenter Scale"

Postby rhomboidal » Fri Sep 23, 2016 5:29 am UTC

At this point, datacenters should just be shaped like giant thumb drives.

FreeRoy
Posts: 2
Joined: Fri Sep 16, 2016 3:34 pm UTC

Re: 1737: "Datacenter Scale"

Postby FreeRoy » Fri Sep 23, 2016 6:13 am UTC

Thanks for the Asimov reference, Randall.
"And Cosmic AC said, 'Let there be Light!'

User avatar
Znirk
Posts: 174
Joined: Mon Jul 01, 2013 9:47 am UTC
Location: ZZ9 plural Z α

Re: 1737: "Datacenter Scale"

Postby Znirk » Fri Sep 23, 2016 7:39 am UTC

Pfhorrest wrote:just replacing the broken part to get a functional machine (which is part of a higher-level, inter-machine redundancy), instead of replacing the whole machine, seems like it would always be more efficient.

Depending on what it costs you to have someone open the old box, diagnose the exact problem, get the relevant spare part, and install it (and in some cases still have to replace the whole box), it may make economic sense. Not the same thing as actual sense, obviously, but still.

User avatar
cellocgw
Posts: 1956
Joined: Sat Jun 21, 2008 7:40 pm UTC

Re: 1737: "Datacenter Scale"

Postby cellocgw » Fri Sep 23, 2016 11:19 am UTC

Pfhorrest wrote:I'm assuming that this is hyperbole taking off from some kind of actual practice, but I'm a bit stumped to imagine what that actual practice might be.


Years ago, Google realized the cheapest and fastest way to run their database search engine was to buy the cheapest PCs they could find and hire kids to walk up and down the aisles, swapping in new ones for the ones that burned out. (There was a ton of cool software that ran virtual clusters & whatnot so that dead machines didn't cause either data loss or query delays, too)
https://app.box.com/witthoftresume
Former OTTer
Vote cellocgw for President 2020. #ScienceintheWhiteHouse http://cellocgw.wordpress.com
"The Planck length is 3.81779e-33 picas." -- keithl
" Earth weighs almost exactly π milliJupiters" -- what-if #146, note 7

Tub
Posts: 410
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: 1737: "Datacenter Scale"

Postby Tub » Fri Sep 23, 2016 11:32 am UTC

Znirk wrote:Depending on what it costs you to have someone open the old box, diagnose the exact problem, get the relevant spare part, and install it (and in some cases still have to replace the whole box), it may make economic sense.

It's obviously cheaper to have someone get a new box, get all relevant spare parts, install them, and then run diagnostics to make sure none of the fresh parts was defective.

The only way I can imagine this making sense is as part of a general replacement strategy. If you want to continually modernize your datacenter by throwing away the X oldest servers and buying Y new current-generation servers each year (thus increasing performance, efficiency and density), then there's a simple calculation saying that any machine older than n months does not get repaired, but is immediately replaced with a better server.

I'm really curious how it's actually handled in the real world.

teelo
Posts: 761
Joined: Thu Apr 08, 2010 11:50 pm UTC

Re: 1737: "Datacenter Scale"

Postby teelo » Fri Sep 23, 2016 12:04 pm UTC

I have a whole solar system full of spare Death Stars. They're like companion cubes from Portal.

synp
Posts: 43
Joined: Mon Feb 02, 2009 7:43 am UTC

Re: 1737: "Datacenter Scale"

Postby synp » Fri Sep 23, 2016 12:09 pm UTC

Pfhorrest wrote:I'm assuming that this is hyperbole taking off from some kind of actual practice, but I'm a bit stumped to imagine what that actual practice might be.

In the scenario in the first panel: OK, so maybe RAID is pointless because of higher-level redundancy between different machines, but if it comes down to buying a new machine, or buying a new HD, how is buying the new machine actually more efficient?

For operating the data center, it makes more sense to replace the server with a "new" one and go on as before. The new server synchronizes with the necessary data in minutes and everything is up and running.

That does not mean that the defective server gets "thrown away". It might make sense for a technician to look at the defective servers, and if repair costs below some threshold and the rest of the hardware is new enough (so that we don't expect something else to break within a month or two), they can make the repair and return the server to the queue of "new" servers.

I don't personally know if Google or Amazon actually do that.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26546
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: 1737: "Datacenter Scale"

Postby gmalivuk » Fri Sep 23, 2016 12:37 pm UTC

Tub wrote:
Znirk wrote:Depending on what it costs you to have someone open the old box, diagnose the exact problem, get the relevant spare part, and install it (and in some cases still have to replace the whole box), it may make economic sense.

It's obviously cheaper to have someone get a new box, get all relevant spare parts, install them, and then run diagnostics to make sure none of the fresh parts was defective.

That's not obvious to me. How much does it cost to have reduced capacity while all that diagnosis is happening? How much does it cost to pay the technician to do it?

At the very least, synp's suggestion that the defective server is immediately replaced and *then* diagnosed makes a lot of sense.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

User avatar
Flumble
Yes Man
Posts: 2082
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: 1737: "Datacenter Scale"

Postby Flumble » Fri Sep 23, 2016 12:39 pm UTC

Does anyone know how the maintenance of "data center containers" is performed?
Do you just buy one, plug it in and then send it back to the supplier once half of the boards/disks are failing?

User avatar
Jaruzel
Posts: 8
Joined: Fri Nov 30, 2012 7:16 am UTC
Location: London, UK

Re: 1737: "Datacenter Scale"

Postby Jaruzel » Fri Sep 23, 2016 12:58 pm UTC

For some reason, this one reminded me of Monty Pythons' 'Four Yorkshire Men' Sketch. :D

For those who haven't seen it: https://www.youtube.com/watch?v=wgsdSOhW1mU

-Jar

User avatar
jc
Posts: 349
Joined: Fri May 04, 2007 5:48 pm UTC
Location: Waltham, Massachusetts, USA, Earth, Solar System, Milky Way Galaxy
Contact:

Re: 1737: "Datacenter Scale"

Postby jc » Fri Sep 23, 2016 1:08 pm UTC

Pfhorrest wrote:I'm assuming that this is hyperbole taking off from some kind of actual practice, but I'm a bit stumped to imagine what that actual practice might be. ...

There's a lot of precedent for this sort of approach. A simpler example: As an undergrad, one of the "temp" jobs I had was with the facilities dept., whose tasks included changing the fluorescent tubes in the ceilings all over campus. It turns out that replacing single failed bulbs wasn't time- or cost-effective. When one tube fails, chances are that more of the same type and age will also soon fail. So if you're sending a team with tubes and ladders to replace a failing tube, it's more practical to replace all of them in the same ceiling. Their records showed how long particular models of tubes would fail, and sent in teams to sweep through a building, replacing all the tubes.

All sorts of electronics have similar properties, so wholesale replacement of batches turns out to be a very practical approach. Similar arguments apply to various non-electronic stuff, like underground pipes (which were entirely replaced in our neighborhood over the past year), road surfaces, house siding, etc. Randall has just extended the approach a bit.

User avatar
Wee Red Bird
Posts: 189
Joined: Wed Apr 24, 2013 11:50 am UTC
Location: In a tree

Re: 1737: "Datacenter Scale"

Postby Wee Red Bird » Fri Sep 23, 2016 1:45 pm UTC

Tub wrote:The only way I can imagine this making sense is as part of a general replacement strategy. If you want to continually modernize your datacenter by throwing away the X oldest servers and buying Y new current-generation servers each year (thus increasing performance, efficiency and density), then there's a simple calculation saying that any machine older than n months does not get repaired, but is immediately replaced with a better server.

If you know the kit will be obsolete in 5 years, build a new datacenter in 4 years and bring it online. Bin old datacenter.
Would be cheaper than ripping out old equipment a section at a time with all the risks to bringing down the whole, or significant part, of the current system.
Sell it to a company that wants a prebuilt data centre that doesn't have to be state of the art, or start ripping out the old kit to work on replacing the other data center in 5 years time.

qvxb
Posts: 159
Joined: Mon Sep 19, 2016 10:20 pm UTC

Re: 1737: "Datacenter Scale"

Postby qvxb » Fri Sep 23, 2016 2:04 pm UTC

Like the Roman principle of "Kill 'em all, let Jupiter sort 'em out!".

User avatar
orthogon
Posts: 3006
Joined: Thu May 17, 2012 7:52 am UTC
Location: The Airy 1830 ellipsoid

Re: 1737: "Datacenter Scale"

Postby orthogon » Fri Sep 23, 2016 4:12 pm UTC

Jaruzel wrote:For some reason, this one reminded me of Monty Pythons' 'Four Yorkshire Men' Sketch. :D

For those who haven't seen it: https://www.youtube.com/watch?v=wgsdSOhW1mU

-Jar

There's also a round in I'm Sorry I Haven't A Clue which involves the panelists making increasingly outrageous boasts until the buzzer sounds. I can't remember what it's called, though.
xtifr wrote:... and orthogon merely sounds undecided.

reval
Posts: 88
Joined: Fri Sep 23, 2016 2:56 pm UTC

Re: 1737: "Datacenter Scale"

Postby reval » Fri Sep 23, 2016 4:19 pm UTC

The fluorescent tube example works because those things age at a definite rate. After a few years, a 40W tube will draw closer to 22W and be pretty dim. You can use a Kill-a-watt meter to measure that.

I'm not sure the mean-time-between-failure for computers works the same way. Yes, they tend to go dead on me after about 8 years, but I always assumed that was due to the electrolytic capacitors. Now that clock speeds have hit a plateau around 2 GHz it may be time to start keeping old hardware around longer.

You'd have to get manufacturers to get past planned obsolescence and build for durability. You'd also have do something about IT owners constantly trying to get rid of all the humans. Which is what this really seems to be about.

golden.number
Posts: 17
Joined: Wed Apr 29, 2015 4:08 pm UTC

Re: 1737: "Datacenter Scale"

Postby golden.number » Fri Sep 23, 2016 4:34 pm UTC

I love Asimov's "Let There Be Light". For me it's up there with Clarke's "The Nine Billion Names of God".

Tub
Posts: 410
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: 1737: "Datacenter Scale"

Postby Tub » Fri Sep 23, 2016 4:57 pm UTC

gmalivuk wrote:That's not obvious to me. How much does it cost to have reduced capacity while all that diagnosis is happening? How much does it cost to pay the technician to do it?

At the very least, synp's suggestion that the defective server is immediately replaced and *then* diagnosed makes a lot of sense.

One approach is to have a pile of unused servers. The faulty server is immediately replaced with a new one to get back to capacity. That requires someone to be on-site 24/7 to switch servers, and there's still some downtime.

The other approach is to have a pile of *used* servers, conveniently placed in a rack, already running and operational. If a server fails, that's not a problem, because a new one kicks in even before the technician realized what happened. Replacements and repairs can be done as time permits.


At least for homogeneous servers, the second one seems smarter. When a shipment of new servers arrives, you will want to do a burn-in anyway, and the best place for that is not a lab bench, but a proper rack inside a proper datacenter. I don't see where you'd save money by removing the servers after the burn-in, just to put them onto a lose pile of unused servers for future replacements. But again, I don't run a datacenter, and I'd love to hear more substantial input.


Oh, and the cost ratio between parts and human work is slightly different for light bulbs and servers. Slightly.

User avatar
vodka.cobra
Posts: 371
Joined: Thu Mar 27, 2008 6:50 pm UTC
Location: Florida
Contact:

Re: 1737: "Datacenter Scale"

Postby vodka.cobra » Fri Sep 23, 2016 5:48 pm UTC

I've found that any discussion about scaling will inevitably result in someone suggesting to use MongoDB. Linked video/transcript is a hilarious rebuttal to these suggestions.
If the above comment has anything to do with hacking or cryptography, note that I work for a PHP security company and might know what I'm talking about.

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: 1737: "Datacenter Scale"

Postby korona » Fri Sep 23, 2016 6:15 pm UTC

Do datacenters use RAID storage attached to compute nodes at all? I suspect they only attach a tiny (non-RAID) HDD/SSD to each node (for fast node-local caching aka /tmp) and use dedicated NAS hardware for durable storage. In which case throwing away a NAS node does not make sense at all.

DanD
Posts: 316
Joined: Tue Oct 05, 2010 12:42 am UTC

Re: 1737: "Datacenter Scale"

Postby DanD » Fri Sep 23, 2016 6:27 pm UTC

Tub wrote: Oh, and the cost ratio between parts and human work is slightly different for light bulbs and servers. Slightly.


Yeah. The technician who can diagnose and replace parts in a server is considerably more expensive.

DanD
Posts: 316
Joined: Tue Oct 05, 2010 12:42 am UTC

Re: 1737: "Datacenter Scale"

Postby DanD » Fri Sep 23, 2016 6:29 pm UTC

korona wrote:Do datacenters use RAID storage attached to compute nodes at all? I suspect they only attach a tiny (non-RAID) HDD/SSD to each node (for fast node-local caching aka /tmp) and use dedicated NAS hardware for durable storage. In which case throwing away a NAS node does not make sense at all.


That sounds like a very expensive way to build a data center. It's simpler and cheaper to buy off the shelf servers with both.

User avatar
Moonfish
Posts: 34
Joined: Sat May 24, 2008 10:40 am UTC
Location: San Diego, California

Re: 1737: "Datacenter Scale"

Postby Moonfish » Fri Sep 23, 2016 6:59 pm UTC

jc wrote:It turns out that replacing single failed bulbs wasn't time- or cost-effective. When one tube fails, chances are that more of the same type and age will also soon fail. So if you're sending a team with tubes and ladders to replace a failing tube, it's more practical to replace all of them in the same ceiling.


In my modeling and simulation class we were given a programming assignment to find if replacing non-burned out florescent tubes was cost effective. It turned out that because the tube's lifespan is an exponentially distributed random variable rather than a bell-curve random variable, it was never cost effective to replace a working bulb.

Here is a link to the assignment: http://www.cs.ucr.edu/~mart/177/CS177_Mid_S03_ans.pdf
The question is on Page 4:
Briefly explain why changing the second lightbulb in a fixture before it burns out cannot reduce
the maintenance costs in this model.


Here is my prof's sample answer:
The key factor is that the life expectancy for light bulbs has an exponential distribution, which is
memoryless. That means the remaining lifetime for a working bulb currently in a fixture is the the
same as the total lifetime for a new bulb that could be installed in the fixture. Therefore, you get
zero extra time, on average, before that light bulb burns out by replacing it before it fails. Since
the cost of the replacement is larger than zero (for both materials and labor) and the benefit is
exactly zero, this is clearly a waste of time.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26546
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: 1737: "Datacenter Scale"

Postby gmalivuk » Fri Sep 23, 2016 7:15 pm UTC

The assignment uses an exponential distribution. That doesn't mean an exponential distribution is accurate.

Edit: In fact it appears to be factually inaccurate. See figure 8 here.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

YTPrenewed
Posts: 75
Joined: Tue Sep 07, 2010 12:09 am UTC

Re: 1737: "Datacenter Scale"

Postby YTPrenewed » Fri Sep 23, 2016 8:37 pm UTC

Is this by any chance a parody of the increasing popularity of "it is beyond saving" reasoning used in the context of politics? (Britain abandoning the EU instead of trying to fix it comes to mind...)
Last edited by YTPrenewed on Fri Sep 23, 2016 8:46 pm UTC, edited 1 time in total.

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: 1737: "Datacenter Scale"

Postby korona » Fri Sep 23, 2016 8:43 pm UTC

DanD wrote:
korona wrote:Do datacenters use RAID storage attached to compute nodes at all? I suspect they only attach a tiny (non-RAID) HDD/SSD to each node (for fast node-local caching aka /tmp) and use dedicated NAS hardware for durable storage. In which case throwing away a NAS node does not make sense at all.


That sounds like a very expensive way to build a data center. It's simpler and cheaper to buy off the shelf servers with both.

Well it certainly seems to work good enough for AWS, Google Cloud Platform and the other PaaS providers.

rmsgrey
Posts: 3482
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1737: "Datacenter Scale"

Postby rmsgrey » Fri Sep 23, 2016 9:54 pm UTC

golden.number wrote:I love Asimov's "Let There Be Light". For me it's up there with Clarke's "The Nine Billion Names of God".


Also known as the Asimov story that people never remember the title of. For the record, the official title is "The Last Question".

Monster_user
Posts: 9
Joined: Mon Nov 29, 2010 2:35 am UTC

Re: 1737: "Datacenter Scale"

Postby Monster_user » Sat Sep 24, 2016 1:24 am UTC

I'll agree, throwing away an entire machine, because a single drive failed, is kind of point less. You've still got to maintain a minimal level of redundancy. And any equipment designed for scale has front swappable drive bays, and replacement drives can be ordered at an effective cost, with the drive "caddy", to make it a simple swap in and out. The cost of a single drive is continually dropping, and remains below the cost of a replacement machine. Unless we are talking grids of consumer devices, which are too costly to service at a large scale.

At a large scale, it doesn't make sense to swap the mounting hardware, just throw the drive and mounting hardware away, and buy the drives in the mounting hardware. Which *might* be an example at a smaller scale than is aluded to here.

The only way it makes sense if the failure rate of drives was less than the growing demands in computing power, so that by the time the first drive fails (1 to 2 years) the hardware is incapable of handling the workload. Either that, or their inventory management is so bad, they just can't find the damn things to replace a drive.

The cost of unboxing, racking, cabling, configuring, and (re)-loading Operating Systems to integrate with the abstraction layer, is significantly more than the credit card swipe ease of replacing a failed drive in a RAID array. The downtime is the same either way, 0.

xtifr
Posts: 336
Joined: Wed Oct 01, 2008 6:38 pm UTC

Re: 1737: "Datacenter Scale"

Postby xtifr » Sat Sep 24, 2016 1:49 am UTC

The comparison with light bulbs doesn't really work, because with light bulbs the location is important. If I'm running a city, and a street lamp on 3rd Ave burns out, I can't just turn on a newly built lamp on 174th Ave and say, "hey, the total illumination in the city is the same as before, so we're fine!" The people who actually live on 3rd Ave are likely to get upset.

With servers, nobody cares if you just leave the dead ones in place. So you can want till a whole set (rack, row, datacenter, planet, whatever) has died before recycling the space. And there's definitely no "we should replace these others because they might die too" factor. Nobody cares if they might die. Wait till they do. The only thing you need to do is make sure that you're provisioning new servers slightly faster than the sum of new-capacity-needed + old-server-death-rate.
"[T]he author has followed the usual practice of contemporary books on graph theory, namely to use words that are similar but not identical to the terms used in other books on graph theory."
-- Donald Knuth, The Art of Computer Programming, Vol I, 3rd ed.

ijuin
Posts: 934
Joined: Fri Jan 09, 2009 6:02 pm UTC

Re: 1737: "Datacenter Scale"

Postby ijuin » Sun Sep 25, 2016 1:39 am UTC

DanD wrote:
Tub wrote: Oh, and the cost ratio between parts and human work is slightly different for light bulbs and servers. Slightly.


Yeah. The technician who can diagnose and replace parts in a server is considerably more expensive.


If it takes one man-hour to replace a hard drive, counting from the moment the server is pulled from the rack until it is back in service again, then the company has spent more money on the technician than on the hard drive.

Monster_user
Posts: 9
Joined: Mon Nov 29, 2010 2:35 am UTC

Re: 1737: "Datacenter Scale"

Postby Monster_user » Sun Sep 25, 2016 5:33 pm UTC

ijuin, the server is never pulled from the rack. If the server is running a mirrored RAID configuration, then it never even leaves service. Otherwise the tech has to rebuild the RAID, and then remirror the contents from another server.

Servers these days are also smarter, and can self-diagnose and order replacement drives when they fail. So the drive arrives later, and then the technician replaces the drive like swapping toast in a toaster.

The drives are front loading, like floppy disks. So the server is never moved, only a latch on that drive bay. Just make sure it is the one with the red light, not the green light. Any IT monkey in a datacenter can diagnose and replace a failed drive. That is as basic as it gets.

Western Rover
Posts: 22
Joined: Mon Aug 24, 2009 2:23 pm UTC

Re: 1737: "Datacenter Scale"

Postby Western Rover » Mon Sep 26, 2016 5:05 am UTC

rmsgrey wrote:
golden.number wrote:I love Asimov's "Let There Be Light". For me it's up there with Clarke's "The Nine Billion Names of God".


Also known as the Asimov story that people never remember the title of. For the record, the official title is "The Last Question".


I remember the name only because of the amazing planetarium show I saw in 1980 based on the short story. Each of the different sections of the story had different scenery projected on the dome, with a different kind of special effect in each section to depict the _AC (e.g. a moving laser for the Cosmic AC). The other characters were mostly not shown, only heard. The final "Let There Be Light" was a flash at the top of the dome and stars and galaxies pouring down.

The show spurred me into finding more Asimov at the library, first his Robot stories, and later Foundation.

ijuin
Posts: 934
Joined: Fri Jan 09, 2009 6:02 pm UTC

Re: 1737: "Datacenter Scale"

Postby ijuin » Mon Sep 26, 2016 4:06 pm UTC

Monster_user wrote:ijuin, the server is never pulled from the rack. If the server is running a mirrored RAID configuration, then it never even leaves service. Otherwise the tech has to rebuild the RAID, and then remirror the contents from another server.

Servers these days are also smarter, and can self-diagnose and order replacement drives when they fail. So the drive arrives later, and then the technician replaces the drive like swapping toast in a toaster.

The drives are front loading, like floppy disks. So the server is never moved, only a latch on that drive bay. Just make sure it is the one with the red light, not the green light. Any IT monkey in a datacenter can diagnose and replace a failed drive. That is as basic as it gets.


Since when are hard drives EVER hot-swappable? I was trained that one ALWAYS does a controlled shutdown and power-off before swapping, or else the new drive will be damaged.

User avatar
Soupspoon
You have done something you shouldn't. Or are about to.
Posts: 3730
Joined: Thu Jan 28, 2016 7:00 pm UTC
Location: 53-1

Re: 1737: "Datacenter Scale"

Postby Soupspoon » Mon Sep 26, 2016 4:33 pm UTC

ijuin wrote:Since when are hard drives EVER hot-swappable? I was trained that one ALWAYS does a controlled shutdown and power-off before swapping, or else the new drive will be damaged.
That's a feature of a RAID rack (or bay within a server-front). The extraction of a drive (working or not, but typically only when failed anyway) is dealt with by the inbuilt RAID controller (and possibly an electromechanical two-stop breaker to cleanly unmount the power/data connectors as you pop the drive handle out prior to pulling it).

Then the new, 'virgin', drive is detected on insertion and spin-up, interrogated, preformatted (if not coming that way from the manufacturer) and then the remirroring or suitably complementative XORing is done to bring it into the 'family' like it had always been there.

But I think (with exceptions) that desktop mobo RAID controllers still prefer you to not hot-swap without a power-down, and the behaviour of a system with unraided drives to hotswapping 'internal' drives (or external ones without the OS being told to unmount/eject and let you know it is done) whilst working can be... unpredictable. Especially as popping a power and/or data connector manually can randomly disconnect one of the parallel connections momentarily before another, depending upon how so grasped and split, unless specifically designed to slip off the 'vulnerable' pins first and the 'maintenance' ones last.

ETA: That said, when the company I was working for had the first ever RAID replacement, we did tell the 200-odd employees to log off of the (as it was) Novell server whilst we did the swapping with only Admin logins allowed on, testing for data continuity, waited for it to resynchronise, then restarted it anyway for good measure. But half a dozen new drives later (probably a year or five after that first, it wasn't a massive datacentre, just a self-sufficient satellite site) we did indeed pop the drive with the red light glowing, push the ready-replacement in, order the replacement-replacement then retroactively fill out the Change Control Form we used to track all system 'changes', and if we told the userbase at all it was likely as casual chat in the lunchroom the next day.

User avatar
ShuRugal
Posts: 75
Joined: Wed Jan 26, 2011 5:19 am UTC

Re: 1737: "Datacenter Scale"

Postby ShuRugal » Mon Sep 26, 2016 5:40 pm UTC

Pfhorrest wrote:In the scenario in the first panel: OK, so maybe RAID is pointless because of higher-level redundancy between different machines, but if it comes down to buying a new machine, or buying a new HD, how is buying the new machine actually more efficient?


If the drive is internal to the machine, you have to pull the machine out of the rack to get to the drive. Then you have to open it. Then you have to remove, the drive. Then you have to do it all in reverse to put it back in and test it. If it turns out that the drive controller on the mobo is what died, you still have to replace the machine, and you just paid some mobile screwdriver $50/hr for two hours (+ HDD cost) to find this out.

For this reason, though, rack-mount machines almost universally have drive bays that can be accessed from the front without unracking the machine. any modern mirroring RAID configuration will even allow you to hot-swap drives, so the above scenario really doesn't apply to anyone doing anything at the datacenter scale. If the RAID controller dies, you will know it is dead because all of your drives will cease to operate, instead of just one. You will be able to verify that it is dead by consoling in and asking it, as well.

Though the comic does accurately parody the kinds of extreme ideas that can result from allowing the guys in finance to make IT decisions.

ps.02
Posts: 378
Joined: Fri Apr 05, 2013 8:02 pm UTC

Re: 1737: "Datacenter Scale"

Postby ps.02 » Mon Sep 26, 2016 10:59 pm UTC

ijuin wrote:Since when are hard drives EVER hot-swappable? I was trained that one ALWAYS does a controlled shutdown and power-off before swapping, or else the new drive will be damaged.

To expand on what others have said: your training was correct - for older consumer-grade disks - specifically, MFM and ATA (aka "IDE"), with that frustrating Molex power connector and 40- or 80-wire ribbon cable. Higher-end disk standards like SCSI, LVD SCSI, SSA, and Fibre Channel have pretty much always supported hot-swap to some degree, and server chassis with drive bays designed for swapping a disk without opening the rest of the case have been common for ages. Even ATA has officially supported hot-swap since the 2003 update known as Serial ATA or SATA. (Though in practice, proper hot-swap support was spotty in some early SATA implementations.)

Monster_user
Posts: 9
Joined: Mon Nov 29, 2010 2:35 am UTC

Re: 1737: "Datacenter Scale"

Postby Monster_user » Tue Sep 27, 2016 3:29 am UTC

Anybody else's shocked that a "mobile screwdriver" gets paid $50 for two hours? As far as I am aware, they make $9 an hour (USD) around these parts.

Daggoth
Posts: 51
Joined: Wed Aug 05, 2009 2:37 am UTC

Re: 1737: "Datacenter Scale"

Postby Daggoth » Tue Sep 27, 2016 4:06 am UTC

"mobile screwdriver"

Thats harsh lol they actually do more stuff than just screwing/unscrewing stuff

User avatar
squall_line
Posts: 169
Joined: Fri Mar 20, 2009 2:36 am UTC

Re: 1737: "Datacenter Scale"

Postby squall_line » Tue Sep 27, 2016 2:17 pm UTC

Wee Red Bird wrote:Sell it to a company that wants a prebuilt data centre that doesn't have to be state of the art, or start ripping out the old kit to work on replacing the other data center in 5 years time.


Or rip out the old kit and sell the hard drives on Amazon/Newegg/eBay as "new" with a steep discount compared to actually new. It's one of the most frustrating parts of trying to bargain shop for RAID drives as a casual / power user. Thankfully user reviews usually shut those clowns down before they do too much damage to honest people who are looking for a robust drive for home use.

Monster_user wrote:Anybody else's shocked that a "mobile screwdriver" gets paid $50 for two hours? As far as I am aware, they make $9 an hour (USD) around these parts.


I believe ShuRugal said $50/hr, so $100 total. And that likely includes salary plus benefits, not just salary.


Return to “Individual XKCD Comic Threads”

Who is online

Users browsing this forum: Exabot [Bot] and 28 guests