Supercomputing basics

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

cspirou
Posts: 147
Joined: Wed Jun 11, 2008 4:09 pm UTC

Supercomputing basics

Postby cspirou » Thu Feb 10, 2011 8:10 pm UTC

I just wanted know some basics on the software involved for supercomputing.

Assume that the cluster works on Linux. How does the installation work? Does each node have the exact same setup? Or is there some kind of master node with the full setup with slave nodes that have a stripped down setup?

When using a supercomputer does the software interpret it as one large computer with a common hard drive and memory? Or is it treated more like a network where you have to indicate a specific computer to access?

Thanks for any info and forgive my ignorance.

Laguana
Posts: 49
Joined: Sat Jan 19, 2008 10:13 pm UTC

Re: Supercomputing basics

Postby Laguana » Thu Feb 10, 2011 9:35 pm UTC

I haven't done any supercomputing myself, but my university has a few set up and I've heard a bit about what they do. This is all from memory, and may be wrong etc.

The answer is basically "it depends". There are lots of kinds of supercomputers, some of which use a global address space (probably hierarchical, so the programmer can tell the difference between "here" and "further" since performance will matter), some of which use message passing on a more network-like setup (though with more interconnectivity than your conventional LANs so choosing which node to perform what task with what other nodes can be an important decision).

I think that most commonly they are heterogeneous, that is all the nodes are the same, but that is more for convenience than a requirement I think. I have seen some student presentations on research into load balancing on heterogeneous architectures, but it is still fairly young.

The master-slave setup is handy for some problems, but not for others. If you've built a supercomputer out of PS3s then you probably do have this kind of setup (The ~8 SPEs are very different compared to the PPU) but if it is a cluster if more conventional computers, then each core might be effectively identical.


Hope this gives you some idea of how things are.

letterX
Posts: 535
Joined: Fri Feb 22, 2008 4:00 am UTC
Location: Ithaca, NY

Re: Supercomputing basics

Postby letterX » Fri Feb 11, 2011 12:14 am UTC

Typically the word supercomputer tends to imply a rather more specialized sort of computer than just a bunch of linux boxes connected together, usually with some sort of unconventional processor architecture or network topology, so I think you actually want to know about distributed computing more generally.

Anyways, distributed computing is super awesome. There's actually a very large parameter space for designing distributed systems, so the answer to pretty much all your questions is that you can have a number of design choices, each of which is a tradeoff for how the system ends up behaving.

The question of each node having the same setup versus having a master node is definitely an interesting question. It's typically much easier to design for a master node that controls identical worker machines, but frequently distributed systems care a lot about fault tolerance, which is the property that even if some number of machines goes down, your system can continue running. The main problem with having a master node is that if that machine dies, everything dies. So more complicated systems will have the functionality of the master replicated across machines. Pushing this all the way to the other end of the spectrum, you get peer to peer systems (like Bitorrent) where all computers are equals, and may be running on completely different architectures, sometimes with completely different programs but still participating in the same distributed protocol.

As far as the distributed memory, there are systems where all processes share a flat address space, and the operating system does the work of making sure that all memory is actually being accessed coherently and efficiently. You also have models like NUMA (non-uniform memory access) where certain parts of the address space live on different machines, so will be faster or slower to access depending on where they live. You also commonly have communication over a network, where processes don't share address space, but send packets to exchange information. When you're running a peer-to-peer system, you of course can't share an address space, so you have to go this route.

This is really a huge field, so I've thrown a bunch of words at you, but distributed computing is really important to the future of computer science. So feel free to google some of the buzzwords I've just mentioned, and ask any questions that come up!


Return to “Computer Science”

Who is online

Users browsing this forum: No registered users and 26 guests