Page 1 of 1

What is a good language for a scientist?

Posted: Fri May 28, 2010 7:32 am UTC
by redgrowth
I am not a complete noob to programming and CS. I was a CS major who switched to chemistry. I took the AP CS in highschool, and I've taken a college course about assembly and computer hardware. So I have some knowledge about data structures, and the behind the scenes operations of computers. It's been a while since I've done any coding, so while I'm not starting from scratch I don't remember much. Most of my (little) experience is with Python and Erlang. What I want is a language where I can get things done, and fast. The more abstract, the more high level, the better.

I'm anticipating I will need to write programs that do data crunching. I also want to write programs that interact with a computer to automate some tasks. This could mean searching the screen for certain images, and simulating keyboard and mouse input. Another task I want to do is to take chunks of text and make sure that they are formatted in a very specific way (citation checker). I also might need to write programs to control robots, and to do 3D modeling.

So I want a language that can easily do those tasks, but will also grow on me and facilitates solving other typical scientist type problems. Basically, I want a real language and not some VB for scientists type thing. I don't care about the paradigm is as long as it doesn't get in the way of what I am trying to do. I just want a a powerful, flexible, easy to use language. Whatever the language is it must have an interactive interpreter.

What are your recommendations? Also, does this particular language have a tutorial geared towards solving scientific type problems? Much thanks in advance.

Re: What is a good language for a scientist?

Posted: Fri May 28, 2010 7:49 am UTC
by Divinas
I would recommend Haskell and Python. You are asking for a quite broad spectrum of functionalities, for which Python is great - it has a LOT of libraries, for just about anything you could ever want. Haskell on the other hand is awesome for number crunching - you can code quite complex mathematical problems very fast, and it's performance matches low-level languages. I would also suggest getting a tutorial on regular expressions - they're kind of specific, but are useful just about everywhere where you need string manipulation (for example "take chunks of text and make sure that they are formatted in a very specific way (citation checker)"). They also both have interactive shells that you can use.

Re: What is a good language for a scientist?

Posted: Fri May 28, 2010 9:08 am UTC
by thoughtfully
Python fits your bill pretty well, but you will probably find that you will need some more diversity. A lot of visualization is done in C++. The established code base and practicing-researcher base of Fortran is huge. Python is a great start, though. There's a lot of scientific support for it. It doesn't even have to be slow, with cool stuff like the numpy or ScientificPython libraries. And coding your numerically intensive stuff in C and talking with it isn't so hard, if the other approaches aren't sufficient.

A more general note. When you start mucking around with the display and mouse/keyboard input, you are probably taking the wrong approach. That path leads to brittle, impossible to maintain code that will cause you a lot of headaches. If you ever get to that point, you really need to rethink your requirements and available tools to see if there isn't some other way, or if it's worth the trouble.

redgrowth wrote:Basically, I want a real language and not some VB for scientists type thing.

*cough*Matlab*cough*

Re: What is a good language for a scientist?

Posted: Fri May 28, 2010 9:40 am UTC
by You, sir, name?
thoughtfully wrote:
redgrowth wrote:Basically, I want a real language and not some VB for scientists type thing.

*cough*Matlab*cough*


Unfortunately, Matlab is pretty much industry standard. It's very convenient for numerical calculations, while general purpose languages are typically not.

Re: What is a good language for a scientist?

Posted: Fri May 28, 2010 12:19 pm UTC
by Jplus
Yes, MATLAB is an industry standard, but it is not a general purpose language as the OP requested.

Here's what I think: Python can do pretty much everything you want, in the convenient high-level style that you ask for... except when you want to do number crunching that is really computationally expensive, when you want to do fast real-time calculations, or when you want to program nonstandard hardware. In those cases you can use C++ with the Boost libraries. The Boost libraries offer nearly everything you'd possibly want for math (Boost.Math, Boost.uBLas, Boost.Random, etc.), for interfacing with Python (Boost.Python), and for technical issues that you could meet at some point (like multithreading and interprocess communication). You get all this Boosty beauty for free and in one download + installer run. Both Python and C++ have a very real-worldish character and together they make a very powerful combination. Also, you can mimic the thread management that you know from Erlang in C++ with the FastFlow framework.
So I would suggest that you use Python and C++, with Boost and maybe FastFlow.

As for a few other languages that could be suggested or that you might consider for some other reason, but which I think you can leave aside:
Ruby is pretty much like Python. The difference mostly seems to be the "taste" of it. Some people are so much in love with Ruby that they evangelise it. However, Python has a much larger userbase and because of that probably also more library support, and I don't know of any Ruby-C++ interface like Boost.Python.
C is by some advocated over C++. You might at some point need C because for some hardware there is no (good) C++ compiler available, although C++ itself is already much wider supported than most other programming languages. C++ was meant to be a better C and I think it is, mostly because it is stricter (and therefore safer) on the low level while it allows the programmer to use more different techniques and paradigms next to each other on the high level. Also, there is no Boost or FastFlow for C (although most Boost functionality can be obtained for C through other libraries). Fortunately, if you ever happen to program for a platform not supported by C++, C won't be hard to learn if you already know C++.
Haskell is, as has already been said, very convenient for programming computations, and also faster than Python. Still, it's not as fast and memory-efficient as C++ (or C). In general, it's also somewhat less real-worldish, although it certainly deserves the label "general purpose". When operating mostly on abstract computational problems it can be a perfect solution, but it has no place in the Python-C++-Boost "power of three".
Fortran is an old rock star (in science). It also has the health issues and the wrinkles of an old rock star. It has good library support for computational stuff, but it's not as convenient as Python (or in fact also C++), it's less powerful than C++, and it's not very real-worldish anymore anyway.
Java may be suggested to you because it has many fans. It's faster than Python but slower than C++. It's less flexible than both and takes more memory than both. It doesn't offer any library support that's not available in Python or C++, certainly not in the fields that you'll probably be programming in. Concluding, I think it's actually least worth considering of all.

Re: What is a good language for a scientist?

Posted: Fri May 28, 2010 5:14 pm UTC
by Berengal
One more vote for Haskell here, with Python coming in a second.

Haskell makes it very easy to abstract things, not to mention using those abstracted things as if they were built into the language. This makes it very convenient for working with your own objects types, e.g. vectors, matrices etc. since you don't have to use notations that look weird. For example, you could be adding vectors using '+' and multiplying matrices using '*'. Additionally, Haskell is a pure, declarative language, which means functions are real relations, not just procedures, which makes translating mathematical formulas to code much nicer. It's not the most popular language, so it doesn't have the number of libraries available other languages do, but it does have a bunch, including many scientific ones. It also has a very nice interface with C should you need to do some nitty gritty low-level stuff. Since it's a pure language, it's very easy to parallelize things in it. One way is through annotations, where you basically say "if you're evaluating this value, it may be beneficial to evaluate this other value in parallel". Since purity guarantees that a value will always be the same, regardless of when it's evaluated, the annotation will just create a "spark", which may or may not be picked up by a background worker thread. If it's picked up it will be evaluated in the background, if it's not picked up and evaluated before it's needed, the thread that needs it will evaluate it as usual, and if it turns out the value is unneeded before it's evaluated, the spark will be garbage collected and not evaluated at all. Another, experimental way of doing parallelism is through nested data parallelism, which is like flat data parallelism (found mostly in FORTRAN implementations on steroids, and is basically just running loops in parallel with very little concurrency overhead) only nested (so nested loops, or parallely recursive functions). It's experimental, but pretty neat nevertheless.

Re: What is a good language for a scientist?

Posted: Fri May 28, 2010 6:26 pm UTC
by You, sir, name?
Jplus wrote:Yes, MATLAB is an industry standard, but it is not a general purpose language as the OP requested.


I think it's going to be hard to find a good general purpose language for a scientist (unless that is computer scientist). If you spend large amounts of time doing things that are not strictly relevant, it's by definition ill suited and something not many scientists will use. But if you can't easily do those irrelevant things, it's not really a general purpose language.

The notion is inherently flawed. It's like asking for a jet fighter that is suitable for transporting cargo containers.

Re: What is a good language for a scientist?

Posted: Fri Jun 04, 2010 5:57 pm UTC
by redgrowth
So I've read the posts, given this some thought and done more research into Python and Haskell. Python does have a huge amount of chemistry specific libraries. Haskell does not, but seems sexier (lazy evaluation, monads, yum!), and does data crunching faster.

Since Matlab is an industry standard, does Python or Haskell integrate well with Matlab? I've heard of SAGE. Does anyone know if it is a genuine Matlab replacement?

What are the advantages of Haskell over Lisp? Neither of them have great science libraries, and both seem to have similar run times. It seems to me that if I was going to go with an abstract language, I would want the most abstract language possible, and what is more abstract than Lisp/macros?

Re: What is a good language for a scientist?

Posted: Fri Jun 04, 2010 6:13 pm UTC
by stephentyrone
redgrowth wrote:Since Matlab is an industry standard, does Python or Haskell integrate well with Matlab? I've heard of SAGE. Does anyone know if it is a genuine Matlab replacement?


Sage doesn't try to be a Matlab replacement, and isn't. The free alternative to Matlab is Octave.

Sage is capable of doing many of the things that Matlab does, but in usually not in the same way as you would in Matlab. Octave hews much more closely to Matlab's usage model, not only its capabilities.

Re: What is a good language for a scientist?

Posted: Fri Jun 04, 2010 6:27 pm UTC
by Berengal
redgrowth wrote:What are the advantages of Haskell over Lisp? Neither of them have great science libraries, and both seem to have similar run times. It seems to me that if I was going to go with an abstract language, I would want the most abstract language possible, and what is more abstract than Lisp/macros?
Which is "more abstract" is an impossible question to answer. Advocates of each respective language will be able to come up with different reasons for why their language is more suited to abstraction than the other.

I'm a Haskell advocate. My reason for liking Haskell is mainly it's ability to deal with abstractions, and how it enables me to deal with them. Referential transparency helps on a very basic level; it's harder to make your abstractions leak since there's no hidden mutable state that could depend on execution order or composition order (I believe this is the same old argument about mutable state, except rephrased to use the word "abstraction").
However, the most helpful feature in dealing with abstractions in Haskell is hands down the type system, and this is the main reason why I like static typing, not the safety aspect and certainly not the optimization potential (though both are neat bonuses). It's amazing what you can abstract given a type system on the level of Haskell's, and how you can deal with those abstractions without getting lost, and how you can recognize them.

Re: What is a good language for a scientist?

Posted: Sat Jun 05, 2010 2:36 am UTC
by mouseposture
I would nominate Python, with the Numpy & Scipy libraries, to fill the Matlab niche. Weaker in some ways, stronger in others -- and one of the strengths is Python's better considered as a general purpose programming language. At least I think so, and I don't think my opinion's particularly eccentric.

Re: What is a good language for a scientist?

Posted: Sat Jun 05, 2010 3:51 am UTC
by cogman
You've described the need for more then one language.

Scripting type stuff: Python will do nicely here.
Number crunching: do NOT use python here. C/C++ is probably the best solution (barring full on assembly). C/C++ have way more high speed number crunching libraries then any other language out there.
Image recognization: again, C/C++ all the way. You need speed for this kind of task, not cutesy stuff like python.
Keyboard mouse simulation: It depends, pretty much any language capable of hooking into the OS api can do this.

One thing you have to accept, the faster something is, the lower level it will be. You can't have high abstraction and speed.

I personally recommend staying away from python for speed critical code. I've yet to see a "fast" python application. Yet, fast c/c++ applications can practically be written in your sleep. You'll lose the scripting capability, but gain a very strong language with tons of support.

So go C++, use the GMP along with MPC library, that will take care of most math you need and do it faster then anything else out there. And make sure you use good algorithms.

Re: What is a good language for a scientist?

Posted: Sat Jun 05, 2010 4:26 am UTC
by thoughtfully
cogman wrote:Number crunching: do NOT use python here. C/C++ is probably the best solution (barring full on assembly). C/C++ have way more high speed number crunching libraries then any other language out there.

Not so fast. Developer time still often trumps computer time, especially when you're a part-time, nonprofessional coder. Python is a very popular choice for scientific applications, from the National Labs on down. A lot depends on the data sets you're working with, for one thing. Not everybody has the terabytes per minute a big detector at a particle accelerator would be working with.

Google is a corporation, but they still do quite a lot of highly technical stuff, and are well known for their extensive use of Python (they even hired Python's creator, Guido van Rossum). Yes, mostly scripting type applications, but I think characterizing it as "cutesy" is really unwarranted.

PS. Please don't take seriously the suggestion to try assembly language!

Re: What is a good language for a scientist?

Posted: Sat Jun 05, 2010 1:24 pm UTC
by cogman
thoughtfully wrote:
cogman wrote:Number crunching: do NOT use python here. C/C++ is probably the best solution (barring full on assembly). C/C++ have way more high speed number crunching libraries then any other language out there.

Not so fast. Developer time still often trumps computer time, especially when you're a part-time, nonprofessional coder. Python is a very popular choice for scientific applications, from the National Labs on down. A lot depends on the data sets you're working with, for one thing. Not everybody has the terabytes per minute a big detector at a particle accelerator would be working with.

Google is a corporation, but they still do quite a lot of highly technical stuff, and are well known for their extensive use of Python (they even hired Python's creator, Guido van Rossum). Yes, mostly scripting type applications, but I think characterizing it as "cutesy" is really unwarranted.

PS. Please don't take seriously the suggestion to try assembly language!

I have yet to see python used in a non-cutesy way. Though, if this is truly a part time thing, then I agree that python looks like a better choice. However, I can't seriously recommend it for serious number crunching.

You don't have to be doing things that are particle accelerator worthy before you start to need a faster language. By the time you start doing things with image manipulation, or edge detection, any benefit that python would have given to speed of development goes out the window. Really complex scientific problems have solutions developed just as fast in c++ as they are in python. For example, a neural net. I contend that a hand written c++ version is going to be just as complex as a hand written python version will be. Python may save you from pushing a few keys, but in the end, the concept is what slows you down, not the language.

Oh, and assembly is a viable language that more developers need to learn. It finds its way into tons of high speed number crunching applications that aren't used for particle accelerators. GMP, x264, gcc, ect, all have some significant portions written in assembly.

Re: What is a good language for a scientist?

Posted: Sat Jun 05, 2010 6:23 pm UTC
by hotaru
another option might be factor... it's a lot faster than python for number crunching, and at least for me, beats python on the developer time thing too.

Re: What is a good language for a scientist?

Posted: Sat Jun 05, 2010 6:37 pm UTC
by 0rm
Perl - nuff said.

Re: What is a good language for a scientist?

Posted: Sun Jun 06, 2010 7:22 am UTC
by squareroot
I'd say Perl and MATLAB. Unless you really need the interactions with the computer, like screen captures, and mouse and keyboard input, MATLAB will be great. Straightforward syntax, interactivity is great, viewing data in different formats is very easy.

I never bothered to learn it, but I have a copy of Mathematica. How's that compared to MATLAB?

Re: What is a good language for a scientist?

Posted: Sun Jun 06, 2010 5:42 pm UTC
by redgrowth
Unless someone can give me a good reason not to, I think I'm going to learn Clojure. It's a dialect of lisp that runs on the JVM so I get the functional goodness of Haskell, the macroness of lisp, and access to all the Java libraries.

Re: What is a good language for a scientist?

Posted: Sun Jun 06, 2010 8:56 pm UTC
by danreil
squareroot wrote:I'd say Perl and MATLAB. Unless you really need the interactions with the computer, like screen captures, and mouse and keyboard input, MATLAB will be great. Straightforward syntax, interactivity is great, viewing data in different formats is very easy.

I never bothered to learn it, but I have a copy of Mathematica. How's that compared to MATLAB?


Mathematica's great for symbolic manipulation. It finds antiderivatives, solves ODE's analytically, finds limits and many other cool math things. It also has tons of built in functions for number theory, and other areas, and I think its fantastic for plotting and graphics. However, as a strict programming language, I find it kind of lacking. For loops are annoying, as is writing custom functions (at least I find it that way). Its also going to be much slower than most languages, even more than matlab which is already slow.

Re: What is a good language for a scientist?

Posted: Sat Jun 12, 2010 12:00 pm UTC
by cwebster
cogman wrote:I have yet to see python used in a non-cutesy way. Though, if this is truly a part time thing, then I agree that python looks like a better choice. However, I can't seriously recommend it for serious number crunching.


I work for a scientific computing company and we work principally with Python as our primary development language for all of our real-world, very "non-cutesy" apps. NumPy plus SciPy plus Cython is a *very* powerful combination. For example, in one of our recent apps we use Python and NumPy to do digital signal processing, visualization and analysis of multi-gigabyte files: the code executed almost as fast as equivalent C++ and we were able to develop it extremely quickly.

cogman wrote:You don't have to be doing things that are particle accelerator worthy before you start to need a faster language.


I know of at least one major particle accelerator project at a national lab which uses Python as its primary data analysis language. I wouldn't be shocked if there were others.

cogman wrote:By the time you start doing things with image manipulation, or edge detection, any benefit that python would have given to speed of development goes out the window. Really complex scientific problems have solutions developed just as fast in c++ as they are in python. For example, a neural net. I contend that a hand written c++ version is going to be just as complex as a hand written python version will be. Python may save you from pushing a few keys, but in the end, the concept is what slows you down, not the language.


In my experience, Python gets out of the way and lets you work on the concepts without having to worry too much about the language. Python does have the very significant advantage from the point of view of a scientific developer that you don't need to worry as much about memory management, typing and declarations.

When Python truly isn't fast enough in its raw state, there are a plethora of extension modules for scientific computing available, its fairly easy to wrap existing C or C++ code in Python, and you can speed up Python code by converting tight loops to Cython. And if worst comes to worst, you only need to write a little bit of C or C++, rather than the whole program in it.

Plus there is a pretty good open source community built around scientific computing using Python (scipy.org).