Why is Python slower than R.

Hello consuli,

thanks for the interesting contribution. Can you describe a little more precisely what kind of data set your R was overwhelmed with? Of course, I agree with you that R has been created for typical statistical problems (we have only examined a small sample and still want to derive general statements) and not for big data questions (how do I master all the data and separate the important from the unimportant) .

Since I never deal with large data sets myself, I have to ask again. Your example with 1 million customers to whom data from two other databases have to be joined doesn't sound that big at first. I would now have thought that R's own merge would manage it and the data.table on top of it would only laugh about it for a moment. But it seems that your example is more complex or extensive than it sounds in the brief description?

Of course, the canonical answer to the problem is: Let the DBMS do joins of data in the DBMS. I'm pretty sure that this is also the answer for the main competitor Python, but as a C ++ programmer you would probably also prefer to do it in SQL. Seen in this way, it does not speak against R as the currently best statistical solution.

consuli wrote:Then there are further technical programming restrictions that R is not a strictly structured object-oriented programming language

This is a very good indication, because it is of course a clear difference to the main competitor Python. Object orientation is R is solved in a very unusual way and the fact that there are two or three different object orientation systems is annoying (has Hadley Wickham actually already developed an object system for R? Maybe {tidy_obj}?). Python is designed from the ground up to be object-oriented and does it very nicely without becoming overly complex. Maybe it just seems like that to me because the penny for object orientation only fell with Python. But I also remember discussions that for some Java fans the object orientation of Python was not strict enough (encapsulation was the keyword, IIRC, but I can't get it right any more in detail). That already indicates that object orientation is also a matter of taste. Personally, I find that object orientation can quickly be overdone and that it is not necessary for projects of low and medium complexity, and who really writes extensive projects in a statistical language?

and does not allow GUI programming.

And I fully agree with you. You cannot write programs in R that look to ordinary users like programs will look like in this century. I think that the shiny hype has something to do with the fact that you can finally present something colorful that reacts to the mouse. But maybe I only underestimate Shiny because I haven't really looked at it enough. In any case, I don't see the bonus of Python in possible speed gains but in the fact that it is suitable both as a general purpose language and as a statistical and machine learning language. If anything, it could pull me in that direction at some point. At the moment, however, nothing is moving.

Otherwise, R suffers from the burden of its age. A lot of garbage has accumulated. A fresh, young, new language to be created could clear up some of the syntax (is it na.rm = TRUE or omit.na = TRUE or method = "rm.na" or ...). Instead of standardization and simplification, however, the trend in R goes in a different direction. Hadley Wickham is currently persuading the community that the smallest meaningful data form is not the vector but the data frame, that functions must always have the data frame to be processed as the first argument, because then everything fits so nicely with the pipe that people now also assign in vignettes use their packages where they don't belong and you produce a number of coding styles that clearly contradict each other and for the most part are not based on earlier common usage (the period in variable and function names is extremely common in old functions and arguments, now should it be replaced by the underscore because people with a C ++, Java and Python background would otherwise get confused because it looks object-oriented to them? Little tip: These people can't learn early enough that object orientation looks different in R.)

All in all, R clearly wins for me at the moment. As jogo already wrote: Combination of fast programming in R and fast arithmetic in the parts that are implemented in FORTRAN and C.


Always program in such a way that the maxim of your programming style could be the basis of general legislation