Is it worth learning R programming

R learning

What is r?

Compared to other popular statistical programs such as SPSS, SAS or Stata, R is a relatively new development. Almost 20 years ago, in 1992, R was made by the professors Ross Ihaka and Robert gentleman at the University of Auckland in New Zealand. Today the further development of R is from the so-called R core team carried out and / or organized. This is an ongoing process, so new, improved versions of R are released regularly. The current R version at the time this book was written was 3.0.2.

R is a free and open source Software (it's kind of like Linux among statistics programs). Free means that it can be downloaded and used for free. Furthermore, it is open source, so that it is possible to understand the programming of the software. The code that performs the calculations in R is not secret, but can be viewed by everyone.

R is still subject to a so-called Copy-left license (GPL2). This means that changes or extensions can be made to R, but these changes must be made available under the same license. Because of this, R can never become a paid product. This is a plus point, especially for a social area of ​​chronically scarce resources, such as university education.

In addition to the R program itself, there are over 3000 expansion packages for R. These packages provide additional, special functionalities for a wide range of areas of application (econometric, biogenetic, social science, etc.). They too stand i., D., R. freely available under a copy-left license. It is also possible to program extensions yourself and make them available to the community. Since many statistics experts around the world are doing this, the number of packages and thus the functionality of R is growing at a rapid pace at {fox_aspects_2009}.

The free availability combined with the immense range of functions have probably led to the fact that the number of R users and institutions that use R has risen sharply in recent years. R has managed to earn its place in the canon of recognized and established statistical programs. Many say that it is already used lingua franca statistics has become

Unlike e.g., B. SPSS does not have a menu interface for R through which commands can be clicked, because R is one Script language. This means that the commands used to control R are entered directly as code. Most readers are probably already familiar with this from the SPSS, SAS or Stata syntax. Furthermore, script language means that R's code is executed immediately after it is entered. So it is possible to work with R * interactive}. You will learn how to do this in Chapter, for example.

Why R?

This book is aimed primarily at psychologists and other social scientists. For them the question arises: “Does it make sense for me to learn R at all?”. Some advantages (free, open source, range of functions) of R have already been pointed out. But are these advantages really relevant to you personally?

I am not an advocate of the opinion that one should definitely learn R instead of a menu-driven program (e.g. SPSS). A program like SPSS is sufficient for at least 90.% of all use cases encountered as a social scientist. However, there are areas in which R offers significantly more than other programs. This concerns e.g. the graphic possibilities of R, which are impressive. Also new statistical methods are i., D., R. usually available much earlier in R than in commercial programs. Another important argument is that R is extensible, i.e. you can implement new procedures yourself. Overall, the possibilities that R offers are immense. Accordingly, the ability to deal with R will represent an increasingly sought-after competence in many fields. Especially if you intend to specialize in methods, evaluation or related areas, it will become more and more a basic requirement to master R. So it depends a little on your personal goals and inclinations whether learning R makes sense for them.
It is true that the effort to familiarize yourself with R is slightly greater than with a menu-driven statistics program. However, this additional effort more than pays off in the long term.

RStudio

Another advantage that affects all script-based statistics programs and thus also R is, in my opinion, of a didactic nature. It seems to me that by using scripts, one has to study the structure of data and how statistical analysis works. You feel closer to what otherwise happens behind the scenes of the program interface. This accelerates the understanding of statistical procedures.

Especially with R it is also important that the program conforms to the paradigm of Economy of output follows. With programs such as SPSS, SAS or Stata, the implementation of a statistical analysis often produces pages of output. This contains very extensive information, which may, however, be necessary for the respective analysis project. are not of interest and are therefore superfluous. In R this is different. For R beginners or those switching, it is often surprising that the output is very short and clear. Each additional analysis usually has to be requested by calling up another function. As a result, more commands have to be written for the same steps compared to other statistics programs. This approach is double-edged. On the one hand, it is easily felt to be unwieldy to have to request each analysis part separately. On the other hand, this means that the user has to deal more intensively with the (statistical) methods used. Because you can only request the analyzes in a targeted manner if you are clear about which evaluation options actually exist and how they can be used sensibly. Overall, in my opinion, this leads to a more comprehensive and deeper understanding of the analyzes carried out.

I would like to emphasize one last point, as it played an important role in my own learning path. There is a community around R - this means the users and developers of R - which is very active. On the one hand, the community provides information in many different ways. On the other hand, there is the possibility of contacting them for help via well-organized forums and mailing lists. I found this support to be very valuable when learning R and it made my learning path much easier.

How is it to learn R?

In addition to the hard facts why learning R can be time well invested, I would like to share below what it feels like for many social scientists to learn R. I rely on the reports of the students from my seminars as well as my own experience.

Especially for social scientists who have not yet learned a programming language, it is often new territory to deal with code instructions to the computer. Feelings of excessive demands, diffuse chaos etc. often appear on the learning path. It is important to see this at almost all the case is. However, this is not an indication that one is simply not suitable for this. Our brain just needs a while to create patterns for unfamiliar processes.

R is known for unfortunately having a very flat learning curve. The learning progress is usually quite slow and a little slow at the beginning. It will take some time to be able to move around safely in the program. And that is only possible with practice. This is probably the biggest challenge for the R beginner. When I give R courses, I always send this as a warning in advance: You just have to go through the first two days of the course. It is the initial hurdle that has to be overcome before it gets easier.

As a person, however, it is all too easy to avoid things that take several tries to tackle. A comment by the comedian Dr. Eckhart von Hirschhausen:

“Take an example from children learning to walk. With the attitude of many adults, they would have stopped after the first fall and said: No, I'm sorry, walking on two legs is not my thing. "

To learn R it is advantageous to adopt the child's attitude: “Does not work. Well then again! ”. There will be moments of frustration here: “How can it be that this is no longer possible? That was still possible ”etc. Learning R or any other programming language often puts you in the beginning where the computer doesn't understand you. It's frustrating but inevitable. However, if we wanted to rewrite it a little differently, we could also say: Learning R has the positive side effect of increasing your tolerance for frustration!

However, the moments when it suddenly works are all the more relieving and full of joy. Although this means that I, as a user, have only written a few lines of correct code, it just feels good. As if you have just achieved something great - it is a piece of effectiveness, like a puzzle that you have finally solved.

After a period of getting used to R, things slowly start to roll. It becomes more and more understandable why, what and how is done in R. The sequences of commands, which initially appear to be less than useful, are now easier to understand. You gradually realize that everything follows a scheme and that it's not about learning commands by heart. A typical sentence that can then be heard is as follows: “There is a logic to this. It's like a language! " Once you've got this insight, you've come a long way. At this point, where you start to see structure, things start to get easier and the benefits of a scripting language become more apparent.

The following excerpt from a student's homework illustrates very nicely how learning R can take place.

“At the end of this term paper, I would like to briefly refer to the use of R as a tool for statistical data analysis. When I started work, it was initially quite difficult for me to adjust to the program and to muster the necessary concentration and perseverance that is necessary to be able to work meaningfully with the program. Especially with Task 2, I was close to giving up at some point because I couldn't seem to get the right code off the ground.

Task 3, on the other hand, aroused the play instinct a little, so that it was fun to track down the hidden statements in the data. When doing this type of task, using R was fun because you work much closer to the syntax than with other programs like SPSS. In a way, you are better connected to the data and analysis, at least that's how I felt.

In conclusion, it can be said that R offers great possibilities, especially in the graphic area, which are definitely worth getting to know. In addition, you get a better understanding of syntax and data that remains hidden when using a click menu. Therefore I would personally recommend in any case to include R as a standard program in psychological methodology and to make it so much better known. "

(Student, Bachelor Psychology)

If we exclude socially desirable response behavior here, psychology students seem to perceive dealing with R as an enrichment despite initial difficulties and to appreciate its advantages.

R is a language

It has been pointed out several times that R is a language is - a programming language. That in the term programming language the word language is included, indicates fundamental similarities between the two. Let us see what this means for learning R by asking ourselves how learning a language works from the outside.

Often, learning a language begins with getting to know individual words and their meanings. For this purpose, you usually create a vocabulary book. The following are the basic grammatical rules. These determine how words can be linked to form statements. After a while you will be able to formulate a sentence, then many sentences and later whole contexts of meaning. Learning a programming language is basically similar: you first learn the functions and data types (words) and the syntax (grammar) and then you can formulate statements or instructions to the computer.

One difference, however, is that human language allows a great deal of blurring and a certain amount of errors without becoming incomprehensible. E.g., B. the phrase “Where's the car?”. Every listener can understand this sentence, but it sounds strange and is grammatically incorrect. However, unlike a human being, the R program is incapable of understanding what I mean when I am not speaking exactly is. R is like a strict language teacher who simply shrugs off a wrong sentence and asks you to rephrase it correctly without improving you.

The equivalent of a dictionary is available in R for learning the words (or functions) (the R help system, see chap.). However, the grammar must also be internalized, because adherence to the syntactic (grammatical) rules of R is central. Periods, commas and semicolons within an instruction have a specific meaning and also have a fixed place. This has to be learned first. The German language also has (clear) grammatical rules here, where e.g. a comma is set. Sometimes this is central to the meaning of the sentence:

Everyone thinks of themselves last.

Everyone thinks of himself, himself in the end.

Although the two sentences contain exactly the same words in the same order, they mean grossly the opposite of each other. This was done by just putting a single comma! Without question, this is more of an exception in the German language. In most cases, a comma only helps structure the sentence. We can usually understand many sentences correctly without a comma. I behave differently with the most important punctuation mark, the full stop. Leaving out a point makes understanding difficult at best; at worst it makes understanding impossible. In short, grammatical rules and especially the punctuation have a central function in the written language to enable us to understand.

Applied to R, this means that the syntax (grammar) must be adhered to precisely in order to express the right thing. Since R has no other way of opening up what is meant, it must be formulated precisely. Failure to adhere to the syntactic rules will then immediately lead to an error. Grammatical errors made in the context of a programming language Syntax error are the most common mistakes R beginners make. It is definitely worth checking again and again whether the brackets, dots, etc. are really in the right place.

The good thing about the R syntax is that it follows clear rules. R in this sense deserves as language to be designated. Here it differs in my opinion. strongly on the syntax of some other common statistics programs. Although it is also possible in these programs to formulate instructions in writing, it is very difficult to get the feeling that one could formulate any other instructions with a basic knowledge of the syntax. It becomes less clear what general rules the expressions are based on.

Working with R

When you start R for the first time, you may be a little disappointed. Because R with its powerful range of functions does not show more than an (almost) empty window when switched on, which wants to be filled with code instructions. As we just saw, the instructions and syntax have to be learned first. This means that working with R is different than e.g. with SPSS.

Working with SPSS is mostly done with the help of the graphical user interface. You can click through menus and submenus. This gives you an overview of what processes the software provides and what setting options are available. Since such a menu does not exist in R, this information must be provided elsewhere. This is done in the form of a written, standardized documentation, but only in English. All functions that can be used in R (regression, factor analysis, etc.) have associated documentation. This is necessary because it is simply impossible to keep the entirety of the functions and commands in mind. All available setting options that offer a function or a procedure - such as B. the number of factors to be extracted in the factor analysis - can be taken from and read in this documentation.

Consequently, when working with R, this means that instead of clicking through a graphic selection menu, one always reads the documentation got toto understand how individual functions are operated. Anyone who already has experience with a programming language knows that programming always means reading a lot of documentation. This initially takes some getting used to, but it quickly becomes a matter of course. Without the documentation you would simply be lost. For this reason, the help system of R, which is presented in the following chapter, is of central importance. Figure is a nice metaphor for working with R and the importance of documentation.