Written by Barry Grant. Posted in Static content
Why should I use R?
If you are new to R and Bio3D then a couple questions might naturally arise:
- What is R?
- What are the pros and cons of using R?
- Why use it instead of, say, a spreadsheet application or a application such as matlab?
R is an environment for data analysis
R is a powerful environment and programming language for the analysis of numerical data. While there are many other common applications that will allow you to manipulate lists of numbers (e.g., spreadsheet programs), R also allows for the easy calculation of a number of quantities, provides a powerful environment for performing numerical simulations and has fantastic graphics capabilities. Also R is free!
What R lacks in apparent user-friendliness, it more than makes up for in power. While there is certainly a learning curve associated with developing the skills you will need to perform analyses in R, this is really true of any advanced software package that you will use. Once you acquire some of the basics, you will find that using R is logical and simple.
The language used by R is a "dialect" of the S statistical programming language. To quote John Chambers (major contributor and developer of the S language), “S is a programming language and environment for all kinds of computing involving data. It has a simple goal: to turn ideas into software, quickly and faithfully.”
- R is a free software implementation of the S language (http://www.r-project.org)
- R was first developed by R. Gentleman and R. Ihaka (U of Auckland, NZ) during the 1990s
- R had developed into an advanced statistical computing system, freely available for most computing platforms.
- Updated versions are available every 3-4 months
The Pros and Cons of R
Pros include:
- Powerful, state-of-the-art
- Used by professional statisticians
- Lot of documentation
- Learn by example
- Easy to extend, Modify and improve with add-on packages
- Freely available for Unix, Windows & Mac
- Extendable, with numerous add-on packages available.
- Programmable: if R can’t do a particular task, you can program R to do it.
- R produces publication quality graphics.
R has a remarkable online presence in the form of help lists, tutorials, etc. which will facilitate solving the problems you inevitably run into in the course of your research. R represents the state-of-the-art in statistical computing.
Cons include:
- Not very easy to learn (many details)
- Easy to forget
- Sometimes forced to learn by example
- Documentation sometimes cryptic
- Not very (easily) interactive in the Excel point and click sense
- Command-based
- Still evolving: backward-compatibility has been an issue
- Slow at times when compared to dedicated C etc implementation for a particular task.
If you “just want to do basic statistical analysis” then its easy to find alternatives
If you intend to do exploratory data analysis such as protein structural bioinformatics and bioinformatics tasks including exprsssion analysis then its probably one of best options.
Why Not a Spreadsheet?
While a spreadsheet is handy for manually entering and viewing small amounts of data along with guiding basic calculations, it is not ideal for more advanced problems on large datasets.
For example, calculating an eigenvalue or numerically solving ordinary differential equations. These are a simple task in an environment such as R or Matlab, but do not exist (to the best of my knowledge in most common spreadsheet applications.
Why not Matlab
$$$$
PrintEmail