Both R and Python are popular programming languages for statistics. While R was developed for statisticians, Python is a general programming language which emphasizes productivity and code readability with a syntax that is relatively easy to learn. Since it’s a full fledged programming language, Python is more flexible when you need to integrate your analysis with web apps or production code. Python does have its weaknesses: many modules that are available in R are not available in Python and visualization in Python is less straightforward than in R. You should choose the language that is more fitting to the problems you are trying to solve.
Many Linux and Unix distributions (the Mac OS is a version of Linux) come with Python pre-installed. To see if you have Python installed, open a terminal and type python. This will invoke the Python interpreter in the interactive mode. The interactive mode is not very sophisticated, but you can use it to quickly test code and run scripts.
If Python is not installed, you can download it here: https://www.python.org/
The version that comes with Mac OS might be out of date and doesn’t come with third party libraries.
You can follow the guide here to install the latest version of Python:
One of the biggest advantages of Python is the plethora of third party libraries available for data scientists. However, managing all the libraries with their different versions, dependencies, and compatibilities can be a huge headache. Anaconda is a platform that provides over 100 of the most popular libraries, as well as a streamlined environment manager to help you easily install, update, and manage most of your libraries! I can’t recommend it enough!
You can download Anaconda here:
Anaconda installs iPython which is a more sophisticated interactive python shell. In addition, it also installs the Jupyter application which makes it really easy to create iPython notebooks. IPython notebooks are similar to R notebooks and they are really, really cool. They allow you to combine code, texts, and figures in one interactive document. They are great for exploration, prototyping code and showcasing you results.
Browse this gallery of noteworthy notebooks:
Integrated Development Environment (IDE):
As you move forward from interactive sessions, you will need to use an editor (or an IDE – integrated development environment which integrates and ipython console, debugger, profiler etc. with the editor) to write and save python scripts and files. There are many options. Anaconda ships Spyder (https://pythonhosted.org/spyder/) which is a Python IDE. If you’re familiar with Matlab or R studio, this provides a similar feel. Personally I prefer to use PyCharm (https://www.jetbrains.com/pycharm/), but if you’re just starting out, Spyder should be adequate for now.
Python 2 or 3?
If you’re just starting out, it’s probably better to get started with Python 3. It’s been around since 2008 and is in active development. If you have been using Python 2 for a while you can create a Python 3 environment using Anaconda and switch between the 2 interpreters.
You can read more about making this decision here:
Instructions on how to create environments for different Python interpreters:
Once you have your environment set up, you should start learning Python. There are many resources available online. Personally, I found the book “Python for Data Analysis” by Wes McKinney very helpful.
This book assumes that you have some basic Python and coding knowledge. If you have no previous python knowledge, Google’s Python course is probably a good place to start.
To get started with the Python scientific computing ecosystem:
The Hitchhiker’s Guide to Python is also a great resource.