Welcome to the first part of my office automation with python series! This tutorial series is targeted for people similar to my coworkers: individuals with little to no programming experience, and eager to apply it to the office to save time and effort. The techniques and tools that I will show are a compilation of things I use at my current job to help make me faster and more productive.
This tutorial is split into two parts: Installing Anaconda and using Jupyter Notebook.Anaconda consists of Python, a very useful tool called conda, and several python libraries and softwares useful for data analysis, including Jupyter Notebook. Conda is a tool that we can use to create virtual environments, allowing us to not have our base installation of Python cluttered with libraries that can potentially conflict each other.
Jupyter vs. Excel
“Why is Python and Jupyter so great? Excel does everything useful in my company!” Sometimes you deal with spreadsheets that are greater than 10MB, Excel can prove to be annoyingly slow. Python can deal with large sets of data way better in comparison to Excel. Python’s Pandas library also lets you run SQL-like operations on data in spreadsheets. You can filter, group, perform aggregate calculations, and join data from other spreadsheets. Unlike regular text editors or the command line, Jupyter lets you mix code blocks with markup blocks in a notebook so that you can write code and analysis in the same document. Jupyter also lets you display data in a neat format, which is something that other IDEs and command lines can’t do.
The first step to setup this environment is to install Anaconda, which is conda with useful data science libraries, python, and other various tools that are helpful. Later on in the tutorial series I will cover those tools. In this tutorial, I am going to focus on Jupyter Notebook and how it can make your office job easier!
You can download Anaconda here. Follow through and select all the recommended settings to install.
Using Jupyter Notebook:
After installing Anaconda, you should have Jupyter Notebook installed. Go to your start menu and search for Jupyter Notebook like so:
You should have a terminal window open up along with a web browser. Go into the web browser and notice how it shows a directory of your documents. Click the ‘New’ button and select ‘Python 3’ to create a new Python 3 notebook.
A Jupyter Notebook Example:
The best way to show how to use a Jupyter Notebook is to create one. I made one and hosted it on Github Gists. You can download it by clicking on the bottom border of the Github window below.
Excel is the standard tool for most office jobs, but it has its limitations. For example, lets say you had to get data from two spreadsheets and join them together, filter some rows out, and then apply a calculation. This could take quite some time to do since you would be forced to switch between two spreadsheets constantly. What if the data in the spreadsheets is so large, not all the rows could be loaded in a spreadsheet? The task becomes impossible.
This is where Jupyter Notebook and Python will thrive. These tools make it so that you are not limited by file sizes and should you need to do the same operations again, you already have a script made to do it. Jupyter Notebook can allow you to think in a more abstract way as well, since you are performing operations on data. Python also add more utilities that Excel doesn’t have. If this is a recurring task, you can take the Jupyter Notebook, save as a python file, and run it with Windows task manager to save yourself time. I’ll go over this in the next tutorial.
I hope that this tutorial has been an adequate intro to Jupyter Notebook. Hopefully you can save some time with Python, and have more time to relax like this guy: