This post is updated on Jan 2019 following the version 2.0 release.
Google Sheets by itself is just another spreadsheet application much like Excel. With Google Apps Script, it becomes a spreadsheet application with VBA-like features but (in my opinion) is easier to use. With the multitude of add-ons, it can possibly be very powerful at the expense of the speed to load your spreadsheet.
However, there is a way to tap on to the powerful Python programming language to enable powerful data processing techniques, machine learning and deep learning data processing, and many more.
Python is one of the most popular programming languages out there that is very easy to learn. Couple with it the ability to automate the execution of the Python script, you can build plenty of powerful data tools, dashboards, alerting tool and much more.
Here, I will bring you through some simple steps to unlock the ability to use Python with Google Sheets.
Installing it is an easy pip install on your command line/terminal.
$ pip install pygsheets
After that, you need to get authorise the connection between Google Sheets and your Python script. To do that, simply follow the authorisation steps for Oauth Credentials that is clearly documented here.
You'll get a client_secret json file after that is done. Name it whichever way you want but keep it private.
The client_secret is a method to verify that you are connecting to Google Sheets and not anyone else. Anyone getting a hold of your client_secret is basically acting on your behalf.
Place your client_secret json file in the working directory. In your Python script, start with import and authorisation
import pygsheets gc = pygsheets.authorize(outh_file='client_secretxxx.json')
You will be prompted for a login.
Important! Make sure that the user that you use to get the Oauth credentials has access to the spreadsheet. Else, you'll get a "The caller does not have permission" error
That's it. Now the fun part begins.
First, you would want to link a particular Google Sheets spreadsheet and pass it to your Python script as a Spreadsheet object.
To do that, we can use the open_by_key method
sheet = gc.open_by_key(key)
Your key is the long string of letters and numbers in your spreadsheet's URL after "/d/". Replace key in the line above with that long string with quotes.
Now that you have a spreadsheet object in your hands, you can use any methods stated here for the Spreadsheet object.
Let's write a simple Hello World in cell A1 of Sheet1.
To do that, we need to get a Worksheet object for Sheet1. We can use its worksheet name to do so, as worksheet names are unique in a spreadsheet.
worksheet = sheet.worksheet_by_title('Sheet1')
We have a Worksheet object now. We can use any of the methods here to get what we want.
What we want is to manipulate or update the value in cell A1. We can use the update_cell method.
worksheet.update_value('A1', 'Hello World!')
Upon execution of that, you will have this:
Awesome! Now on to something more fun.
For those familiar with Python for data science, you would have heard of pandas dataframe.
pandas turn Python for a general-purpose programming language into a powerful data analysis and manipulation programming language.
To return a large dataset from Google Sheets to Python as a pandas dataframe, we use the get_as_df method of the Worksheet object.
import pandas # to use DataFrame methods on the df object df = worksheet.get_as_df()
There you go! We can now manipulate the data however we want!
There are so many things we can do once the data is in Python as a dataframe. We can commence cleaning up the data to be processed using machine learning, we can summarise the data and turn the Google Sheets into a powerful dashboard, and many more!
After you've done all the fun things on Python, it's time to return a nice set of data to Google Sheets and share it with your colleagues or friends.
I would advise that you prepare all the data that you want to push back to Google Sheets as one large, formatted dataframe first.
This is because, sadly, there is a timeout i.e. there's only a limited number of actions that you can execute pygsheets within a specific window of time.
Hence, it's best to minimise the number of actions while maximising the size of one action.
A good practice before pushing the data is to clear the data. Let's clear Sheet2, assuming that we have created one. If not, use the add_worksheet method of the Spreadsheet object.
A word of advice: If you want the code to be executed on a schedule, skip the add_worksheet method as that step will fail if you keep trying to create a worksheet with the same name.
Use the clear method to clear Sheet2.
second_worksheet = sheet.worksheet_by_title('Sheet2') second_worksheet.clear(start = 'A1') # start – topright cell address.
Use optional "end" argument to specify the end of the data range to clear.
Now, let's push our DataFrame object to Google Sheets with the set_dataframe method!
second_worksheet.set_dataframe(df, start = 'A3', nan = '')
The nan argument is important as any empty data in your DataFrame would be replaced with the ugly NaN otherwise.
That's it! You're now a Google Sheets master, capable of manipulating Google Sheets without touching Google Sheets!
This blog piece covers the most basic and most frequently used methods to manipulate Google Sheets data from Python.
If you refer to the documentation, we actually did not cover DataRange and Cell object.
From the Worksheet object, you can dive deeper into DataRange and Cell object via some methods.
For all my use thus far, I wasn't required to dive deeper. But the documentation is there to help you in case you're lost.