Microsoft Excel is great at spreadsheets but what if you want to use a different tool to analyse the data an Excel spreadsheet contains? In steps Python 3 and pandas, and as I’m currently looking at using pandas I’ve decided to make a few notes.
No, not this type of panda (kindly used under Creative Commons license from https://commons.wikimedia.org/wiki/File:Giant_Panda_Eating.jpg) but a Python module called pandas.
pandas is an open source data analysis library for Python and has a website at https://pandas.pydata.org/
For my example I have created a very basic spreadsheet (xlsx format) with some fictional comic book sales in:
To import Excel (xlsx) data into Python requires only a few lines of code:
Looking at the line to read the spreadsheet:
comic_data = panda_data.read_excel(‘comics_geektechstuff.xlsx’,sheet_name=’Sheet1′,skiprows=1)
comic_data is the variable where Python will store the data.
panda_data.read_excel opens the xlsx file
‘comics_geektechstuff.xlsx’ is the xlsx file name that I want to open
sheet_name=’Sheet1′ is the name of the sheet that I want to look at
skiprows=1 is used to skip the first row of data as this is an unneeded header (number of comics in my xlsx file)
print(comic_data.head()) can then be used to output the first five rows of data.
print(comic_data) could be used to output all the data.