I’ve been using Python Pandas to analyze the data I recently scraped from BoardGameGeek. I have used lists and dataframes, but this experience showed me how much I have to learn. The official documentation is a treasure trove of information, but can be difficult to understand. One of the first obstacles is figuring out where to find information relevant to the problem at hand. In this post I will share an example of a problem I encountered and how I used ‘dtypes’ and ‘type()’ to find the right documentation to solve it.
The Setup
import pandas
import sqlite3
database_connection = sqlite3.connect("my_db.sqlite")
connection_cursor = database_connection.cursor()
gen2_tags = pandas.read_sql_query \
("SELECT generation,tag,parent,child_count FROM node \
WHERE generation = '2'", \
database_connection)
database_connection.close()
This created a dataframe object from my database with these columns: ‘generation’, ‘tag’, ‘parent’ and ‘child_count.’ Since I won’t be loading any more data from the database, I close the database connection. From here on out, I will be working with data stored in the gen2_tags object. This bit of code just makes sure that all of the data in my columns is of the type that I expect (either string or int).
gen2_tags = gen2_tags.convert_dtypes()
gen2_tags = \
gen2_tags.astype({'generation':'int64'})
gen2_tags.dtypes
The Problem
Using groupby() and count(), I have a partial list of unique tags along with the number of times they occur.
gen2_tags[['tag','generation']].groupby('tag').count()
This is the output:
generation
tag
age 282212
bgaccessoryversion 4664
boardgame 98
boardgameaccessory 10496
boardgameartist 81863
... ...
videogamepublisher 59333
videogameseries 15200
videogametheme 69690
videogameversion 108737
yearpublished 282212
[74 rows x 1 columns]
At the bottom of the output, I can see that I have 74 rows. Unfortunately, I can’t see every unique tag value. The problem, then, is how to display only those unique values.
Solving the Problem
First, I try using ‘unique’ which I came across in StackOverflow using a Google search. I expect this to display just a list of the unique ‘tag’ values.
pandas.unique(gen2_tags['tag'])
<StringArray>
['yearpublished', 'minplayers',
'maxplayers', 'playingtime',
'minplaytime', 'maxplaytime',
'age', 'name',
'description', 'thumbnail',
'image', 'boardgamepublisher',
'boardgamehonor', 'boardgamemechanic',
'boardgamepodcastepisode', 'boardgameversion',
'boardgamefamily', 'boardgamecategory',
'boardgameartist', 'boardgamedesigner',
'boardgamesubdomain', 'poll',
'statistics', 'boardgameimplementation',
'videogamebg', 'boardgameaccessory',
'boardgameexpansion', 'boardgamecompilation',
'boardgameintegration', 'boardgame',
'boardgameissuearticle', 'bgaccessoryversion',
'videogametheme', 'videogameversion',
'videogamedeveloper', 'videogamepublisher',
'videogamegenre', 'videogamemode',
'videogameplatform', 'videogamecharacter',
'vgcharacterversion', 'rpgissuearticle',
'rpgartist', 'rpgpublisher',
'rpggenre', 'rpgdesigner',
'rpgproducer', 'rpgcategory',
'rpgitemversion', 'rpgpodcastepisode',
'error', 'commerceweblink',
'rpg', 'rpgmechanic',
'rpgseries', 'rpghonor',
'rpgsetting', 'rpgissueversion',
'rpgissue', 'videogameexpansion',
'videogamehonor', 'videogameseries',
'videogamecompilation', 'videogamehardwarecategory',
'videogamehwversion', 'videogamehwmfg',
'vgwalkthrough', 'videogamefranchise',
'videogame', 'videogamehardware',
'vgcharactercat', 'boardgameissueversion',
'boardgameissue', 'boardgameauthor']
Length: 74, dtype: string
I can see all the unique values, but I don’t want to see ‘StringArray’ or ‘Length’ or even the brackets and quotes. My initial instinct is to follow the Python documentation on lists for splicing:
tag_list = pandas.unique(generation_tags['tag']) print(tag_list[:])
But, it gives me the exact same output as before. Perhaps I am not dealing with a list in the basic Python sense at all? Looking over the output, the first line refers to a ‘StringArray.’ And this confirms it:
type(tag_list)
pandas.core.arrays.string_.StringArray
Not only that, but the object comes from Pandas. In the Pandas API reference for StringArray, I can confirm that the output is as expected, but it isn’t really helping me do what I want. The documentation refers me to a pandas.array object (in a ‘See Also’ box on the webpage).

The examples here are like the examples for the StringArray. Not too helpful, but I notice that the ‘See Also’ box sends me to something called a Pandas Series. It isn’t obvious at my first read through, but a bit of persistence gives me a new hint.

I realize that Series data is ‘iterable.’ Looking up what ‘iterable’ means in the Python Glossary suggests that I need a for loop. So I try:
for i in pandas.Series(tag_list):
print(i)
This displays a list of the unique ‘tag’ values without any extra characters or meta information.
yearpublished
minplayers
maxplayers
playingtime
minplaytime
maxplaytime
age
name
description
thumbnail
image
boardgamepublisher
boardgamehonor
boardgamemechanic
boardgamepodcastepisode
boardgameversion
boardgamefamily
boardgamecategory
boardgameartist
boardgamedesigner
boardgamesubdomain
poll
statistics
boardgameimplementation
videogamebg
boardgameaccessory
boardgameexpansion
boardgamecompilation
boardgameintegration
boardgame
boardgameissuearticle
bgaccessoryversion
videogametheme
videogameversion
videogamedeveloper
videogamepublisher
videogamegenre
videogamemode
videogameplatform
videogamecharacter
vgcharacterversion
rpgissuearticle
rpgartist
rpgpublisher
rpggenre
rpgdesigner
rpgproducer
rpgcategory
rpgitemversion
rpgpodcastepisode
error
commerceweblink
rpg
rpgmechanic
rpgseries
rpghonor
rpgsetting
rpgissueversion
rpgissue
videogameexpansion
videogamehonor
videogameseries
videogamecompilation
videogamehardwarecategory
videogamehwversion
videogamehwmfg
vgwalkthrough
videogamefranchise
videogame
videogamehardware
vgcharactercat
boardgameissueversion
boardgameissue
boardgameauthor
Problem solved!
Of course, I could have just copied some code from StackOverflow, but trying to read the APIs helps me understand Python on a whole different level. For example, I won’t soon forget to check the type of my Python objects when they don’t behave as I expect. One of the really nice aspects of Python is how much documentation is available. I should definitely use it!