Categories
Python Study Notes

Python To Copy Or Not

I am a bit lost in dataframes right now, so I decided to take a step back and review compound data types. A very tricky stumbling block for me occurred when I did a simple thing:

a_list = ['a','b','c','d','e']
b_list = a_list
print("This is list a:",a_list,
    "\nThis is list b:",b_list)

This is list a: [‘a’, ‘b’, ‘c’, ‘d’, ‘e’]
This is list b: [‘a’, ‘b’, ‘c’, ‘d’, ‘e’]

(output from previous print statement)

It looks like I have a copy of my first variable (a_list) in a new variable (b_list). But watch this:

b_list[2] = 'z'
print("This is list a:",a_list,
    "\nThis is list b:",b_list)

This is list a: [‘a’, ‘b’, ‘z’, ‘d’, ‘e’]
This is list b: [‘a’, ‘b’, ‘z’, ‘d’, ‘e’]

(output from previous print statement)

Meet Python Aliases

Not what I expected at all!! It turns out that Python uses what I would call pointers from my experience in C++:

print("This is the address of list a:",id(a_list),
    "\nThis is the address of list b:",id(b_list))

This is the address of list a: 1686755852160
This is the address of list b: 1686755852160

(output from previous print statement)

Using an assignment operator for lists and other compound data types actually creates an ‘alias’ for the data stored in memory. In this case, the variable ‘b_list’ is an alias for the variable ‘a_list.’ This saves memory because the data only exists in one place. But what if I really want a copy of the data?

Making a Copy

There are at least three ways to copy the list:

a1_copy = list(a_list)
a2_copy = a_list[:]
a3_copy = list.copy(a_list)
print("This is the address of list a:",id(a_list),\
... "\nThis is the address of copy a1:",id(a1_copy),\
... "\nThis is the address of copy a2:",id(a2_copy),\
... "\nThis is the address of copy a3:",id(a3_copy))

This is the address of list a: 1686755852160
This is the address of copy a1: 1686755853440
This is the address of copy a2: 1686724145776
This is the address of copy a3: 1686755852480

(output of previous print statement)

It looks like I created three separate copies of a_list. Changing any single value in any of these four lists only changes it in the specified list. Hooray! Except when I did this:

c_list = ['a','b',['c','d'],'e']
c_copy = c_list[:]
c_list[1] = 'x'
c_list[2][0] = 'z'
print("This is list c:",c_list,
     "\nThis is copy c:",c_copy)

This is list c: [‘a’, ‘x’, [‘z’, ‘d’], ‘e’]
This is copy c: [‘a’, ‘b’, [‘z’, ‘d’], ‘e’]

(output from previous print statment)

Shallow Copying

Ack! Again, not what I was expecting. It turns out that those three copy statements above created ‘shallow’ copies. Python is obsessed with saving memory, at least in my opinion. Rather than copy ‘mutable’ data, it tries to save space by copying the references to mutable data. The Python Glossary defines ‘mutable’ as able to change the value for a given id or memory reference.

The object ‘c_list’ has a combination of mutable and immutable objects. Generally, immutable objects are things like numbers, strings and tuples. Lists fall into the category of mutable objects.

print("The second object in list c is:",type(c_list[1]))

The second object in list c is: <class ‘str’>

(output from previous print statement)
print("The third object in list c is:",type(c_list[2]))

The third object in list c is: <class ‘list’>

(output from previous print statement)
print("Id of second object in each:\n",
       id(c_list[1]),"\n",id(c_copy[1]))

Id of second object in each:
1686723511600
1686723910896

(output from previous print statement)
print("Id of third object in each:\n",
       id(c_list[2]),"\n",id(c_copy[2]))

Id of third object in each:
1686755853120
1686755853120

(output from previous print statement)

This is very important to understand. The Python ‘shallow’ copy does different things depending on the mutability of the data type being copied. Therefore, I need to plan ahead when copying variables. But what if I really, really, really want an actual copy of the entire object in memory?

Deep Copying

There is such a thing as ‘deep’ copying in Python. It uses a separate ‘copy‘ library and has some tricky ways to handle recursive objects, so I am going to put off thinking about it for now. At this point, if I wanted to make a referentially unique copy of c_list, I might hack together something like this:

c_list = ['a','b',['c','d'],'e']
c_copy = []
temp = []
for i in c_list:
  if len(i) == 1:
    c_copy.append(i)
  elif len(i) > 1:
    for j in i:
      temp.append(j)
    c_copy.append(temp)
    temp = []
  else:
    continue
c_copy[2][0] = 'z'
print("This is list c:",c_list,\
   "\n"This is copy c:",c_copy)

This is list c: [‘a’, ‘b’, [‘c’, ‘d’], ‘e’]
This is copy c: [‘a’, ‘b’, [‘z’, ‘d’], ‘e’]

(output of previous print statement)

Not elegant and certainly not ‘pythonic’ as I’ve come across it on the web, but an interesting exercise in understanding copying variables in Python.

Leave a Reply

Your email address will not be published. Required fields are marked *