C++ STL Library Counterparts In Python
c++
is choice of majority of competitive programmers, and python
is not that friendly. One significant reason for this preference is Python’s slower code execution compared to C++. Python has an advantage, it has very very low learning curve, in case someone wants to learn the algorithm and not the language they can try python. However, it not advisable to run away from scary implementations since it’s a part of programming but for starters lets overlook those complexities. It’s not impossible to do competitive programming in python, there are people like pagenegod who have received grand master title in codeforces.
So this is my attempt to demystify python data structures in order to start competitive programming, as a c++
users I have tried to compare and contrast how to use the Python counterparts; if you already know the c++ counter parts.
Dynamic Arrays
list
is a collection of elements in python, just like vectors
in c++. The difference in python’s list and c++ vector is that in python’s list the elements can be of varying data type. Here is a list of basic operations that vectors support and then corresponding things in python:
Initializing And Inserts
1
2
3
vector<int> arr = {2, 3, 1, 5, 4}; // creates a list
arr.push_back(6); // adds a new element in the list
arr.insert(7, 0); // inserts the element 7 at position 0
1
2
3
arr = [2, 3, 1, 5, 4] # initializes an empty list
arr.append(6) # adds 6 to the end
arr.insert(7, 0) # adds 7 at position 0
The cost of operation of inserting at random position is \(O(n)\) where n is the size of the collection, if inserted at the end the cost of operation is \(O(1)\) for both vectors and list.
Finding And Deleting Elements
1
2
3
4
auto it = find(arr.begin(), arr.end(), 2); // gets an iterator to the element
int index = distance(arr.begin(), it); // distance between the iterator is the actual index of the element
arr.erase(it); // removes the element
1
2
index = arr.index(2) # finds the first index where value is 2
arr.pop(ind) # remove the element at index
The cost of finding is \(O(n)\) in both the cases, the cost of deleting is also \(O(n)\) if the position is random. If the position of deleting the element is last then the time complexity of deletion is \(O(1)\) else it’s \(O(n)\).
Sorting And Reversing
1
2
sort(arr.begin(), arr.end()); // sorts the vector in increasing order
reverse(arr.begin(), arr.end()); // reverses the vector
1
2
arr.sort() # sorts
arr.reverse() # reverses
To sort with specific keys, use the following:
1
2
// sorts the vector with different key, the third input is the custom comparator
std:sort(arr.begin(), arr.end(), [](int a, int b) { return f(a) < f(b); });
1
2
3
4
5
def f(x):
return x * x - 6 * x + 8
# sorts based on values of f(x), you can use lamda functions as well
arr.sort(key=f(x))
Binary Search
The best thing about stl is you don’t have write binary search from scratch, you have lower_bound
and upper_bound
implemented. Well python also has bisect. Lets see how to use them:
Remember it works only when the collections are sorted
1
2
3
4
5
// return the iterator to first element where *it >= x
auto l_it = lower_bound(arr.begin(), arr.end(), x);
// return the iterator to first element where *it > x
auto h_it = upper_bound(arr.begin(), arr.end(), x);
1
2
3
4
5
6
7
import bisect
# returns the first index where arr[idx] >= x
l_idx = bisect.bisect_left(x)
# returns the first index where arr[idx] > x
r_idx = bisect.bisect_right(x)
Sets
Sets are a collection of elements, they will not have duplicates. In c++
the sets are ordered and implemented using BinarySearchTrees
but in python it’s implemented as a HashTable
. So python’s sets are same as unordered_sets
in c++. The document says that the average operation cost for insertions and retrieval as \(O(1)\) but they can be hacked to run on \(O(N)\) which is their worst case scenario.
1
2
3
st = {1, 2, 3, 4, 5} # sets in python are initialized using {}
st.add(6) # adds an element to the set
print(4 in st) # prints True if 4 is in the set
python sets support all the set operations:
1
2
3
4
5
6
7
8
9
10
11
12
A = {1, 2, 3}
B = {3, 4, 5}
A == B # A is equivalent to B
A != B # A is not equivalent to B
A <= B # A is subset of B A <B>= B
A > B # A is proper superset of B
A | B # the union of A and B
A & B # the intersection of A and B
A - B # the set of elements in A but not B
A ˆ B # the symmetric difference
But again the danger is in the worst case time complexity, so if you want to avoid that it’s better to use something else, people use their own implementation of stuff
Dictionary
Dictionary are key value pairs in python, similar to maps
in c++ usage wise but complexity wise they are not. Dictionaries also have \(O(1)\) average time complexity for all the things, but worst case they are also \(O(N)\) instead of the ideal \(O(log(N))\) that c++ maps provide.
1
2
3
4
5
6
7
my_dict = {
"a": 1,
"b": 2,
"c": 3
}
my_dict.pop('c') # deletion
There is something called ordered dictionary, which might be confusing for people using
c++
since usually ordered means ordered by key but in python it’s ordered by time of insertion
1
2
3
4
5
from collections import OrderedDict
# The complexity of operations like insert, deletion and looup is still O(N) in worst case
my_dict = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
Talking alot about dictionary won’t really help if you are trying to do CP in python so we will skip this.
Priority Queues / Heaps
Priority queues in python are implemented using
heapq
. Theheapq
provides various methods present in the library to heapify a list and support the various heap operations in \(O(log(N))\).
If you are not aware of various heap operations and how they work, I would suggest you read this.
1
2
3
4
5
6
7
8
9
import heapq
my_heap = [1, 2, 3, 4, 5, 6, 7, 8]
heapq.heapify(my_heap) # this will heapify the list in O(N * log(N))
heapq.heappush(my_heap, 2) # this will push an element in O(log(N))
heapq.heappop(my_heap) # removes and returns the smallest element in O(log(N))
These functions are enough to know, but if you want to know more about other operations please read this
There is also a priority queue implemented in python:
1
2
3
4
5
6
7
8
9
10
11
12
13
from queue import PriorityQueue
pq = PriorityQueue()
# insert the values (priority, value)
pq.put((2, 'a'))
pq.put((1, 'b'))
pq.put((5, 'c'))
pq.put((4, 'd'))
# this will print the values (1, 'b'), (2, 'a'), (4, 'd'), (5, 'c')
while pq:
print(pq.get())
Insertions and removals are both \(O(log(N))\). The queue version uses heapq internally, however it’s slower since it’s made thread safe so it implements locks, encaptulations and stuff, which is not really required for competitive programming
Queue
There is an inbuilt Queue implemented in python, here is how to use it:
1
2
3
4
5
6
7
8
from queue import Queue
q = Queue() # Initializes a queue
q.put(1) # adds 1 to the queue
q.put(2)
q.put(3)
q.get() # gets the first element in the queue and deletes it from the queue
Deque
Python also supports the deque data structure:
1
2
3
4
5
6
7
8
9
from collections import deque
dq = deque()
dq.append(1) # adds an element to the end of the deque
dq.appendleft(2) # adds an element to the begining of the deque
dq.pop() # pops an element from the right of the deque
dp.popleft() # pops an element from the left of the deque
Python’s deque datastructure can be used to implement the
stack
,queue
anddeque
stl classes present in c++. Deques are better if you want to implement Queue and Stack operations.
Counters
This is a special mention. Alot of times there are case when you want to calculate the frequency of elements in an array.
Implementation in c++ without sorting and searching:
1
2
3
4
vector<int> nums = {1, 2, 3, 4, 1, 2, 1, 3, 3, 4, 5};
unordered_map<int, int> freq;
for (int num: nums) freq[num]++
To implement it in python:
1
2
3
4
5
from collections import Counters
nums = [1, 2, 3, 4, 1, 2, 1, 3, 3, 4, 5]
freq = Counter(nums) # contruct the frequency map of elements
print(freq[1]) # it will return the frequncy of 1
The End
We will continue the discussion and implement some famous algorithms in python. I am fairly new to python so the contents of the blog may change in the future. Please leave your suggestions in the comments below