Using defaultdict
00:00
Welcome back. In this lesson, Iâm going to be showing you how to use defaultdict. How exactly youâre going to use it is going to depend on the exact use case, but a general pattern here is that weâre going to be using built-in mutable data structures which Python offers, and weâre going to be mapping those to keys.
00:17 Which data structure we use is going to depend on our use case, and there are four of those which weâre going to be looking at.
00:23 The first one is grouping items. Next, weâll look at grouping unique items. After that, weâll look at counting items, and finally, at accumulating items.
00:33
This all sounds a bit abstract, but itâs much clearer once we start looking at the code, so letâs jump right in. Here we are in the REPL, and the very first thing Iâm going to do is Iâm going to import defaultdict from collections.
00:45
So now, Iâm able to create defaultdict, and thatâs what Iâll do. Iâm going to create a defaultdict and Iâll call it dd and Iâll pass it list.
00:55
So what this allows me to do is that I can append something to a key which isnât present, such as the key 'key'. And as you can see, this didnât cause my code to error out. In fact, if I have a look at my dictionary, you can see that thereâs a single 'key' and the value is 1.
01:13
Now what happens if I add another value, such as 2? Whoops.
01:20
Well, there we go. That didnât go as smoothly as I planned, but we can have a look at the defaultdict and see what it contains. And you can see that what happened was that the value 2 was smoothly appended to this list. Again, no errors, no issues of any kind.
01:35 This is the basic logic which weâre going to be applying in each of the scenarios which we just discussed. Letâs look at counting items. First of all, Iâm going to clear my REPL.
01:45
But remember, Iâve already imported defaultdict, so I donât need to do that again. Now, to set up my example, Iâm going to be working off this data.
01:54 So basically what it is is a list of tuples, and each tuple is a department and an employee. So in the Sales department, we have John Doe and Martin Smith. In Marketing, we have Elizabeth Smith and Adam Doe. And Jane Doe works all alone in Accounting.
02:11
What I want to do here is I want to create a dictionary which will group people by department. So for example, I would like to have one key for 'Marketing', and then I would like to have 'Elizabeth Smith' and 'Adam Doe' both listed in a list under that key, since they both work in Marketing, right?
02:29
But we can imagine that this is a huge companyâthere are many departments, and I might not know all of them from the beginning. So now Iâve created a new defaultdict. This time itâs called dep_dd.
02:41
So now that Iâve done that setup, what I can do is I can iterate over each department and each employee in depâso in my list of departments and employeesâand I can append each employee to a list which is mapped to a key which is the department.
02:59
And I donât have to worry about the department keys already being present, so I wonât get an error even when Iâm adding an employee to a department for the first time.
03:08
And since Iâm using list as .default_factory, I can always add employees to departments. Itâs not one department, one employee. So letâs try this.
03:20
As you can see, that ran without any issues. Letâs try looking at the defaultdict which I just created. And you can see here, for instance, that 'Sales' has two employees and indeed, theyâre both in my listâ'John Doe' and 'Martin Smith' are both there.
03:35 So that works as expected.
03:39 The next use case I would like to show you is grouping unique items.
03:43
Letâs come back to the REPL. First of all, Iâm going to clear it so that we have sort of a blank slate, and Iâm going to create my dep list again. Except this time, the data isnât as clean. In fact, itâs quite messy in the sense that I have multiple entries for the same values.
04:01
So for example, if you look at the last three values, itâs three times 'Adam Doe' in the 'Marketing' department. And this is a very common situation, right? Weâre often working with dirty data, which is not optimally presented to us.
04:14
What I want to create is a dictionary-like structure, so a defaultdict, which only has one entry for 'Adam Doe', one entry for 'Elizabeth Smith', and so on.
04:24
The way to do this is very similar to what we did previously. In the example just before this one, I created a defaultdict. I called it dep_dd, just like here. Instead of passing set as a parameter to .default_factory, I had passed list. What Iâm going to do now is pass set.
04:42
And what that does is set accepts only one of each value. So if I pass the same value again to set, it wonât be entered again, since itâs already present.
04:53
So the syntax for doing this is very similar to what I had done just before. Again, Iâm iterating over my tuples, over department and employee, and Iâm this time addingârather than appending themâto keys which are the department values. But this time the mutable data structure which Iâm using is a set instead of a list.
05:12
So rather than prolonging the list with repetitive values, the set will only accept unique values. Letâs see how this went. Iâll have a look at my defaultdict to see what it contains, and you can see in 'Sales', I have 'Martin Smith', 'John Doe'.
05:29
But whatâs more interesting is that in 'Marketing', I have 'Adam Doe' and 'Elizabeth Smith', and 'Adam Doe' only appears once even though in my original dataset up here, we had 'Adam Doe' three times.
05:42 The next use case weâre going to be looking at is counting items.
05:47
For this example, Iâm going to be using the same list I used in the very first example so there are no repetitions, just because itâs a bit cleaner and easier to work with. If you thought I was going to start out by creating a defaultdict, as I did in the previous examples, you would be right.
06:04
And if you thought that this example would be different in that I would pass a different mutable data structure to defaultdict, then that would be correct as well.
06:14
What I will pass this time is int instead of set and instead of list.
06:20
So there we go. Iâve created my defaultdict. Next, I will iterate over my list of tuples. And now what Iâll be doing is Iâm going to be incrementing the int, which Iâve added to each entry where I didnât have a key in my defaultdict.
06:36
And again, as in previous examples, the key is the department name. Okay, so we ran this code. Letâs have a look at what my defaultdict contains.
06:46
As you can see, this time there is an int mapped to each key. The key is a department name. So the first one is 'Sales' and the value I have here is 2, and thatâs because two people work in 'Sales'.
06:59
Thatâs 'John Doe' and 'Martin Smith'.
07:03 Weâve reached the final use case, and thatâs accumulating items. Again, thisâll be easier to understand when weâre looking at the REPL. In this use case, weâre going to be using this data.
07:15
What we have here is a series of departments againâor you can imagine these are sales typesâand we have a value for each of them. So for instance, here at the very top line, you can see that weâve sold three types of books, or we have three entries for sales in 'Books', or this could be perhaps spent on books.
07:34
But the point is we have different numerical values here, 1250.00, 1300.00, 1420.00, and so on. And what Iâd like to do is Iâd like to add them.
07:43
I would like to end up with a dictionary-like structure, or a defaultdict, where I have one entry for 'Books' and the sum of these values, so I have consolidated totals.
07:56
And by now youâre probably expecting me to create a defaultdict and pass a different argument to it, and thatâs correct. This time, Iâm using a float.
08:06 And as in the previous use cases, what Iâm going to be doing is Iâm going to be iterating over this list of tuples, and Iâm going to be accumulating values. This time, Iâm going to be using products as keys.
08:19 Now this should have run, and the best way to see what we came up with is to print this. And thatâs exactly what Iâm going to be doing here. Letâs see what this gives us.
08:30 And so you can see that the income for books was just under $4,000, for tutorials just under $2,000, and so on.
08:40
Thatâs the end of this lesson. We looked at four different use cases and how defaultdict can help us in each of them. In each case, we are grouping or consolidating or somehow reducing items that we have, maybe to unique items, and weâre mapping them in different categories.
08:57
Weâre using keys as the way in which we can retrieve these consolidated values. So, Iâve shown you four use cases. I hope Iâve convinced you that defaultdict can be helpful and useful in resolving concrete problems. In the next lesson, weâll go deeper into defaultdict and see more of how they work under the hood.
Become a Member to join the conversation.
