Explore the usage of the class ConcurrentDictionary as a shared data structure in asynchronous or multithreaded applications.
- [Instructor] The concurrent queue, and concurrent stack implies a sequence in which elements are added, and then removed from the data structure. But a much more common scenario for synchronous code is to use dictionaries, and so there is also a concurrent dictionary that deals with key value pairs rather than a sequence of elements that is added and removed from the collection. So concurrent dictionary has key value pairs. You can specify the types thanks to genetics, and it has thread safety built in so you can remove/add safely.
Let's see how this works. Back to our AsyncDemos solution. Right click add new project, concurrent dictionary, cancel application. So we'll do very similar things like before. We'll add two threads, two things concurrently, but this time we will use a concurrent dictionary, and the concurrent dictionary is from the name space System.Collection.Concurrent, and it was two data types to be specified as a genetic type.
So the key itself and then the underlying value. I'll do an integer key and a string value, and here we have a shared data structure. That'll be used as a dictionary from multiple threads. So as before, we'll create multiple threads. Now in this case, we're not looking so much at a producer/consumer style where something is adding to a queue or stack, something else is removing from that queue or stack, but rather, we would have relatively random access in the sense that somethings will be added to the dictionary, and some things will be removed from the dictionary, and we need to be able to do so in a thread-safe manner, but the operations will seek keys and seek to modify the values associated with those keys.
Let's look at some of the common operations. Type add, and notice the three different ways in which we might need to add something. Because we cannot know whether a key value pair already exists in the collection until we call this method, we have the option to say we would like to add this key value pair, but if the key already exists, I'd like to do the following, and so provide a method. You can also say, well, I expect this value might already be there. I'll try and retrieve a value for this key, but if the key is not present let me add this key value pair.
Or you could simply say, I'd like to try to add this value, and if the key value pair already exists nothing happens. So let's do the simplest case first, TryAdd. Attempts to add the specified key and value to the concurrent dictionary, and so the key, one, value, the text one, and so that should not come as a surprise. This is almost as straight forward as a traditional dictionary.add, but there is an if statement involved because TryAdd will tell you whether you succeeded or not, right? So if you're adding unique values, unique key value pairs to this collection, all you might want to determine is, I thought I was adding this, oh it was there already.
Moving on, right? Right, TryAdd, and returns true if the value was added. Save all, build, run. Woops. We need to change the project I'm running. Concurrent dictionary, set to start up, run again, there we go. The key value pair 1 and one was added. So it's a very simple way to determine the state of the dictionary at the time when we do the operation.
Now the other variation was to say add, but get or add. So I'm saying given my algorithm, I would like to retrieve a value for a certain key. So the key I'm interested in is one. Oh, but other people can add then remove while my code was running, so as an atomic operation, get or add. Get me the value of the key value pair 1, or if it's not present, add one, and so the return value is a string because our key value pairs come as integers and strings, and so at this point what'll happen? The key 1 already exists, and so we will be given the value one.
If that key was not present, then our specified value would be the new value for that key value pair in the collection. One more, add or update. This is the more complex option. So if I'd like to add the key 1 with a value, well that's great, but it might already exist. So we can add a key value pair, or if the key already exists, we can run our own code to modify the existing value.
So this would be similar to what we had at first, but if that key already exists, we will have a function called. So here we have parameters, key value pairs are ints and strings, and this will map to a method of our choice. We do have to return a string, and we'll modify this in just a moment. So just watch what's happening. We're asking the concurrent collection to add or update.
If the key exists, we're in trouble. What're we going to do? We have a value we want to set, but there's an existing value in the dictionary already. Okay, so if the 1 does not exist, then it simply an add as we had with the TryAdd. So now what happens if the key already exists? This method is given the key and the existing value. Not the value we're trying to set, the existing value, and so we can use these parameters in our code to determine what the value should be after the add or update.
So given that the value is currently one, if we wanted to change that to uppercase, well, we can be given the existing value and just change it to uppercase. So can try an example, but notice that we are talking about a lander here, so this is the parameter list for our method, this is the body of our method, and we're essentially providing code to the add or update method so that it can go ahead and either add a new key value pair, or modify an existing value based on the code we provide.
So you can see it gets a lot more complicated very quickly when we start considering all the scenarios that might happen to a dictionary from being accessed by multiple threads. So the value here will be the lowercase one. So 1 has been added, and so when we get the value we'll be ignoring this provided one string, and value will be the original one from up there. Save, run, confirm it's the lowercase text as was added originally, and so now here we'll say AddOrUpdate.
I'm thinking if the key doesn't exist, here's the uppercase string I want. If the key does already exist, I would like to change it to uppercase. So just to make sure we know which one we're getting, we can underscore here, we know we won't get that one, and so add or update will actually return the resulting string so we know what we get. So val2 be what we write out this time. So the value here is what is the resulting value for the key 1 after we've made the determination whether it exists.
If it exists, our custom code runs, if it did not exist this value will be set. So let's see what we get, and so we get the uppercase one without underscores which is the result of our calculation here, the brief code. Okay, so three different ways to add to a collection depending on the requirements of your algorithm. Let's try and retrieve some values from this collection. We have get or add, and similarly to the scenario before, get something that exists should succeed, but get something that does not exist, you have the opportunity to add a value at that point in time.
You also have TryGetValue, which is the commonly used one if we expect a key to exist. We can say out string val3 here, and of course it's a try, so if we succeed then we expect to be able to use that value, and so now at this point the existing value would be val2, the uppercase of what was originally added, and so val3 should be an uppercase string saying one. Now the catch for us here is that we did say try.
So when another thread is intervening and modifying things here, it can happen that that value was either removed and so false. We were not able to get the value, or the value might have been modified, but we don't know that. It's multi-threading, so who knows what's happening in the background? All we know is that at the time we made this call, we succeeded in retrieving a value matched with it's key, and the value is provided to us. Okay. So remember that get. There was GetOrAdd, and I'll not write this one out.
Just notice there's the key value pair, and you can also do this with a function. You can ask, if you have this key, give me the matching value, and you can then specify either a substitute value or code-generated value if that key does not yet exist. Save all, build, it's all good. So let's do something a little bit more real world. I'll comment of this out so you can see it later, and let's just read from file, and we'll split the file into multiple lines.
We'll split each line into the individual words, and then we'll just count how many instances of each word exist in that file. So if someone says "the" a lot, then we'll count "the" multiple times. So I have a little sample file. You can download any public domain or other document you have access to, and in this text file we have lines of text, and of course those lines of text contain words. So a very simple thing to do would just be to read it using the file class.
So all the lines in the file. File from system.io can read all lines for that file name, and this is a IO operation. If it's a large file it's a stream of data, so this is pretty close to efficient, and of course I am counting that method being efficient splitting by line breaks as well, but as an IO operation this might take a while. You might want to do this asynchronously, but assuming my processing of those lines is itself an expensive operation, that's the thing I want to split into parallel tasks.
So I would like to run, say, five, ten, 15 threads to do processing of each line on separate threads, or perhaps a batch of lines per thread. Now I don't want to manage all of this myself, so what I'll do, I'll use our trusty parallel method, and I'll ask the parallel for each string in that collection of lines I would like to do something interesting. Now remember what's happening here.
Parallel ForEach will run a full loop for us over all of the elements in this collection for each line in the lines collection, and it will run our function once for each element. Just like the body of a for each loop, but it will split our collection into chunks, so the first ten or 50 or 100 items will be running on one thread, and the next ten, 50, or 100 items on the next thread. So I don't know how many threads it will create, but I know all of this will run on background threads, and so this block of code will have multiple threads running it in parallel, and so now I can get in trouble.
So all the way down here I'm going to create a new concurrent dictionary, and I want to keep track of individual words, so strings, and the integers account of how many times that word occurs in this document. Because it's a concurrent dictionary, I can call the members on this class from multiple threads safely. Okay, so the four each running in parallel will give us text. The individual line of text.
So this is essentially the signature of our method, and here is the method then that will be run on multiple threads. So I'll have multiple words on each line of text, so line, let's just do something really simplistic. We'll split on a splice. Of course, this could be more complicated. We'll take a look at each word we receive from that split operation, and now we can do some expensive processing. I'm just going to keep it really simple and say if string.IsNulOrWhiteSpace, so if we didn't actually get a word then just continue, and of course this is where you'll do much more expensive processing in terms of comparison of case and all sorts of other challenging things like that.
Now to do our word count, I want to make sure that words are counted as unique strings, and I only want to count a word once per instance in the document as a key, and then for every repetition of that word in the document I would like to increment the count, and so the concurrent dictionary, apologies, needs to be static, has an add method to do exactly that. So if the word does not yet exist in the dictionary, we would like to add it, and we would like to add it with a count of 1.
We just discovered it for the first time in this document, but if the key already exists, so the dictionary already has the word present, it already has a count of how many times that word has occurred, so the add or update method provides a key. The key that was already present, and the current value, or in our case, the current count as is currently present in that dictionary, and so what do you want to do with the current value and the key if it already exists? If it did not exist, we'll set a value of one for the key the word, but if it already exists we want to take the current count, increment it by 1 and return that as the new value for that key value pair in that dictionary, and it is truly that simple.
Set a break point. By the end of this method we would've counted all the words in the document and see how many times each one has occurred. Save all, build, no typos, and let's run. Here we have our dictionary. 67,000 unique words, and the unique words in there, 12 instances of enemy, two instances of murderers. You can imagine the type of document I'm pushing here, but essentially we've gotten to a point now where we could safely run a piece of code, this one here, on multiple threads at the same time and access a shared data structure that all those threads are accessing concurrently in a safe manner.
So the word count concurrent dictionary took care of multiple threads showing up at the same time trying to add things and managed that process by giving us a method that does something safe and concise that matched the kind of algorithm I was trying to implement. You can very often replace your existing collections with a concurrent dictionary, but bear in mind the pattern of how you access the concurrent dictionary does require you to update your methods and determine whether the way you used your current dictionary is appropriate for the concurrent dictionary.
- What is asynchronous?
- Blocking vs. nonblocking I/O
- Async database queries with begin/end
- Windows Forms BackgroundWorker
- Async networking with Tasks
- Async database queries with Tasks
- C# keywords async and await