Join Chander Dhall for an in-depth discussion in this video Exclude paths from index, part of Cosmos DB: Import, Manipulate, Index and Query.
- [Instructor] As we've discussed earlier, that every single attribute is automatically indexed by Cosmos DB. That is advantageous if we are searching all these attributes. However, in my real world experience, that's usually not the case. Especially in case of deeply nested structures and verbose data. Let's say, you and me have a blog post, and you already have an instance of Azure Search running that index is the post, there's no point having Cosmos DB do the same for you.
Another example is having deeply nested structures that may never need to be searched upon. Turning off indexing can surely help. The first benefit is savings on storage costs. And the second major benefit is write performance, as there are no CPU cycles spent indexing. Third, but often ignored benefit, is increasing read performance because the index is now smaller. Let's get back to Visual Studio and write some code which excludes paths from indexing.
We'll just go and delete some of this code. And maybe we can use some of this code. We'll get the collection Uri. Use the same database Id and collection Id.
Now, we'll create a dynamic document. We can also create a type document, and let's say it has an Id, could be something we've not used so far. And then we can have a name. What I like to do is add entities or properties that have nested properties within them.
So, something like company can also have a name, but can also have a smaller entity like address. An address can have multiple properties. The benefit of making this dynamic is that I don't really need to worry about a type structure, and it's good for unitest and a lot of demo code. But when you do have something serious in production, it might be advisable to go in a typed fashion, even though Document DB doesn't really care.
At the end of the day, it's still going to be JSON. Now, this is just an example to show you how deeply nested the path could be, and real world data could be a lot more complicated than this, too. And then you can have something like a shipping address. And keep in mind, this address is different. That's the address of that company and this is the address for this particular individual.
So, it's still different data altogether. And then I'll just use California, and that gives me a nested object right there. And I maybe have some more notes about the person, and that could vary completely. For some people, they may have awards, some people may have designations, and then some people could have titles. And it could be completely different for anyone there.
So, the benefit of Document DB is that since your data is Cumulus, this shouldn't matter. Well, next thing would be to create a collection definition. And it's going to be a document collection. And we could say Id is the same as collection Id. And then we need to add certain things to the definition, which is your indexing policy and then say, "Include a path," because you're going to start with including pretty much everything we know.
And that happens by just using the included path, which is going to be the root. So, I want to go to the root and put star, means everything in the root is by default included. Well, the next thing I would like to do is copy the same thing few times, and then change this to have an excluded path.
Now, this path could be different. So, let's say I want to be here and say, "All right, I've got company and anything "below company is excluded." So, this way, company won't be included at all. And then this needs to change to excluded paths. Not only company, but anything below, even address. So, even though address is one level below company, it will not be included.
City won't be included, same with name won't be included because we used star here. Another way to do the same exact thing, by the way, is escaping it in this fashion, and saying, "Okay, we've got company," and then you escape the same thing. Then you have this star. Now, this is if you want to use the right escape structures for the same thing, otherwise it doesn't matter. Both these lines do the same exact job. So, I'm going to just comment this for now.
And then fourth, what we could try to do is go to a level. For example, I want address to be included but I don't want mailing address to be included. Now, this way, anything below the mailing address won't be included, or will be excluded from my index. Now, keep in mind, notes are still included, and then Id and name are still included. Well next, I would like to create the collection. And this time, we will take this code as it is, and then paste it here.
Now, we've got the collection async and we also have the Id, and then we need to pass in one important aspect, which is collection definition. So, once we have that property, we've added all of this here. Now, as you can see, it's going to complain because we now don't need this, the collection definition's already got that. And then we have one more property to add, which could be your request options, and in this case, I don't think it matters. Otherwise, you could say offer 20,000, or whatever you like.
Well next, would be the document. And whatever is that created document, we're going to do the same exact way we've been doing, which is create a document async, and then send the collection SelfLink, which is under resource, and then pass in the document. And what's the document in this case? So, you know it's the same document we created in line number 39.
So now, the document should be part of Azure Cosmos DB instance. And now, we can run a query. We're going to run a very simple query, create document query. And then take the SelfLink of the collection, and then pass in your query. It could be a very simple query like star from root r where r.name equals my name.
And then close it right there. And next thing you want to do is As Enumerable, and then any, because we're just checking for any record whatsoever. And if that is true, that means this went through. Now, this is pretty good to make sure that we can test at least one functionality, but before that we want to make sure that we do not get an error just in case we already had the collection created. So, we want to make sure that this is changed if not exist async.
Now, if you don't have the collection, you don't have to worry about it. But in my case, I might already have the collection in Azure Cosmos DB instance. So now, we could do is place a breakpoint right here and run this. As you can see, we have the document, we have the collection definitions with included and excluded paths. Now, the collection already existed in my case, so it was a very quick call. And the next thing would be to get the result.
And as you can see, the result is true, so this worked. Now, keep in mind if you debug this, and you have to run this over and over again, you might want to go in here and add a line of code that kills the collection. And you could do that by saying, "client.deleteDocumentCollectionAsync," and then get the collection Uri. Just move this line of code right here, makes it very simple so you don't have to get the collection Uri again.
Now, this way, it will delete the collection and then recreate it. By the way, you might want to take a completely different collection if you want to run this, so that it's more like a temporary collection. You do all this and then you edit up here, so it doesn't really matter. For test purposes, probably a good idea. So now, we saw that everything worked for name, which makes sense because it was part of the path, and it was never part of an excluded path. As you can see, company still is excluded.
So, why don't we do something like company.name equals Cazton, as the next query and see what happens. So, we can copy the same code as it is, and say where company.name equals Cazton. And as you know that this will actually not return any records whatsoever. So now, one thing to keep in mind is that if you have this line of code, you might want to make sure that there is no document collection with that same name that exists, since we just changed the name, so there's a chance that it doesn't actually exist at all.
So, you want to put it in a try-catch, and this is just for demo purposes. And then what you do is just ignore it completely, and say, "Well, whatever that exception is." And you just put it so you know that happened, but it doesn't really come in your way. 'Cause we really don't have that collection. So, this will actually throw an exception saying, "Well, the collection doesn't exist." So, what we're doing is just a hack to make our demo work and then say it. Well then, we might also use this at the end of our code to get rid of that collection if it actually exists.
Now, in a regular real world scenario, you would never run into this situation. But in this case, we're debugging, and what we don't want to do is have this coming our way, so we're just employing a hack. And maybe you could do the same here so you have the try-catch statement. All right, so this is good and we can now do an F5. And maybe we can just put a breakpoint right here so we can see what comes in this particular result. And then keep in mind that this particular line will actually give you an error because this is a query that is not allowed because we're already excluding it from that path.
So, what I would like to do is put a breakpoint right here so we can look into exception and see what it is. So, perfect, sometimes even an error message is the perfect thing to have. So, as you can see, we have one or more errors occurred. And then inner exception is what gives us a little bit more detail. What we could do is just do F5 and see what we get as an exception. So, once we run this we will see an exception that'll be thrown, which is perfect.
And as you can see, the exception is an invalid query has been specified with filters against paths and excluded from indexing. And we don't need to get into details there. This is exactly what we were expecting. Just because we were inside company and that particular path has been excluded. So, this is perfect. Now next, what we could do is also try the same thing with address and mailingaAddress.city. So, I'm going to copy this same exact code. We're still going to get an exception.
And this time, we're going to go inside address, and we're going to go inside mailing address. And then the city will be Austin. Just to make sure that we had excluded that path, which is address and mailing address, so we should get an exception again. Now, just to remove this exception, I'm just going to comment this code, and just run this part.
And we can do a breakpoint right here on line number 105 and remove the one in 98. So now, we should see an exception at line number 105. Press F5, and you can see the exception. It is the same exact exception that we talked about. So, you can see that if we exclude it, you won't be able to do these kind of queries. I'm going to copy and comment this code, and then do something that actually makes sense, so you can that a nested query does work for something like notes.
Just to get an idea, we had notes.title and the title was CEO. And this code should work and we should not get an exception. So, what we could do is we could place a breakpoint right here, and we should never hit this breakpoint. We should go to line number 122. So, I'm going to press F5 again. And as you can see, it went to line number 122. You can see the result, and the result happens to be true.
So, it was able to find the value, which in this case happens to be CEO for notes.title. And now, we're going to delete the collection and press F5.
- Data import scenarios
- Creating a database
- Creating a partitioned collection
- Data manipulation
- Importing documents with a stored procedure
- User-defined functions
- Excluding indexing at a document level
- Range indexing on strings
- Querying with SQL parameters
- Range operations