Join Chander Dhall for an in-depth discussion in this video Precision vs. recall, part of Azure Search for Developers.
- [Instructor] Before we start with Azure Search, I would like for you to walk through some of the search fundamentals. Information retrieval has been one of the most fascinating topics when it comes to computer science, and it adheres to a typical kind of vocabulary. One of the first things I'd like to talk about are precision versus recall. Whenever we talk about any search engine, precision is a very important aspect. Precision means the fraction of retrieved instances that are relevant.
And recall is the fraction of relevant instances that are retrieved. I know it's kind of confusing. Let's take an example. Let's say we have a query, and our query says I want to look for a company which has a name Cazton. And this looks like a point query that we're going to make to a search engine. Let's assume that we have the search engine that's talking to a data store, and the data store has total results that are 120,025 records for company 25.
And the relevant results are only 25. So we're left with 120,000 irrelevant results. Let's assume we made the call, and we retrieved only 15 results. Now what does that mean in case of recall? Well, since we know that the relevant results were 25 but we were only able to retrieve 15 records or 15 results, that means we missed about 10 of the relevant results. So our recall, in this case, is just 15 by 25.
Let's take another example where we made the query and the retrieved results were 15, but only 12 out of them were relevant results. Now what happens in that case? Well, in that case, our precision cannot be 100%. It will be 12 by 15. In order for it to be 100%, it should have been 15 by 15. That means we should have retrieved 15 records, and all 15 of them should have been relevant to our query, which in this case is Cazton.
Now the question is, which one should we go after? Well, that totally depends on your business. Most of the times, it's a very good idea to keep checking what our recall is and our precision is. Because the higher the numbers on recall and precision, the better the search engine. There's enough business value to go after higher numbers of precision and recall, and the reason is simple. Let's take an example where we have something like an e-commerce engine, and we're trying to sell laptops, monitors, desktops, and a lot of other electronic items.
What happens if we had 15 laptops in our database and only 12 of them showed up when someone searched for laptop? And what if the three records that didn't show up were the laptops that I, as a user, wanted to buy? That means serious loss of business for the company, and that's not a very good thing. So precision and recall are very important, and that's one of the reasons a lot of companies will make sure that they do point queries.
And when they do point queries, they end up using a robust backend that will give them 100% relevancy in that case. Precision and relevancy become more important when we're doing unstructured search. An unstructured search is where we're making a call to the database, which may not have the data in a very structured fashion, and that has its own benefits also. One of the major benefits of making a call to an unstructured search is the fact that we can use a data store that may not be an RDBMS and that may be a data store which is more like a NoSQL, that actually could be scaled out very easily.
If you do not know the differences between NoSQL versus SQL, I highly recommend watching the course on NoSQL Development with DocumentDB. There's a small module that talks about the differences and gives you a bit of an idea.
- Querying and indexing
- Creating a search service
- Using APIs during searching
- Importing JSON data
- Handling synonyms
- Working with suggestors and facets