From the course: Extending Laravel with First Party Packages

How search indexing works

- [Instructor] Alright, so we've talked a lot about search at this point, we've got everything installed with Laravel Scout, so we're ready to start working on building the search index. Now before we get going any further, I want to do a quick high-level overview just to briefly explain how search works, because you need to understand this concept of indexing in order to understand what we do in the next few videos. So throughout this video, we're gonna talk about kind of on a high level, how modern search engines work, what is an index, and why we need to maintain an index. That's gonna be kinda the important points of this video. So, why do modern search engines work and how do they work? So, the idea is that we need to be prepared for searching. There's a big difference in speed between searching against an index and searching against the documents, and that's the biggest difference between modern search engines and older search engines, is that in older search engines, we used to search against and in the documents. We used to go down the database and look at every record in the database and see if it matches, if a term in that database record matches what we're looking for. So we had to look through every single record in order to find it, and that's incredibly inefficient, and that's why newer search engines use this concept of an index. The index helps us point to the right results, but it doesn't actually contain the record that we're looking for. It's going to point to the right record, and then we can just go fetch that record like we normally would. So if you think about when you have a blog post and you're looking for the blog post with an ID of 12, it's very fast for the databases to get the record with the idea of 12, because they know exactly what it is. They're set up for that, and so they can go get that record really really fast. So we search against an index. The index will say okay, well, the term you're looking for, the result for that is in post number 12, and then we can go to post number 12 out of the database, which again is very fast, and we use both systems for what they're best at, and that's kind of how search engines work, in the modern era. So this is a really good thing I wanna read, this is from Algolia's blog, Milliseconds Matter, and this kind of explains at a high level indexes are. The goal of the indexing process is to store your data in a specific data structure that is optimized for search. Without indexing, we would need to scan all documents in a database to detect which ones match a query. This would take forever and be highly inefficient. The main part of the indexing process is to create a data structure that contains a mapping of each word to the associated list of documents that contain that word. One mapping of the word to the list of documents is called an inverted list, with the name coming from the inversion of this space. Instead of having documents that contain words, we now have words with the list of documents containing each word. The concept is similar to the index you have at the end of a book, except all words are in the index. So hopefully that helps clear up a little bit about what an index is. Again, you don't need to know the details of the index, other than at least understanding what it is. I want to just create a visual here of what needs to happen since we are going to maintain this index, because indexes are how we perform very fast searches, but we also need to store information in our database. So we have our database, my sequel database, whatever, as well, that needs to store the entire information, whereas an index is an abbreviated way to find that information quickly, so they're two separate things. So here you can see at the bottom I've created a new post called Hello World 1. We're publishing it and it's got this created at stamp, and we want to save this post. So when we save this in our application, in our Laravel application, this is kind of what we need to do. You need to, first of all, move it over into the database and save a record in the database. We're already doing this right now when we save something in the database. We take this from the user, we push it over to our database and save a record in our database. Now what we're going to need to do if we want to create a search index is in addition to saving it in the database, we're not going to search against the database. We have this whole separate document called an index that we need to maintain as well, so we need to also, then, when after we push it to the database, we need to then push it to the index, and we create a small record in the index, which is much more efficient. Then, let's think about this again, we get rid of it, we create a second record, this is Hello World 2, not published, somewhat different data. We wanna save this, the user wants to save this. Again, we push it to the database, create another full record in the database, and then we also have to make another push this time to our index, creating another record in our index. So we have to keep maintaining this. We've got another third one here, as an example, Hello World 3, slightly different data. We're gonna push that to our database, create another record, push it to the index, create another record, and so forth. This continues on forever. Now let's talk about when we want to delete. If we want to delete this post, we delete this record, and then we've got to push over to the index and tell it to delete that record as well. So that's an important step as well. So we have to be involved in every time we work and manipulate with that database, we need to keep the index in sync to it. By preparing the index ahead of time, we're gonna be able to search the index and not have to worry about the database. And so that's what I mean by preparing for search, is in old search engines, we didn't really prepare for search. When someone wanted to find a word, we would just try to find it in the database. We weren't prepared for search. By maintaining this index, which is something that we have to do constantly, every time a record is modified in the database, we need to also modify it in the index to keep the index in sync, and that's the important thing to keep in mind. And that's why we look at other tools to help maintain that, and that's where Laravel Scout comes in. Laravel Scout is going to help maintain and manage that process. Every time something changes in the database, it's going to synchronize it with the index and back and forth. Now, again, we've talked about lots of different drivers you can use to create and manage these indexes, but the one that keeps coming up is Algolia. Laravel supports Algolia out of the box. The Laravel community is a big fan of Algolia. I've used Algolia before Laravel even supported it. It's a really really good product, and it helps handle the distribution of your index worldwide. So this means if you're a United States company, everyone in the U.S. can get really quick search results, but then they'd have distributed infrastructure so people also in Tel Aviv or in Belgium, all these other places around the world, in Australia, these people also have centrally located distribution centers, so they can get quick results as well, because there's data centers all over the world. Your index is going to be distributed all over the world, and then we can get really quick search results. We're talking a couple milliseconds to navigate the index and get a response back, no matter where people are in the world. And that's one of the big benefits with using a cloud service provider like Algolia. It also hosts all the index for us, so we don't need to manage or maintain indexes or worry about if they corrupt, they're set up for that, and they have super fast response times. Okay, so in the next video, we're gonna talk more about Algolia, and we're actually going to set up Algolia, and we're gonna be using Algolia throughout the rest of this section. We're gonna create a free account, we're gonna set up Algolia, throughout the rest of this section, and that's what we're going to use to maintain this index and to be able to run our searches. Okay, so in the next video, we'll go ahead and start that process and set up Algolia.

Contents