After learning all about the theory of lemmatization and the goal of an NLP-enhanced search feature, now implement the search feature in Xcode. First, have a look at how you can determine the dominant language in a given text. Afterwards, use the NSLinguisticTagger again to add each word and all the lemmas from all diary entries to your word sets to then look for intersections with your search string.
- [Instructor] Back in Xcode, I've already opened up our NLPDiary project, and opened up in this project the EntryFilter.swift file, because the cool thing about that is that to enhance our application with natural language processing, we just need to make modifications here in the setOfWords function. And indeed, what we are going to do here now has a lot to do with lemmatization, what you've learned about, and this function is going to take a string and produce a set of word forms from it.
And these are going to include all of the words of the text and their lemmas, so this is what we're trying to do. And just to refresh your memory, we have a string that we're going to receive and that we're going to analyze in this function, and we have this language parameter, which is an inout parameter. And inout means that modifying the local variable will also modify the past in parameters, and without it, the past in parameters will remain the same value, which means that you can try to think of this as a reference type when you're using inout and as value type when you're not using it.
So just as a quick explanation of what our function does and of what we intend to do. And now we want to actually return a set, a Set of strings, and we've already created that here, a wordSet. And we're returning it here at the bottom and we're going to remove all of the code that is currently present, because this is just dividing a sentence into its different words. So we're getting rid of that. And we're going to add a lot of more logic here and we're going to start with a tagger, and you're already familiar with that because you've seen the basics about the NSLingusticTagger, and we're going to initialize that right here again with an NSLingusticTagger and a tagScheme, and options.
We will not need any options, but here we're going to add the text that we'd like to work with, and in our case, as I've already said, we want to work with the lemmas and also with the language. So these are our two tags that we're going to use. And for the options, I'm just entering zero here. Then what we also have to do is define a range, you're also familiar with that because we need to define which range we'd like to analyze. And here, I could use NSMakeRange or just NSRange, initialize that with a start location, in our case zero and a length, which is the length of our string, so I'm using the string parameter utf16 and the count of the number of character we have.
And then we can already use our tagger and set its string property, and here we simply assign the string that we receive here in our set of words function and then we can already check with the language detection. And first of all we should check if we have language or we receive the language from our parameter list. So if let language, and assign the language parameter here, and if this works, then we can already define orthography and assign that to our tagger.
So let orthography and initialize that with NSOrthography, and then the defaultOrthography(forLanguage: and here we can pass along the language that we have as a parameter. And if the language is null, then the tagger should set it based on an automatic identification of the language of our string. So we're adding an else statement here, and we're then using our language, and use the tagger and just choose the dominantLanguage, which returns the dominant language of the string set for our linguistic tagger.
And that's all there is to language detection. Now let's just quickly build our project, and there we see that we should actually do something with our orthography that we have defined. So again, if we have detected a language, or if we got a language as a parameter for our function already, then we should also set the orthography to our tagger, or give our tagger this orthography, accessing the setOrthography function of our tagger, passing along the orthography we defined in the range that we have defined.
And with that, we really completed the work for the language detection. As you can see, we still need to do something with the wordSet, and what our goal actually is to populate this Set with all the words present in our string, but also with all the lemmas that we've found. And therefore, I'm going to use our tagger, called the enumerateTags function, with the range we have defined, with the unit: .words, with the scheme: .lemma, because that is what we are interested in, and we want to actually omit white spaces and also omit punctuation.
And that is all the configuration that I need to do, so I can now append my closure and define the parameters that I'm interested in, which is first of all the tag, which would be lemma, in our case. This is what we're looking for. And the tokenRange, and we do not need the spool that we all would also get here, so we're skipping that and now we can start the implementation of this closure. And first of all, we're again interested in our token, so let's define a constant for that, and use our string as NSString again, because we want to use the specific NSString function, which is .substring with range.
And for the range, we're of course using the token range because the token range gives us the exact start and end position of all the tokens that our linguistic tagger has found. So this is the first thing that we do, and now each word of the text should be inserted into the result Set. So what we do is taking our wordSet, insert our new token that we have found, and this would be now our token, but in a lowercased form.
So this was the first step, now the second step is dealing with the lemmas. And here we need to check if we got a lemma. So if let statement, lemma, and here we're checking if we really have a lemma using the tag.rawValue. And if we have a lemma, then we're going to use our wordSet again, insert the new lemma, also as a lowercased string. And actually, that is all there is to it when it comes to enhancing a search with natural language processing.
All the code that you've seen earlier about the filterEntries, that is really simple and that you might already have used for another search feature in another app, is exactly the same. And now, let's just try that out in the simulator and see if this search, this NLP enhanced search really works, so let's wait until the simulator's started. Search for the word goose, for example, and here we have, "I recently read a book. "It was about one goose." And now, another one is, "I am not sure if you can buy geese as pets." So, as you can see, lemmatization works and we have a greatly enhanced search feature now with the power of natural language processing.
- What are machine learning, Core ML, Vision, and NLP?
- Adding a machine learning model to a project
- Getting predictions from machine learning models
- Converting existing machine learning models for Core ML
- Classifying images and detecting objects with Vision and Core ML
- Analyzing natural language text with NSLinguisticTagger