- [David] Hi, I'm David Powers, and welcome to this week's edition of PHP Tips, Tricks, and Techniques designed to help you become a smarter, more productive PHP developer. This week's technique is something that should appear to typographers, smart quotes and apostrophes. In other words, converting straight quotes in texts to curly quotes. I've got an example in this file. The string at the top of the script contains a combination of double and single straight quotes.
But some of the straight quotes are apostrophes like this one here. And then at the end of the sentence we've got a particular challenge. We've got some straight quotes surrounding the title of the play, but in the middle there's an apostrophe. And on line six I define a function called smart quotes which takes a single argument, the text where you want to convert the straight quotes into curly quotes. I'll come back to how it works in a moment. But if I scroll down to the bottom we can see here that in the HTML on line 30 I'm passing that string to the smart quotes function and then outputting it.
So let's load this page into a browser to see the result. I'll just zoom in a little bit so we can see it more clearly. The opening and the closing quotes, they've been paired correctly. We've got some more opening double quotes here and then we've got an apostrophe. That looks good. That apostrophe also looks good. And then this challenge at the bottom here, we've got an opening single quote there, we've got the apostrophe in the middle, that's correct, and we've also got the closing right hand quote there.
They all look good. So let's return to my editing program to see how the smart quotes function works. We'll scroll back up and take a look at it. Lines seven through to 13 define a series of variables that look rather like Klingon poetry. Then on lines 15 and 16 they're assigned to a couple of arrays called patterns and replacements. And these in turn are passed as arguments to the preg_replace function.
Preg_replace uses Perl compatible regular expressions to match patterns in a string and replace them. And a really useful feature is that you can use arrays for both the patterns and for the replacements. So anything in the string passed to preg_replace that matches the first regular expression, in other words double quotes, will be replaced by double replace. So up here, double quotes is a regular expression, and double replace is the replacement, and it uses unicode code points.
Then anything that matches ls_quote will be replaced by ls_replace. And anything that matches apostrophe is replaced by rs_replace and then finally rs_quote. Anything that matches that is replaced by rs_replace again. So what about all of this Klingon poetry? Well, if regular expressions give you a headache, spare a thought for me. I had to work all this out in the first place. And if regular expressions are a complete mystery to you, check out Kevin Skoglund's in-depth course Learning Regular Expressions.
I'm just gonna concentrate on what we have here. Let's start with double quotes. This is the regular expression that matches pairs of opening and closing quotes and everything in between. It begins with a double quote in parentheses, so it matches a literal double quote and nothing else. The parentheses make it a capturing pattern. Then we've added another capturing pattern with a character class in square brackets that matches anything except a double quote.
The plus and question mark make the character class non-greedy. So this matches and captures everything up to the matching closing quote. The backslash one matches whatever was captured in the first capturing group, in other words, another double quote. The replacement string is in double quotes because it contains escape sequences. Backslash u 201C is the unicode code point for an opening double quote.
$2 represents everything captured by the second capturing group. In other words, everything between the opening and closing quotes. Then we replace the closing quote with the unicode code point for a closing double quote. Next we deal with left single quotes. This is the regular expression that matches them. The first part of the regex is a negative look behind with backslash w, the shorthand for a word character.
So this won't match anything that begins with a character that can appear in a word. It's followed by a literal single quote and a positive look ahead, again with backslash w. So whatever follows the single quote must be part of a word. The replacement string is the unicode code point for a left single quote. The parentheses around the look behinds and look aheads don't form capturing groups.
So this simply replaces the straight single quote. Next we deal with the apostrophe. The regular expression is very similar to the one for the single left quote. It beings with positive look behind for a word character, followed by a single quote and a positive look ahead for a word character. In other words, this is looking for a single quote between two word characters. This won't match an apostrophe at the end of a word, but that case is taken care of by the pattern by a right single quote.
The replacement text is the unicode code point for a right single quote. And that replaces the single straight quote that's been matched. Finally, right single quotes. This is the regular expression. It begins with a positive look behind. The character class matches word characters and some punctuation. Then a single straight quote followed by a negative look ahead that won't match any character that can appear in a word.
The replacement character is the same as for an apostrophe. It inserts a right single quote in place of the straight one. That was really some heavy lifting with regular expressions. In this version of the smart quotes function I've assigned the regular expressions and replacement strings to variables for ease of identification. But in this other file, optimized.php, I've passed them directly as arrays to preg_replace.
This produces identical results to the more verbose version. Now I can't guarantee it will work in absolutely every situation where you want to replace straight quotes with curly ones, but I think it's fairly robust. Let me know if you find situations where it fails. Well that's it for this week. I hope you found it useful and that the Klingon poetry didn't give you too big a headache. 'Til next time, thanks for watching.
Note: The exercise files are free to all members. The code is commented to enhance your learning, but you will need database connectivity for some files to run as intended.