Join Joey D'Antoni for an in-depth discussion in this video The importance of datatypes, part of SQL Server Performance for Developers.
- [Instructor] One of the most important steps you can take in designing your database for performance and developing your application to perform well and scale is choosing the right data types and while this seems like a very simple task, you want to put text in a text field and numbers in a number field, it can be a little bit more nuanced than that. Especially when you start dealing with internationalization and Unicode data. We don't always deal with Unicode data but as you're building and internationalizing your applications, it's something you want to design for and the other challenge around data types is that once you've built an application and designed the database, more importantly, started loading data into that database and your customers are using the system, it's extremely difficult to make changes to the data types and the schema, typically something that requires pretty large adage if you're talking about doing it against production data.
So it's not something that can be easily done and fixed and there are lots of scenarios where you can make these kind of tragic mistakes. Whether it's choosing the wrong size for a number field and having way larger growth than you expected or maybe that whole internationalization thing. Maybe you started out domestically in North America and you never expected to need Unicode characters. But then you moved into Asia or Europe where you needed them and you had to make change over time. So it's something you need to have vision for at the beginning and it's not necessarily the easiest thing to adjust for.
So as always, good performance starts with good design. This applies in race cars and it applies in databases as well. You're never going to have a database that performs exceedingly well if your initial design is poor. If you're doing the wrong kinds of joins and you're doing all sorts of conversions, you're going to have major, major issues and data types influence all sorts of things. So space use, log use, the size of your tables and indexes are all influenced by data type use. There are things you can do to work around this.
You can enable compression and you can get some benefits by doing that. But at the end of the day, if you're creating all of your data types to be in NVARCHAR MAX 'cause you're not sure how big everything is going to be, you're going to have a really inefficient database. On the other extreme of that, it can be tempting to use the smallest data types. You may want to save some space and you may say that we're never going to have more than 256 values in this column, or in this table, so I'm going to use the tiny intValue to categorize that. It's almost never worth it to use tinyint 'cause there's almost in any structure that you build the possibility to have more than 256 values exist.
I'd highly recommend avoiding the use of that data type. And as we talk about tinyint, let's talk about some of the data types in SQL Server. If we talk about the text types, the main four there, CHAR, NCHAR, VARCHAR and NVARCHAR. The N represents data types that support Unicode. You'll learn more about Unicode in video three of this module, but it's important to understand upfront if you think you're going to have to store Unicode or international type data in your database and make those fields NVARCHAR. There is a small space penalty associated with that, so that would be why you would want to choose those over each other.
And CHAR versus VARCHAR. CHAR would be what you choose when you'd have a fixed width field so you know exactly how wide your field is always going to be. A good example of this would be zip code in the United States. It's always either four, nine or 10 characters, depending on how it's defined. But it's always one of those values. It's never any more than that. Next we have our numeric types, with BIGINT, SMALLINT, TINYINT and INT. And those are all just integer types that have a larger or smaller range depending on the size. Like I say, you want to avoid TINYINT.
We also have DECIMIAL, NUMERIC and MONEY, and those are how we define decimal type numbers. We also have a whole series of different date time data types. One of the more common issues I've seen with this DATETIME problem isn't with the DATETIME data types. It's the fact that a lot of developers will like to store their dates in either some sort of numeric field or even worse, in a text field. And what that means is they have to do a conversion every time and those conversions can be quite expensive. Use the appropriate DATETIME or date field for your application's data needs and always reference that in your application code with the same data types.
What you're trying to avoid there is converting, 'cause conversions are quite expensive. We have binary, so we can store binary files in the database. There are some costs associated with that. In that, there's binary files will consume a large number of data pages. You'll learn more about that in another video in this module. And then finally, just a few other data types that we have, XML, which is very expensive to store and load in the database, and parse, but sometimes you have to do it. The database engine frequently uses XML itself.
UNIQUEIDENTIFIER, or also known as a GUID. This is a type of field that will guarantee uniqueness across machines, but it's very wide, and we'll talk about that in another module specifically. There's a TIMESTAMP data type which isn't actually a time stamp. DATETIME2(7) is an actual time stamp. TIMESTAMP has some other uses. CURSOR is considered a data type, as is TABLE and HIERARCHYID, which is what we use for referential hierarchies. Understanding what you need to do with data types and performance, by far the most important thing you can do to improve the performance of your application is make sure all the data types in your application code match the database data types.
You want to avoid conversions. What does that mean? So, for example, if you're storing national ID number in a database, you don't necessarily want to store that as a numeric data type because you're never operating on it, you're never doing math on someone's national ID number. You're never looking for the max national ID number. You're always just matching. So that should be probably stored as a character field, even though it's not intuitive. When possible, you want to use the smallest data types you can to more efficiently store things. So if you have the option of choosing an integer over a GUID for a primary key, that's ideal.
But sometimes you may have to use a larger data type. If you have to merge data from multiple databases into a single one, you may need a unique identifier for your primary key. Or you may just have a very, very large volume of data and you have to use calm types, like BEGIN. One feature you can take advantage of that was added to all additions of SQL Server starting with SQL Server 2016 SP1 is data compression and there's two main types of data compression when we're not talking about columns or indexes, and that's data and page compression and if you have to use larger data types, row compression can help with this.
Page compression will help you reduce space even more, row compression will mainly fix the poor data type choices you've had to make, but page compression will get even better compression from there so, compression is something you definitely want to think about using and just always remember that compression is setup at the object level and not at the database level. So you need to compress tables and indexes independently of everything else. What that means is you're going to want to choose the largest tables and indexes in your database to compress to get maximal performance benefit.
- Query execution
- How to read an execution plan
- What not to do with SQL Server
- Why cursors are bad
- Why scalar UDFs are expensive
- Datatypes and design
- What is a columnstore index?
- Transaction isolation