Join Simon Allardice for an in-depth discussion in this video Planning your database, part of SQL Server 2008 Essential Training.
It's very useful to have these sample databases to play around and experiment with but let's face it. The main reason we want to have a database server is so that we can have our own database. But it would be a bad idea to start by just jumping into SQL Server Management Studio and going crazy with the right-click mouse button. That is how bad databases get started. So in the next few minutes what I am going to do is talk about the basics of how to plan or model your database before we start to build it.
You see while in other areas of development, things like programming and web development, I'm a really big fan of the iterative incremental design idea that you build apps quickly, you get them out, you revise them, you add new features repeatedly over weeks or even days, but that's not what I want to do with the database. You see building a database is like getting tattooed. You'll really want it to be correct the first time you do it. Changes are possible but they are painful.
The great thing is though while version of SQL Server 2008 R2 might be new, relational databases aren't. They've been around since the '70s and that's a good thing, as the methods from modeling a database have been battle- tested over four decades, and to make a good one, you just follow pretty much a 40-year old process. It doesn't really even matter what database management system you're using. Now while you can use a diagramming piece of software like Visio, all you really need to model a database, at least initially, is pencil and paper and be prepared to think deeply about a few questions.
A database modeling is not a place to express your inner creativity and find wild and crazy new ways of doing things. If you want to get wild and crazy, do it in your user interface, in your application, but in your database, I am afraid you want to be patient, methodical, step-by-step. The first thing we need to do is ask ourselves a few questions. Number 1, what's it for? What is the point of this database? Be careful of the first answer that comes to mind.
Sure, in most cases, you're building a database to support an application, whether that's Desktop, or mobile, or a web application, but say you're building an online bookstore. Well that's way too easy to say that, the database stores necessary product and order information. Yes, that might be true but you should be asking what are the goals of the store, or the website, or the application. Because wherever you wanted to go over the next year or five should affect what you build right now.
Even having an elevator picture or a mission statement in mind will help you build a better database. If the idea behind your online store was "We help customers find books, discover what others thought about them, purchase and track their orders, contribute their own reviews and opinions, learn about other products they might like based on people with similar reading habits," you'll build a very different database from that description than you would from the first one. The second question. What do you have already? Do you have an existing database, even in something like Access or FileMaker? If so, well what's wrong with it? Because you don't want to just make the assumption that you're going to import all your existing data straight into this new system.
There is almost certainly problems with everything you have. Take the opportunity to fix any problems before you just re-create those same problems in SQL Server. Do you have an existing process that this database will replace or help, even a manual process? If so, get all your physical assets together, printouts, order sheets, filing cabinets, and people, of course, because understanding the data that you already have is essential before you can answer the next question, which is what tables do you need? Each database you make will consist of one or more tables and these tables are the basic building blocks of a database.
When you create a separate tables for each entity, that is each object or each thing that needs to be represented in your system, some of your tables might represent things that exist in the real world like a customer, or product, or an employee but others can be more abstract. A blog entry, a comment, an appointment, or event. Each of these tables will contain multiple rows of information, whether that's one employee, or a dozen, or a million.
This is one of those places where you could take a look for inspiration at some of the databases that are provided as samples. We can actually expand say AdventureWorks LT, the light version, and look at their tables here. They've got SalesLT.Customer, Product, ProductCategory, ProductDescription, SalesOrderDetail, SalesOrderHeader. Now if you notice these are the table names and they are prefixed here with SalesLT. That is something specific to SQL Server here. It's what's called a schema and it's a way of grouping your tables together into larger containers if you will.
We are not going to do an awful lot with schemas but to give you some idea of where that goes, if we look at one of the larger sample databases, the tables in here are grouped into several tables under the Person schema, such as Person.Address, Person.Contact, Person.ContactType. We have got Production schemas and HumanResources. So the easy perspective to take is it's just a way of grouping some of your stuff together. But we're defining fairly simple tables to begin with and after figuring out what tables need to exist, you then zoom into each individual table to specify exactly what data should be stored for each entity.
So in this case for an employee and here you need to be as specific as possible, defining separate columns for each individual piece of information. As a rule, you're going to go as granular as possible, not just a name but first name and last name or even title and suffix if needed. As you are defining your columns, you're going to define what is the data type for each column. What is it? Is it text data? Or is it numeric? Is it a date or time? Or even binary data or XML? You are going to define how big they should be.
If it's a text column for example, will it represent a few characters of a name? Or will represent the contents of a thousand-page manuscript? Your database management system wants to know so it can be efficient about storing it. Is this column required or is it optional? Are there maximum and minimum values for it? Should it match a pattern like an email address, or phone number, or a credit card number? You see flexibility is usually our friend but it's not what you're looking for in your database. If you want to store an e-ail address, you want to know it will always be an email address, not sometimes an e mail address, and sometimes a date, and sometimes an inspiring quots, and sometimes an MP3 file.
Defining your columns as exactly as possible means that SQL Server will enforce rules on those columns, your data will stay valid, and you won't end up with a database full of garbage. Next up, you need to define what keys you have and that really is how do you get to a particular row. Each row in each table should have something called a primary key and it's something that uniquely identifies that individual row. If we have an employee ID, it should take us to only one employee.
If we have a customer ID, it takes us to only one customer. So we define which column it is that contains our primary key. Now sometimes the key is already naturally in the data but often you'll need to generate a unique key and the database can help you with that. Then you'll define what relationships you need. You are splitting your database up into tables but many of these tables will need to know about each other. If you're creating an order, for example, that order would be typically connected to a particular customer and it will represent the purchase of one or more products.
We never want to duplicate data in our database. So we don't want to copy customer information into order or copy product information into order. We instead describe relationships between our order table, our customer, and product tables. We shortly see how to define what those relationships are. They can be one to many, many to many, one to one, or none at all. You can then use these relationships to answer all sorts of questions. How many orders has a customer had, how many products were in a particular order, or going the whole other way, find a particular product and find all the customers you ever ordered it by going through the order table.
After you've planned out of these tables, these columns, these keys, and relationships, you can technically go ahead and build them in SQL Server, start adding some data, and see if it exposes any issues with your first design, and it typically will. You'll realize something needs to be stored differently or split out into its own tables. There is something called database normalization, a set of guidelines and rules you can go through that will expose issues with your database, and they are super important but I'm going to talk about those after we've seen how to apply some of this in SQL Server.
- Using T-SQL (Transact-SQL)
- Managing databases with SQL Server Management Studio
- Understanding database normalization
- Using SELECT statements
- Building indexes
- Monitoring database size and integrity
- Backing up and restoring databases
- Creating functions and stored procedures
- Managing database permissions
- Creating and formatting reports
- Adding charts to reports
- Creating and executing a simple SSIS package