Validating input to a website is the most important security measure. Examine common data conditions that deserve consideration.
- Validating input is an important technique to ensure that only good data is allowed into your web application. As we just saw, regulating requests provides a first line of defense by examining the envelope around the data being sent to our servers. If the envelope passes inspection, then the data inside the envelope should be inspected next. Watching the data coming through well-known public pathways is one of the first steps to secure any website. Most hackers don't use secret back doors or unexpected zero-day exploits. More often, they use the standard data inputs, but send in malicious data. Data validation determines if the data being received as input is acceptable. This means you need to establish criteria to separate good data from bad data. What are your expectations for the data? What should be considered acceptable data? What should be considered unacceptable data? The answers to these questions will be different for every web application and even changes from page to page. However, there are some common validations that are useful. The first is to validate the presence of the length of data. For example, a first name field should not be blank and should not be longer than 50 characters. Another common use it to ensure that strings are not longer than the space allocated in the database columns where they'll be stored. The second is to validate the type of data. If a number is expected, the website should not accept a value that's not a number. If a file is being uploaded, we can check the file type. For example, a web application may want to only accept JPEG images or only PDF files. The format of the data can also be validated to ensure that it matches an expected pattern. The most common example is to make sure that an email address looks like a legitimate email address. It should contain a username, an at symbol, a domain name, and end in a valid top-level domain like .com or .org. Regular expressions are frequently used for validating the format of data. We may also validate whether data is within a set of values. We could validate that it's a number between one and 10. Or we might validate that a value corresponds to a database value. Or we may validate whether data is excluded from a set of values. Validating for inclusion or exclusion in a list is a good place to use the allow lists and deny lists we discussed in the previous chapter. Usually we want to make sure that an existing value is not being used again. A new user can't have the same username as an existing user. A blog post can't have the same URL as an existing blog post. So we can perform checks to confirm uniqueness. Often this validation requires making a database query to find out if a value already exists in the database. When writing validations, make sure that your validation logic is correct. Every programming language has quirks and pitfalls that you need to watch out for. Testing for exact equality and testing whether a value is blank, empty, or not defined is often tricky. For example, in PHP, a validation might test for the presence of a value. If the value was NULL or an empty string, you would expect it to return false and fail the validation. If the value was a string, you would expect it to pass the validation. But notice what happens when the value is a string that contains the number zero. It also returns false. Other numbers will return true and pass the validation, but zero is special and will fail. You can see why this would be a problem if you imagine a web form that asks how many pets you own. You can't answer zero. So validating input is an important technique to ensure that only good data is allowed into our web application. It reduces software bugs that can become vulnerabilities, and it will become more difficult for hackers to slip malicious data past our defenses.
- Threat models
- Least privilege
- Defense in depth
- Validating and sanitizing input
- Credential attacks
- SQL injection
- Cross-site scripting