Before getting into the detail of garbage collection, this chapter focuses on how Java can be efficient when you are creating strings, by placing them in a pool.
- [Instructor] In this chapter, we're going to spend some time understanding how the virtual machine's garbage collection process works. Although you can't tune the garbage collector, it's worth understanding what it's doing as with this knowledge, you can write code that avoids memory leaks or at least, we'll learn how to monitor our application's memory usage and the effectiveness of the garbage collector. And with this, we'll be able to detect and correct potential memory leaks. So let's start by considering when variables are eligible for garbage collection.
The stack is very efficient. Java can manage it very easily. It knows that as soon as a closing curly bracket is reached, it can pop variables off the stack that were created within the scope of the code block that's being exited. The problem, though, with the stack is that its scope is so tight. It's based on code blocks. Often, we want to an object to live for a longer period of time than its enclosing scope. That is, we want to often share objects between code blocks.
Here's a simple example. In this code, we're passing an object, the customer reference by the variable c, from one method to another. The second method is referencing the object that we create in the first method. The actual object is being referenced here, not a copy of the object. If Java had put the data for the customer on the stack, it would become out of scope and therefore, invisible in the second method. So the fact that Java creates the customer object on the heap means that it's able to be shared, in this case, with the second method.
We've seen this already in this section of the course. We now know that's what the heap is for. It's a massive storage area for objects and the lifetimes of the objects on the heap are going to be very variable. Some might live only for a little while and some will live for a long time. If you've written code in other languages such as C, C++, Visual Basic, or Pascal, you might be aware that in those and other languages, you have the choice when you create an object, whether you want it to be stored on the stack or on the heap.
Java decided not to offer that choice, but to place all objects on the heap. The reason for this was that one of the design goals of Java was to simplify choices and where possible, to provide a single, clean way of doing things. Now, you might well argue that this is far from the Java that we know today, but that was the goal back in the mid-1990s. Actually, modern virtual machines are very efficient and clever. And if they detect that an object you're creating is not going to be shared, that is, it doesn't go outside the code block in which it's created, then the virtual machine will in fact create that object on the stack.
This is not something we need to know about. We won't see it and it won't impact anything we write. But I'm mentioning it just to point out that although Java doesn't give us any control over where objects are created, the virtual machine, in reality, makes the most efficient choice for us. So our code will generally run in an optimized way. There is a further optimization that virtual machines make and I'm going to take a moment to mention this one, as I think it's useful to know about.
I've created a blank project here in Eclipse, but I'm not going to be doing very much in here, so you don't need to create this for yourself. Just sit back and watch what I'm about to do. I want to create some simple code that's going to generate two strings with identical values. So let's have a string called one with a value of hello and a string called two with a value of hello. Now, from what we've learned, when this application runs, we expect there to be two variables on the stack pointing to two string objects on the heap.
Well, and this point only applies to strings. That's not quite true. It's not quite what happens. Because strings are immutable, that is, they can't be changed, Java is clever enough to know that it's safe to make both of these stack variables point to a single object on the heap, the same string object. When the second variable is created, Java reuses the existing heap object and points the second variable to the same object as the first.
This is known as internalized strings. Now, we can check that that is the case. I remember teaching in the Java Fundamentals course that you need to be careful when you're comparing strings to use dot-equals rather than equals-equals. We know that dot-equals tests the value equality whereas equals-equals tests reference equality. So if these two variables are pointing to the same object on the heap, then equals-equals will return true.
Let's try it to see if that's the case. So if one equals-equals two, then we'll print out, they are the same object. And I'm going to put an else in. And if they're not, we'll print out, they are different objects. I just need to put some semicolons at the end of that line. And if I run this now, well, we can see they are indeed the same object.
So the virtual machine knows that there is no need to create a second string object with an identical value to the first. There's no harm in both of the stack variables pointing to the same object on the heap because strings are immutable. So although in our code, we think of two string objects having been created, actually, there's only one in reality. What actually happens with strings is that the virtual machine puts them into a pool and it will reuse the objects in this pool whenever it can.
Now, in general, this only happens with little strings. It won't happen with strings that are calculated from something else. For example, if I create a further string. I'll call this one three and set that equal to. I want to calculate it from something, so I'm going to create a new integer, give it a value, let's say, 76, then call its toString method. And let's create a little string with the same value. So we'll have string four equals 76.
Well, if I compare string three and string four using equals-equals, let me copy the if statement down to there, and this time, we're comparing string three and four. If I run this now, we'll see that those second two strings are different objects. So Java hasn't been able to reuse the string it created for object three for object four. And that's because the string for object three did not get placed in the pool.
Now, there is a method on the string class to internalize a string, that is, to force the virtual machine to place that string in the pool. And we can use this method where we think the string we're creating is important and is therefore sensible to reuse. The method's called intern and all I need to do is, at the end of the toString method, call .intern. If I run this again now, well, we'll see this time, they're the same object.
Now, I've rarely seen the intern method used in reality. Java will automatically place little strings into the pool anyway, so it's only these calculated strings that aren't placed in the pool. Now, the reason to use intern is that, of course, it's better for any strings that are going to be reused a lot to be in the pool, as this will minimize the number of objects created and needing to be garbage collected. We'll be saving the creation of lots of duplicate objects. But if we're having to use intern, there is, of course, the expense of running that intern method.
So what we've just seen here is a couple of different ways in which the Java virtual machine optimizes the creation of objects. It sometimes places objects on the stack and with strings, it might not create duplicate objects. But for now, let's put this information to one side and continue to work on the basis that all objects are created on the heap as separate objects. While we know it's not completely true, this is a good enough assumption, it's a good starting point to continue to learn about garbage collection.
- How memory works in Java
- Passing variables by value
- How objects are passed
- What are escaping references?
- How to avoid escaping references with collections and custom objects
- Garbage collection and generation sizes
- Detecting soft leaks
- Choosing a garbage collector
- Tuning a virtual machine
- Fixing a memory leak