Google I/O 2009 – V8: ..High Performance JavaScript Engine


Ager: So my name
is Mads Ager. I’m a software engineer
at Google. And I work
on the V8 JavaScript engine. And this talk is going
to be about the internals of V8. So what did we do,
uh, to make V8 fast and what were our design,
uh, decisions when making V8? So here’s the agenda
of the talk. So the main part of the talk
will be about the design goals and the overall goals
that we had for V8. And, um, the internals. So what did we actually do? Uh, what are the techniques
that we used to make V8 fast? Um, and that involves
things like hidden classes, uh, native code generation
using inline caching, and precise generational
garbage collection. So that’ll be the main part
of the talk where I explain
these–these things. So, um, after that,
I’ll tell you something about a couple of recent
developments in V8. So the first one
is that we implemented a new JavaScript regular
expression engine completely from scratch. And the other one
is that we’ve built a new compile
infrastructure. I’m just going
to briefly mention those and tell you a bit
about them. Um, and then I’ll go
into something that, um, I think
is really important. Uh, and I’ve called it
object heap scalability here. Or JavaScript scalability. And that’s all about how well
your JavaScript engine deals with running really,
really big web applications where you have
a lot of objects. Uh, and therefore,
extra pressure on your memory management
system. Um, and I think that
this is really important. And I think that it has been
underemphasized so far. Um, so I’m going to spend
a bit of time talking about that. And then at the end,
I’m going to just, uh, tell you a bit about some
of the performance bottlenecks that we have in V8 right now,
uh, which will give you sort of an indication
of the kind of things that we like to, uh,
to work on in V8 to make it even faster. Okay, so for this talk, the first question is really
why did Google want to build a new JavaScript
engine? And the answer
is pretty simple. So we believe that in order
to build, uh, the next wave of big,
uh, web applications, you need better JavaScript
performance. Um, and at the time
when the V8 project started, the existing
JavaScript engines were not very fast. So basically
they were all interpreters working on abstract syntax trees
or on bytecodes. And they had, uh, poor memory
management systems, which led
to big pause times, um, and maybe even,
uh, memory leaks. Um, so we wanted
to try to change this picture. We wanted to try to push,
uh, JavaScript performance. Uh, and we thought that
the best way to do that was to build a completely
new JavaScript engine from scratch so that we could do
our own designs and start completely
from scratch. Um, so the overall goal
for V8 is to push the performance bar
for JavaScript. Not only for Google Chrome
and for V8, but we wanted to kind of try
to push the industry and see if we can get, uh,
better JavaScript performance in browsers in general, which will benefit
all of us. It’ll benefit Google
because we can do better web applications. And it’ll benefit
the users as well because, well, users like
web applications as well. So we wanted to build
a really fast JavaScript engine. So why is that a challenge? Well, JavaScript is a very,
very dynamic language. So objects are basically
just HashMaps. So you can add and remove
properties, um, to and from objects
on the fly in any order you like. And you have,
uh, no type structure. So if you want to do,
uh, a property lookup on an object in JavaScript, you don’t know anything
statically about the object. So if you have an object, you want to search
for the X-property, you have to do exactly that. You have to search. So you have to look
in your HashMap. If it’s not there, you need to traverse
the prototype chain and look for it there. And that’s, like,
really slow. So if you compare that
to static, object-oriented languages where you have a notion of type,
where you have classes, if you do a property load
in a language where you have a notion
of class, well, then the class tells you
the layout of your object. So if you want to access
an X-property on an object
that has a certain class, then the class tells you
the index in your object
for that property. So you can generate
just a few assembly line– assembly instructions
that will load your property. In JavaScript, if you just
do it, uh, like naively– implement JavaScript–
then you’ll have a full search of–of a number
of HashMaps in a chain. And that’s really slow. Um, right, so JavaScript
is highly dynamic. You can add properties
to objects. You can remove properties
from objects. Um, and, uh,
you can actually change prototype chains
as well dynamically. So you can change basically
everything, uh, in your program
on the fly. Also, JavaScript has a couple
of language features, um, that–that could
give you trouble as well. So one of them is eval. So eval is a function
that you can call with a string to evaluate that string
as JavaScript code. And that’s actually
not so bad. The problem with it is that
it can also change the calling context. And that means that,
um, if you’re trying to implement JavaScript, you really want to have
some scope information. So if you look for– if you access
a property X somewhere, you want to be able to–to see
basically in your scopes where is X declared? But if you have a call to eval,
basically, you don’t know because eval can introduce
new bindings dynamically. So you could introduce
a new X variable in, uh, the context
where you’re calling eval. So basically, you just have
to give up and say, statically,
I don’t know. Um, and there’s similar issues
with, uh, “with” statements in JavaScript. So “with” statements are used
to introduce a new object in your scope chain. So if you want easy access
to all the properties on the document object, you can say with document
and then have a blog of code, where if you just write a name
like “write” that will be looked up
in the document object so that will actually be
document.write. And that’s all fine. But if you want to, uh,
have scope information again, you have to give up
inside of a “with” statement because you don’t know. Because it might look
like a variable is declared and out of scope, but, uh, dynamically,
you might introduce an object with your “with” statement that has that property name
as well. And that’s the one
that you need to get. So overall,
JavaScript is very dynamic. Um, and that makes it a bit
of a challenge to implement it efficiently. So let’s talk
about what we did for V8 and which design decisions
we took. So the goal for V8 was to create
a JavaScript engine that would make, um, large
object-oriented applications perform well. So if you have large
object-oriented applications, you’re going to use
abstractions. And if you use abstractions, you’re going to have a lot
of property accesses and you’re going to have
a lot of–of function calls. And you’re going to have
a lot of objects. So that means
that in order to make these kind of applications
run fast, you need fast property
access, you need fast property
calls, and you need fast and scalable
memory management. So you want to be able to handle
a large number of objects, um, efficiently. So these goals led
to three key components of V8. So the first component is something called
hidden classes and hidden class
transitions. Um, so hidden classes is basically our notion
of types. Um, so JavaScript does not have
any types. And we’re not adding types
to JavaScript or anything like that. But internally
in the JavaScript engine, we have a notion of type
and that’s our hidden class. And I’ll–I’ll tell you more
about that. So once we have that, we–
we, uh, we can use a technique called inline caching. Um, so we generate native code
for all of our JavaScript. And we also generate code
on the fly and use a technique
called inline caching. So these two things together,
hidden classes and native code generation
with inline caching, gives us fast property access
and fast function calls. So those are the two
first items. Um, so the second thing
that we wanted was, uh, a fast memory
management system. Um, so we use, uh, a generational garbage
collector. And it’s pretty much standard,
but it works really well. Um, and that means that, uh, having a generational
garbage collector, we scale well to dealing
with a lot of objects. And I’ll tell you some more
about that as well. Okay, so let’s, uh,
let’s dig in. So to set the scene,
let me start by telling you a bit
about V8’s memory model. Um, so in V8, we use,
uh, 32-bit tagged pointers. Um, all of our objects
in our heap are, uh, 4-byte aligned. And that means
that if you have a pointer to a JavaScript object, uh,
the two least significant bits are going to be zero. So we can use that knowledge
to actually, um, do pointer tagging. Since we know that all pointers
should have the two least significant bits,
zero, we can actually put in
ones there, uh, to–to give
some extra information to our pointers. And then we just make sure
to strip them if we actually need
the actual pointer. Um, so what we use that for
is allowing 32– sor–sorry–
31-bit signed integers to be immediate values. And they’re distinguished
from real pointers by tagging. So if you’ll, uh, load a pointer
to a JavaScript object from a heap and the address
is even– it’s actually not a pointer. It’s an immediate
integer value. And you get the integer value
by just shifting, uh, one bit right. So basically shifting
the zero away. And that’s going to be
you’re, uh, integer value. So that means
that we can have efficient, um, we can deal with integers
efficiently, uh, because they’re just
immediate values and we don’t have
to do any heap allocation to handle integers. Um, so on the other hand, if you load, um, a pointer
to an object from our heap and it’s odd, so it has the least significant
bit of one, then it is an actual pointer. And the pointer to the object
is this value where you subtract one. So where you clear the tag. Okay, so, um, our base JavaScript objects
consist of three words. The first word
is a hidden class pointer. Um, so this is our notion
of type as I told you before. And I’ll–I’ll get into details
on that in a second. Uh, and the two other pointers
are used to point to where
we actually hold the values of properties. So we split the property’s
backing storage into two parts, one is for properties
that are named by strings. So like X, Y, and Z. And the other one
is for what we call “elements,” which are basically properties
which have integer names. So 012 and so on. Um, so all
of our JavaScript objects are these three-word
thingies. Okay, and the backing storage
for properties and for elements can be in two
different states, namely a fast case
and a slow case. So in the fast case, uh, a property’s backing storage
or an element’s backing storage is just an array, where we can do
direct indexing. And in the slow case,
it’s going to be a dictionary or a HashMap. Okay, so now we’re ready
to talk about hidden classes. So, uh, as I told you, in static, object-oriented,
uh, languages where you have classes, uh, you can do really fast
property access because the class describes
your object and you know exactly
where to find the properties of your object. So we wanted to have the same
ability for JavaScript. We wanted to be able to do,
like, a few, uh, assembly instructions
to load a property. Um, so to use these
normal optimization techniques, we needed to introduce
a notion of type. And for that, we use
hidden classes. So hidden classes,
uh, are classes, but they’re only internal to the–
to the JavaScript engine. So it’s something
that we keep track of and users don’t know about. Um, and hidden classes
group objects that have the exact same
structure. Okay, so talking
about hidden classes, it’s easiest to just,
uh, have an example. So, um, in JavaScript,
all objects are created from–from, uh,
from functions. Um, and here’s
a point function, which is a constructive
function that we can use
to construct points. And it just takes two arguments
and assigns them to the, uh, X and Y properties
of the resulting object. Okay, so the goal here
is that any point that we construct
should end up having, uh, the point map, basically, the point hidden class. So JavaScript objects created
in the same way should end up having
the same hidden class because they’ll have
the same internal structure. Okay. Um, so in this example,
I’m going to create two points. I’m going to show you
how they end up having the same
hidden class. So since all JavaScript objects
are created from functions, we create an initial class
for each function. So the point function here is going to have
an initial class, which is called class zero, which I’ve put up there. And this is the class–
the hidden class– that we’re going to give,
uh, to all new objects created from the point function. So when we, uh, create
our first point here, p1, that’s this three-word thing that has a pointer
to hidden class. And since this object
was created using the point function, it’s initial class is going
to be the initial class of the point function,
class zero. And then it has properties
and elements as well. And in the beginning,
we’re going to allocate a small piece of–
of backing storage both for properties
and for elements. Um, so that’s–
that’s the initial thing that you get
when you say a new point. And then we’re going
to start executing the body of the point function
with this newly allocated object as the–this pointer. Okay, so the first thing
that we’re going to do is add an X-property
to this object. So adding a property
to an object changes the structure
of the object, which means that we need
to change the hidden class of the object. So when we assign X
to this property, we need to create
a new class. And we do that by copying
the original class, which is class zero, and then adding the knowledge
to that class that we have an X-property. and that–that property
is at offset zero in our backing storage. And then we’re going
to put the value into the backing storage
at offset zero. And importantly,
we’re going to remember in class zero that if you have an object
that has class zero and you add a property
with the name X, then you need to transition
to class one. So that’s what we call
a hidden class transition. Another similar thing happens
when we add the Y-property. So adding a new property
to an object changes its structure. And therefore it needs
to change its hidden class. So we create
a new hidden class by copying class one. We add the knowledge
that we have a Y-property and that the value
of the Y-property is at offset one
in our backing storage. We fill in value. And we, uh, update
the hidden class pointer to point
to this new class. And again, we remember
in class one that if you have an object
that has class one and you add a Y-property
to it, then you need to transition
to class two. Okay, so that finishes
the construction of our first point. So you see this is actually
a pretty heavy weight to construct the first object
from a constructive function because we need to create
all of these new classes by copying existing ones
and adding a bit of information. Um, but that only costs
on the first construction. So on the next construction, um, we’re basically going
to do the same thing, but we’re going
to reuse our classes. So if we go to the construction
of, uh, of p2, that will create
a new object, which is this
three-way thing. The initial class
is going to be the class associated
with the point function, which is going to be c0. Class zero. And then we’re going
to execute the body of the point function. So the first thing
we’re going to do is assign, um, is to add
an X-property to our object. So now this point
has class zero. And in class zero,
there’s information that if you have an object
that has class zero, and you add a property
called X, well, then you need
to transition to class one. So that’s what we’re going
to do. So for the assignment to X, we’re just going to update
the hidden class pointer and put the value
in the backing storage. And similarly for Y, we now have an object
of class one. In class one, we know
that if you add a Y property, you need to transition
to class two. So that’s all we’re going
to do. We’re just going to update
the hidden class pointer and, uh, add the value
to the backing storage. So now we’ve created
two point objects. They’ve been created
in exactly the same way. And they end up sharing
the hidden class, class two. And the important thing here
is that in class two, we have a complete description
of our objects. We know that all objects
that have class two will have X at offset zero
in the backing storage and Y at offset one. And we can use that
to optimize property accesses. So let me revisit
my previous statement. So I said that JavaScript
is really challenging to implement because
it’s really, really dynamic. And that’s true. So in JavaScript,
you have a lot of features that allow you to write
really, really dynamic code. But it turns out
that if you write, uh, nice,
well-factored code, in most cases, you’ll have,
like, parts of your code that deal with objects
that have a certain structure. So now that we have
hidden classes, we can instrument
our JavaScript engine to actually collect information
about the hidden classes that hit certain access sites. And it turns out that 90%
of the cases or maybe even more,
uh, only objects having the same hidden class
are ever seen. So that means that,
basically, you know something about the structure
of your objects when you’ve seen an object once
at an access site. And that’s what we’re, uh,
and that’s what we’re going to do to, uh, to–
to optimize. Um, so basically,
at runtime, JavaScript is not that
dynamic. You basically see things
that have the same structure again and again and again
and again. So we can use that. Um, so the only problem here is that when we generate
the JavaScript code– sorry, the native code
for our JavaScript code, um, we don’t know
the hidden class of an object
at an access site, uh, because we have
no type information in the language. Um, so that means
that we only know something about our objects
when we see the first object at an access site. Um, so in order
to handle this, we use runtime
code generation. And we use a technique called
inline caching. So here’s
a very simple picture of what inline caching
is like. So in the beginning, we just generate native code
for our JavaScript code. And at all access sites,
we have absolutely no knowledge. So if we want to load
an X-property, we’re just going to call
a piece of code that’s going to do
a completely generic full lookup, uh, through
our objects to see if they can find– uh, if it can find
the property. But once
we’ve done that once, we have some extra
information. We know the hidden class
of the object that we’ve just looked at. And we know, um, where we found the property
in this object. And based on that knowledge,
we can generate a small snippet of new code that’s going to be
really fast. Um, and it’s going to assume
that the next time we get here, we’re going to see
the same hidden class again. So then we’re going
to rewrite the code. So that at this access site, we’re not going to go
to the fully generic code. But we’re actually going
to hit the fast lookup code. And the only thing
the fast lookup code has to do is validate our assumptions. So it needs to check that
the next object that gets here actually has the same
hidden class as the first one. And if it does, then it knows exactly
where to load the–the property value. Okay, and if the assumptions
are broken– so if we get here
with another type of object– well, then we bail out
and then we go to the fully generic
lookup again. And again, that’s the same
for all access sites. When we’ve seen the first,
uh, object at an access site, we generate a new piece
of code to do a fast lookup
the next time. Um, and that’s
with the assumption that we get the same
hidden class again. If we don’t,
then we bail out and do a full generate lookup. Okay, so what does
the fast case look like? So it looks like this. Um, and I’ve actually
given you all of the information
that you need to– to completely understand
what’s going on here. So let me try to step–
step through it. So this is the–
the small snippet of code that we generate to access
a property on an object. So the first couple
of lines here, the first line
just loads the receiver from the stack
into register eax, okay? So as I told you,
we use tag pointers. So either–so now
we’ve loaded something. Okay. So now we need to check
if it’s an object or if it’s
an immediate value. Uh, maybe a small integer. So that’s what the second line
here is doing. It–it, uh, tests
if the least significant bit of eax is one. If it is,
this is an object pointer. If it’s not, it’s an immediate
integer value for which this snippet
doesn’t really do anything. So we’re going to jump out
and go do the generic case if that’s the–
if–if that’s happening. So if we see an integer here,
we’re just bailing out. So the–the first lines here
are just loading the receiver, checking that it’s an object. So the next couple
of lines here are validating
our assumptions. And our assumption is
that we’re going to see the same hidden class again. So this piece of code
was generated with the knowledge that the first,
uh, that the hidden class of the first object
that got here was at address
whatever’s there. 0xf78 and so on. Okay. So now we need to check
that the hidden class of a current object
is the same one. So we’re going to load
from eax minus one. So that’s removing the tagging
from the pointer and loading a pointer
from there, which is the map pointer, and we’re just comparing that
to what we saw before. If it’s the same,
we’re happy and we load
the properties array from our object. And from the properties array, we load the property value
directly. And then we return. So in the fast case, all we need to do
is check our assumptions. We load the receiver.
Check that it’s an object. Check that it has the correct
hidden class. And then directly load
the property. So that’s a few assembly line
instructions. And that’s really,
really fast. And if any of
our assumptions break, if we either get an integer
or if the class does not match, then we bail out and go
to a fully generic lookup. Okay, so basically, I’ve told you now
about two states that our inline caches
can be in. There’s actually a third
as well. So the three inline cache states
that we have are uninitialized, which is where
everything begins. We don’t know anything yet. We haven’t seen
any objects here. So once we see
the first object, we transition
to monomorphic, which means that we’ve
only seen one hidden class at this access site. If we see more hidden classes
at the same access site, we go into what’s called
a megamorphic state. And that just means
we’ve seen more types here. Um, and in that case,
we actually use a cache of generated
fast-case stubs. So if we’ve already generated
a stub for the, uh, this hidden class
that we’re seeing and this property,
we use that. And otherwise, we go
to the full generic lookup. So even if we see,
uh, multiple different classes at the same access site, we’re actually trying
to–to use some of our optimized stubs. Another thing worth,
uh, knowing about inline caches in V8 is that we clear them
on full garbage collections. And that might sound odd. So we’ve just spent time
generating code on the fly. Uh, and now we just
throw it all away. So the reason for that is that,
first of all, it allows us to get rid
of unused code stubs. So all of the code
that we generate is allocated
in the JavaScript heap like any other JavaScript
object. And it flows freely around. Um, so in order to be able
to, uh, to throw it away, uh, we need to clear
inline caches so there are no pointers,
uh, to these stubs. Um, so if you have
an application where you have, like,
a couple of stages– so in the first stage,
you set up a lot of stuff. And you do a lot of things
with one piece of code. And then basically
you transition to some other– to–to, like, another phase
where you use, basically, another piece of code. We don’t want to keep
all the code objects alive from the first piece of code because if they’re not going
to be used anymore, that’s a waste. Um, so clearing inline caches
at full garbage collections allows us to get rid
of unused code stubs. The other thing that clearing,
um, inline caches at full garbage collection
gives us is the ability
to actually go back to the fast case
if we hit the slow case where we’ve seen multiple
different hidden classes at an access site. But basically,
we’ve transitioned to another, like, phase
of the program where we’re only seeing objects
with the same, uh, with the same
hidden class. Uh, then we basically
don’t want to get stuck in the situation
where we think that we’ve seen multiple, uh, things. So clearing inline caches
on–on full garbage collection gives our inline caches
the ability to go back to the fast
monomorphic case again. Okay, so, um, that was,
uh, an introduction to how we make
property accesses fast. So there are
two things here. There are the hidden classes
and–and the transition between hidden classes. And there’s native code
generation with inline caching. So the next thing I’d like
to–to tell you about is, uh,
our memory management. So we use a precise generational
garbage collector. And what we really want here
is, uh, to get in a situation
where the JavaScript engine scales well. So if you have a lot
of objects, uh, and a lot of, like,
big data structures when you run your program, we don’t want
the garbage collector to have to look
at all of these objects every time it has to, uh,
uh, reclaim, uh, some memory because that’s going to lead
to big pause times and it’s going to degrade
your performance. Because every time you need
to get some free memory, it has to look at everything. So we don’t want that. So instead, we use generational
garbage collection. Um, so we have
two generations. We have a young generation, which is one, small,
contiguous space, which would
garbage collect often, and then we have
an old generation, uh, which is divided
into a number of spaces. Um, and–and we only
occasionally garbage collect the old generation. So the reasoning here
is that, um, in object-oriented programs, most objects that you allocate
are going to be dead, like, really soon. You’ll have a lot
of temporary objects. Um, and the ones
that do not die immediately are often long-lived. So we allocate
all of our objects in the young generation. When–when we run
out of space in the young generation, we need to do
a garbage collection. But we only garbage collect
the young generation. And since most objects
that are allocated die really young, there’ll be a lot
of dead objects in our young generation. So we can get rid
of most of them. And then we’ll be ready
to allocate again. So there’s no reason for us
to look at the old generation if we can free enough,
uh, memory from the–from the young
generation. Um, so objects that survive
in the young generation will be promoted
to the old generation and then will continue. Okay, so, um, the old
generation, as I said, is divided up into a number
of spaces. And I’m not going to go
into a lot of details with that. But, so–so the reasons
for having multiple spaces is basically to be able to, uh, um, manage the–
the bookkeeping in our memory management
system. So for instance,
uh, we need–we have pages that are 8, uh, 8K big. So if we have objects
that are larger than that, then we’re not going
to allocate them in the normal way. So we have a special space
for large objects. Uh, and also,
uh, as I said, we’re dynamically generating
native code on the fly. Um, so we need one space
that’s executable, uh, where we generate
our native code and run it from there. Um, so that’s
the code space. And then there are a number
of other spaces. Good. So which types of garbage
collection do we use? Um, I’m sorry about that. My slides
are a bit messed up here. Uh, I think we’ll be fine. So, um, we have three types
of garbage collection. So the first one
is a scavenge collect. A scavenge. And on a scavenge collection, uh, we only look
at the young generation. So this is what we do
all the time. Uh, and that’s,
uh, copying collection. Um, which means
that it’s leaning on the size of the live data. And as I told you,
a lot of data dies– a lot of objects die
really young. So that means
that most of your data in the young generation
is going to be dead. And since this, uh, garbage
collection technique is leaning on the size
of live objects, it’s going to be really fast because there’s not going
to be much live data. So typical pause times
for a scavenge is one to two milliseconds. Um, and we do that
all the time. Okay, so once in awhile, we’ve actually, uh, promoted
enough objects that we should, um, try
to clear up the old generation as well. Um, so once in awhile,
we do full garbage collections. And we have two flavors
of those. So the first one
is a non-compacting collection. So that’s a mark-sweep
collection of both young and old
generation. And in the non-compacting
version, all free memory
gets added to free lists and we allocate from there. Um, and this might cause
fragmentation. So, um, so fragmentation’s
not good. So that’s why we have
a–another way of garbage collection. Um, okay, so a typical
pause time for, like, really big
applications for a full,
uh, non-compacting collection is up to 50 milliseconds. Often, you’ll see
a lot smaller. But if you’re building
really big applications, you can see pause times
up to 50 milliseconds. So the last type of garbage
collection that we have is a full compacting
collection. Uh, and that’s
a mark-sweep-compact collection of both generations. And instead
of adding free memory to free lists, we compact everything. So we move all of the objects
to the beginning of pages um, to get rid
of the fragmentation. So basically, inside of V8,
we have a notion of– we have an idea
of how much fragmentation we’ve created. And when we reach
a certain threshold, we’re going to do a non–
sorry, a compacting collection to get rid
of the fragmentation. Okay, so getting rid
of fragmentation means that we need
to copy a lot of objects. So if you’re building
a really large application, uh, you might get pause times
of up to 100 milliseconds here. Yeah. Oh, so the question is
“how big is the, uh, is the– is the young generation?” Um, it’s not very big. It’s, like, wow, honestly,
I don’t even remember. I think it’s–it’s–
it’s a megabyte or two. Something like that. So the–so–I’m not sure
I understand the question. So you’re saying
that if I have an application that allocates
a lot of small objects– a lot of small objects,
uh, and then you’re saying that then
the young generation’s going to be scanned
frequently? Well, no, not really. So–so the–so, um,
so if you allocate a lot of small objects, yes, they’re going
to be allocated in the–
in the young generation. Um, so if–if when you– so when you fill out
the young generation and you don’t have
any more room, you need to garbage collect. So at that point,
you look at all the live data and then it’s true
that if all of your objects are still live, then you’re going to look
through all of them, basically. But if they survive,
then you’re going to move them to the old generation and you’re not going to look
at them again. Yeah. But, yeah,
you’re right. I mean, as long as you just
allocate, uh, and–and keep
everything alive, it’s going to be allocated
in the young generation, and you’re going
to scan all of them and basically move them
to the old generation so you don’t have to look
at them the next time. Yeah. Oh, the timing of these? Uh, these are on, uh,
all desktop–desktop machines. Uh, and–and, uh, so basically
running inside of Chrome on, like, a modern machine–
desktop machine. Okay. So that covers the part
of how we made V8 handle, um, property accesses,
function calls, and, um, uh, write really well
and be fast at that. And how we made V8 scale, uh, by using generational
garbage collection. So now let me talk
a bit about, uh, some recent developments. So recently, we’ve, uh,
we’ve implemented a new regular expression engine
for V8. It’s called Irregexp. Um, so initially, V8, um, actually
in the beginning, like, completely
from the beginning to get V8 up and running,
we use PCRE, which is a standard library
for regular expression matching. Um, the problem with PCRE is that it does not match JavaScript regular expression
semantics. So you need something else. Either you need
to modify PCRE or you need to have,
like, a compatibility layer that basically translates
PCRE regular expressions– sorry, JavaScript regular
expressions to their equivalent
PCRE regular expression and then use PCRE. Um, so luckily,
at WebKit, they, uh, they had this problem
as well. And they created a library
called JSCRE, uh, which matched,
uh, JavaScript semantics. So we switched to that. So the initial version of V8
used a library, uh, from WebKit
called JSCRE. Um, so JSCRE did not fit V8
that well so it didn’t really
perform well. And the reason for this
is that, uh, JSCRE, uh, expected a different string
representation than most strings in V8. So in order to actually
hand off a string to JSCRE, we would most of the time
have to do a string conversion. And that’s really costly. Um, so this really didn’t
perform very well. So, uh, and we thought
that we could do–do better. And we also thought
that we could do better when it comes to algorithms, uh, for regular expression
matching. Uh, so we implemented our own
regular expression matching, which has given us a 10x speedup
on regular expression matching on our benchmarks. Um, and our new library
implements all of, uh, JavaScript
regular expressions. So there’s no fallback
to any other engines. It’s a full JavaScript regular
expression implementation. So what’s in there? Um, well, first of all, um, the new
regular expression engine is based on an automata
approach. So from an input regular
expression, we build an automaton. And then we perform
analysis and optimization
on that automaton. And then from the resulting
rewritten automaton, we generate native code. So all of your regular
expressions are now executed,
uh, using native code. And, um, in our analysis
and optimization phase, we use a number of tricks
to, uh, to make, um, regular expression matching
cheaper. So we try
to avoid backtracking and we try
to reorder operations so that we perform
the least– the least expensive
operations first. So when you do regular
expression matching, um, most of the time,
you do find a match because you actually know
that it’s going to be there– at–at–that it’s going
to be there somewhere. But you start
from the beginning, right? And maybe you have to scan,
uh, through a lot. So basically, the common case
for regular expression matching is that you don’t have
a match. For the first position, you’re probably not going
to have a match. And so on. Um, so reordering
operations to perform the least expensive
operations first gives us an opportunity
to–to fail as quickly as possible, which is actually
pretty important when you do regular expression
matching. Okay, so let me, uh,
let me give you a couple of examples here. Um, so one of the things
that we do, uh, which is to avoid
backtracking, is to search for common parts
of–in–in alternatives first. So if you want to look
for either “Sun” or “Mon,” then the common part
of this regular expression is that the character
at index two is going to be “n.” Okay. So if, uh,
so if you search and at the current position, the thing that’s at index two
is not an “n,” then you’re done. Then you don’t have to do
anything more because then you know
that it can’t match here. So you need to move on. Okay, so the more traditional
approach here would be to start out
by the first alternative. So you try to look
for “Sun.” So you might find S.
You might find u. And then
you might not find n. Then at that point,
all you know is that you didn’t match
“Sun.” So you have to try
the other alternative. Uh, and that’ll fail,
uh, by looking at the M. All right. But by, uh, looking
for common parts first, we just have to do
one comparison and we know that it’s not
going to be here for any of the alternatives. So that avoids
a lot of backtracking. Um, another nice thing about the–our new regular
expression engine is that, uh, in V8, we have
an ASCII representation for most strings. Uh, because a lot of strings
that are actually used in JavaScript programs
are still ASCII strings. Uh, and that means that in a single compare
instruction, you can actually, uh, match up to four characters
at a time. So in order to search
for “foobar,” you’ll just, um,
in an ASCII string, you’ll just load four bytes
into Word and then you’ll do a compare against the bit parent
that’s “foob.” Um, and then you’re going
to load the next two characters, which is two bytes, and then you’re going to do
a two-byte compare for those. Um, so that’s really nice
to have the ability to actually match
four characters at a time. That’s really fast. Okay, so here’s an example
that shows you something about reordering. Um, so here’s
a regular expression that looks for “foobar” that can be spelled
either with a normal F or a capital F and a normal B
or a capital B. And then it’s going to capture
whatever was matched. Okay, so we want to perform
the cheap operations first. So we’re going to try
to begin by matching “oo” and “ar”
at position one and four because those are just,
like, simple character
comparisons. And those are really quick. And if one of them fail, well, then we can skip
all the rest. Okay, so the next thing
we’re going to do is look for either F
or a capital F at position zero. And we’re going to look for a B
or a capital B at position three. And then finally,
we’re going to–to perform the actual capture
operation. Um, so the reordering here
is really just to get the cheap operations first
because that allows us to fail early and quickly. Good. So that’s, uh, that’s our new
regular expression engine. So another thing
that we’ve done recently is to change our compiler. Uh, so the original compiler
was extremely simple. Uh, just build an abstract
syntax tree. It did no static analysis
of any kind. It did no
register allocation. So it just built an abstract
syntax tree. And one pass
over the abstract syntax tree, it–it would generate
a code. Period. Um, so we’ve now implemented
a new compile infrastructure, which performs register
allocation. It’s still
a one-pass compiler. Um, so for JavaScript, since you get all of your code
over the network, you need
your JavaScript engine to–to be able
to run the code quickly. So you need to, uh,
to have a fast compiler. So it’s still
a one-pass compiler. But now it actually
does its best to, uh, to do register
allocation as well. And–and we’re pretty
excited about this. And we think that it’ll–
that it’ll form the basis for further native code
optimizations in–in V8. Okay, so that’s what I wanted
to say about recent stuff. Um, so now I’m going to talk
to something that’s quite different, uh, but that I think
is really, really important. Namely how well
your JavaScript engines scale to deal with very large
applications. Um, so why is that
important? Well, so web applications
are becoming bigger and bigger and more complex. And with
the increased complexity, you get more objects. And more objects
put extra, uh, stress on your memory management
system. So if you do not have
a system that scale to handle all of
these objects, your performance
will suffer. Um, so not only are applications
becoming bigger, which means that they get
more objects, but also, a lot of browsers
are still single process. And most single process
browsers, um, when you start up
multiple tabs with multiple applications, all of your JavaScript objects
are going to be allocated in the same JavaScript
object heap. Which means that you actually
get a lot of objects in your JavaScript heap. So if your memory management
system has to look at all objects
for every garbage collection, that’s going to hurt you. Um, right. So it is actually
really important that your JavaScript execution
is–is good and is fast in these situations
as well. And it’s important now. It’s not–it’s not only
for, like, the future. It’s also important now. Um, right. So our approach
to scalability is to use generational
garbage collection, as I’ve just
told you about. Um, right. So I’ve done a little
experiment here. And I should say right now that this is really,
really artificial. So take all of this
with a grain of salt. But it does–it does
illustrate my point. Um, so my scalability
experiment, um, was to take
the raytrace benchmark from the V8 benchmark suite
and then run it a number of times. And on each iteration, I’m going to allocate
some extra memory in the JavaScript heap. Um, and with V8,
the data structure that I’m going to allocate, I’m going to add one megabyte
of extra live data per iteration. And then I’m just going
to map out the performance– so the benchmark score–
over these iterations with an increased amount
of objects in your heap. And the result looks like this
for, uh, for three browsers. Um, so as you can see, the performance
characteristics here are quite different. Um, so what we want
to get to here is basically having
a straight line. So as you can see, uh, Chromium running V8
is not getting a straight line. But it’s also not, like,
dropping completely. Um, and that’s–that’s pretty
much what you get when you use a generational
garbage collection like we do. So it does cost
to have larger heaps, but your performance
does not completely degrade. Um, so another
interesting thing that you can see
from this graph is that you have, like,
down spikes. Um, so those down spikes
are when you get to the full compacting
collections. So if you actually get
to a point where you need to do
a full compacting collection, that’s going to be visible
on graphs like, uh, like these. So the interesting thing here is that, like, Firefox 3.5,
beta 4, uh, seems to be doing
very well when it comes to scalability. Uh, the benchmark score
is not that impressive on the other hand. Uh, Safari is doing really well
on the benchmark score. Uh, but it seems
that there’s something like a full scan of the heap
on each garbage collection because if you add extra data
to the heap, then already at, like,
five megabytes, they’re at a third
of the speed that they had with, uh, with a freshly,
uh, started browser. Um, so I think
this really matters because this is–I mean,
this is a benchmark, right? So this is a benchmark
that’s in our benchmark suite. And this tells you something
about how valid those benchmark results are
in the real world. So if you have
a single process browser and you open multiple tabs, that’s going to lead
to a lot of objects. And, uh, and then
your performance is going to be more like
what’s on the right of this, uh,
of this graph. So I think that–that we need
to put some more emphasis on this. Uh, and I think
that it’s really important, um, because it’s–
it’s really important for– for the real live
situations, having many tabs
or having big applications. Um, so in order
to put a bit of focus on this, uh, we’ve, um, we’ve published
a new benchmark in the V8 benchmark suite. Which is again completely
artificial. So it builds a splay tree,
uh, roughly the size of, like, 30 megabytes,
something like that. And then it just performs
operations on the splay tree. Which means that it allocates
new nodes, it modifies the splay tree, and it garbage collects, uh,
nodes that have been removed. Um, right, so we really want to try
to, like, put some emphasis on this. Uh, we want scalability.
We want browsers to be scalable. We want people to actually get
the JavaScript performance that they expect when they run the JavaScript
benchmarks out there. Uh, those should basically
reflect what you get in reality. So again, this experiment
is completely artificial. Uh, if you want to try this,
uh, take your browsers, load a number of tabs,
run benchmarks, um, or try something else. Try writing really big
web applications and see how well,
uh, JavaScript behaves under a bit of stress. Good. So that finishes
my scalability rant. Um, good. So let me tell you a bit about the performance
bottlenecks that I see for–
for V8 currently. Um, so one of the bottlenecks
that we have is that, um, we only generate
one, uh, fully general version of all code that we generate. That means that we are not
basically allowed to have that many assumptions
in our code. Because we need to handle
all the special cases in the code that we generate. So that means that we have
a lot of checking for basically exceptional
situations in our code, like situations that basically
do not happen in reality. Um, so one thing
that we should explore is generating multiple versions
of the same code. So generating–
uh, in the beginning, generating a really optimized
version of the code that avoids a lot
of the checking. And then if some
of our checks fail, then basically bail out
to a slower version of the code. So at that point, generate
another version of the code. So that should give us
a big performance gain. Um, another,
uh, bottleneck is that our use
of inline caching is based on calls to stubs. So we generate these stubs and then we use,
like, a call instruction to actually get
to that code. Um, and that’s good
for a code size because we just have
a call instruction in there. But it’s not necessarily
the best thing we can do for our performance. So in some situations, we might actually want
to inline the code instead directly in the code stream. So instead of having
the call, we can just have the code
directly there. Uh, and that
actually matters. Uh, and we have done that
for a couple of simple cases of loads. And it really does matter. So the downside to that
is code size. So your code becomes bigger
when you actually inline. Okay, so another issue, which is basically,
uh, related to doing more inlining is our write barrier. Um, so since we use
generational garbage collection, um, and we, uh, garbage collect
the young generation often, we need to know
about all pointers into the young generation. And that includes
all of the pointers from objects
in the old generation to pointers
in the new generation. And that means that whenever
we store a pointer in an old generation object, we might have
to record somewhere that this object
is actually referring to a new space object. And that’s done by something
called a write barrier. And right now, our write barrier is a bit
complicated. It has a lot of instructions,
and it’s a bit slow. So we should experiment
with–with new ways of implementing
our write barrier. Um, and the reason
why that’s related to, uh, inlining more code is that since our write
barrier’s pretty big, it’s not really practical
to inline stores directly in the code
because all stores need to have the write barrier
inline as well. And that will lead
to a lot of, uh, of code size, uh, increase. Okay,
and the last thing here, I don’t really want
to talk much about. But–but basically,
there are a lot of global loads in JavaScript code. And we handle global loads
the same way as any other loads. And basically,
we don’t have to. So we could generate
better code for that. Um, so another thing
that I didn’t put up here, but which is also
really important, is, uh, DOM performance. And basically, we haven’t
done much, uh, in that area to really,
really optimize it. Um, so that’s another thing
that we should do. Uh, we should look more
at DOM performance. So the interaction
between JavaScript and–and the DOM. Good. So to summarize, V8 was designed
both for speed and for scalability. Uh, the goal of V8 is
to raise the performance bar for JavaScript. So we want to continue
to push and see if we can make
JavaScript even faster. And, um, we don’t want
to just push for Google
and for Google Chrome. We really want to–
to try to see if we can– if we can inspire other
JavaScript vendors as well. And therefore,
our full source code is available
under a BSD license. And, unfortunately, the URL scrolled out of
my slides here. But it’s at
code.google.com/p/v8. Should be pretty easy
to find. So the graph
that I’ve put up here that I’d like to end with is the, um,
is what has happened to the V8 benchmark score
over time. So, um, the versions
that you see on the left are Google Chrome versions. So this basically started
in September of 2008 with a 1.0 beta. And since then,
we’ve just continued to push. And we will continue
to do that. So that’s all.
Thanks. [applause] Questions. Yes. Yeah, that’s a good point. So the question is,
in my example of hidden classes and hidden class transitions,
I added first X and then Y. So what would happen
if I added Y first and then X? And the, uh, and the answer is
that that would be a different hidden class. Um, so, uh,
it’s really common for objects to be allocat– to be generated
in exactly the same way. Um, so that’s basically
what saves us. But you’re right. They’re objects that basically
have the same structure, namely a Y-property
and an X-property, um, they can actually
get completely different maps if you first add Y and then X
and then in another place add X and then Y. Those would be separate
hidden classes. Yeah. Okay. So what if you declare
an object through an object literal
and not through a function? Uh, that’s not going
to be a problem because object literals
are also created based on a function–
namely the object function. And the object function
has an initial map. Um, so basically,
uh, the same thing works. Yeah. Yeah, that’s a good question. So the question is do we only apply this
to user-defined objects or do we also use this
on, like, built-in stuff like date, math, number,
whatever? And the answer is that,
yes, we also do it on built-in stuff. Uh, and to go a bit deeper, our built-in stuff is actually
implemented in JavaScript. So we don’t implement that
in C++. We actually have it implemented
in JavaScript–a lot of it. Um, so that means that basically
the same thing applies. So it’s going to go through
exactly the same path. And the cool thing about that
is that if we improve our overall JavaScript
performance, we’re also going to improve
the speed of our libraries. Um, so that’s one cool thing
about having libraries written in JavaScript. Another cool thing
is that, um, uh, it’s really easy and quick
to go change stuff. Um, so it also does lead to– to some extra complication in the sense that in order
to set up your math, number,
date objects, you actually have to look
at JavaScript code. You actually have to compile
JavaScript code. So to overcome that,
uh, we–we use something called snapshotting. So, um, yes, this is becoming
a longer explanation. Sorry about that. So–so basically,
in the beginning, the first time that we–
that we, uh– so when we build V8, we start up V8 once,
we load the basic libraries. That gives us a heap
that contains all of the code for our basic libraries. We dump that heap
as a binary image to a file. And then we integrate
that binary data into a new version of V8
that we built so that when you start V8, even though your libraries
are written in JavaScript, you don’t have to–to pass
and compile any JavaScript code to get your native libraries. All you do is just slurp
in that binary data and you’re up and running. Yeah. So the question is “if there are any JavaScript
design patterns that I know of that perform
better than others.” Um, so my answer to that
would be no. But also, my answer to that
should be that, uh, that it shouldn’t matter
that much. So basically, you shouldn’t
worry about it. We should hopefully be able
to make it fast. Um, but as you’ve seen
with the techniques that we used here, we’re kind of, uh, somehow
relying on user build to– to do something
that’s, like, well structured, uses abstractions,
and does not, like, uh, try
to be overly general and try to have, like, code
that’s, like, really using a million different– operating on a million
different types. But, um, but that shouldn’t
really be an issue because in– in most application code
that you write, you might have parts
of the code that’s like that that’s really generic and then you have most
of your code, which is not. So basically, you shouldn’t
have to worry about it. That should be, uh,
that should be us worrying. Oh. Okay, so, uh,
the question is if I can– can give some information about other browsers
that are starting to use– that are using
these techniques and, uh, also if it’s being
fed back to WebKit. Yeah. So, um… so, uh,
Apple’s JavaScript engine, I guess it’s called Nitro
by now, uh, is using
these techniques as well, which is really cool. So I think that’s–
that’s great. And they’re getting
great performance gains the same way that we did
by using these techniques. So they use, uh,
what we call hidden classes and inline caching as well. Uh, I think they call
their hidden classes structure IDs or something
like that. But they use
the same techniques, which is really cool. Um, feeding it back
to WebKit, yes, we are doing that
in the sense that the binding layer,
so the– the layer that binds V8
into the DOM is being upstreamed
to WebKit. So that actually lives
at WebKit now. Um, so hopefully it should be
fairly easy going forward to use V8 instead of Nitro
if you want to in WebKit. man: Any–over here.
Over here. Ager: Oh, yes. man:
Any performance boosts to the lambda
functional closures optimizing
the call stacks? Ager: I’m not sure
I understand the question. Could you repeat, please? man: Is there
a performance boost to the handling of the lambda
functions closures? Ager: Oh, if we have
any performance boosts on–on closures. Uh… so… well, basically,
I mean, that code is handled like–like all other code. So it gets, like,
the same kind of benefits that–that the rest
of the code does. So–so yes. So I think we’re running
out of time. So–so I think
I should stop now. So you’re all welcome
to come talk to me afterwards. So if you have other questions,
please come talk to me. [applause]

12 Comments

Add a Comment

Your email address will not be published. Required fields are marked *