blob: 2e5202ff5388a74c7627e99f48abb0cfcfba701f [file] [log] [blame]
{namespace buck.what_makes_buck_so_fast}
/***/
{template .soyweb}
{call buck.page}
{param title: 'What Makes Buck so Fast?' /}
{param description}
An overview of what makes Buck fast at compiling your code.
{/param}
{param content}
<p>Buck exploits a number of strategies to reduce build times:</p>
<h2>A build rule knows all of the inputs that can affect its output</h2>
<p>
Buck is designed so that anything that can affect the output of a build rule must be specified
as an input to the build rule: hidden state is not allowed. (This is also important for ensuring that
results are consistent and reproducible for all developers.) Therefore, we can be sure that once a
rule's <code>deps</code> are satisfied, the rule itself can be built. This gives us confidence that
the <a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph">DAG</a> that results from build
rules and their <code>deps</code> is true: all dependencies are captured in the graph.
<p>
Having a DAG makes it straightforward for rules to be built in parallel, which can dramatically reduce
build times. The execution model for Buck is very simple: starting with the leaf nodes
of the graph, add them to a queue of rules to be built. When a thread is available, a rule is removed
from the queue, and built. Assuming it is built successfully, it notifies all of the rules that depend
on it that it is done. When a rule gets such a notification, it checks whether all its dependencies have
been satisfied, and if so, it gets added to the queue. Computation proceeds in this manner until all of
the nodes in the graph have gone through the queue.
Therefore, breaking modules into finer dependencies creates opportunities for increased
parallelism, improving throughput.
<h2>Buck can store the outputs it generates in a cache</h2>
<p>
A build rule knows all of the inputs that can affect its output, and therefore it can combine that
information into a hash that represents the total input. This hash is used as
a <em>cache key</em> where the associated value in the cache is the output produced by the rule.
(See {call buck.concept_buckconfig /} for information on how to set up a cache.)
The following information contributes to the cache key for a build rule:
<ul>
<li>The values of the arguments used to define the build rule in the build file.
<li>The contents of any file arguments for the build rule.
<li>The version of Buck being used to build the rule. (This means that upgrading Buck to a new
version invalidates all of the cache keys generated by the old version.)
<li>The cache key for each of the rule's <code>deps</code>.
</ul>
<p>
When Buck begins to build a build rule, the first thing it does is compute the <em>cache key</em> for
the rule. If there is a hit in any of the caches specified in <code>.buckconfig</code>, then it will
fetch the rule's output from the cache instead of building the rule locally. For outputs that are
expensive to compute, this is a substantial savings. It also makes it fast to rebuild when switching
between branches in a <a href="http://en.wikipedia.org/wiki/Distributed_version_control_system">DVCS</a> such as Git or Mercurial.
<p>
Because Buck uses the cache key to determine whether to rebuild a rule, you should never have to run {call buck.cmd_clean /}.
If anything that could affect the output of the rule changes, then the cache key should change, as well.
Because the change in input will cause a cache miss, Buck will rebuild the rule, overwriting its old outputs.
Since out-of-date outputs are guaranteed to be overwritten, there is no reason to clean the build.
<p>
If you are using some sort of <a href="http://en.wikipedia.org/wiki/Continuous_integration">continuous
integration (CI)</a> system, you will likely want your CI builds to populate a cache that can be read by your local builds.
That way, when a developer syncs to a revision that has already been built on your CI system, running
{sp}{call buck.cmd_build /} should not build anything locally, as all outputs should be able to be pulled from the cache.
This works because the cache key computed by Buck when run on the CI system should match the key computed by Buck
on your local machine.
<h2>If a Java library's API doesn't change, code that uses the library doesn't need to be rebuilt</h2>
<p>
Oftentimes, a developer will modify Java code in a way that does not affect its externally-visible
API. For example, adding or removing private methods, as well as modifying the implementation of
existing methods (regardless of their visibility), does not change the API of a Java file.
<p>
When Buck builds a {call buck.java_library /} rule, it also computes its API.
Normally, modifying a private method
in a {call buck.java_library /} would cause it and all rules that depend on it to be rebuilt because the change in
cache keys would propagate up the DAG. However, Buck has special logic for a {call buck.java_library /} where,
if the <code>.java</code> input files have not changed since the previous build, and the API for each of its Java
dependencies has not changed since the previous build, then the {call buck.java_library /} will not be recompiled.
This is valid because we know that neither the input <code>.java</code> files nor the API against which they
would be compiled has changed, so the result would be the same if the rule were rebuilt. This localizes how much
Java code needs to be recompiled in response to a change, again reducing build times.
<h2>Rules can calculate their own "ABI" keys</h2>
<p>
As a generalization of the Java library API optimization,
every rule type has the freedom to determine whether or not to rebuild itself
based on information about the state of its dependencies.
For example, when editing a file in an {call buck.android_resource /} rule,
we don't need to recompile all dependent resources and libraries
if the set of exposed symbols doesn't change
(for example, if we just changed a padding value).
If we recompile an {call buck.android_library /} due to a dependency change,
but the resulting classes are identical,
we don't need to re-run DX.
<p>
This mechanism is fairly general.
When the build engine is preparing to build a rule,
in addition to the normal cache key,
it generates a key that excludes the keys of the dependencies.
This is combined with a key that the rule generates
by hashing whatever parts of its dependencies it considers "visible".
Usually, the dependency will help with this process
by outputting the relevant information
(like the Java API or hash of all classes)
to a single small file.
If both keys match the values from the last build,
then there is no need to rebuild.
<p>
Note that this optimization is currently separate from the distributed cache.
We'd like to combine them so that the cache can be used to fetch rules
built by a continuous integration server as long as the source files
and visible parts of the dependencies match.
<h2>Buck prefers to use first-order dependencies</h2>
<p>
By default, Buck uses first-order dependencies when compiling Java.
This means that compilation can only see explicitly declared dependencies,
not other libraries that your dependencies depend on.
This behavior can be changed at runtime with the
{sp}{call buck.cmd_build /} command by specifying a
different value for <code>--build-dependencies</code>.
<p>
We recommend keeping the default, however.
First-order dependencies dramatically shrink the set of APIs
that your library is exposed to,
which dramatically reduces the scope of changes
that will force your library to be rebuilt.
<h2>Fast Dex merging for Android</h2>
<p>
Other build tools use also Android's DX merge support
to merge your main program's dex file with third-party libraries.
However, Buck's support for fine-grained libraries
allows dex merging to work at a much higher granularity.
<p>
Buck also includes a customized version of DX
that includes significant performance improvements.
It uses a faster algorithm for merging many dex files.
It also has support for running multiple copies of DX
concurrently within a single long-lived buckd process,
which eliminates most of DX's start-up time.
<p>
As a result, when editing a small module and performing an incremental build,
we frequently see less than 1 second spent generating classes.dex.
<h2>Graph enhancement for increased rule granularity</h2>
<p>
Frequently, the granularity at which we expect users to declare build rules
is very different from the granularity at which
we want the build system to model them.
Users want coarse-grained rules for simplicity (like "android_binary"),
but the build system wants fine-grained rules
(like "aapt package" and "dex merge")
to allow for parallelism and fine-grained caching.
<p>
Internally, Buck uses a mechanism called "graph enhancement"
that allows its internal "action graph" (the DAG used for building)
to be different from what the user declared (internally called the "target graph").
Graph enhancement can add new synthetic rules
to break a monolithic task (like {call buck.android_binary /})
into independent subtasks
that might only have a subset of the original dependencies,
so dex merging does not depend on running a full <code>aapt package</code>.
It can also move dependency edges,
so compiling Android libraries does not depend on dexing their dependencies.
{/param}
{/call}
{/template}