blob: cd7dea7ac920b7c9c441a074543023d8ce0d8d48 [file] [log] [blame]
/*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
{namespace buck.what_makes_buck_so_fast}
/***/
{template .soyweb}
{call buck.page}
{param title: 'What Makes Buck so Fast?' /}
{param navid: 'concept_what_makes_buck_so_fast' /}
{param description}
An overview of what makes Buck fast at compiling your code.
{/param}
{param content}
<p>Buck exploits a number of strategies to reduce build times.</p>
<h2>Buck builds dependencies in parallel</h2>
<p>
Buck is designed so that any input files required by
a {call buck.build_target /} must be specified in
the {call buck.build_rule /} for that target. Therefore,
we can know that the directed acyclic
graph <a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph">(DAG)</a> that
Buck constructs from the build rule is an accurate reflection of
the build's dependencies, and that once a rule's dependencies are
satisfied, the target for that rule can be built.
</p>
<p>
Having a DAG makes it straightforward for rules to be built in parallel,
which can dramatically reduce build times. Buck starts with the leaf
nodes of the graph, that is, targets that have no dependencies. Buck
adds these to a queue of targets to build. When a thread is available,
Buck removes a target from the front of the queue and builds it.
Assuming the target builds successfully, Buck notifies all of the rules
that depend on that target. When all of a rule's dependencies have been
satisfied, Buck adds that rule's target to the build queue. Computation
proceeds in this manner until all of the nodes in the graph have been
built. This execution model means that breaking modules into finer
dependencies creates opportunities for increased parallelism, which
improves throughput.
</p>
<h2>Graph enhancement increases rule granularity</h2>
<p>
Frequently, the granularity at which users declare build rules is
different from the granularity at which we want the build system to
model them. For simplicity, users want coarse-grained rules, such
as {call buck.android_binary /}. However, the build system
wants fine-grained rules, such as <code>AaptPackage</code> and <code>DexMerge</code>, that
allow for more parallelism and more granular caching. See the
section on <em>caching</em> below.
</p>
<p>
Internally, Buck uses a mechanism called <em>graph enhancement</em> which
transforms the <em>target graph</em>, specified by the build rules, into
an <em>action graph</em>, which is the DAG that Buck actually uses for building.
Graph enhancement can add new synthetic rules
to break a monolithic task, such as <code>android_binary</code> into
independent subtasks, each of which might have only a subset of the
original task's dependencies. So, for example, dex merging would not depend
on a full run of <code>AaptPackage</code>. Graph enhancement can also move
dependency edges, so that compiling Android libraries does not depend on
dexing their dependencies.
</p>
<p>
<strong>Note:</strong> Adding or removing dependencies from your build
causes Buck to rebuild the action graph, which can significantly
increase the time required for your next build. However, changing the
{sp}<em>contents</em> of a dependency, such as a source file, does not cause
Buck to rebuild the action graph.
</p>
<h2>Buck caches build artifacts</h2>
<p>
A build rule&mdash;together with other aspects of the build
environment&mdash;specify all of the inputs that can affect the rule's output.
Therefore, we can combine that information into a hash that represents
the totality of those inputs. This hash&mdash;called
a <em>{call buck.concept_link}{param page: 'rule_keys' /}{param name: 'RuleKey' /}{/call}</em>&mdash;is
used as an index or <em>cache key</em> into a cache where the associated value is the output produced by the rule.
(See {call buck.buckconfig_link /} for information on how to set up Buck's caches.)
Some of the factors that affect the RuleKey for a build rule are:
</p>
<ul>
<li>
The contents of any file arguments to the build rule. For example,
the files specified in the <code>srcs</code> or <code>headers</code> arguments.
If you use the {call buck.fn_glob /} function in these arguments, to
pattern-match files, be aware that <code>glob()</code> could potentially pull in
extraneous files generated by, for example, text editors or the
operating system. These files could then cause unexpected variations
in RuleKeys between different development computers.
</li>
<li>
The RuleKey for each of the rule's <em>dependencies</em>. By
dependencies, we mean build targets specified in the rule's
arguments. Note that you can specify build targets in arguments
other than just the <code>deps</code> argument. For example, you can
specify build targets in the <code>srcs</code> or <code>headers</code> arguments.
An expression in a <code>deps_query</code> argument can also return a set
of build targets.
</li>
<li>
Some rules are defined using {call buck.macros/}. Therefore, changes to
a macro can cascade through to the definition of a rule and therefore
change the rule's RuleKey.
</li>
<li>
The build environment, which includes the components of the
toolchain that are used to build the rule and the configurations of
those tools, such as compiler flags or linker flags. The build
configuration is a function of&mdash;for example&mdash;the rule itself,
the {call buck.buckconfig_link /} and <code>.buckconfig.local</code> files,
Buck's command-line parameters, and the defaults of the local
environment.
</li>
<li>
The version of Buck used to build the rule. This means that
upgrading Buck to a new version invalidates all of the RuleKeys
generated by the old version.
</li>
</ul>
<p>
When Buck determines whether to build the target for a rule, it first
computes the RuleKey for the rule. If that key results in a hit in any of
the caches specified in <code>.buckconfig</code>, Buck fetches the
rule's output from the cache instead of building the rule locally. For
outputs that are expensive to build, this results in substantial savings.
This caching can also make it fast to rebuild when switching between branches in
a <a href="http://en.wikipedia.org/wiki/Distributed_version_control_system">DVCS</a> such
as Git or Mercurial&mdash;assuming that relatively few files differ between branches.
</p>
<p>
If you are using a <a href="http://en.wikipedia.org/wiki/Continuous_integration">continuous integration (CI)</a> system,
such as <a href="https://en.wikipedia.org/wiki/Jenkins_(software)">Jenkins</a>,
you should configure your CI builds to populate a cache that can be read
by local builds. That way, when a developer syncs to a revision that has already been
built by your CI system, a local build with {sp}{call buck.cmd_build /} can
pull build artifacts from the cache. In order for this strategy to
work, the RuleKeys computed by Buck on the CI system must match the
corresponding keys computed by Buck on the developer's local computer.
</p>
<h2>The importance of deterministic builds</h2>
<p>
In order to take full advantage of caching, all the factors that affect
the output of the build should be kept safe from unintended changes. The
build should be <em>deterministic</em> in the sense that it should
reliably produce identical output across different build servers or
different developers' computers. For this reason, we recommend that you
keep all inputs to the build under source control. These would include,
for example, Buck's configuration file, <code>.buckconfig</code>, and the
toolchains used to build the outputs, such as compilers and linkers.
This way, you can ensure that all developers on the project have the
same build environment.
</p>
<h3>Command-line configuration changes</h3>
<p>
Consistent with the preceding discussion, Buck reparses the build files
in a project if it detects certain changes in the build's configuration.
These could be changes in the <code>.buckconfig</code> file itself or be
the result of <em>specifying configuration parameters with
the </em> {call buck.cmd_link}{param name: 'common_parameters' /}{param rendered_text: '--config'/}{/call} <em>command-line option.</em>
</p>
<h2>If a Java library's API doesn't change, code that uses the library doesn't need to be rebuilt</h2>
<p>
Developers often modify a Java library in ways that do not affect the
library's externally-visible API. For example, adding or removing
private methods, or modifying the implementation of existing
methods&mdash;regardless of their visibility&mdash;does not change the
API exposed by the Java library.
</p>
<p>
When Buck builds a {call buck.java_library /} rule, it also computes that library's API.
Normally, modifying a private method
in a {call buck.java_library /} would cause the library and all the rules that depend on it to be rebuilt because the change in
RuleKeys would propagate up the DAG. However, Buck has special logic for {call buck.java_library /} where,
if the <code>.java</code> input files have not changed since the previous build, and the API for each of its Java
dependencies has not changed since the previous build, then the {call buck.java_library /} will not be recompiled.
This is valid because we know that neither the input <code>.java</code> files nor the API against which they
would be compiled has changed, so the result would be the same if the rule were rebuilt. This localizes how much
Java code needs to be recompiled in response to a change, again reducing build times.
</p>
<p>
For more information about this Java Library optimization,
see {call buck.concept_link}{param page: 'java_abis' /}{param name: 'Java Application Binary Interfaces (ABIs)' /}{/call}.
</p>
<h2>RuleKeys and input-based keys</h2>
<p>
As a generalization of the Java library optimization&mdash;described in
the previous section&mdash;other rule types have functionality to
determine whether or not to rebuild themselves based on information
about the state of their dependencies&mdash;irrespective of whether
those dependencies have changed.
</p>
<p>
For example, if we change a file that is an input to
an {call buck.android_resource /} rule, we don't need to recompile
targets that depend on the resource if the set of exposed symbols
hasn't changed&mdash;such as the case where we
just change a padding value. Similarly, if we recompile
an {call buck.android_library /} due to a dependency change,
but the resulting classes are identical, we don't need to re-run the DEX
compiler (<code>dx</code>).
</p>
<p>
The determination about whether to rebuild is based on the value of an
additional key: an <em>input-based</em> key, which is distinct from the standard RuleKey
that Buck generates for the rule. For a standard RuleKey, Buck folds in
the RuleKey for each dependency, but for an input-based key, Buck folds in
a hash of the actual output file from the dependency. For example,
suppose <code>&#x2F;&#x2F;:foo</code> specifies <code>&#x2F;&#x2F;:bar</code> as a dependency.
The <em>RuleKey</em> for <code>&#x2F;&#x2F;:foo</code> folds in the RuleKey
for <code>&#x2F;&#x2F;:bar</code>, but the input-based key
for <code>&#x2F;&#x2F;:foo</code> folds in a hash of the output
from <code>&#x2F;&#x2F;:bar</code> that <code>&#x2F;&#x2F;:foo</code> takes as input.
</p>
<h2>Buck uses only first-order dependencies for Java</h2>
<p>
When compiling Java, Buck uses first-order dependencies only, that is,
dependencies that you specify explicitly in the <code>deps</code> argument
of your build rule. This means that the compilation step in your build
sees only explicitly-declared dependencies, not other libraries that
those dependencies themselves depend on.
</p>
<p>
Using only first-order dependencies dramatically shrinks the set of APIs
that your Java code is exposed to, which dramatically reduces the scope
of changes that will trigger a rebuild.
</p>
<p>
<strong>NOTE:</strong> If your rule does, in fact, depend on a dependency of one of your
explicitly-specified dependencies&mdash;such as a <em>second-order</em> dependency&mdash;you
can make that dependency available to your rule by specifying it in an <code>exported_deps</code> argument
in the rule of the explicitly-specified dependency.
</p>
<h2>Buck uses dependency files to trim over-specified inputs</h2>
<p>
Buck's low-level build rules specify all inputs&mdash;such as source
files or the outputs from other build rules&mdash;that might contribute
to the output when the build rule is executed. Normally, changes to any of these inputs
result in a new RuleKey and therefore trigger a rebuild. However, in
practice, it's not uncommon for these build rules to <em>over-specify</em> their
inputs. A good example is Buck's C/C++
compilation rules. C/C++ compilation rules specify as inputs all
headers found from the transitive closure of C/C++ library dependencies,
even though in many cases only a small subset of these headers are
actually used. For example, a C/C++ source file might use only one of
many headers exported by a C/C++ library dependency. However, there's
not enough information available before running the build to know if any
given input is used, and so all inputs must be considered, which can
lead to unnecessary rebuilding.
</p>
<p>
In some cases, after the build completes, Buck can figure out the exact
subset of the listed inputs that were actually used.
In C/C++, compilers such as <code>gcc</code> provide a <code>-M</code> option which produces a
dependency file. This file identifies the exact headers that were used during compilation.
For supported rules, Buck uses this dependency file before the build, to
try to avoid an unnecessary rebuilding:
</p>
<ul>
<li>
If the dependency file is available before the build, Buck reads
the file and uses it to filter out unused inputs when constructing
the RuleKey.
</li>
<li>
If no dependency file is available before the build,
Buck runs the build as normal and produces a dependency file.
The dependency file is then available for subsequent builds.
</li>
</ul>
<p>
Note that dependency files are used only if the standard RuleKey&mdash;which
considers all inputs&mdash;doesn't match. In cases where
the RuleKey matches, the output from the rule can be fetched from the
cache.
</p>
{/param}
{/call}
{/template}