blob: 0269fb159b5283d0661464eb41943bfb2c93456d [file] [log] [blame]
This is a rationale for why I think it's safe to run DX in-process in buckd.
Everything in here needs to be re-checked when DX is updated.
As an indirect argument, Eclipse apparently already does this.
The code is at
https://android.googlesource.com/platform/sdk/+/android-4.4.2_r2/eclipse/plugins/com.android.ide.eclipse.adt/src/com/android/ide/eclipse/adt/internal/build/DexWrapper.java
The code in DX is clearly meant to enable this.
See the first few lines of
https://github.com/android/platform_dalvik/blob/android-4.4.2_r1/dx/src/com/android/dx/command/dexer/Main.java#L210
and the commit message for
https://github.com/android/platform_dalvik/commit/7aa5ce7e990dc3766eba97cd0932b62e4de21503
However, just because Eclipse ADT does it, doesn't mean it's safe.
Plus, the use of statics in Main clearly won't support multiple *concurrent*
runs of DX. We need to go deeper.
A few possible problems are easy to rule out:
- System.exit() is not called from any method but the top-level entry points
(which we aren't using).
- System properties are not used.
- There are no instances of "synchronized (Foo.class)".
- Files.deleteOnExit is not used except in test code.
- There are not shutdown hooks used.
- There's no networking going on.
- There are no global loggers in use.
These were just checked with grep, but they don't seem like things
DX would be likely to use.
This leaves a few possible problems:
- Mutable statics and statics used as locks
- Failure to clean up external resources
- Static synchronized methods
- Use of System.out and System.err
Static synchronized methods are easy to find with static analysis.
There are none.
The only external resources used during dexing are the input and output files
and the thread pool (if we are doing multi-threaded dexing).
I added diffs to make sure these are cleaned up.
Most output in the dexer goes through DexConsole.
There are other parts of the codebase that use System.out/err,
but I don't think they are reachable from the dexer.
We could use Proguard to verify this.
Mutable statics are the tough one.
I wrote a static analysis tool to identify deeply immutable object types
and find static members that are (transitively) mutable.
I've annotated the output with justifications for why I think each is safe.
When upgrading dx, you can look for any new unsafe members by running
buck build //src/com/facebook/buck/tools/dxanalysis:dxanalysis && comm -23 <(java -jar buck-out/gen/src/com/facebook/buck/tools/dxanalysis/dxanalysis.jar build/dx.jar | sort) <(sort src/com/facebook/buck/tools/dxanalysis/in-process-checklist.txt)
Some of the members require a more detailed explanation, which I give here.
For context DX has two main phases:
- Collecting and transforming the inputs.
- Serializing dexed classes into its output.
One of my biggest areas for concern was that the internal ids
used in the dex files would leak into mutable statics.
The good news is that this id mapping is done entirely in the second phase
and is entirely contained in subclasses of Section, which do not use statics.
This makes the whole second section and the whole issue internal ids
nearly risk free, which is great.
The first phase turns out to be a much bigger problem.
It uses *lots* of mutable statics. However, after careful analysis and
a few diffs, I believe they are all safe.
A few points of DX's design help us here. The first phase is parallelizable
with the --num-threads argument to DX, so all of its structures are
concurrency-safe for threads operating on a single output dex file.
Since KitKat, a single DX process can also produce multiple output dex files
from a single input (though not in multi-threaded mode), so it would be
difficult these modes to work properly if any truly task-specific data
were leaking into statics.
The mutable statics take two major forms:
- Intern tables for objects that are interned.
- Static constants to provide quick access to well-known objects.
Let's start by discussing the class "Type". My analyzer identifies this class
as mutable, it is interned, and there are lots of static instances.
However, I believe it is "semantically immutable", and therefore safe to use
in multiple concurrent executions.
Type is effectively the parsed form of a Java type descriptor, so nothing
about it changes across multiple DX operations: everything about it is purely
a function of the descriptor that identifies it.
The only non-final fields of Type are references to other related types:
array and array component types, and normal ("initialized") types for types
that represent uninitialized objects (which contain no special data other than
the position in a method that they were created).
Therefore, I believe that the same global set of Type objects is safe to share
across all DX invocations. This means that both the static references to Type
objects and the Type intern table are safe.
Next, consider StdTypeList. This is just a list of Type objects, which we
already know are safe. It does expose a "set" method, but this is only
used on freshly-allocated StdTypeList instances that are being initialized.
Therefore, all of the fully initialized instances are effectively immutable,
and therefore safe.
Prototype is basically like Type, but with more fields.
It is basically the parsed form of a Java method descriptor.
It is safe to share between instances for the same reason as Type:
It just contains its descriptor, a Type for the return type,
and a StdTypeList for the arguments (plus a lazily computed transformed
parameter type list). All of these fields will be valid in any DX invocation.
Rop is a bundle containing final ints, strings, a Type, and TypeLists.
All of the static instances of Rop are constructed with static instances of
Type and StdTypeList, so they are all safe, and also safe to use in
RopToDop#MAP.
Several subclasses of TypedConstant (starting with "Cst") are
effectively immutable.
CstString is just a wrapper around a String, which also caches a ByteArray
representing the encoded value. ByteArray is technically mutable if it
is constructed with a byte array that someone else maintains a reference to,
but all of the CstString instances used in static contexts are constructed
with Strings, which makes them immutable.
CstType is just a wrapper around a Type, which also caches a CstString
representing the descriptor. The CstString is always constructed with a
String argument, so CstType is as safe as Type.
CstNat is just a bundle of two CstStrings.
CstMethodRef is just a Prototype, CstNat, and CstString, so it is safe.
This leaves five potentially unsafe static members
dx/rop/cst/CstType#interns
dx/rop/type/Prototype#internTable
dx/rop/type/Type#internTable
These are intern tables for Type, CstType, and Prototype.
As explained above, I think all of these types are effectively immutable
and contain no per-task information, so they are safe to intern across
invocations. The intern tables are already thread-safe.
I made them use weak values to avoid retaining lots of unnecessary memory
in buckd between invocations. This should be safe because all of these
objects will be reconstructed on-demand. Whenever one is collected,
there will be no remaining references to the old instance, so there is
no risk of a == comparison spuriously failing.
The maps provided by MapMaker support atomic operations,
but I'm not using them. More profiling might suggest that we could
get a perf win here.
After a large build, and doing a gc, buckd consumes about 50 extra MB of heap
with in-process DX vs without. If we want to reduce this, I think the first
step would be
(1) GC after build to collect weak references,
(2) do a "deep clean" of these maps (Guava currently doesn't support
doing a full clean or shrinking the hash tables), and
(3) doing another gc to collect the tables and interned strings.
com/android/dx/rop/code/RegisterSpec#theInterns
com/android/dx/rop/code/RegisterSpec#theInterningItem
I don't think these are safe to intern. They contain a TypeBearer,
which is implemented by lots of classes. I created a flag to disable interning.
This is potentially unsafe because it could violate an implicit assumption
that two instances are only equivalent if they are they same object.
Fortunately, this is unlikely because RegisterSpec.make's docblock
specifically says that it is not required to return a shared instance.
However, I wanted to be more sure.
IdentityHashMap and identityHashCode are not used with RegisterSpec.
The most likely issue would be a change in behavior because
two instances are compared with "==", which returns false instead of true.
I added a static analyzer to find all instances of RegisterSpec
(or its interfaces) being compared with ==.
See in-process-checklist.txt for justifications for the safety of each.