Fix diff regarding newline at end of file

Up to now, the Gerrit backend had been oblivious to newlines at the end
of a file when asked for a diff. As users do want to know about added
or deleted newlines at the end of a file, the GWT UI had special code
which worked around those quirks of the backend. Instead of adding a
similar workaround to PolyGerrit, we rather fix the behavior of the
backend.

Switching Gerrit's backend to consider newlines at the end isn't as
simple as it may sound. The reason is that Gerrit's diff code heavily
relies on JGit and JGit doesn't consider a newline at the end of a
file as starting another line. This means that a file with 4 lines
ending with newline characters is considered to be a 4 line file by
JGit and also git-core. The idea seems to be that lines aren't
*separated* but *terminated*.

Probably because of this reason, JGit limits all of its code to that
smaller assumed number of lines. For instance, it's not possible to
query for line 5 of the above example. Instead of returning an empty
string or something similar, JGit throws an exception. Another
consequence is that edits returned from a JGit diff computation for an
added/deleted newline at the end are 'cut off'. In the example above,
deleting the newline in the last line (which is index 3 for 4 lines)
results in a {3-4, 3-4} edit whereas deleting the newline in the line
with index 1 results in a {1-3, 1-2} edit. I personally would have
expected to get {3-5, 3-4} for the deleted newline at the end.

Whether it's a bug or a feature of JGit isn't known and is probably
difficult to change. As detailed above, all of this might just be the
result of consistently following git-core's decision to treat newlines
as terminators and not separators. The idea might also have been to map
git-core's "No newline at end of file" statements, which git-core does
print for various commands including "git diff", to an equivalent in
JGit but this feature seems to be missing (or hidden very well) for
JGit's diff computation.

For Gerrit, we want a more user-friendly and obvious behavior, which
also doesn't restrict users regarding their regular actions (e.g. adding
comments on added/deleted newline characters at the end of a file).

So far, user complaints especially focus on addition/deletion of those
characters as the UI presentation in PolyGerrit is confusing in those
situations. For a proper fix, we have to assess the whole representation
on the UI, though. If we somehow only explicitly indicated added/deleted
newlines at the end but didn't also reconsider to differently represent
unmodified newlines at the end in general, users might be confused and
think that those newlines were missing. For this reason, this change
attempts to address the situation as a whole.

Several options have been considered:
1) Represent a newline at the end with an empty string (= empty line
content) in the diff output, similar to intermediary empty lines.
2) Return lines including their newline characters in the diff
output. A missing newline at the end would be represented by a missing
newline character in the last line.
3) Add flags to the diff output indicating whether the newline at the
end is present/missing for each of the compared files.

Option 1 has the benefit that it directly works in PolyGerrit and fixes
the reported issue. Even possibly other existing UIs for Gerrit should
most likely keep working as they already need to handle empty
intermediary lines.

Option 2 could possibly break existing UIs. It would definitely require
further adjustments in PolyGerrit. In general, it would also require
that each UI needs to know how to handle all possible types of line
separators.

Option 3 is the least risky one as it doesn't affect already existing
fields of the diff output. On the other hand, it pushes the burden to
the UI. Any UI of Gerrit would need special code for those flags and
might get it wrong (e.g. not showing comments on such empty last lines).
In addition, the flags feel like a crutch as Gerrit's API should rather
directly do the right, least astonishing, and less error-prone thing.
JGit uses the flag approach and after playing around with it as a user
of that API as part of this change, I personally wouldn't want to force
any user of Gerrit's API to cope with this variant.

Given this consideration, option 1 was chosen as implementation for this
change.

Compared to before this change, the returned diff output now behaves
differently in the following ways:
- Added/Deleted newlines at the end of a file are indicated by an empty
string in the modified lines of side b/a and a missing empty string on
the other side.
- If a file has a newline at the end which isn't modified, an empty
string occurs as last entry of the common 'ab' lines.
- The number of lines of the complete files indicated in the diff output
counts a newline at the end as an additional line.

The last mentioned aspect was necessary for everything to work
correctly. It introduces a small discrepancy, though. Gerrit also has
a REST endpoint which outputs some details about a file, including the
number of added/deleted lines. For added/deleted files with newlines at
the end, the number of added/deleted lines from that endpoint is one
smaller than the number of lines of the complete file when requesting
the diff output for those files. Fixing this would require some
effort and make our code even more complex. It's questionable whether
users will notice the discrepancy and hence we punt on this for now.

Adding the empty string to the common lines has the benefit that users
can clearly see on the diff screen whether a file as a newline at the
end (= an empty line is present) or not (= the last line is filled).
This behavior is consistent with other text editors and can be
customized further in PolyGerrit if necessary.

There's another positive side-effect of this fix: Comments which were
added on the last empty line in the GWT UI or through the REST API now
show up also in PolyGerrit. Users have complained about this, too. It
also means that reviewers can directly comment on added/removed newlines
at the end in PolyGerrit.

To ensure that the diff REST endpoint is now working as expected
regarding missing/present newlines at the end, a lot of new tests were
added. When necessary, different whitespace settings were tested.
Other tests rely on the default whitespace behavior (which is
IGNORE_NONE) as covering all possible combinations would have resulted
in an explosion of the number of tests. All in all, our test coverage
for the diff REST endpoint has increased another step with this change.

Bug: Issue 5486
Change-Id: I1e15767fc139e58cd1b9855e4e9d325027a16e31
6 files changed
tree: 399d2e33f34939d902058e2056cbd65907d5283c
  1. .settings/
  2. antlr3/
  3. contrib/
  4. Documentation/
  5. gerrit-gwtdebug/
  6. gerrit-gwtui/
  7. gerrit-gwtui-common/
  8. gerrit-plugin-gwtui/
  9. java/
  10. javatests/
  11. lib/
  12. plugins/
  13. polygerrit-ui/
  14. prolog/
  15. prologtests/
  16. proto/
  17. resources/
  18. tools/
  19. webapp/
  20. .bazelproject
  21. .editorconfig
  22. .git-blame-ignore-revs
  23. .gitignore
  24. .gitmodules
  25. .mailmap
  26. .pydevproject
  27. 0001-Replace-native-http-git-_archive-with-Skylark-rules.patch
  28. 0002-Bump-Dagger-to-2.14.1-to-support-Java-9.patch
  29. BUILD
  30. COPYING
  31. INSTALL
  32. README.md
  33. SUBMITTING_PATCHES
  34. version.bzl
  35. WORKSPACE
README.md

Gerrit Code Review

Gerrit is a code review and project management tool for Git based projects.

Build Status

Objective

Gerrit makes reviews easier by showing changes in a side-by-side display, and allowing inline comments to be added by any reviewer.

Gerrit simplifies Git based project maintainership by permitting any authorized user to submit changes to the master Git repository, rather than requiring all approved changes to be merged in by hand by the project maintainer.

Documentation

For information about how to install and use Gerrit, refer to the documentation.

Source

Our canonical Git repository is located on googlesource.com. There is a mirror of the repository on Github.

Reporting bugs

Please report bugs on the issue tracker.

Contribute

Gerrit is the work of hundreds of contributors. We appreciate your help!

Please read the contribution guidelines.

Note that we do not accept Pull Requests via the Github mirror.

Getting in contact

The IRC channel on freenode is #gerrit. An archive is available at: echelog.com.

The Developer Mailing list is repo-discuss on Google Groups.

License

Gerrit is provided under the Apache License 2.0.

Build

Install Bazel and run the following:

    git clone --recursive https://gerrit.googlesource.com/gerrit
    cd gerrit && bazel build release

Install binary packages (Deb/Rpm)

The instruction how to configure GerritForge/BinTray repositories is here

On Debian/Ubuntu run:

    apt-get update & apt-get install gerrit=<version>-<release>

NOTE: release is a counter that starts with 1 and indicates the number of packages that have been released with the same version of the software.

On CentOS/RedHat run:

    yum clean all && yum install gerrit-<version>[-<release>]

On Fedora run:

    dnf clean all && dnf install gerrit-<version>[-<release>]

Use pre-built Gerrit images on Docker

Docker images of Gerrit are available on DockerHub

To run a CentOS 7 based Gerrit image:

    docker run -p 8080:8080 gerritforge/gerrit-centos7[:version]

To run a Ubuntu 15.04 based Gerrit image:

    docker run -p 8080:8080 gerritforge/gerrit-ubuntu15.04[:version]

NOTE: release is optional. Last released package of the version is installed if the release number is omitted.