Initial release of copyright scanner library
This copyright scanner library has been developed for the purpose of
creating a plugin to ensure necessary review when new revisions may
introduce copyrighted files that are not compliant with a project's
policies around copyright.
Initially targeted at the AOSP gerrit instances with the intent to
expand to other Google-owned instances, the plugin will be completely
configurable.
Best practices identify allowed copyrights as first party or third
party, and restrict third party copyrights to specific locations.
See for example: https://opensource.google.com/docs/thirdparty/
Revisions containing only first-party code do not require special
review.
Revisions to files in locations where third-party code is allowed that
consist entirely of first-party code or appropriately licensed
third-party code do not require special review.
Revisions to files outside locations where third party licenses are
allowed that appear to have third party licenses will require special
review to verify they are false positives, or a qualified reviewer may
reject the commit.
Changes to files that seem to introduce unknown or forbidden licenses
likewise require special review. A qualified reviewer may determine
that an unknown license has acceptable terms and allow it. The reviewer
may determine the match is a false positive or reject the change.
See for example: https://opensource.google.com/docs/thirdparty/licenses/
and: https://opensource.google.com/docs/using/agpl-policy/
This commit does not include the plugin. It releases the copyright
scanner library. The library has been extensively tested and used
internally for analyzing content on AOSP hosts.
The library and associated command-line tool supports deep scans into
archive files (.zip, .jar, .apk etc.); however, the plugin will perform
only shallow scans.
Change-Id: I8800cf011f392d7d0c848f43a8efa095dd68ad0a
diff --git a/.bazelignore b/.bazelignore
new file mode 100644
index 0000000..30f1613
--- /dev/null
+++ b/.bazelignore
@@ -0,0 +1 @@
+eclipse-out
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..52a5343
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,8 @@
+/.classpath
+/.primary_build_tool
+/.project
+/.settings/org.maven.ide.eclipse.prefs
+/.settings/org.eclipse.m2e.core.prefs
+/bazel-*
+/eclipse-out
+/target
diff --git a/BUILD b/BUILD
new file mode 100644
index 0000000..7585fef
--- /dev/null
+++ b/BUILD
@@ -0,0 +1,66 @@
+load("//tools/bzl:junit.bzl", "junit_tests")
+load("//tools/bzl:plugin.bzl", "PLUGIN_DEPS", "PLUGIN_TEST_DEPS", "gerrit_plugin")
+
+filegroup(
+ name = "testdata",
+ srcs = glob(["src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/**"]),
+)
+
+java_library(
+ name = "copyright_scanner",
+ srcs = glob(["src/main/java/**/*.java"]),
+ deps = PLUGIN_DEPS,
+)
+
+java_binary(
+ name = "scan_tool",
+ srcs = ["src/main/java/com/googlesource/gerrit/plugins/copyright/tools/ScanTool.java"],
+ main_class = "com.googlesource.gerrit.plugins.copyright.tools.ScanTool",
+ deps = [
+ ":copyright_scanner",
+ "@commons-compress//jar",
+ "@guava//jar",
+ ],
+)
+
+java_binary(
+ name = "android_scan",
+ srcs = ["src/main/java/com/googlesource/gerrit/plugins/copyright/tools/AndroidScan.java"],
+ main_class = "com.googlesource.gerrit.plugins.copyright.tools.AndroidScan",
+ deps = [":copyright_scanner"],
+)
+
+TEST_SRCS = "src/test/java/**/*Test.java"
+
+TEST_DEPS = PLUGIN_DEPS + PLUGIN_TEST_DEPS + [
+ ":copyright_scanner",
+ "@guava//jar",
+]
+
+junit_tests(
+ name = "copyright_scanner_tests",
+ testonly = 1,
+ srcs = glob([TEST_SRCS]),
+ tags = ["copyright"],
+ deps = TEST_DEPS,
+)
+
+sh_test(
+ name = "AndroidScanTest",
+ size = "small",
+ srcs = ["src/test/java/com/googlesource/gerrit/plugins/copyright/tools/AndroidScanTest.sh"],
+ data = [
+ ":android_scan",
+ ":testdata",
+ ],
+)
+
+sh_test(
+ name = "ScanToolTest",
+ size = "small",
+ srcs = ["src/test/java/com/googlesource/gerrit/plugins/copyright/tools/ScanToolTest.sh"],
+ data = [
+ ":scan_tool",
+ ":testdata",
+ ],
+)
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..11069ed
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,201 @@
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+END OF TERMS AND CONDITIONS
+
+APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+Copyright [yyyy] [name of copyright owner]
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
diff --git a/WORKSPACE b/WORKSPACE
new file mode 100644
index 0000000..0bd8981
--- /dev/null
+++ b/WORKSPACE
@@ -0,0 +1,68 @@
+workspace(name = "copyright")
+load("//:bazlets.bzl", "load_bazlets")
+load_bazlets(
+ commit = "738ddb525810a50c792736d7115a1bb5289bd3d3",
+ #local_path = "/home/<user>/projects/bazlets",
+)
+
+load("//tools/bzl:maven_jar.bzl", "maven_jar")
+load("//:external_plugin_deps.bzl", "external_plugin_deps")
+
+# Snapshot Plugin API
+load(
+ "@com_googlesource_gerrit_bazlets//:gerrit_api_maven_local.bzl",
+ "gerrit_api_maven_local",
+)
+
+# Load snapshot Plugin API
+gerrit_api_maven_local()
+
+# Release Plugin API
+#load(
+# "@com_googlesource_gerrit_bazlets//:gerrit_api.bzl",
+# "gerrit_api",
+#)
+
+# Load release Plugin API
+#gerrit_api()
+
+
+external_plugin_deps()
+
+
+# When upgrading commons-compress, also upgrade tukaani-xz
+maven_jar(
+ name = "commons-compress",
+ artifact = "org.apache.commons:commons-compress:1.15",
+ sha1 = "b686cd04abaef1ea7bc5e143c080563668eec17e",
+)
+
+# Transitive dependency of commons-compress
+maven_jar(
+ name = "tukaani-xz",
+ artifact = "org.tukaani:xz:1.6",
+ sha1 = "05b6f921f1810bdf90e25471968f741f87168b64",
+)
+
+
+load("//lib:guava.bzl", "GUAVA_BIN_SHA1", "GUAVA_VERSION")
+
+maven_jar(
+ name = "guava",
+ artifact = "com.google.guava:guava:" + GUAVA_VERSION,
+ sha1 = GUAVA_BIN_SHA1,
+)
+
+# Transitive dependency of guava
+maven_jar(
+ name = "guava-failureaccess",
+ artifact = "com.google.guava:failureaccess:1.0.1",
+ sha1 = "1dcf1de382a0bf95a3d8b0849546c88bac1292c9",
+)
+
+# Transitive dependency of guava
+maven_jar(
+ name = "j2objc",
+ artifact = "com.google.j2objc:j2objc-annotations:1.1",
+ sha1 = "ed28ded51a8b1c6b112568def5f4b455e6809019",
+)
diff --git a/bazlets.bzl b/bazlets.bzl
new file mode 100644
index 0000000..f089af4
--- /dev/null
+++ b/bazlets.bzl
@@ -0,0 +1,18 @@
+load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")
+
+NAME = "com_googlesource_gerrit_bazlets"
+
+def load_bazlets(
+ commit,
+ local_path = None):
+ if not local_path:
+ git_repository(
+ name = NAME,
+ remote = "https://gerrit.googlesource.com/bazlets",
+ commit = commit,
+ )
+ else:
+ native.local_repository(
+ name = NAME,
+ path = local_path,
+ )
diff --git a/external_plugin_deps.bzl b/external_plugin_deps.bzl
new file mode 100644
index 0000000..93746e1
--- /dev/null
+++ b/external_plugin_deps.bzl
@@ -0,0 +1,8 @@
+load("//tools/bzl:maven_jar.bzl", "maven_jar")
+
+def external_plugin_deps():
+ maven_jar(
+ name = "commons_io",
+ artifact = "commons-io:commons-io:1.4",
+ sha1 = "a8762d07e76cfde2395257a5da47ba7c1dbd3dce",
+ )
diff --git a/lib/BUILD b/lib/BUILD
new file mode 100644
index 0000000..d26b3c1
--- /dev/null
+++ b/lib/BUILD
@@ -0,0 +1,37 @@
+exports_files(glob([
+ "LICENSE-*",
+]))
+
+filegroup(
+ name = "all-licenses",
+ srcs = glob(
+ ["LICENSE-*"],
+ exclude = ["LICENSE-DO_NOT_DISTRIBUTE"],
+ ),
+ visibility = ["//visibility:public"],
+)
+
+java_library(
+ name = "guava-failureaccess",
+ data = ["//lib:LICENSE-Apache2.0"],
+ visibility = ["//visibility:public"],
+ exports = ["@guava-failureaccess//jar"],
+)
+
+java_library(
+ name = "j2objc",
+ data = ["//lib:LICENSE-Apache2.0"],
+ visibility = ["//visibility:public"],
+ exports = ["@j2objc//jar"],
+)
+
+java_library(
+ name = "guava",
+ data = ["//lib:LICENSE-Apache2.0"],
+ visibility = ["//visibility:public"],
+ exports = [
+ ":guava-failureaccess",
+ ":j2objc",
+ "@guava//jar",
+ ],
+)
diff --git a/lib/LICENSE-Apache2.0 b/lib/LICENSE-Apache2.0
new file mode 100644
index 0000000..d645695
--- /dev/null
+++ b/lib/LICENSE-Apache2.0
@@ -0,0 +1,202 @@
+
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+ APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+ Copyright [yyyy] [name of copyright owner]
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
diff --git a/lib/guava.bzl b/lib/guava.bzl
new file mode 100644
index 0000000..c36bf14
--- /dev/null
+++ b/lib/guava.bzl
@@ -0,0 +1,5 @@
+GUAVA_VERSION = "27.1-jre"
+
+GUAVA_BIN_SHA1 = "e47b59c893079b87743cdcfb6f17ca95c08c592c"
+
+GUAVA_DOC_URL = "https://google.github.io/guava/releases/" + GUAVA_VERSION + "/api/docs/"
diff --git a/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/Archive.java b/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/Archive.java
new file mode 100644
index 0000000..8d6ed0b
--- /dev/null
+++ b/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/Archive.java
@@ -0,0 +1,269 @@
+// Copyright (C) 2019 The Android Open Source Project
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.googlesource.gerrit.plugins.copyright.lib;
+
+import com.google.common.collect.ImmutableList;
+import java.io.BufferedInputStream;
+import java.io.IOException;
+import org.apache.commons.compress.archivers.ArchiveEntry;
+import org.apache.commons.compress.archivers.ArchiveException;
+import org.apache.commons.compress.archivers.ArchiveInputStream;
+import org.apache.commons.compress.archivers.ar.ArArchiveInputStream;
+import org.apache.commons.compress.archivers.arj.ArjArchiveInputStream;
+import org.apache.commons.compress.archivers.cpio.CpioArchiveEntry;
+import org.apache.commons.compress.archivers.cpio.CpioArchiveInputStream;
+import org.apache.commons.compress.archivers.dump.DumpArchiveEntry;
+import org.apache.commons.compress.archivers.dump.DumpArchiveInputStream;
+import org.apache.commons.compress.archivers.jar.JarArchiveEntry;
+import org.apache.commons.compress.archivers.jar.JarArchiveInputStream;
+import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
+import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
+import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
+import org.apache.commons.compress.archivers.zip.ZipArchiveInputStream;
+
+/** Encapsulates the differences among the known ArchiveInputStream/ArchiveEntry pairs. */
+public abstract class Archive {
+
+ /** The know archive file types. */
+ private static final ImmutableList<Archive> archives =
+ ImmutableList.of(
+ new ArFile(),
+ new ArjFile(),
+ new CpioFile(),
+ new DumpFile(),
+ new JarFile(),
+ new TarFile(),
+ new ZipFile());
+
+ /** Returns the Archive to use based on `fileName` or null if not a known archive file format. */
+ public static Archive getArchive(String fileName) {
+ for (Archive archive : archives) {
+ if (archive.isArchive(fileName)) {
+ return archive;
+ }
+ }
+ return null;
+ }
+
+ /** Returns true if `fileName` identifies an instance of the archive file type. */
+ protected abstract boolean isArchive(String fileName);
+
+ /** Wraps `source` with the `ArchiveInputStream` type for the archive file type. */
+ public abstract ArchiveInputStream newStream(BufferedInputStream source) throws ArchiveException;
+
+ /** Returns the next `ArchiveEntry` in `archive` for the archive file type. */
+ public abstract ArchiveEntry getNext(ArchiveInputStream archive)
+ throws ArchiveException, IOException;
+
+ /** Returns true if `entry` describes a regular file in the archive. */
+ public abstract boolean isRegularFile(ArchiveEntry entry);
+
+ /** Archives created with the `ar` command or equivalent. */
+ private static final class ArFile extends Archive {
+ @Override
+ protected boolean isArchive(String fileName) {
+ return fileName.endsWith(".a")
+ || fileName.endsWith(".deb")
+ || fileName.endsWith(".ar")
+ || fileName.endsWith("-ar")
+ || fileName.endsWith("-deb")
+ || fileName.endsWith("-a");
+ }
+
+ @Override
+ public ArchiveInputStream newStream(BufferedInputStream source) throws ArchiveException {
+ return new ArArchiveInputStream(source);
+ }
+
+ @Override
+ public ArchiveEntry getNext(ArchiveInputStream archive) throws ArchiveException, IOException {
+ return ((ArArchiveInputStream) archive).getNextArEntry();
+ }
+
+ @Override
+ public boolean isRegularFile(ArchiveEntry entry) {
+ return !entry.isDirectory();
+ }
+ }
+
+ /** Archives created with the `arj` command. */
+ private static final class ArjFile extends Archive {
+ @Override
+ protected boolean isArchive(String fileName) {
+ return fileName.endsWith(".arj");
+ }
+
+ @Override
+ public ArchiveInputStream newStream(BufferedInputStream source) throws ArchiveException {
+ return new ArjArchiveInputStream(source);
+ }
+
+ @Override
+ public ArchiveEntry getNext(ArchiveInputStream archive) throws ArchiveException, IOException {
+ return ((ArjArchiveInputStream) archive).getNextEntry();
+ }
+
+ @Override
+ public boolean isRegularFile(ArchiveEntry entry) {
+ return !entry.isDirectory();
+ }
+ }
+
+ /** Archives created with the `cpio` command. */
+ private static final class CpioFile extends Archive {
+ @Override
+ protected boolean isArchive(String fileName) {
+ return fileName.endsWith(".cpio");
+ }
+
+ @Override
+ public ArchiveInputStream newStream(BufferedInputStream source) throws ArchiveException {
+ return new CpioArchiveInputStream(source);
+ }
+
+ @Override
+ public ArchiveEntry getNext(ArchiveInputStream archive) throws ArchiveException, IOException {
+ return ((CpioArchiveInputStream) archive).getNextCPIOEntry();
+ }
+
+ @Override
+ public boolean isRegularFile(ArchiveEntry entry) {
+ return ((CpioArchiveEntry) entry).isRegularFile();
+ }
+ }
+
+ /** Archives created with the `dump` command. */
+ private static final class DumpFile extends Archive {
+ @Override
+ protected boolean isArchive(String fileName) {
+ return fileName.endsWith(".dump") || fileName.endsWith(".dmp");
+ }
+
+ @Override
+ public ArchiveInputStream newStream(BufferedInputStream source) throws ArchiveException {
+ return new DumpArchiveInputStream(source);
+ }
+
+ @Override
+ public ArchiveEntry getNext(ArchiveInputStream archive) throws ArchiveException, IOException {
+ return ((DumpArchiveInputStream) archive).getNextDumpEntry();
+ }
+
+ @Override
+ public boolean isRegularFile(ArchiveEntry entry) {
+ return ((DumpArchiveEntry) entry).isFile();
+ }
+ }
+
+ /** Java archives and equivalents. Internally structured as special cases of zip files. */
+ private static final class JarFile extends Archive {
+ @Override
+ protected boolean isArchive(String fileName) {
+ return fileName.endsWith(".jar")
+ || fileName.endsWith(".aar")
+ || fileName.endsWith(".apk")
+ || fileName.endsWith(".apex")
+ || fileName.endsWith(".war")
+ || fileName.endsWith(".rar")
+ || fileName.endsWith(".ear")
+ || fileName.endsWith(".sar")
+ || fileName.endsWith(".par")
+ || fileName.endsWith(".kar")
+ || fileName.endsWith("-jar");
+ }
+
+ @Override
+ public ArchiveInputStream newStream(BufferedInputStream source) throws ArchiveException {
+ return new JarArchiveInputStream(source);
+ }
+
+ @Override
+ public ArchiveEntry getNext(ArchiveInputStream archive) throws ArchiveException, IOException {
+ return ((JarArchiveInputStream) archive).getNextJarEntry();
+ }
+
+ @Override
+ public boolean isRegularFile(ArchiveEntry entry) {
+ JarArchiveEntry e = (JarArchiveEntry) entry;
+ return !e.isDirectory() && !e.isUnixSymlink();
+ }
+ }
+
+ /** Archives created with the `tar` command or equivalent. */
+ private static final class TarFile extends Archive {
+ @Override
+ protected boolean isArchive(String fileName) {
+ return fileName.endsWith(".tar") || fileName.endsWith("-tar") || fileName.endsWith(".pax");
+ }
+
+ @Override
+ public ArchiveInputStream newStream(BufferedInputStream source) throws ArchiveException {
+ return new TarArchiveInputStream(source);
+ }
+
+ @Override
+ public ArchiveEntry getNext(ArchiveInputStream archive) throws ArchiveException, IOException {
+ return ((TarArchiveInputStream) archive).getNextTarEntry();
+ }
+
+ @Override
+ public boolean isRegularFile(ArchiveEntry entry) {
+ return ((TarArchiveEntry) entry).isFile();
+ }
+ }
+
+ /**
+ * Archives created with the `zip` command or equivalent.
+ *
+ * <p>Many standard file types are internally structured as zip files.
+ */
+ private static final class ZipFile extends Archive {
+ @Override
+ protected boolean isArchive(String fileName) {
+ return fileName.endsWith(".zip")
+ || fileName.endsWith(".ZIP")
+ || fileName.endsWith("-zip")
+ || fileName.endsWith("-ZIP")
+ || fileName.endsWith(".sfx")
+ || fileName.endsWith(".docx")
+ || fileName.endsWith(".docm")
+ || fileName.endsWith(".xlsx")
+ || fileName.endsWith(".xlsm")
+ || fileName.endsWith(".pptx")
+ || fileName.endsWith(".pptm")
+ || fileName.endsWith(".odf")
+ || fileName.endsWith(".odt")
+ || fileName.endsWith(".odp")
+ || fileName.endsWith(".ods")
+ || fileName.endsWith(".odg");
+ }
+
+ @Override
+ public ArchiveInputStream newStream(BufferedInputStream source) throws ArchiveException {
+ return new ZipArchiveInputStream(source);
+ }
+
+ @Override
+ public ArchiveEntry getNext(ArchiveInputStream archive) throws ArchiveException, IOException {
+ return ((ZipArchiveInputStream) archive).getNextZipEntry();
+ }
+
+ @Override
+ public boolean isRegularFile(ArchiveEntry entry) {
+ ZipArchiveEntry e = (ZipArchiveEntry) entry;
+ return !e.isDirectory() && !e.isUnixSymlink();
+ }
+ }
+}
diff --git a/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightPatterns.java b/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightPatterns.java
new file mode 100644
index 0000000..390ffe2
--- /dev/null
+++ b/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightPatterns.java
@@ -0,0 +1,595 @@
+// Copyright (C) 2019 The Android Open Source Project
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.googlesource.gerrit.plugins.copyright.lib;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import java.util.NoSuchElementException;
+
+/** Constants declaring match patterns for common copyright licenses and owners. */
+public abstract class CopyrightPatterns {
+
+ // No instances
+ private CopyrightPatterns() {}
+
+ // All of the MAX parameters below have been chosen empirically similar to MATCH_SEARCH_LENGTH to
+ // minimize computing cost while still catching virtually all of the important matches.
+
+ /** Maximum length of consecutive text characters to match. */
+ public static final int MAX_NAME_LENGTH = 30;
+ /** Maximum number of potential names to match. */
+ public static final int MAX_NAME_REPETITION = 35;
+ /** Maximum length of consecutive space/comment characters to match. */
+ public static final int MAX_SPACE_LENGTH = 47;
+ /** Maximum repetition of potential dates to match. Might have to revisit this in future. */
+ public static final int MAX_DATE_REPETITION = 30;
+
+ /** Regular expression matching whitespace or a comment character. */
+ public static final String WS = "[\\s*/#]";
+ /** Regular expression matching whitespace, a comment character, or punctuation. */
+ public static final String WSPCT = "[-,;.\\s*/#]";
+ /** Regular expression matching a text character. */
+ public static final String NAME_CHAR = "[-\\p{L}\\p{N}]"; // \p{L}->letter, \p{N}->numeral
+ /** Regular expression matching an UPPER CASE text character. */
+ public static final String UPPER_CHAR = "\\p{Lu}";
+ /** Regular expression matching a lower case text character. */
+ public static final String LOWER_CHAR = "\\p{Ll}";
+ /** Regular expression matching an email character. */
+ public static final String EMAIL_CHAR = "[-.\\p{L}\\p{N}_]";
+ /** Regular expression matching a URL character. */
+ public static final String URL_CHAR = "[-.\\p{L}\\p{N}_=%+]";
+ /** Regular experssion matching a web address. */
+ public static final String URL =
+ "https?[:]/(?:[/?&]" // http://example.com/path/?var=val&var=val
+ + URL_CHAR
+ + "{1,"
+ + MAX_NAME_LENGTH
+ + "}){1,25}"
+ + "|www[.]" // www.domain or www.domain/path
+ + URL_CHAR
+ + "{1,"
+ + MAX_NAME_LENGTH
+ + "}(?:[.]com|[.]net|[.]org|[.]\\p{L}\\p{L})(?:[/?&#]"
+ + URL_CHAR
+ + "{0,"
+ + MAX_NAME_LENGTH
+ + "}){0,25}";
+ /** Regular expression matching a text or email address. */
+ public static final String NAME =
+ "(?:(?:"
+ + URL // web address
+ + "|[\\p{L}\\p{N}]" // regular text
+ + NAME_CHAR
+ + "{0,mnl}\\b"
+ + "|[<]?[\\p{L}\\p{N}]" // email@domain or <email@domain>
+ + EMAIL_CHAR
+ + "{1,mnl}[@][\\p{L}\\p{N}]"
+ + EMAIL_CHAR
+ + "{1,mnl}\\b[>]?"
+ + "|\\b\\p{N}{1,2}(?:[.]\\p{N}{1,2}){1,5}\\b" // version number
+ + "|\\p{Pi}[^\\p{Pf}]{0,65}\\p{Pf}" // quoted string
+ + "|(?-i:\\p{Lu}[-.]){1,5}" // Initials A.G. or S.A. etc. \\p{Lu} -> uppercase letter
+ + "|" // domain text
+ + EMAIL_CHAR
+ + "{1,mnl}(?:[.]com|[.]org|[.]net|[.]\\p{L}\\p{L})"
+ + ")[,;:]?)"; // punctuation following text
+ /** Regular expression matching an UPPERCASE text. */
+ public static final String UPPER_NAME = "(?-i:\\b" + UPPER_CHAR + "{1," + MAX_NAME_LENGTH + "})";
+ /** Regular expression matching a Proper Case text. */
+ public static final String PROPER_NAME =
+ "(?-i:\\b" + UPPER_CHAR + LOWER_CHAR + "{0," + MAX_NAME_LENGTH + "})";
+ /** Regular expression matching any text, email address, or quote character. */
+ public static final String ANY_CHAR = "[-.,\\p{L}\\p{N}]";
+ /** Regular expression matching any text, email address, or quoted string. */
+ public static final String ANY_WORD = "(?:" + ANY_CHAR + "{1," + MAX_NAME_LENGTH + "})";
+
+ /** Affero General Public License */
+ public static final Rule AGPL = license(ImmutableList.of("Affero"));
+
+ /** Android owned or licensed */
+ public static final Rule ANDROID =
+ new Rule(
+ ImmutableList.of("Android(?:-x86)? Open(?: |-)Source Project", "LK Trusty Authors"),
+ ImmutableList.of("Android Software Development Kit Licen[cs]e Agreement"));
+
+ /** Apache 2 owned or licensed */
+ public static final Rule APACHE2 =
+ new Rule(
+ ImmutableList.of(
+ ".*SPDX-License-Identifier: Apache-2.0",
+ "(?:by )?(?:The )?Apache Software Foundation.?",
+ "Apache Software Foundation.?"
+ + " This product includes software developed"
+ + " (?:by|at) The Apache Software Foundation"),
+ ImmutableList.of(
+ "http://www[.]apache[.]org/licenses/LICENSE-2[.]0",
+ "Apache 2[.]0 Licen[cs]e",
+ ".*Licen[cs]ed under (?:both )?(?:the )?Apache Licen[cs]e,?(?: version 2[.]?0?)?",
+ ".+Licen[cs]ed under (?:both )?(?:the )?Apache Licen[cs]e v2[.]?(?:[\\p{L}\\p{N}]+)?",
+ ".+licen[cs]ed under (?:the )?Apache 2.?",
+ ".+licen[cs]es this file to you under (?:the )?Apache Licen[cs]e,?",
+ "Apache Licen[cs]e Version 2[.]0",
+ "^apache2(?:-android)?",
+ "^the apache licen[cs]e",
+ "^terms of the Apache 2 licen[cs]e",
+ ".+under the terms of (?:either )the Apache Licen[cs]e[,.;]?"),
+ ImmutableList.of("owner as \\p{Pi}?Not a Contribution[.,;:]{0,3}\\p{Pf}?"));
+
+ /** The BEER-WARE License */
+ public static final Rule BEER_WARE = license(ImmutableList.of("\\bTHE BEER-WARE LICEN[CS]E"));
+
+ /** BSD licensed */
+ public static final Rule BSD =
+ license(
+ ImmutableList.of(
+ ".*SPDX-License-Identifier: BSD-2-Clause",
+ ".*SPDX-License-Identifier: BSD-2-Clause-FreeBSD",
+ ".*SPDX-License-Identifier: BSD-3-Clause",
+ ".*SPDX-License-Identifier: BSD-4-Clause",
+ "^BSD(?:[.]|, see LICEN[CS]E for (?:more )details[.])?",
+ ".*under the terms and conditions of the BSD Licen[cs]e.*",
+ ".*(?:\\p{N}-clause |a )?BSD (?:\\p{N}-clause )?licen[cs]e.*",
+ ".*Redistribution and use in source and binary forms,? with or without modification,?"
+ + " are permitted provided that the following conditions are met[:]?",
+ ".*This header is BSD licen[cs]ed so anyone can use the definitions to implement"
+ + " compatible drivers servers[.:]?.*",
+ ".*Redistribution and use is allowed according to the terms of the"
+ + " (?:\\p{N}-clause )?BSD licen[cs]e[.:]?.*",
+ ".*(?:[-\\p{L}\\p{N}] )?redistributions (?:of|in) source code must retain the"
+ + " (?:(?:above|accompanying) )?copyright notice(?: unmodified)?[,.:;]? this list"
+ + " of conditions[,.;:]? and the following disclaimers?[,.;:]?"
+ + " (?:[-\\p{L}\\p{N}] )?redistributions (?:in|of) binary form must reproduce the"
+ + " (?:above|accompanying) copyright notice[,.;:]? this list of conditions[,.:;]?"
+ + " and the following disclaimer in the documentation[,.;:]? and(?: |[/])or other"
+ + " materials(?: provided with the distribution)?[,.:;]{0,3}"));
+
+ /** Creative Commons Attribution -- allows commercial */
+ public static final Rule CC_BY_C =
+ license(
+ ImmutableList.of(
+ "\\bhttps?://[\\p{L}\\p{N}.]*creativecommons[.]org/licen[cs]es/by[-\\p{L}\\p{N}.]*",
+ "(?-i:\\bAttribution(?:-(?:Share ?Alike|NoDerivs)){0,2} )"));
+
+ /** Creative Commons Non-Commercial License */
+ public static final Rule CC_BY_NC =
+ license(
+ ImmutableList.of(
+ "\\bhttps?://[\\p{L}\\p{N}.]*creativecommons[.]org/licenses/by"
+ + "(?:-nd)?(?:-sa)?-nc[-/\\p{L}\\p{N}.]*",
+ "\\bAttribution(?:-NoDerivs)?(?:-Share ?Alike)?"
+ + "-NonCommercial(?:-NoDerivs)?(?:-Share ?Alike)?"));
+
+ /** Commons Cause License */
+ public static final Rule COMMONS_CAUSE = license(ImmutableList.of("\\bCommons Clause"));
+
+ /** Common Public Attribution License */
+ public static final Rule CPAL =
+ license(ImmutableList.of("\\bCommon Public Attribution Licen[cs]e"));
+
+ /** Eclipse Public License */
+ public static final Rule EPL =
+ license(
+ ImmutableList.of(
+ "^Eclipse Public Licen[cs]e[.]?",
+ ".*under (?:(?:the|this) )?(?:terms of )?(?:the )?eclipse"
+ + " (?:public )?licen[cs]e[,.;:]?.*",
+ ".*terms of (?:(?:the|this) )?eclipse public licen[cs]e[,.;:]?.*"));
+
+ /** European Union Public License */
+ public static final Rule EUPL = license(ImmutableList.of(" [(]?EUPL[)]? "));
+
+ /** Appears in tests similar to using example.com as a test domain */
+ public static final Rule EXAMPLES = owner(ImmutableList.of("Your Company."));
+
+ /** Google owned */
+ public static final Rule GOOGLE = owner(ImmutableList.of("Google,? Inc."));
+
+ /** Generic GNU General Public License */
+ public static final Rule GPL =
+ license(
+ ImmutableList.of(
+ "\\bIn addition to the permissions in the GNU General Public License[,.;:]?",
+ "See the [\\[]?GNU[\\]]? General Public Licen[cs]e for more details[,.;:]?",
+ "\\bGNU General Public Licen[cs]e",
+ ".*gnu (?:library|lesser) general public licen[cs]e.*"),
+ ImmutableList.of(
+ "See the [\\[]?GNU[\\]]? General Public Licen[cs]e for more details[,.;:]?",
+ "In addition to the permissions in the GNU General Public License[,.;:]?"));
+
+ /** GNU General Public License v2 */
+ public static final Rule GPL2 =
+ license(
+ ImmutableList.of(
+ ".*SPDX-License-Identifier: GPL-2.0[+]?",
+ ".*SPDX-License-Identifier: GPL-2.0-only",
+ ".*SPDX-License-Identifier: GPL-2.0-or-later",
+ ".*[\\[]?GNU[\\]]? GPL[,;]? version 2[,.;:]?.*",
+ ".*[\\[]?GNU[\\]]? General Public Licen[cs]e[,;]? version 2[,.;:]?.*",
+ "See the [\\[]?GNU[\\]]? General Public Licen[cs]e for more details[,.;:]?",
+ "You should have received a copy of the [\\[]?GNU[\\]]? General Public Licen[cs]e",
+ ".*[\\[]?GNU[\\]]? General Public Licen[cs]e as published by the Free Software"
+ + " Foundation?(?:[']s)?[,.;:]? (?:either )?version 2.*"),
+ ImmutableList.of(
+ "See the [\\[]?GNU[\\]]? General Public Licen[cs]e for more details[,.;:]?",
+ "In addition to the permissions in the GNU General Public License[,.;:]?"));
+
+ /** GNU General Public License v3 */
+ public static final Rule GPL3 =
+ license(
+ ImmutableList.of(
+ ".*SPDX-License-Identifier: GPL-3.0[+]?",
+ ".*[\\[]?GNU[\\]]? GPL[,;]? version 3[,.;:]?.*",
+ ".*[\\[]?GNU[\\]]? General Public Licen[cs]e[,;]? version 3[,.;:]?.*",
+ "See the [\\[]?GNU[\\]]? General Public Licen[cs]e for more details[,.;:]?",
+ "You should have received a copy of the [\\[]?GNU[\\]]? General Public Licen[cs]e",
+ ".*[\\[]?GNU[\\]]? General Public Licen[cs]e as published by the Free Software"
+ + " Foundation?(?:[']s)?[,.;:]? (?:either )?version 3.*"),
+ ImmutableList.of(
+ "See the [\\[]?GNU[\\]]? General Public Licen[cs]e for more details[,.;:]?",
+ "In addition to the permissions in the GNU General Public License[,.;:]?"));
+
+ /** GNU Lessor or Library General Public License */
+ public static final Rule LGPL =
+ license(
+ ImmutableList.of(
+ ".*SPDX-License-Identifier: LGPL.*",
+ ".*LGPL.*",
+ ".*gnu (?:library|lesser) general public licen[cs]e.*"));
+
+ /** MIT licensed */
+ public static final Rule MIT =
+ license(
+ ImmutableList.of(
+ ".*SPDX-License-Identifier: MIT",
+ "http://www.opensource.org/licenses/mit-license.php",
+ "^the mit licen[cs]e(?:[:] http://www.opensource.org/licenses/mit-license.php)?",
+ "^MIT licen[cs]e[,.;:]? http://www.ibiblio.org/pub/Linux/LICENSE",
+ ".*under (?:(?:the|this) )?(?:terms of )?(?:the )?mit"
+ + " (?:open source )?licen[cs]e[,.;:]?.*",
+ ".*MIT licen[cs]ed",
+ ".*terms of (?:(?:the|this) )?mit licen[cs]e[,.;:]?.*",
+ ".*this code is licen[cs]ed under the mit licen[cs]e[,;.:]?.*",
+ ".*the mit or psf open source licen[cs]es[,.]?.*",
+ ".*Dual licen[cs]ed under the MIT or.*",
+ ".*Use of this software is governed by the MIT licen[cs]e[,.;:]?.*",
+ ".*This library is free software[,.;:]? you can redistribute it and or modify it"
+ + " under the terms of the MIT licen[cs]e[,.;:]?.*",
+ ".*may be distributed under the MIT or PSF open source licen[cs]es[,.;:]?.*",
+ ".*permission is (?:hereby )?granted[,;]? free of charge[,;]? to any person.*",
+ "(?:the mit licen[cs]e )?permission is (?:hereby )?granted[,;]? free of charge[,;]?"
+ + " to any person obtaining a copy of this software and associated documentation"
+ + " files [(]?the \\p{Pi}?software\\p{Pf}[)]?[,;]? to deal (?:in|with) the"
+ + " software without restriction[,;.:]? including without limitation the rights"
+ + " to use[,;]? copy[,;]? modify[,;]? merge[,;]? publish[,;]? distribute[,;]?"
+ + " sublicense[,;]? and(?: |[/])or sell copies of the software[,;]? and to permit"
+ + " persons to whom the software is furnished to do so[,;.:]? subject to the"
+ + " following conditions[,;.:]? the above copyright notice[,;]? and this"
+ + " permission notice shall be included in all copies[,;]? or substantial"
+ + " portions of the software[,;.:]?",
+ ".*permission to use[,;]? copy[,;]? modify[,;]? (?:and )?distribute"
+ + " (?:and sell )?this software (?:and its documentation )?(?:for any purpose )?"
+ + "(?:(?:and|with or) without fee)?is (?:hereby )?granted[,;]?"
+ + " (?:without fee )?provided that the above copyright notice.*",
+ ".*I hereby give permission[,;]? free of charge[,;]? to copy[,;]? modify[,;]? and"
+ + " redistribute this software[,;]? in source or binary form[,;]? provided that"
+ + " the above copyright notice and the following disclaimer are included.*"));
+
+ /** Generic non-commercial disclaimer. */
+ public static final Rule NON_COMMERCIAL =
+ license(ImmutableList.of("\\bNON-?COMMERCIAL LICEN[CS]E"));
+
+ /** Rejects distribution under APACHE */
+ public static final Rule NOT_A_CONTRIBUTION =
+ license(
+ ImmutableList.of(
+ ".*(?:"
+ + ANY_WORD
+ + " ){2}\\p{Pi}?" // 2 words to exclude false +ves
+ + "Not a Contribution[.,;:]{0,3}\\p{Pf}?.*"), // explicitly disclaims license
+ ImmutableList.of("owner as \\p{Pi}?Not a Contribution[.,;:]{0,3}\\p{Pf}?"));
+
+ /** Python Software Foundation */
+ public static final Rule PSF = owner(ImmutableList.of("Python Software Foundation"));
+
+ /** Python Software Foundation License */
+ public static final Rule PSFL =
+ license(
+ ImmutableList.of(
+ ".*Python Software Foundation license(?: version \\p{N})?[,.;:]?.*",
+ ".*Permission to use[,;]? copy[,;]? modify[,;]? and distribute this Python software"
+ + " and its associated documentation for any purpose.*"));
+
+ /** Sun Insdustry Standards Source License */
+ public static final Rule SISSL =
+ license(ImmutableList.of("\\bSun Industry Standards Source Licen[cs]e"));
+
+ /** Watcom-1.0 license */
+ public static final Rule WATCOM =
+ license(
+ ImmutableList.of(
+ ".*Sybase Open Watcom Public License.*",
+ ".*automatically without notice if You[,;]? at any time during the term of this"
+ + " Licen[cs]e[,;]? commence an action for patent infringement [(]?including as a"
+ + " cross claim or counterclaim[)]?.*"));
+
+ /** Do What The Fuck You Want To Public License */
+ public static final Rule WTFPL =
+ license(ImmutableList.of("\\bDo What The Fuck You Want To Public Licen[cs]e"));
+
+ @VisibleForTesting
+ static ImmutableMap<String, Rule> lookup =
+ ImmutableMap.<String, Rule>builder()
+ .put("AGPL", AGPL)
+ .put("ANDROID", ANDROID)
+ .put("APACHE2", APACHE2)
+ .put("BEER_WARE", BEER_WARE)
+ .put("BSD", BSD)
+ .put("CC_BY_C", CC_BY_C)
+ .put("CC_BY_NC", CC_BY_NC)
+ .put("COMMONS_CLAUSE", COMMONS_CAUSE)
+ .put("CPAL", CPAL)
+ .put("EPL", EPL)
+ .put("EUPL", EUPL)
+ .put("EXAMPLES", EXAMPLES)
+ .put("GOOGLE", GOOGLE)
+ .put("GPL", GPL)
+ .put("GPL2", GPL2)
+ .put("GPL3", GPL3)
+ .put("LGPL", LGPL)
+ .put("MIT", MIT)
+ .put("NON_COMMERCIAL", NON_COMMERCIAL)
+ .put("NOT_A_CONTRIBUTION", NOT_A_CONTRIBUTION)
+ .put("PSF", PSF)
+ .put("PSFL", PSFL)
+ .put("SISSL", SISSL)
+ .put("WATCOM", WATCOM)
+ .put("WTFPL", WTFPL)
+ .build();
+
+ /** Immutable set of copyright rules described as lists of regular expression strings. */
+ public static class RuleSet {
+ public final ImmutableList<String> firstPartyLicenses;
+ public final ImmutableList<String> thirdPartyLicenses;
+ public final ImmutableList<String> forbiddenLicenses;
+ public final ImmutableList<String> firstPartyOwners;
+ public final ImmutableList<String> thirdPartyOwners;
+ public final ImmutableList<String> forbiddenOwners;
+ public final ImmutableList<String> excludePatterns;
+
+ /** Returns a Builder object for the RuleSet class. */
+ public static Builder builder() {
+ return new Builder();
+ }
+
+ /** Implements the Builder pattern for CopyrightPatterns.RuleSet. */
+ public static class Builder {
+ private final ImmutableList.Builder<String> firstPartyLicenses =
+ ImmutableList.<String>builder();
+ private final ImmutableList.Builder<String> thirdPartyLicenses =
+ ImmutableList.<String>builder();
+ private final ImmutableList.Builder<String> forbiddenLicenses =
+ ImmutableList.<String>builder();
+ private final ImmutableList.Builder<String> firstPartyOwners =
+ ImmutableList.<String>builder();
+ private final ImmutableList.Builder<String> thirdPartyOwners =
+ ImmutableList.<String>builder();
+ private final ImmutableList.Builder<String> forbiddenOwners = ImmutableList.<String>builder();
+ private final ImmutableList.Builder<String> excludePatterns = ImmutableList.<String>builder();
+
+ private Builder() {}
+
+ /** Create a RuleSet reflecting the current state of this Builder. */
+ public RuleSet build() {
+ return new RuleSet(
+ firstPartyLicenses.build(),
+ thirdPartyLicenses.build(),
+ forbiddenLicenses.build(),
+ firstPartyOwners.build(),
+ thirdPartyOwners.build(),
+ forbiddenOwners.build(),
+ excludePatterns.build());
+ }
+
+ /** Look up `ruleName` and add it as a 1p rule type. */
+ public Builder addFirstParty(String ruleName) {
+ Rule pattern = lookup.get(ruleName);
+ if (pattern == null) {
+ throw new UnknownPatternName(ruleName);
+ }
+ if (pattern.licenses != null) {
+ firstPartyLicenses.addAll(pattern.licenses);
+ }
+ if (pattern.owners != null) {
+ firstPartyOwners.addAll(pattern.owners);
+ }
+ if (pattern.exclusions != null) {
+ excludePatterns.addAll(pattern.exclusions);
+ }
+ return this;
+ }
+
+ /** Add the regular expression `pattern` as a 1p owner. */
+ public Builder addFirstPartyOwner(String pattern) {
+ firstPartyOwners.add(pattern);
+ return this;
+ }
+
+ /** Add the regular expression `pattern` as a 1p license */
+ public Builder addFirstPartyLicense(String pattern) {
+ firstPartyLicenses.add(pattern);
+ return this;
+ }
+
+ /** Look up `ruleName` and add it as a 3p rule type. */
+ public Builder addThirdParty(String ruleName) {
+ Rule pattern = lookup.get(ruleName);
+ if (pattern == null) {
+ throw new UnknownPatternName(ruleName);
+ }
+ if (pattern.licenses != null) {
+ thirdPartyLicenses.addAll(pattern.licenses);
+ }
+ if (pattern.owners != null) {
+ thirdPartyOwners.addAll(pattern.owners);
+ }
+ if (pattern.exclusions != null) {
+ excludePatterns.addAll(pattern.exclusions);
+ }
+ return this;
+ }
+
+ /** Add the regular expression `pattern` as a 3p owner. */
+ public Builder addThirdPartyOwner(String pattern) {
+ thirdPartyOwners.add(pattern);
+ return this;
+ }
+
+ /** Add the regular expression `pattern` as a 3p license. */
+ public Builder addThirdPartyLicense(String pattern) {
+ thirdPartyLicenses.add(pattern);
+ return this;
+ }
+
+ /** Look up `ruleName` and add it as a forbidden rule type. */
+ public Builder addForbidden(String ruleName) {
+ Rule pattern = lookup.get(ruleName);
+ if (pattern == null) {
+ throw new UnknownPatternName(ruleName);
+ }
+ if (pattern.licenses != null) {
+ forbiddenLicenses.addAll(pattern.licenses);
+ }
+ if (pattern.owners != null) {
+ forbiddenOwners.addAll(pattern.owners);
+ }
+ if (pattern.exclusions != null) {
+ excludePatterns.addAll(pattern.exclusions);
+ }
+ return this;
+ }
+
+ /** Add the regular expression `pattern` as a forbidden owner. */
+ public Builder addForbiddenOwner(String pattern) {
+ forbiddenOwners.add(pattern);
+ return this;
+ }
+
+ /** Add the regular expression `pattern` as a forbidden license. */
+ public Builder addForbiddenLicense(String pattern) {
+ forbiddenLicenses.add(pattern);
+ return this;
+ }
+
+ /** Look up `ruleName` and add it as a rule type to ignore completely. */
+ public Builder exclude(String ruleName) {
+ Rule pattern = lookup.get(ruleName);
+ if (pattern == null) {
+ throw new UnknownPatternName(ruleName);
+ }
+ if (pattern.licenses != null) {
+ excludePatterns.addAll(pattern.licenses);
+ }
+ if (pattern.owners != null) {
+ excludePatterns.addAll(pattern.owners);
+ }
+ if (pattern.exclusions != null) {
+ excludePatterns.addAll(pattern.exclusions);
+ }
+ return this;
+ }
+
+ /** Add the regular expression `pattern` to the list of patterns to ignore when found. */
+ public Builder excludePattern(String pattern) {
+ excludePatterns.add(pattern);
+ return this;
+ }
+ }
+
+ private RuleSet(
+ ImmutableList<String> firstPartyLicenses,
+ ImmutableList<String> thirdPartyLicenses,
+ ImmutableList<String> forbiddenLicenses,
+ ImmutableList<String> firstPartyOwners,
+ ImmutableList<String> thirdPartyOwners,
+ ImmutableList<String> forbiddenOwners,
+ ImmutableList<String> excludePatterns) {
+ this.firstPartyLicenses = firstPartyLicenses;
+ this.thirdPartyLicenses = thirdPartyLicenses;
+ this.forbiddenLicenses = forbiddenLicenses;
+ this.firstPartyOwners = firstPartyOwners;
+ this.thirdPartyOwners = thirdPartyOwners;
+ this.forbiddenOwners = forbiddenOwners;
+ this.excludePatterns = excludePatterns;
+ }
+ }
+
+ /** Initialize a pattern consisting of only a list of owner patterns. */
+ private static Rule owner(ImmutableList<String> owners) {
+ return new Rule(owners, null, null);
+ }
+
+ /** Initialize a pattern consisting of only a list of license patterns. */
+ private static Rule license(ImmutableList<String> licenses) {
+ return new Rule(null, licenses, null);
+ }
+
+ /** Initialize a pattern consisting of lists of license and exclusion patterns. */
+ private static Rule license(ImmutableList<String> licenses, ImmutableList<String> exclusions) {
+ return new Rule(null, licenses, exclusions);
+ }
+
+ /**
+ * A matching rule described by lists of regular expressions matching relevant licenses and
+ * owners, and a list of regular expressions matching hits to ignore when found.
+ *
+ * <p>e.g. The text "not a contribution" is important for Apache2 licensed code because it
+ * disclaims the terms of the otherwise described Apache2 license. However, this very text exists
+ * inside the Apache2 license to allow such disclaimers. An effective rule for /not a
+ * contribution/ will have to match /not a contribution/ but ignore /owner as "not a
+ * contribution"/ like it appears in the license itself.
+ */
+ @VisibleForTesting
+ static class Rule {
+ public final ImmutableList<String> exclusions;
+ public final ImmutableList<String> owners;
+ public final ImmutableList<String> licenses;
+
+ private Rule(ImmutableList<String> owners, ImmutableList<String> licenses) {
+ this(owners, licenses, null);
+ }
+
+ private Rule(
+ ImmutableList<String> owners,
+ ImmutableList<String> licenses,
+ ImmutableList<String> exclusions) {
+ this.owners = owners;
+ this.licenses = licenses;
+ this.exclusions = exclusions;
+ }
+ }
+
+ /** Thrown when requesting a pattern by a name that does not appear among the known patterns. */
+ public static class UnknownPatternName extends NoSuchElementException {
+ UnknownPatternName(String ruleName) {
+ super(
+ "Unknown pattern name: "
+ + ruleName
+ + "\nKnown pattern names include: "
+ + Joiner.on(", ").join(lookup.keySet()));
+ }
+ }
+}
diff --git a/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightScanner.java b/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightScanner.java
new file mode 100644
index 0000000..e5e2888
--- /dev/null
+++ b/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightScanner.java
@@ -0,0 +1,1038 @@
+// Copyright (C) 2019 The Android Open Source Project
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.googlesource.gerrit.plugins.copyright.lib;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Iterables;
+import com.google.common.collect.Streams;
+import java.io.IOException;
+import java.nio.CharBuffer;
+import java.util.ArrayList;
+import java.util.regex.MatchResult;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.stream.Collectors;
+
+/**
+ * Immutable file scanner for copyrights classifying the copyright matches it finds.
+ *
+ * <p>In general, configure the first-party (1p) and forbidden owners, and any generic owner matches
+ * get classified as third-party (3p) automatically. Generally, only configure third-party (3p)
+ * owners that the generic pattern will not match for some reason.
+ *
+ * <p>Licenses are different. Unknown licenses get identified as unknown and treated the same as
+ * forbidden. Configure all of the known first-party (1p), third-party (3p) and forbidden licenes.
+ *
+ * <p>Configure the scanner using simplified regular expressions. The scanner will replace sequences
+ * of whitespace with a regular sub-expression matching sequences of whitespace or comment
+ * characters. Because the scanner makes this transformation, avoid including whitespace inside
+ * character classes.
+ *
+ * <p>e.g. use "Android Open(?: |-)Source Project" not "Android Open[- ]Source Project"
+ *
+ * <p>When classifying matches as 1p, 3p or forbidden, the scanner looks for complete matches of
+ * configured patterns. i.e. "re.match()" not "re.find()"
+ *
+ * <p>It's useful to include wildcards in configured patterns to match sub-sequences in generic
+ * matches, but these can cause excessive backtracking leading to performance problems or even stack
+ * exhaustion. The scanner replaces the wildcards '.*' and '.+' with expressions matching a more
+ * limited set of characters for a shorter length that will generally match what is expected.
+ *
+ * <p>This allows simple configuration patterns like ".*Licen[cs]ed under the Apache Licen[cs]e,?"
+ * without the risks normally caused by wildcard patterns.
+ */
+public final class CopyrightScanner {
+
+ private final Pattern copyright; // Full regular expression for scanner to match.
+ private final ImmutableList<Pattern> firstPartyLicenses; // Match 1p licenses.
+ private final ImmutableList<Pattern> thirdPartyLicenses; // Match 3p licenses.
+ private final ImmutableList<Pattern> forbiddenLicenses; // Match forbidden licences.
+ private final ImmutableList<Pattern> firstPartyOwners; // Match 1p authors/matches.
+ private final ImmutableList<Pattern> thirdPartyOwners; // Match 3p authors/matches.
+ private final ImmutableList<Pattern> forbiddenOwners; // Match forbidden authors.
+ private final ImmutableList<Pattern> contractWords; // Match license words.
+ private final ImmutableList<Pattern> excludePatterns; // Exclude when found.
+
+ // Most files that have a copyright or license declaration have 1 of them -- or at most 2 or 3.
+ // NOTICE files can have thousands all derived from other files in the repository. No need to find
+ // them all. Picked a small multiple of the expected number of licenses per file to catch any
+ // long-tail files without wasting much effort on derivative NOTICE files etc.
+ private static final int MATCH_THRESHOLD = 10;
+
+ // Determined empirically by scanning millions of files on several hosts and looking at the offset
+ // of the first matched copyright or license declaration. A couple .cpp files have copyright
+ // declarations near the end of the file for some function or class copied from a third party.
+ //
+ // The only files where the first match appeared later than 230k or so were a few multi-gigabyte
+ // build images derived entirely from other files in the repository. Picked a power of 2 large
+ // enough to report all or virtually all of the source files with copyright declarations; even if
+ // it doesn't report all of the declarations in the largest source files.
+ //
+ // There is an obvious trade-off for performance here. Increasing the maximum search length beyond
+ // this threshold makes little or no difference for detecting problematic licenses, but does
+ // increase scan durations at least linearly for larger files. Reducing the maximum search length
+ // significantly below this threshold increases the risk a problematic license will go undetected.
+ private static final int MAX_SEARCH_LENGTH = 256 * 1024;
+
+ // All of the MAX parameters below have been chosen empirically similar to MATCH_SEARCH_LENGTH to
+ // minimize computing cost while still catching virtually all of the important matches.
+
+ /** Maximum length of consecutive text characters to match. */
+ private static final int MAX_NAME_LENGTH = CopyrightPatterns.MAX_NAME_LENGTH;
+ /** Maximum number of potential names to match. */
+ private static final int MAX_NAME_REPETITION = CopyrightPatterns.MAX_NAME_REPETITION;
+ /** Maximum length of consecutive space/comment characters to match. */
+ private static final int MAX_SPACE_LENGTH = CopyrightPatterns.MAX_SPACE_LENGTH;
+ /** Maximum repetition of potential dates to match. Might have to revisit this in future. */
+ private static final int MAX_DATE_REPETITION = CopyrightPatterns.MAX_DATE_REPETITION;
+
+ /** Regular expression matching whitespace or a comment character. */
+ private static final String WS = CopyrightPatterns.WS;
+ /** Regular expression matching whitespace, a comment character, or punctuation. */
+ private static final String WSPCT = CopyrightPatterns.WSPCT;
+ /** Regular experssion matching a web address. */
+ private static final String URL = CopyrightPatterns.URL;
+ /** Regular expression matching a text or email address. */
+ public static final String NAME = CopyrightPatterns.NAME;
+ /** Regular expression matching an UPPERCASE text. */
+ public static final String UPPER_NAME = CopyrightPatterns.UPPER_NAME;
+ /** Regular expression matching a Proper Case text. */
+ public static final String PROPER_NAME = CopyrightPatterns.PROPER_NAME;
+ /** Regular expression matching any text, email address, or quote character. */
+ private static final String ANY_CHAR = CopyrightPatterns.ANY_CHAR;
+ /** Regular expression matching any text, email address, or quoted string. */
+ public static final String ANY_WORD = CopyrightPatterns.ANY_WORD;
+
+ /**
+ * Regular expressions to match arbitrary contract words.
+ *
+ * <p>Purposefully pushed the definition of common contract words to the lowest levels of the
+ * library to make it difficult--but not impossible--to customize the word list.
+ *
+ * <p>There are many words one can think of that are common to license contracts that do not
+ * appear here. For example, "grant" and "permission" lead to many false positives due to their
+ * use associated with ACLs and visibility etc. The word "contributed" appears so many times in
+ * .xml files in the Android code base that it adds significant latency and had to be removed.
+ *
+ * <p>Most license declarations will have multiple of these words so if a particular word causes a
+ * problem in a particular code base, it is probably okay to remove it for all code bases without
+ * too large a reduction in true positives. But please, check first.
+ *
+ * <p>Take care adding new words to make sure they do increase the number of true positives
+ * without causing other problems. Remember that the existing word list was arrived at empirically
+ * by adding many candidates and then pruning.
+ *
+ * <p>If the word lists really must diverge among different code bases, make the 2nd constructor
+ * public, and provide different word lists at a higher level.
+ */
+ private static final ImmutableList<String> CONTRACT_WORDS =
+ ImmutableList.of(
+ "agree(?:s|d|ment)?",
+ "amendments?",
+ "applicable laws?",
+ "any manner",
+ "auth?or(?:s|ed|ship)?:?(?-i: \\p{Lu}\\p{Ll}*){2,5}",
+ "breach",
+ "(?:(?:required|return|allocated|allowed|contributed|copyrighted|generated|provided"
+ + "|raised|understandable|used|written) )?by:? @[-\\p{L}\\p{N}._]+",
+ "(?:(?:required|return|allocated|allowed|contributed|copyrighted|generated|provided"
+ + "|raised|understandable|used|written) )?by:? [-\\p{L}\\p{N}._]+@[-\\p{L}\\p{N}._]+",
+ "(?:(?:required|return|allocated|allowed|contributed|copyrighted|generated|provided"
+ + "|raised|understandable|used|written) )?by:?(?-i: \\p{Lu}\\p{Ll}*){2,5}",
+ "charge for",
+ "constitut(?:e|es|ed|ing)",
+ "contract(?:s|ed|ing|ual|ually)?",
+ // contributed removed -- frequent appearance in large .xml files increases latency
+ "contribut(?:e|es|or|ors|ion|ions)",
+ "copyleft",
+ "\\p{L}+ copyright(?:able)? \\p{L}+",
+ "damages",
+ "derivative",
+ "disclaim(?:s|ed|er)?",
+ "endorsements?",
+ " [(]?EUPL[)]? ",
+ "exemplary",
+ "expressly",
+ "fitness",
+ "govern(?:s|ed|ing)?",
+ "here(?:by|under)",
+ "herein(?:after)?",
+ "however caused",
+ "incidental",
+ "infring(?:e|es|ed|ing)",
+ "injury",
+ "jurisdictions?",
+ "lawful",
+ "liable",
+ "liabilit(?:ies|y)",
+ "(?:re)?licen[cs](?:e(?![:])|es|ed|ing|or)",
+ "litigation",
+ "merchantability",
+ "must agree",
+ "negligen(?:ce|t)",
+ "no event",
+ "no provision",
+ "(?:non|un)enforce(?:s|d|able|ability)?",
+ "nonexclusive",
+ "notwithstanding",
+ "obligations?",
+ "otherwise agreed",
+ "perpetu(?:al|ity)",
+ "phonorecords?",
+ "prior written",
+ "provisions",
+ "public domain",
+ "(?-i:(?:" + UPPER_NAME + " ){0,5}PUBLIC LICEN[CS]E)",
+ "(?-i:(?:" + PROPER_NAME + " ){0,5}Public Licen[cs]e)",
+ "punitive",
+ "pursuant",
+ "redistribut(?:e|ion)",
+ "right to",
+ "royalties",
+ "set forth",
+ " [(]?SISSL[)]? ",
+ "SPDX-License-Identifier[:]?",
+ "stoppage",
+ "terms and conditions",
+ "the laws of",
+ "third party",
+ "tort(?:s|ious)?",
+ "trademark",
+ "waive(?:s|d|r)?",
+ "warrant(?:s|y|ee|ed|ing)?",
+ "whatsoever");
+
+ public CopyrightScanner(
+ Iterable<String> firstPartyLicenses,
+ Iterable<String> thirdPartyLicenses,
+ Iterable<String> forbiddenLicenses,
+ Iterable<String> firstPartyOwners,
+ Iterable<String> thirdPartyOwners,
+ Iterable<String> forbiddenOwners,
+ Iterable<String> excludePatterns) {
+ this(
+ firstPartyLicenses,
+ thirdPartyLicenses,
+ forbiddenLicenses,
+ firstPartyOwners,
+ thirdPartyOwners,
+ forbiddenOwners,
+ excludePatterns,
+ CONTRACT_WORDS);
+ }
+
+ private CopyrightScanner(
+ Iterable<String> firstPartyLicenses,
+ Iterable<String> thirdPartyLicenses,
+ Iterable<String> forbiddenLicenses,
+ Iterable<String> firstPartyOwners,
+ Iterable<String> thirdPartyOwners,
+ Iterable<String> forbiddenOwners,
+ Iterable<String> excludePatterns,
+ Iterable<String> contractWords) {
+ ImmutableList.Builder<Pattern> b = ImmutableList.builder();
+ if (firstPartyLicenses != null) {
+ for (String license : firstPartyLicenses) {
+ b.add(patternizeKnownMatch(license));
+ }
+ }
+ this.firstPartyLicenses = b.build();
+ b = ImmutableList.builder();
+ if (thirdPartyLicenses != null) {
+ for (String license : thirdPartyLicenses) {
+ b.add(patternizeKnownMatch(license));
+ }
+ }
+ this.thirdPartyLicenses = b.build();
+ b = ImmutableList.builder();
+ if (forbiddenLicenses != null) {
+ for (String license : forbiddenLicenses) {
+ b.add(patternizeKnownMatch(license));
+ }
+ }
+ this.forbiddenLicenses = b.build();
+ b = ImmutableList.builder();
+ if (firstPartyOwners != null) {
+ for (String owner : firstPartyOwners) {
+ b.add(patternizeKnownMatch(owner));
+ }
+ }
+ this.firstPartyOwners = b.build();
+ b = ImmutableList.builder();
+ if (thirdPartyOwners != null) {
+ for (String owner : thirdPartyOwners) {
+ b.add(patternizeKnownMatch(owner));
+ }
+ }
+ this.thirdPartyOwners = b.build();
+ b = ImmutableList.builder();
+ if (forbiddenOwners != null) {
+ for (String owner : forbiddenOwners) {
+ b.add(patternizeKnownMatch(owner));
+ }
+ }
+ this.forbiddenOwners = b.build();
+ b = ImmutableList.builder();
+ for (String word : contractWords) {
+ b.add(patternizeKnownMatch(word));
+ }
+ this.contractWords = b.build();
+ Preconditions.checkArgument(!this.contractWords.isEmpty());
+ b = ImmutableList.builder();
+ if (excludePatterns != null) {
+ for (String pattern : excludePatterns) {
+ b.add(Pattern.compile(pattern)); // not transformed because applies to normalized matches
+ }
+ }
+ this.excludePatterns = b.build();
+ this.copyright = buildPattern();
+ }
+
+ /**
+ * Scans `source` for copyright notices returning found license/author/owner information.
+ *
+ * @param name Arbitrary string identifying the source. Usually a filename.
+ * @param size Hint regarding the expected size of the input source. Use -1 if unknown.
+ * @param source The source input stream with line endings indexed for lookup.
+ * @return the list of matches found in the input stream -- never null.
+ */
+ public ImmutableList<Match> findMatches(String name, long size, IndexedLineReader source)
+ throws IOException {
+ Preconditions.checkNotNull(name);
+ Preconditions.checkNotNull(source);
+
+ ImmutableList.Builder<Match> builder = ImmutableList.builder();
+
+ // Accumulates unknown licenses in case no known matches found.
+ ArrayList<Match> unknowns = new ArrayList<>();
+
+ // Allocate a character buffer using the size hint.
+ int searchLength = size < 1 || size > MAX_SEARCH_LENGTH ? MAX_SEARCH_LENGTH : (int) size;
+ char[] content = new char[searchLength > 2 ? searchLength : 2]; // minimum 2 chars required
+ CharBuffer cb = CharBuffer.wrap(content);
+
+ // Read the input into the character buffer.
+ source.read(cb);
+ cb.flip(); // Switch from tracking available space to read into to tracking amount read.
+
+ int numUnknown = 0; // track number of contract words from unknown licenses found
+ int numLicenses = 0; // track number of licenses versus owners added to the builder
+ int numLicenseGroups = // First 2 or 3 captured groups are licenses. Rest are author/owner.
+ firstPartyLicenses.isEmpty() && thirdPartyLicenses.isEmpty() && forbiddenLicenses.isEmpty()
+ ? 2
+ : 3;
+
+ Matcher matcher = copyright.matcher(cb);
+ while (matcher.find()) {
+ MatchResult mr = matcher.toMatchResult();
+ int numBuilt = 0; // track number of matches added to the builder
+ for (int i = 1; i <= mr.groupCount(); i++) { // group 0 is entire match not a specific group
+ String license = normalizeLicense(mr.group(i));
+ if (license == null || license.trim().isEmpty() || isExcluded(license)) {
+ continue;
+ }
+ String owner = normalizeOwner(license);
+ if (isForbiddenLicense(license)) {
+ builder.add(
+ new Match(
+ PartyType.FORBIDDEN,
+ MatchType.LICENSE,
+ normalizeLicense(mr.group()),
+ source.getLineNumber(mr.start(i)),
+ source.getLineNumber(mr.end(i)),
+ mr.start(i),
+ mr.end(i)));
+ numLicenses++;
+ } else if (isThirdPartyLicense(license)) {
+ builder.add(
+ new Match(
+ PartyType.THIRD_PARTY,
+ MatchType.LICENSE,
+ normalizeLicense(mr.group()),
+ source.getLineNumber(mr.start(i)),
+ source.getLineNumber(mr.end(i)),
+ mr.start(i),
+ mr.end(i)));
+ numLicenses++;
+ } else if (isFirstPartyLicense(license)) {
+ builder.add(
+ new Match(
+ PartyType.FIRST_PARTY,
+ MatchType.LICENSE,
+ normalizeLicense(mr.group()),
+ source.getLineNumber(mr.start(i)),
+ source.getLineNumber(mr.end(i)),
+ mr.start(i),
+ mr.end(i)));
+ numLicenses++;
+ } else if (i <= numLicenseGroups) { // first 2 or 3 groups are licenses
+ builder.add(
+ new Match(
+ PartyType.UNKNOWN, // unknown licenses classified as unknown
+ MatchType.LICENSE,
+ normalizeLicense(mr.group()),
+ source.getLineNumber(mr.start(i)),
+ source.getLineNumber(mr.end(i)),
+ mr.start(i),
+ mr.end(i)));
+ numLicenses++;
+ } else if (license.toLowerCase().contains("license")
+ || license.toLowerCase().contains("licence")) {
+ builder.add(
+ new Match(
+ PartyType.UNKNOWN, // unknown licenses classified as unknown
+ MatchType.LICENSE,
+ normalizeLicense(mr.group()),
+ source.getLineNumber(mr.start(i)),
+ source.getLineNumber(mr.end(i)),
+ mr.start(i),
+ mr.end(i)));
+ numLicenses++;
+ } else if (isForbiddenOwner(owner)) {
+ builder.add(
+ new Match(
+ PartyType.FORBIDDEN,
+ normalizeLicense(mr.group()),
+ source.getLineNumber(mr.start(i)),
+ source.getLineNumber(mr.end(i)),
+ mr.start(i),
+ mr.end(i)));
+ } else if (isThirdPartyOwner(owner)) {
+ builder.add(
+ new Match(
+ PartyType.THIRD_PARTY,
+ normalizeLicense(mr.group()),
+ source.getLineNumber(mr.start(i)),
+ source.getLineNumber(mr.end(i)),
+ mr.start(i),
+ mr.end(i)));
+ } else if (isFirstPartyOwner(owner)) {
+ builder.add(
+ new Match(
+ PartyType.FIRST_PARTY,
+ normalizeLicense(mr.group()),
+ source.getLineNumber(mr.start(i)),
+ source.getLineNumber(mr.end(i)),
+ mr.start(i),
+ mr.end(i)));
+ } else { // remainder of groups are owner/author copyrights
+ builder.add(
+ new Match(
+ PartyType.THIRD_PARTY, // unknown authors classified as third party.
+ normalizeLicense(mr.group()),
+ source.getLineNumber(mr.start(i)),
+ source.getLineNumber(mr.end(i)),
+ mr.start(i),
+ mr.end(i)));
+ }
+ numBuilt++;
+ }
+ // If no capture group has content, the entire match is a word from an unknown contract.
+ // Don't bother accumulating unknown contract matches after known patterns detected.
+ if (numLicenses == 0 && numBuilt == 0 && numUnknown <= MATCH_THRESHOLD) {
+ String license = normalizeLicense(mr.group());
+ if (license.matches("(?i)no copyright(?:able)?.*")) { // exclude negated match
+ continue;
+ }
+ if (isExcluded(license)) {
+ continue;
+ }
+ if (license.matches( // exclude common implementation comments using the word `by`
+ "(?i:required|return|allocated|allowed|generated|provided|raised|understandable"
+ + "|used) by .*")) {}
+ int startLine = source.getLineNumber(mr.start());
+ int endLine = source.getLineNumber(mr.end());
+ String owner = normalizeOwner(license);
+ if (isForbiddenLicense(license)) {
+ builder.add(
+ new Match(
+ PartyType.FORBIDDEN,
+ MatchType.LICENSE,
+ license,
+ startLine,
+ endLine,
+ mr.start(),
+ mr.end()));
+ numBuilt++;
+ continue;
+ } else if (isThirdPartyLicense(license)) {
+ builder.add(
+ new Match(
+ PartyType.THIRD_PARTY,
+ MatchType.LICENSE,
+ license,
+ startLine,
+ endLine,
+ mr.start(),
+ mr.end()));
+ numBuilt++;
+ continue;
+ } else if (isFirstPartyLicense(license)) {
+ builder.add(
+ new Match(
+ PartyType.FIRST_PARTY,
+ MatchType.LICENSE,
+ license,
+ startLine,
+ endLine,
+ mr.start(),
+ mr.end()));
+ numBuilt++;
+ continue;
+ } else if (isForbiddenOwner(owner)) {
+ builder.add(
+ new Match(PartyType.FORBIDDEN, license, startLine, endLine, mr.start(), mr.end()));
+ numBuilt++;
+ continue;
+ } else if (isThirdPartyOwner(owner)) {
+ builder.add(
+ new Match(PartyType.THIRD_PARTY, license, startLine, endLine, mr.start(), mr.end()));
+ numBuilt++;
+ continue;
+ } else if (isFirstPartyOwner(owner)) {
+ builder.add(
+ new Match(PartyType.FIRST_PARTY, license, startLine, endLine, mr.start(), mr.end()));
+ numBuilt++;
+ continue;
+ }
+ Match priorMatch = !unknowns.isEmpty() ? Iterables.getLast(unknowns) : null;
+ // If close to an earlier match (within 6 lines or 300 chars), extend the match to include
+ // the new word.
+ if (priorMatch != null
+ && (startLine - priorMatch.endLine < 6 || mr.start() - priorMatch.end < 300)) {
+ priorMatch.text = priorMatch.text + "..." + license;
+ priorMatch.endLine = endLine;
+ priorMatch.end = mr.end();
+ } else {
+ // Otherwise, create a new match.
+ if (numUnknown < MATCH_THRESHOLD) {
+ unknowns.add(
+ new Match(
+ PartyType.UNKNOWN,
+ MatchType.LICENSE,
+ license,
+ startLine,
+ endLine,
+ mr.start(),
+ mr.end()));
+ }
+ numUnknown++;
+ }
+ }
+ // Stop the search early if enough known patterns already matched.
+ if (numBuilt >= MATCH_THRESHOLD) {
+ break;
+ }
+ }
+ // Return unknown contracts only when found and no known patterns matched.
+ if (numLicenses == 0) {
+ builder.addAll(unknowns);
+ }
+ return builder.build();
+ }
+
+ /**
+ * Constructs the search pattern incorporating the known matches into the generic regular
+ * expression.
+ *
+ * <p>The first 2 or 3 match groups correspond to license matches. If the configuration specifies
+ * known license patterns (1p, 3p or forbidden), the 1st match group will include these matches.
+ *
+ * <p>If the configuration specifies no known license patterns, the 1st and 2nd match groups will
+ * include matches to the generic license pattern. Otherwise, the 2nd and 3rd match groups will
+ * include these.
+ *
+ * <p>Subsequent match groups are all copyright author/owner matches.
+ *
+ * <p>The arbitrary contract words expression uses a non-capturing group. If none of the other
+ * match groups contain any content, the entire match is treated as an unknown license word.
+ */
+ private Pattern buildPattern() {
+ StringBuilder words = new StringBuilder();
+ for (Pattern word : contractWords) {
+ if (words.length() > 0) {
+ words.append('|');
+ }
+ words.append(word);
+ }
+
+ StringBuilder owners = new StringBuilder();
+ owners.append("(?:by");
+ owners.append(WS);
+ owners.append("{1,msl})?(?:the");
+ owners.append(WS);
+ owners.append("{1,msl})?("); // owner expression always captured here
+ for (Pattern owner : thirdPartyOwners) {
+ String s = owner.toString();
+ int start = s.startsWith(".*") || s.startsWith(".+") ? 2 : 0;
+ int end = s.endsWith(".*") || s.endsWith(".+") ? s.length() - 2 : s.length();
+ owners.append(owner.toString().substring(start, end));
+ owners.append('|');
+ }
+ for (Pattern owner : firstPartyOwners) {
+ String s = owner.toString();
+ int start = s.startsWith(".*") || s.startsWith(".+") ? 2 : 0;
+ int end = s.endsWith(".*") || s.endsWith(".+") ? s.length() - 2 : s.length();
+ owners.append(owner.toString().substring(start, end));
+ owners.append('|');
+ }
+ for (Pattern owner : forbiddenOwners) {
+ String s = owner.toString();
+ int start = s.startsWith(".*") || s.startsWith(".+") ? 2 : 0;
+ int end = s.endsWith(".*") || s.endsWith(".+") ? s.length() - 2 : s.length();
+ owners.append(owner.toString().substring(start, end));
+ owners.append('|');
+ }
+ owners.append("(?:");
+ owners.append(NAME);
+ owners.append("(?:");
+ owners.append(WS);
+ owners.append("{1,msl}");
+ owners.append(NAME);
+ owners.append("){0,mnr}))"); // end of owner capture
+
+ // One of the frequent objections to regular expressions is the objection that long or complex
+ // expressions are difficult to read, and they are. Avoid changes to the expressions below. If
+ // given a choice between making a change below or adding a few "known owner"/"known license"
+ // patterns to the configuration, bias toward configuration.
+ //
+ // If that is not possible, one of the most difficult tasks when maintaining these expressions
+ // is balancing the parentheses and braces at the appropriate parts. The author of the below
+ // expression added a System.err.println() statement to output:
+ // pattern.toString().replaceall("([(](?:[?][:])?)", "$1\n").replaceall("[)]", "\n$1")
+ // inserting newlines after opening parentheses and before closing parentheses. The output
+ // was then fed through an awk script to indent the nested expressions:
+
+ /* awk '
+ BEGIN {
+ p="";
+ }
+ $0 ~ /^[)].*$/ {
+ p=substr(p,1, length(p)-2);
+ }
+ {
+ print p $0;
+ }
+ $0 ~ /[(]([?][:])?$/ {
+ p=p " ";
+ }
+ '
+ */
+ // From that output, it was possible to see where parentheses balanced and what changes to make
+ // to edit the expression correctly. Not for the fainthearted.
+ StringBuilder sb = new StringBuilder();
+
+ // Optional known licence capture.
+ if (!firstPartyLicenses.isEmpty()
+ || !thirdPartyLicenses.isEmpty()
+ || !forbiddenLicenses.isEmpty()) {
+ sb.append("("); // start of optional 1st captured match group
+ sb.append(
+ Streams.concat(
+ thirdPartyLicenses.stream(),
+ firstPartyLicenses.stream(),
+ forbiddenLicenses.stream())
+ .map(
+ input -> {
+ if (input == null) {
+ return "";
+ }
+ String s = input.toString();
+ int start = s.startsWith(".*") || s.startsWith(".+") ? 2 : 0;
+ int end = s.endsWith(".*") || s.endsWith(".+") ? s.length() - 2 : s.length();
+ return input.toString().substring(start, end);
+ })
+ .collect(Collectors.joining("|")));
+ sb.append(")|"); // end of optional 1st captured group and | to introduce 2nd captured group.
+ }
+
+ // Other license captures. -- ends with License
+ sb.append("(?:is"); // not captured -- helps confirm license but interferes with matching 1p,3p
+ sb.append(WS);
+ sb.append("{1,msl}(?:distributed|provided)");
+ sb.append(WS);
+ sb.append("{1,msl}under(?:");
+ sb.append(WS);
+ sb.append("{1,msl}(?:the|this))?");
+ sb.append(WS);
+ sb.append("{1,msl}((?:"); // start of 1st or 2nd captured match group
+ sb.append(NAME);
+ sb.append(WS);
+ sb.append(
+ "{1,msl}){2,mnr}?licen[cs]e))[,.;]{0,3}(?![:])"); // end of 1st or 2nd captured match group
+
+ // Other license captures. -- Line starting with License:
+ sb.append("|(?-ms:licen[cs]e:\\s{1,msl}("); // start of the 2nd or 3rd captured match group
+ sb.append(NAME);
+ sb.append("(?:\\s{1,msl}");
+ sb.append(NAME);
+ sb.append("){0,mnr})\\n)"); // end of 2nd or 3rd captured match group
+
+ // "Author is" copyright capture.
+ sb.append("|\\b(?:(?:the"); // not captured--helps confirm but interferes with 1p, 3p, forbidden
+ sb.append(WS);
+ sb.append("{1,msl}author");
+ sb.append(WS);
+ sb.append("{1,msl}of");
+ sb.append(WS);
+ sb.append("{1,msl}this");
+ sb.append(WS);
+ sb.append("{1,msl}software");
+ sb.append(WS);
+ sb.append("{1,msl}is|\\b(?:(?:principal");
+ sb.append(WS);
+ sb.append("{1,msl})?author:?))");
+ sb.append(WS);
+ sb.append("{1,msl}");
+ sb.append(owners.toString()); // owner pattern includes capture group
+ sb.append(")");
+
+ // Copyright+year(s)+owner copyright capture.
+ sb.append("|(?:"); // not captureed -- helps confirm but interferes with 1p, 3p, forbidden
+ sb.append(WS);
+ sb.append("{0,msl}(?:[(]c[)]|©|©)");
+ sb.append(WS);
+ sb.append("{0,msl})?(?:(?:copy(?:right|left)(?:");
+ sb.append(WS);
+ sb.append("{1,msl}notice)?(?:");
+ sb.append(WS);
+ sb.append("{0,msl}(?:[(]c[)]|©|©))?)|(?:[(]c[)]|©|©))");
+ sb.append(WS);
+ sb.append("{1,msl}(?:");
+ sb.append("[\\p{N}]{2,4}(?:"); // year(s)+owner
+ sb.append(WSPCT);
+ sb.append("{1,msl}(?:and");
+ sb.append(WSPCT);
+ sb.append("{1,msl})?[\\p{N}]{2,4}){0,mdr}(?:"); // allows pre-y2k 2-digit years
+ sb.append(WSPCT);
+ sb.append("{1,msl}(?:present|now))?");
+ sb.append(WSPCT);
+ sb.append("{1,msl}");
+ sb.append(owners.toString()); // owner pattern includes capture group
+ sb.append("|"); // owner+year(s)
+ sb.append(owners.toString()); // owner pattern includes capture group
+ sb.append(WS);
+ sb.append("{1,msl}[\\p{N}]{2,4}(?:"); // allows pre-y2k 2-digit years
+ sb.append(WSPCT);
+ sb.append("{1,msl}(?:and");
+ sb.append(WSPCT);
+ sb.append("{1,msl})?[\\p{N}]{2,4}){0,mdr}(?:"); // allows pre-y2k 2-digit years
+ sb.append(WSPCT);
+ sb.append("{1,msl}(?:present|now))?");
+ sb.append(")(?:(?:portions)?");
+ sb.append(WS);
+ sb.append("{0,msl}(?:[(]c[)]|©|©)?");
+ sb.append(WS);
+ sb.append("{1,msl}copy(?:right|left)(?:");
+ sb.append(WS);
+ sb.append("{0,msl}(?:[(]c[)]|©|©))?");
+ sb.append(WS);
+ sb.append("{1,msl}(?:");
+ sb.append("[\\p{N}]{2,4}(?:"); // year(s)+owner
+ sb.append(WSPCT);
+ sb.append("{1,msl}(?:and");
+ sb.append(WSPCT);
+ sb.append("{1,msl})?[\\p{N}]{2,4}){0,mdr}(?:"); // allows pre-y2k 2-digit years
+ sb.append(WSPCT);
+ sb.append("{1,msl}(?:present|now))?");
+ sb.append(WSPCT);
+ sb.append("{1,msl}");
+ sb.append(owners.toString()); // owner pattern (repeated) includes capture group
+ sb.append("|"); // owner+year(s)
+ sb.append(owners.toString()); // owner pattern (repeated) includes capture group
+ sb.append(WS);
+ sb.append("{1,msl}[\\p{N}]{2,4}(?:"); // allows pre-y2k 2-digit years
+ sb.append(WSPCT);
+ sb.append("{1,msl}(?:and");
+ sb.append(WSPCT);
+ sb.append("{1,msl})?[\\p{N}]{2,4}){0,mdr}(?:"); // allows pre-y2k 2-digit years
+ sb.append(WSPCT);
+ sb.append("{1,msl}(?:present|now))?");
+ sb.append(")){0,5}"); // captures 0 to 5 additional author/owner declarations
+
+ // Detect contract words to detect unknown licenses.
+ sb.append("|(?:(?:\\b|\\p{Pi})(?:"); // unknown licenses use non-capturing group
+ sb.append(words);
+ sb.append(")(?:");
+ sb.append(WS);
+ sb.append("(?:");
+ sb.append(words);
+ sb.append(")){0,mnr}(?:\\b|[,.;:\\p{Pf}]))");
+
+ return Pattern.compile(
+ sb.toString()
+ .replaceAll("[,]mnl[}]", "," + MAX_NAME_LENGTH + "}")
+ .replaceAll("[,]msl[}]", "," + MAX_SPACE_LENGTH + "}")
+ .replaceAll("[,]mnr[}]", "," + MAX_NAME_REPETITION + "}")
+ .replaceAll("[,]mdr[}]", "," + MAX_DATE_REPETITION + "}"),
+ Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.UNICODE_CASE | Pattern.DOTALL);
+ }
+
+ /** Returns true when `owner` matches any known first party owner. */
+ private boolean isExcluded(String match) {
+ for (Pattern p : excludePatterns) {
+ if (p.matcher(match).find()) {
+ return true;
+ }
+ }
+ return false;
+ }
+
+ /** Returns true when `owner` matches any known first party owner. */
+ private boolean isFirstPartyOwner(String owner) {
+ if (owner == null || owner.isEmpty()) {
+ return false;
+ }
+ for (Pattern p : firstPartyOwners) {
+ if (p.matcher(owner).matches()) {
+ return true;
+ }
+ }
+ return false;
+ }
+
+ /** Returns true when `owner` matches any known forbidden owner. */
+ private boolean isForbiddenOwner(String owner) {
+ if (owner == null || owner.isEmpty()) {
+ return false;
+ }
+ for (Pattern p : forbiddenOwners) {
+ if (p.matcher(owner).matches()) {
+ return true;
+ }
+ }
+ return false;
+ }
+
+ /** Returns true when `owner` matches any known third party owner. */
+ private boolean isThirdPartyOwner(String owner) {
+ if (owner == null || owner.isEmpty()) {
+ return false;
+ }
+ for (Pattern p : thirdPartyOwners) {
+ if (p.matcher(owner).matches()) {
+ return true;
+ }
+ }
+ return false;
+ }
+
+ /** Returns true when `license` matches any known first party license. */
+ private boolean isFirstPartyLicense(String license) {
+ for (Pattern p : firstPartyLicenses) {
+ if (p.matcher(license).matches()) {
+ return true;
+ }
+ }
+ return false;
+ }
+
+ /** Returns true when `license` matches any known forbidden license. */
+ private boolean isForbiddenLicense(String license) {
+ for (Pattern p : forbiddenLicenses) {
+ if (p.matcher(license).matches()) {
+ return true;
+ }
+ }
+ return false;
+ }
+
+ /** Returns true when `license` matches any known third party license. */
+ private boolean isThirdPartyLicense(String license) {
+ for (Pattern p : thirdPartyLicenses) {
+ if (p.matcher(license).matches()) {
+ return true;
+ }
+ }
+ return false;
+ }
+
+ /**
+ * Converts a known matching pattern written in a simplified regular expression language into a
+ * regular expression treating comment characters as whitespace and replacing unlimited wildcard
+ * expressions with expressions using a limited set of characters and a limited quantifier.
+ */
+ private static Pattern patternizeKnownMatch(String match) {
+ Preconditions.checkNotNull(match);
+ Preconditions.checkArgument(!match.isEmpty(), "Non-empty pattern required.");
+ // Disallow capture groups which will interfere with 1p, 3p, or forbidden classification.
+ Preconditions.checkArgument(
+ !match.matches("(?:^|.*[^\\[])[(][^?](?:[^:].*|$)"),
+ "Capturing group found in /" + match + "/. Use non-capturing (?:...) instead of (...).");
+ // Disallow spaces inside character classes because they will get replaced.
+ Preconditions.checkArgument(
+ !match.matches(".*\\[[^]]*\\s[]].*"),
+ "Character class with space in /" + match + "/. Use (?: |...) instead of space in [...].");
+ // Replace unlimited "any char" wildcards that can cost too much backtracking with patterns that
+ // match a smaller subset of characters with more limited quantifiers.
+ //
+ // Replace any sequence of whitespace with a regular expression to match any non-empty sequence
+ // of whitespace or comment characters.
+ String prefix = "";
+ if (match.startsWith(".*")) {
+ prefix = ".*";
+ } else if (match.startsWith(".+")) {
+ prefix = ".*";
+ }
+ String suffix = "";
+ if (match.endsWith(".*")) {
+ suffix = ".*";
+ } else if (match.endsWith(".+")) {
+ suffix = ".*";
+ }
+ return Pattern.compile(
+ prefix
+ + match
+ .substring(prefix.length(), match.length() - suffix.length())
+ .replaceAll(
+ "[.][*]",
+ ("(?: "
+ + ANY_CHAR
+ + "{1,"
+ + MAX_NAME_LENGTH
+ + "}){0,"
+ + MAX_NAME_REPETITION
+ + "}")
+ .replace("\\", "\\\\"))
+ .replaceAll(
+ "[.][+]",
+ ("(?: "
+ + ANY_CHAR
+ + "{1,"
+ + MAX_NAME_LENGTH
+ + "}){1,"
+ + MAX_NAME_REPETITION
+ + "}")
+ .replace("\\", "\\\\"))
+ .replaceAll("\\s+[?]", WS.replace("\\", "\\\\") + "{0," + MAX_SPACE_LENGTH + "}")
+ .replaceAll("\\s+", WS.replace("\\", "\\\\") + "{1," + MAX_SPACE_LENGTH + "}")
+ + suffix,
+ Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.UNICODE_CASE | Pattern.DOTALL);
+ }
+
+ /**
+ * Replaces sequences of whitespace and comment characters with a single space preserving URLs,
+ * which often contain `/` or `#` as non-comment characters.
+ */
+ private static String normalizeLicense(String match) {
+ if (match == null) {
+ return null;
+ }
+ StringBuilder sb = new StringBuilder();
+ Matcher m = Pattern.compile(URL).matcher(match);
+ int nextIndex = 0;
+ while (m.find()) {
+ int start = m.start();
+ if (nextIndex < start) {
+ sb.append(match.substring(nextIndex, start).replaceAll(WS + "+", " "));
+ }
+ sb.append(m.group());
+ nextIndex = m.end();
+ }
+ if (nextIndex < match.length()) {
+ sb.append(match.substring(nextIndex).replaceAll(WS + "+", " "));
+ }
+ return sb.toString().trim();
+ }
+
+ /**
+ * Strips common non-author/owner suffixes that get picked up unintentionally from previously
+ * normalized license with sequences of whitespace and comment characters replaced with a single
+ * space preserving URLS, which often contain `/` or `#` as non-comment characters.
+ *
+ * <p>The generic license pattern always ends by matching the word `license` or stops at the end
+ * of the line so it does not pick up spurious additional text. The generic owner pattern does not
+ * end in a specific word so it often includes spurious additional words like #ifdef or #ifndef,
+ * which interfere when comparing the match against known author/owner patterns. This method
+ * strips the most common non-author/owner words from the end of the match.
+ */
+ private static String normalizeOwner(String license) {
+ if (license == null) {
+ return null;
+ }
+ return license
+ .split(
+ "(?i)[ ](?:all rights|(?:the|this) [^ ]+(?: [^ ]+){0,2} (?:is|assumes|may)"
+ + "|permission|copyright|version \\p{N}|for conditions|include |include$"
+ + "|modification|however|open source license|please (?:use|read)|libname"
+ + "|if defined|usage|this is free|added|generic|redistribution|ifdef|ifndef"
+ + "|for (?:more|terms)|copying and|you (?:may|can)|released under|see the"
+ + "|full source|freedom to use|this program and|distributed|https?|unit ?test"
+ + "|import|static|by obtaining|by using|by copying|example|namespace|config\\b"
+ + "|public (?:static|final|class)|package (?:org|com)|[^ ]+ is hereby)")[0];
+ }
+
+ /** Identifies the relevant party as 1p, 3p, forbidden, or unknown. */
+ public enum PartyType {
+ FIRST_PARTY,
+ THIRD_PARTY,
+ FORBIDDEN,
+ UNKNOWN,
+ }
+
+ /** Identifies whether text matched by author/owner pattern or by license pattern. */
+ public enum MatchType {
+ AUTHOR_OWNER,
+ LICENSE,
+ }
+
+ /**
+ * Describes a copyright author/owner or license `text` match found in the input stream.
+ *
+ * <p>Identifies the relevant party as `FIRST_PARTY`, `THIRD_PARTY`, `FORBIDDEN`, or `UNKNOWN`.
+ *
+ * <p>Identifies the match as `AUTHOR_OWNER` or `LICENSE`.
+ *
+ * <p>Includes a normalized version of the matched text including where it was found in the file.
+ */
+ public static class Match {
+ /** Classifies relevant party as 1p, 3p, forbidden, or unknown. */
+ public PartyType partyType;
+ /** Classifies match as author/owner or as license. */
+ public MatchType matchType;
+ /** Matched text with spaces and comment characters replaced by a single space. */
+ public String text;
+ /** The line number in the file where the match starts. */
+ public int startLine;
+ /** The line number in the file where the match ends. */
+ public int endLine;
+ /** The character offset into the file where the match starts. */
+ public int start;
+ /** The character offset into the file where the match ends. */
+ public int end;
+
+ Match(PartyType partyType, String text, int startLine, int endLine, int start, int end) {
+ this(partyType, MatchType.AUTHOR_OWNER, text, startLine, endLine, start, end);
+ }
+
+ Match(
+ PartyType partyType,
+ MatchType matchType,
+ String text,
+ int startLine,
+ int endLine,
+ int start,
+ int end) {
+ this.partyType = partyType;
+ this.matchType = matchType;
+ this.text = text;
+ this.startLine = startLine;
+ this.endLine = endLine;
+ this.start = start;
+ this.end = end;
+ }
+ }
+}
diff --git a/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/IndexedLineReader.java b/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/IndexedLineReader.java
new file mode 100644
index 0000000..015863c
--- /dev/null
+++ b/src/main/java/com/googlesource/gerrit/plugins/copyright/lib/IndexedLineReader.java
@@ -0,0 +1,471 @@
+// Copyright (C) 2019 The Android Open Source Project
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.googlesource.gerrit.plugins.copyright.lib;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.BufferOverflowException;
+import java.nio.ByteBuffer;
+import java.nio.CharBuffer;
+import java.nio.charset.CharacterCodingException;
+import java.nio.charset.CharsetDecoder;
+import java.nio.charset.CoderResult;
+import java.nio.charset.CodingErrorAction;
+import java.nio.charset.MalformedInputException;
+import java.nio.charset.StandardCharsets;
+import java.nio.charset.UnmappableCharacterException;
+import java.util.ArrayList;
+import java.util.Collections;
+
+/**
+ * Class for reading character streams to scan for copyright declarations while indexing the
+ * newlines for quick line number lookups.
+ *
+ * <p>Interprets the bytes of the input source as UTF-8 when it can. Reinterprets some non-UTF-8
+ * bytes that empirically appear in or near copyrights. In many cases, these correspond to the
+ * low-byte of a UTF-16 character stored as-is without requisite escaping for UTF-8. In other cases,
+ * these are just characters from other arbitrary code pages.
+ *
+ * <p>Replaces all other non-UTF-8 (i.e. binary) bytes with '?' because it matches neither name,
+ * whitespace, nor comment charactes and expresses appropriate uncertainty.
+ */
+public class IndexedLineReader implements Readable, Closeable {
+
+ public static final int BUFFER_SIZE = 2048;
+ private static final int INITIAL_LINES_CAPACITY = 1024;
+ private static final int FALLBACK_BUFFER_SIZE = 16;
+
+ private String name; // identifies input source
+ private InputStream source; // raw data (bytes) to read
+ private ByteBuffer bb; // io buffer
+
+ private CharBuffer cb; // Decoded but unread characters.
+
+ private int currChar; // Count of previously read chars.
+ private int currLine; // Count of previously read newlines.
+
+ private ArrayList<Integer> lineIndex; // Count of chars up to end of each line.
+
+ private CharsetDecoder decoder; // Converts UTF-8 bytes to chars.
+ private boolean atEof; // False until entire source is read.
+
+ public int firstBinary;
+ public int numBinary;
+
+ /**
+ * @param name Identifies the input source.
+ * @param size Hints number of bytes in source. Use -1 if unknown.
+ * @param source Input source of bytes (usually UTF-8 encoded) to scan.
+ */
+ public IndexedLineReader(String name, long size, InputStream source) {
+ this.name = name;
+ this.source = source;
+
+ int bufferSize = size < 1 || size > BUFFER_SIZE ? BUFFER_SIZE : (int) size;
+ bb = ByteBuffer.wrap(new byte[bufferSize > 8 ? bufferSize : 8]);
+ bb.flip();
+
+ cb = CharBuffer.allocate(FALLBACK_BUFFER_SIZE);
+ cb.flip();
+
+ currChar = 0;
+
+ int initialLines =
+ size < 30 || size > 30 * INITIAL_LINES_CAPACITY ? INITIAL_LINES_CAPACITY : (int) size / 30;
+ lineIndex = new ArrayList<>(initialLines);
+ lineIndex.add(0);
+
+ firstBinary = -1;
+ numBinary = 0;
+
+ decoder =
+ StandardCharsets.UTF_8
+ .newDecoder()
+ .onMalformedInput(CodingErrorAction.REPORT)
+ .onUnmappableCharacter(CodingErrorAction.REPORT);
+ }
+
+ /**
+ * Attempts to read characters into the specified character buffer. The buffer is used as a
+ * repository of characters as-is: the only changes made are the results of a put operation. No
+ * flipping or rewinding of the buffer is performed.
+ *
+ * @param dest The buffer into which the read characters are put.
+ * @return The number of {@code char} values added to the buffer, or -1 if this source of
+ * characters is at its end.
+ * @throws IOException if an I/O error occurs
+ * @throws NullPointerException if dest is null
+ * @throws java.nio.ReadOnlyBufferException if dest is a read only buffer
+ */
+ @Override
+ @SuppressWarnings("ReferenceEquality")
+ public int read(CharBuffer dest) throws IOException {
+ Preconditions.checkNotNull(dest);
+ Preconditions.checkArgument(dest.remaining() >= 2);
+ try {
+ int nPrev = 0;
+ if (atEof && !this.cb.hasRemaining() && !bb.hasRemaining()) {
+ // At end with nothing left in the buffers -- time to indicate EOF.
+ return -1;
+ }
+ if (!dest.hasRemaining()) {
+ throw new BufferOverflowException();
+ }
+ int nRead = 0;
+ if (this.cb.hasRemaining() && dest != this.cb) {
+ // Copy the previously decoded characters (either all of them or enough to fill dest) into
+ // dest.
+ nPrev = Math.min(dest.remaining(), this.cb.remaining());
+ dest.put(this.cb);
+ }
+ while (dest.hasRemaining()) {
+ int oldCharOffset = dest.position() - nPrev;
+ nPrev = 0;
+ CoderResult cr = decoder.decode(bb, dest, atEof);
+ nRead += dest.position() - oldCharOffset;
+ // Scan decoded characters to index the line endings.
+ for (int i = oldCharOffset; i < dest.position(); i++) {
+ char c = dest.array()[dest.arrayOffset() + i];
+ currChar++;
+ if (c == '\n') {
+ lineIndex.set(currLine, currChar);
+ currLine++;
+ lineIndex.add(currChar);
+ } else if (c == '&') {
+ if (!replaceAt(dest, i, """, '"')) {
+ nRead -= cutAt(dest, i);
+ return nRead;
+ }
+ if (!replaceAt(dest, i, """, '"')) {
+ nRead -= cutAt(dest, i);
+ return nRead;
+ }
+ } else if (c == '<') {
+ if (!replaceAt(dest, i, "<var>", '"')) {
+ nRead -= cutAt(dest, i);
+ return nRead;
+ }
+ if (!replaceAt(dest, i, "</var>", '"')) {
+ nRead -= cutAt(dest, i);
+ return nRead;
+ }
+ }
+ lineIndex.set(currLine, currChar);
+ }
+ if (cr.isUnderflow()) { // all bytes decoded -- read more if possible.
+ if (atEof) {
+ break;
+ }
+ bb.compact();
+ int n =
+ (numBinary > currLine)
+ ? -1
+ : source.read(bb.array(), bb.arrayOffset() + bb.position(), bb.remaining());
+ if (n > 0) {
+ bb.position(bb.position() + n);
+ }
+ bb.flip();
+ if (n < 0) {
+ atEof = true;
+ }
+ decoder.reset();
+ continue;
+ } else if (cr.isOverflow()) {
+ // dest filled or dest has space for 1 character, but next byte sequence to decode is a
+ // surrogate pair requiring 2 characters to represent.
+ if (nRead == 0) {
+ // Presumably a surrogate pair -- need to buffer the un-read 2nd character of the pair.
+ this.cb.clear();
+ int oldPosition = bb.position();
+ decoder.reset();
+ cr = decoder.decode(bb, this.cb, false);
+ int n = bb.position() - oldPosition;
+ this.cb.flip();
+ if (n == 0 || !this.cb.hasRemaining()) {
+ // cr must be an error i.e. next byte not part of valid UTF-8 character.
+ dest.put('?');
+ bb.position(bb.position() + 1);
+ } else {
+ dest.put(this.cb.get());
+ }
+ nRead++;
+ }
+ break;
+ } else if (cr.isError()) {
+ // not valid utf-8 sequence -- binary file or other code page...
+ if (firstBinary < 0) {
+ firstBinary = currChar;
+ }
+ numBinary += cr.length();
+ nRead += cr.length();
+ if (!dest.hasRemaining()) {
+ break;
+ }
+ byte b = bb.array()[bb.arrayOffset() + bb.position()];
+ char c = '?'; // By default, replace binary data with '?'
+
+ // There is no need to try to translate all binary data -- some is just binary.
+ //
+ // Empirically the non-UTF-8 characters below sometimes appear in or near copyrights.
+ // In some cases, the file may be encoded with a different code page, or a UTF
+ // character above 128 may have been stored without proper escaping. Making these
+ // substitutions improves readability of extracted matches and licenses.
+ //
+ // The range U+00c0 to U+00ff are mostly accented characters, which require escaping in
+ // UTF-8. The low-order byte sometimes appears without escaping -- perhaps this
+ // corresponds to a different code page? In any case, just interpreting as chars works in
+ // files that include them in copyrights, and doesn't matter when they appear in other
+ // binary sequences...
+ if (b >= (byte) 0xc0 && b <= (byte) 0xff) {
+ c = (char) ('\u0000' | (b & 0xff));
+ }
+ switch (b) {
+ case (byte) 0: // preserve nul character
+ c = '\000';
+ break;
+ case (byte) 0x87: // sometimes appears where one might expect bullet
+ case (byte) 0xb7: // middle-dot could be bullet -- unescaped U+00b7
+ c = '*'; // treat bullets the same as comment character '*' -- ignored as whitespace
+ break;
+ case (byte) 0x85: // sometimes appears where one might expect (TM)
+ case (byte) 0x99: // sometimes appears where one might expect (TM)
+ c = '™';
+ break;
+ case (byte) 0xa0: // non-breaking space -- unescaped U+00a0
+ case (byte) 0xa7: // section symbol -- unescapd U+00a7
+ case (byte) 0xad: // soft hyphen -- unescaped U+00ad
+ case (byte) 0xb6: // pilcrow or paragraph symbol -- unescaped U+00b6
+ // treat as white space
+ c = ' ';
+ break;
+ case (byte) 0xa9: // copright -- unescaped U+00a9
+ c = '©';
+ break;
+ case (byte) 0xae: // registered -- unescaped U+00ae
+ c = '®';
+ break;
+ case (byte) 0x94: // sometimes appears in place of ö in Björn
+ c = 'ö';
+ break;
+ }
+ dest.put(c);
+ bb.position(bb.position() + 1);
+ decoder.reset();
+ continue;
+ }
+ assert false : "Unexpected CoderResult state: " + cr.toString();
+ }
+ return nRead;
+ } catch (CharacterCodingException e) {
+ throw binaryFile(e);
+ } catch (IOException e) {
+ throw ioException(e);
+ }
+ }
+
+ /**
+ * Reads a string from the file up to the next delimiter `delim` (or until eof if no delimiter)
+ * appending the string to buffer `sb`.
+ *
+ * <p>Resulting string does not include the delimiter.
+ *
+ * @param delim The string delimiter. e.g. '\n' or '\000'
+ * @param sb A string builder into which the string is read without the delimiter.
+ * @return The number of characters read from the stream including the delimiter.
+ */
+ public int readString(char delim, StringBuilder sb) throws IOException {
+ char[] buf = new char[FALLBACK_BUFFER_SIZE];
+ CharBuffer cb = CharBuffer.wrap(buf);
+ if (this.cb.hasRemaining()) {
+ cb.put(this.cb);
+ }
+ cb.flip();
+ int nRead = 0;
+ int tries = 3;
+ while (true) {
+ while (cb.hasRemaining()) {
+ char c = cb.get();
+ nRead++;
+ if (c == delim) {
+ unput(cb);
+ return nRead;
+ }
+ sb.append(c);
+ }
+ cb.clear();
+ int n = read(cb);
+ cb.flip();
+ if (n < 0) {
+ if (nRead == 0) {
+ return -1;
+ }
+ break;
+ } else if (n == 0) {
+ tries--;
+ if (tries < 1) {
+ if (nRead == 0) {
+ return -1;
+ }
+ break;
+ }
+ }
+ }
+ return nRead;
+ }
+
+ @Override
+ public void close() throws IOException {
+ source.close();
+ }
+
+ /** Returns the line number containing the given character position, `charPosn`. */
+ public int getLineNumber(int charPosn) {
+ int index = Collections.binarySearch(lineIndex, charPosn);
+ if (index < 0) { // binarySearch returns inexact matches as negative indexes.
+ index = -index - 1;
+ }
+ return index + 1;
+ }
+
+ /** Wrap a CharacterCodingException with a BinaryFileException describing file, line, etc. */
+ private BinaryFileException binaryFile(CharacterCodingException cause) {
+ int lineNumber = getLineNumber(currChar);
+ int index = lineNumber - 1;
+ int column = (index == 0 ? currChar : currChar - lineIndex.get(index - 1)) + 1;
+ int length = 0;
+ if (cause instanceof MalformedInputException) {
+ MalformedInputException me = (MalformedInputException) cause;
+ length = me.getInputLength();
+ } else if (cause instanceof UnmappableCharacterException) {
+ UnmappableCharacterException ue = (UnmappableCharacterException) cause;
+ length = ue.getInputLength();
+ }
+ StringBuffer sb = new StringBuffer();
+ sb.append(name);
+ for (int i = 0; i < length; i++) {
+ sb.append(String.format(" %02x", bb.array()[bb.arrayOffset() + bb.position() + i]));
+ }
+ return new BinaryFileException(sb.toString(), currChar, lineNumber, column, cause);
+ }
+
+ /** Wrap an IOException with a description of the current file, line number and column number. */
+ private LineReaderIOException ioException(IOException cause) {
+ int lineNumber = getLineNumber(currChar);
+ int index = lineNumber - 1;
+ int column = (index == 0 ? currChar : currChar - lineIndex.get(index)) + 1;
+ return new LineReaderIOException(
+ "IndexedLineReaderIOException " + cause.getMessage() + " " + name,
+ currChar,
+ lineNumber,
+ column,
+ cause);
+ }
+
+ /** Cut the current buffer `cb` at `position` putting the rest in `this.cb`. */
+ @SuppressWarnings("ReferenceEquality")
+ private int cutAt(CharBuffer cb, int position) {
+ if (cb == this.cb) {
+ throw new BufferOverflowException();
+ }
+ int nCut = cb.position() - position;
+ this.cb.clear();
+ this.cb.put(cb.array(), cb.arrayOffset() + position, nCut);
+ cb.position(position);
+ this.cb.flip();
+ return nCut;
+ }
+
+ /** Save the remaining characters from `cb` onto `this.cb` for later. */
+ private void unput(CharBuffer cb) {
+ if (!this.cb.hasRemaining()) {
+ this.cb.clear();
+ this.cb.put(cb);
+ this.cb.flip();
+ return;
+ }
+ // Shift `this.cb` and prepend `cb`
+ int len = cb.remaining();
+ if (this.cb.limit() + len > this.cb.capacity()) {
+ throw new BufferOverflowException();
+ }
+ this.cb.limit(this.cb.limit() + len);
+ for (int i = this.cb.limit() - len - 1; i >= this.cb.position(); i--) {
+ this.cb.put(i + len, this.cb.get(i));
+ }
+ for (int i = 0; i < len; i++) {
+ this.cb.put(this.cb.position() + i, cb.get());
+ }
+ }
+
+ /** Conditionally replaces `prefix` when found at `position` in `cb` with `replacement` char. */
+ private static boolean replaceAt(CharBuffer cb, int position, String prefix, char replacement) {
+ for (int i = 0; i < prefix.length(); i++) {
+ if (position + i >= cb.position()) {
+ return false;
+ }
+ if (cb.get(position + i) != prefix.charAt(i)) {
+ return true;
+ }
+ }
+ cb.put(position, replacement);
+ int dst = position + 1;
+ int src = position + prefix.length();
+ while (src < cb.position()) {
+ cb.put(dst, cb.get(src));
+ src++;
+ dst++;
+ }
+ cb.position(dst);
+ return true;
+ }
+
+ /** Describes an IO error at a specific location in a file. */
+ public static class LineReaderIOException extends IOException {
+ private int charPosn;
+ private int lineNumber;
+ private int column;
+
+ LineReaderIOException(
+ String message, int charPosn, int lineNumber, int column, Throwable cause) {
+ super(message, cause);
+ this.charPosn = charPosn;
+ this.lineNumber = lineNumber;
+ this.column = column;
+ }
+
+ @Override
+ public String getMessage() {
+ StringBuffer m = new StringBuffer();
+ m.append(super.getMessage())
+ .append(" line ")
+ .append(lineNumber)
+ .append(" col ")
+ .append(column)
+ .append(" offset ")
+ .append(charPosn);
+ return m.toString();
+ }
+ }
+
+ /** Thrown when a binary file is detected. */
+ public static class BinaryFileException extends LineReaderIOException {
+ BinaryFileException(
+ String fileName, int charPosn, int lineNumber, int column, Throwable cause) {
+ super("Binary file: " + fileName, charPosn, lineNumber, column, cause);
+ }
+ }
+}
diff --git a/src/main/java/com/googlesource/gerrit/plugins/copyright/tools/AndroidScan.java b/src/main/java/com/googlesource/gerrit/plugins/copyright/tools/AndroidScan.java
new file mode 100644
index 0000000..b5591a4
--- /dev/null
+++ b/src/main/java/com/googlesource/gerrit/plugins/copyright/tools/AndroidScan.java
@@ -0,0 +1,57 @@
+// Copyright (C) 2019 The Android Open Source Project
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.googlesource.gerrit.plugins.copyright.tools;
+
+import com.googlesource.gerrit.plugins.copyright.lib.CopyrightPatterns;
+import java.io.IOException;
+
+/** Runs the scan tool with patterns reflecting Android Open Source Project (AOSP) policies. */
+public class AndroidScan {
+
+ public static void main(String[] args) throws IOException {
+ ScanTool.toolName = "android_scan";
+ ScanTool.rules =
+ CopyrightPatterns.RuleSet.builder()
+ .exclude("EXAMPLES")
+ // 1p
+ .addFirstParty("APACHE2")
+ .addFirstParty("ANDROID")
+ .addFirstParty("GOOGLE")
+ .addFirstParty("EXAMPLES")
+ // 3p
+ .addThirdParty("BSD")
+ .addThirdParty("MIT")
+ .addThirdParty("EPL")
+ .addThirdParty("GPL2")
+ .addThirdParty("GPL3")
+ .addThirdParty("PSFL")
+ // Forbidden
+ .addForbidden("AGPL")
+ .addForbiddenLicense(".*(?:Previously|formerly) licen[cs]ed under.*")
+ .addForbidden("NOT_A_CONTRIBUTION")
+ .addForbidden("WTFPL")
+ .addForbidden("BEER_WARE")
+ .addForbidden("CC_BY_NC")
+ .addForbidden("NON_COMMERCIAL")
+ .addForbidden("COMMONS_CLAUSE")
+ .addForbidden("WATCOM")
+ .addForbidden("CC_BY_C")
+ .addForbidden("LGPL")
+ .addForbidden("GPL")
+ .build();
+
+ ScanTool.main(args);
+ }
+}
diff --git a/src/main/java/com/googlesource/gerrit/plugins/copyright/tools/ScanTool.java b/src/main/java/com/googlesource/gerrit/plugins/copyright/tools/ScanTool.java
new file mode 100644
index 0000000..0a4bb04
--- /dev/null
+++ b/src/main/java/com/googlesource/gerrit/plugins/copyright/tools/ScanTool.java
@@ -0,0 +1,630 @@
+// Copyright (C) 2019 The Android Open Source Project
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.googlesource.gerrit.plugins.copyright.tools;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.base.Strings;
+import com.google.common.collect.ImmutableList;
+import com.googlesource.gerrit.plugins.copyright.lib.Archive;
+import com.googlesource.gerrit.plugins.copyright.lib.CopyrightPatterns;
+import com.googlesource.gerrit.plugins.copyright.lib.CopyrightScanner;
+import com.googlesource.gerrit.plugins.copyright.lib.CopyrightScanner.Match;
+import com.googlesource.gerrit.plugins.copyright.lib.IndexedLineReader;
+import java.io.BufferedInputStream;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.commons.compress.archivers.ArchiveEntry;
+import org.apache.commons.compress.archivers.ArchiveException;
+import org.apache.commons.compress.archivers.ArchiveInputStream;
+import org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream;
+import org.apache.commons.compress.compressors.bzip2.BZip2Utils;
+import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import org.apache.commons.compress.compressors.gzip.GzipUtils;
+import org.apache.commons.compress.compressors.lzma.LZMACompressorInputStream;
+import org.apache.commons.compress.compressors.lzma.LZMAUtils;
+import org.apache.commons.compress.compressors.xz.XZCompressorInputStream;
+import org.apache.commons.compress.compressors.xz.XZUtils;
+
+/** Command-line tool to scan files for copyright or license notices. */
+public class ScanTool {
+
+ public static String toolName = "scan_tool";
+
+ public static CopyrightPatterns.RuleSet rules =
+ CopyrightPatterns.RuleSet.builder().addFirstParty("EXAMPLES").build();
+
+ // Flag -f=<inputFile>
+ private static String inputFile = "";
+
+ // Flag --deep
+ private static boolean deepScan = false;
+
+ // Flag --skip=<pathPattern> (multiple allowed)
+ private static List<String> skipFiles = ImmutableList.of("[/][.]git[/]");
+
+ // Flag -0
+ private static boolean nulDelim = false;
+
+ // Flag -v or --verbose
+ private static boolean verbose = false;
+
+ private static final Pattern flagsPattern =
+ Pattern.compile("^[-][-]*(deep|skip(?:=.*)?|0|v(?:erbose)?|f(?:[-]|=.*))$");
+
+ private static final int CAPACITY = 32768; // how much memory to set aside for pending results
+ private static final int NUM_THREADS = 32; // amount of concurrency
+ private static final int MAX_DEPTH = 10; // how deep to go into archives containing archives etc.
+
+ private static WaitGroup wg;
+ private static long maxLatencyUs;
+ private static String maxLatencyName;
+
+ public static void usage() {
+ System.err.printf(
+ "%s <flags> {file-to-scan...}\n where flags are:\n"
+ + " -f=<filename> file named `filename` contains the list of files to scan\n"
+ + " use - as filename for list of filse from stdin\n"
+ + " --deep scan files contained in archives (.zip, .jar etc.)\n"
+ + " --skip=<pattern> ignore file with names matching `pattern`\n"
+ + " defaults to \"[/][.]git[/]\"\n"
+ + " --skip flag may appear multiple times\n"
+ + " -v (or --verbose) output additional progress and status to err\n"
+ + " -0 with -f to use nul instead of newline to separate files\n",
+ toolName);
+ System.exit(1);
+ }
+
+ public static void main(String[] args) throws IOException {
+ Stopwatch entireSw = Stopwatch.createStarted();
+
+ ArrayList<String> skips = new ArrayList<>();
+ boolean hasSkips = false;
+ ArrayList<String> targets = new ArrayList<>();
+ for (int i = 0; i < args.length; i++) {
+ String arg = args[i];
+ Matcher flagMatcher = flagsPattern.matcher(arg);
+ if (flagMatcher.matches()) {
+ String flag = flagMatcher.group(1);
+ if ("deep".equals(flag)) {
+ if (deepScan) {
+ usage();
+ }
+ deepScan = true;
+ } else if ("0".equals(flag)) {
+ if (nulDelim) {
+ usage();
+ }
+ nulDelim = true;
+ } else if ("v".equals(flag) || "verbose".equals(flag)) {
+ if (verbose) {
+ usage();
+ }
+ verbose = true;
+ } else if ("skip".equals(flag)) {
+ if (++i >= args.length) {
+ usage();
+ }
+ skips.add(args[i]);
+ hasSkips = true;
+ } else if (flag.startsWith("skip=")) {
+ hasSkips = true;
+ flag = flag.substring(5);
+ if (!flag.isEmpty()) {
+ skips.add(flag);
+ }
+ } else if ("f".equals(flag)) {
+ if (++i >= args.length || !inputFile.equals("")) {
+ usage();
+ }
+ inputFile = args[i];
+ } else if ("f-".equals(flag)) {
+ if (!inputFile.equals("")) {
+ usage();
+ }
+ inputFile = "-";
+ } else if (flag.startsWith("f=")) {
+ if (!inputFile.equals("")) {
+ usage();
+ }
+ inputFile = flag.substring(2);
+ } else {
+ usage();
+ }
+ continue;
+ }
+ targets.add(args[i]);
+ }
+ if (hasSkips) {
+ skipFiles = ImmutableList.copyOf(skips);
+ }
+ long numFiles = 0;
+ CopyrightScanner s =
+ new CopyrightScanner(
+ rules.firstPartyLicenses,
+ rules.thirdPartyLicenses,
+ rules.forbiddenLicenses,
+ rules.firstPartyOwners,
+ rules.thirdPartyOwners,
+ rules.forbiddenOwners,
+ rules.excludePatterns);
+ ExecutorService pool = Executors.newFixedThreadPool(NUM_THREADS);
+ wg = new WaitGroup();
+ if (inputFile.isEmpty()) { // each command-line argument is a file to scan
+ if (targets.isEmpty()) {
+ usage();
+ System.exit(1);
+ }
+ for (String target : targets) {
+ wg.startTask(target);
+ pool.execute(new ScanFile(pool, s, target));
+ numFiles++;
+ }
+ } else { // inputFile lists files to scan -- 1 per line. (use stdin if "-")
+ ArrayList<Pattern> skipPatterns = new ArrayList<>(skipFiles.size());
+ for (String pattern : skipFiles) {
+ skipPatterns.add(Pattern.compile(pattern));
+ }
+ if (verbose) {
+ for (Pattern p : skipPatterns) {
+ System.err.printf("Skip=%s\n", p.pattern());
+ }
+ }
+ IndexedLineReader ifr =
+ inputFile.trim().equals("-")
+ ? new IndexedLineReader("-", -1, System.in)
+ : new IndexedLineReader(
+ inputFile.trim(),
+ Paths.get(inputFile.trim()).toFile().length(),
+ new FileInputStream(inputFile.trim()));
+ char delim = nulDelim ? '\000' : '\n';
+ StringBuilder sb = new StringBuilder();
+ while (true) {
+ int nRead = ifr.readString(delim, sb);
+ if (nRead < 0) {
+ break;
+ }
+ String line = sb.toString();
+ sb.setLength(0);
+ boolean skip = false;
+ for (Pattern p : skipPatterns) {
+ if (p.matcher(line).find()) {
+ skip = true;
+ break;
+ }
+ }
+ if (skip) {
+ continue;
+ }
+ wg.startTask(line);
+ pool.execute(new ScanFile(pool, s, line));
+ numFiles++;
+ if ((numFiles & 0xffL) == 0) { // lots of files -- at least 256
+ // Poll and drain any accumulating results.
+ wg.processResultsAndReturnRemaining();
+ }
+ }
+ }
+ // Poll the results until done.
+ while (wg.processResultsAndReturnRemaining() > 0) {
+ try {
+ Thread.sleep(60); // Faster than the blink of an eye -- or a screen refresh.
+ } catch (InterruptedException e) {
+ pool.shutdownNow();
+ Thread.currentThread().interrupt();
+ break;
+ }
+ }
+ entireSw.stop();
+ if (verbose) {
+ long elapsedS = entireSw.elapsed(TimeUnit.SECONDS);
+ System.err.printf(
+ "High water results: %d\nHigh water errors: %d\n", wg.highResults, wg.highErrors);
+ if (elapsedS > 1) {
+ System.err.printf(
+ "%d files in %ds -- %d files per second\n", numFiles, elapsedS, numFiles / elapsedS);
+ } else {
+ System.err.printf(
+ "%d files in %dms -- %d files per s\n",
+ numFiles,
+ entireSw.elapsed(TimeUnit.MILLISECONDS),
+ (numFiles * 1000) / entireSw.elapsed(TimeUnit.MILLISECONDS));
+ }
+ System.err.printf("Max latency: %dus %s\n", maxLatencyUs, maxLatencyName);
+ }
+ System.exit(wg.highErrors == 0 ? 0 : 2);
+ }
+
+ /* Runnable task that scans a file looking for copyright, authorship or license declarations. */
+ private static class ScanFile implements Runnable {
+ ExecutorService pool;
+ CopyrightScanner s;
+ ArrayList<String> fileNames;
+
+ int firstBinary = -1;
+ int numBinary = 0;
+
+ private ScanFile(ExecutorService pool, CopyrightScanner s, String fileName) {
+ this.pool = pool;
+ this.s = s;
+ this.fileNames = deepScan ? new ArrayList<>(MAX_DEPTH) : new ArrayList<>(1);
+ this.fileNames.add(fileName);
+ }
+
+ @Override
+ public void run() {
+ Stopwatch sw = Stopwatch.createStarted();
+ long size = 0;
+ try {
+ Path p = Paths.get(fileNames.get(0));
+ size = p.toFile().length();
+ try (InputStream source = Files.newInputStream(Paths.get(fileNames.get(0)))) {
+ scan(s, fileNames.get(0), size, source);
+ }
+ } catch (Exception e) {
+ wg.addError(new ScanError(fileNames.toArray(new String[0]), e));
+ } finally {
+ wg.finishTask(fileNames.get(0));
+ if (sw.isRunning()) {
+ sw.stop();
+ }
+ if (sw.elapsed(TimeUnit.MICROSECONDS) > maxLatencyUs) {
+ maxLatencyName = fileNames.get(0);
+ maxLatencyUs = sw.elapsed(TimeUnit.MICROSECONDS);
+ }
+ while (fileNames.size() > 1) {
+ popName();
+ }
+ if (verbose) {
+ System.err.printf(
+ "%d %d %d %d %s\n",
+ sw.elapsed(TimeUnit.MICROSECONDS), size, firstBinary, numBinary, formatFn());
+ }
+ }
+ }
+
+ /* Scan a possibly compressed, possibly embedded file. */
+ private void scan(CopyrightScanner s, String fileName, long size, InputStream source)
+ throws IOException, ArchiveException {
+ String rawFileName = fileName;
+ boolean isArchive = deepScan && this.fileNames.size() < MAX_DEPTH;
+ InputStream newSource = null;
+ BufferedInputStream bufferedSource = null;
+ try {
+ try {
+ if (BZip2Utils.isCompressedFilename(fileName)) {
+ newSource = new BZip2CompressorInputStream(source);
+ rawFileName = BZip2Utils.getUncompressedFilename(fileName);
+ } else if (GzipUtils.isCompressedFilename(fileName)) {
+ newSource = new GzipCompressorInputStream(source);
+ rawFileName = GzipUtils.getUncompressedFilename(fileName);
+ } else if (LZMAUtils.isLZMACompressionAvailable()
+ && LZMAUtils.isCompressedFilename(fileName)) {
+ newSource = new LZMACompressorInputStream(source);
+ rawFileName = LZMAUtils.getUncompressedFilename(fileName);
+ } else if (XZUtils.isXZCompressionAvailable() && XZUtils.isCompressedFilename(fileName)) {
+ newSource = new XZCompressorInputStream(source);
+ rawFileName = XZUtils.getUncompressedFilename(fileName);
+ }
+ } catch (Exception ignored) {
+ newSource = null;
+ rawFileName = fileName;
+ }
+ bufferedSource = new BufferedInputStream(newSource == null ? source : newSource);
+ bufferedSource.mark(IndexedLineReader.BUFFER_SIZE);
+ if (!isArchive) {
+ scanText(s, rawFileName, size, bufferedSource);
+ } else {
+ Archive archive = Archive.getArchive(rawFileName);
+ if (archive == null) {
+ isArchive = false;
+ scanText(s, rawFileName, size, bufferedSource);
+ } else {
+ scanArchive(s, archive, rawFileName, bufferedSource);
+ }
+ }
+ } catch (IOException e) {
+ if (isArchive && bufferedSource.markSupported()) {
+ bufferedSource.reset();
+ scanText(s, rawFileName, size, bufferedSource);
+ } else if (newSource != null) {
+ scanText(s, fileName, size, source);
+ } else {
+ throw e;
+ }
+ } catch (Exception e) {
+ if (isArchive && bufferedSource.markSupported()) {
+ bufferedSource.reset();
+ scanText(s, rawFileName, size, bufferedSource);
+ } else if (newSource != null) {
+ scanText(s, fileName, size, source);
+ } else {
+ throw e;
+ }
+ }
+ }
+
+ /* Scan a file as a regular, non-archive file. */
+ private void scanText(CopyrightScanner s, String fileName, long size, InputStream source)
+ throws IOException {
+ Stopwatch sw = Stopwatch.createStarted();
+ IndexedLineReader lr = new IndexedLineReader(fileName, size, source);
+ try {
+ ImmutableList<Match> matches = s.findMatches(fileName, size, lr);
+ sw.stop();
+ if (!matches.isEmpty()) {
+ wg.addResult(
+ new Result(
+ this.fileNames.toArray(new String[0]),
+ size,
+ matches,
+ sw.elapsed(TimeUnit.MICROSECONDS)));
+ }
+ } finally {
+ if (sw.isRunning()) {
+ sw.stop();
+ }
+ if (lr.firstBinary >= 0 && (firstBinary < 0 || lr.firstBinary < firstBinary)) {
+ firstBinary = lr.firstBinary;
+ }
+ numBinary += lr.numBinary;
+ }
+ }
+
+ /* Scan the files contained in an archive file. e.g. .zip, .tar, .jar etc. */
+ private void scanArchive(
+ CopyrightScanner s, Archive archive, String fileName, BufferedInputStream source)
+ throws IOException, ArchiveException {
+ assert deepScan : "Must be deep scan to look inside archive file " + fileName;
+ int originalDepth = this.fileNames.size();
+ try {
+ ArchiveInputStream af = archive.newStream(source);
+ ArchiveEntry entry = archive.getNext(af);
+ int numTries = 3;
+ while (entry != null) {
+ if (archive.isRegularFile(entry)) {
+ String name = cleanName(entry.getName());
+ pushName(name);
+ scan(s, name, entry.getSize(), (InputStream) af);
+ popName();
+ }
+ // After at least 1 entry scans without error, ignore a limited number of bad entries.
+ try {
+ entry = archive.getNext(af);
+ } catch (IOException e) {
+ numTries--;
+ if (numTries < 1) {
+ throw e;
+ }
+ }
+ }
+ } finally {
+ while (this.fileNames.size() > originalDepth) {
+ this.fileNames.remove(this.fileNames.size() - 1);
+ }
+ }
+ }
+
+ /** Add another embedded filename to the stack. */
+ private void pushName(String name) {
+ fileNames.add(name);
+ }
+
+ /** Remove the deepest nested filename from the stack. */
+ private void popName() {
+ fileNames.remove(fileNames.size() - 1);
+ }
+
+ /** Remove unexpected characters from embedded filenames. */
+ private String cleanName(String name) {
+ return Pattern.compile(
+ "[^\\p{L}\\p{N}\\p{P}\\p{S}\\s].*[^\\p{L}\\p{N}\\p{P}\\p{S}\\s]",
+ Pattern.MULTILINE | Pattern.UNICODE_CASE | Pattern.DOTALL)
+ .matcher(name.replaceAll("^[^\\p{L}\\p{N}\\p{P}\\p{S}\\s]+", "_BINARY_"))
+ .replaceAll("_BINARY_");
+ }
+
+ /** Format filenames urlencoding whitespace and appending containing file in <> */
+ private String formatFn() {
+ StringBuffer sb = new StringBuffer();
+ for (int i = fileNames.size() - 1; i > 0; i--) {
+ sb.append(fileNames.get(i)).append('<');
+ }
+ sb.append(fileNames.get(0));
+ if (fileNames.size() > 1) {
+ sb.append(Strings.repeat(">", fileNames.size() - 1));
+ }
+ return sb.toString()
+ .replaceAll("[%]", "%37")
+ .replaceAll("[ ]", "%20")
+ .replaceAll("[\\r]", "%0D")
+ .replaceAll("[\\n]", "%0A")
+ .replaceAll("[\\t]", "%09");
+ }
+ }
+
+ /** Format filenames urlencoding whitespace and appending containing file in <> */
+ private static String formatFilenames(String[] fileNames) {
+ assert fileNames.length > 0 : "Root file required.";
+ StringBuffer sb = new StringBuffer();
+ for (int i = fileNames.length - 1; i > 0; i--) {
+ sb.append(fileNames[i]).append('<');
+ }
+ sb.append(fileNames[0]);
+ if (fileNames.length > 1) {
+ sb.append(Strings.repeat(">", fileNames.length - 1));
+ }
+ return sb.toString()
+ .replaceAll("[%]", "%37")
+ .replaceAll("[ ]", "%20")
+ .replaceAll("[\\r]", "%0D")
+ .replaceAll("[\\n]", "%0A")
+ .replaceAll("[\\t]", "%09");
+ }
+
+ private static class Result {
+ String[] fileName;
+ long size;
+ ImmutableList<Match> matches;
+ long elapsedUs;
+
+ private Result(String[] fileName, long size, ImmutableList<Match> matches, long elapsedUs) {
+ this.fileName = fileName;
+ this.size = size;
+ this.matches = matches;
+ this.elapsedUs = elapsedUs;
+ }
+ }
+
+ private static class ScanError {
+ String[] fileName;
+ Throwable e;
+
+ private ScanError(String[] fileName, Throwable e) {
+ this.fileName = fileName;
+ this.e = e;
+ }
+ }
+
+ /** Synchronizes scanning (i.e. child) and reading (i.e. main) tasks. */
+ private static class WaitGroup {
+ public HashMap<String, Integer> tasks; // finished when becomes empty again.
+ public ArrayList<Result> results; // accumulates results to be read; guarded by this
+ public ArrayList<ScanError> errors; // accumulates errors to be read; guarded by this
+
+ // To keep the critical section short, the main reader thread keeps 2 copies of results and
+ // errors. In the critical section, it swaps the output references with `next` references while
+ // it drains the prior output outside of the critical section. The members below are manipulated
+ // by a single thread (the main reader thread) and do not require inter-thread synchronization.
+ private ArrayList<Result> nextResults; // referenced only by main thread -- swaps with results
+ private ArrayList<ScanError> nextErrors; // referenced only by main thread -- swaps with errors
+ private int highResults; // Maximum observed size of the results list.
+ private int highErrors; // Maximum observed size of the errors list.
+
+ private WaitGroup() {
+ tasks = new HashMap<>();
+ results = new ArrayList<>(CAPACITY);
+ nextResults = new ArrayList<>(CAPACITY);
+ errors = new ArrayList<>(16);
+ nextErrors = new ArrayList<>(16);
+ highResults = 0;
+ highErrors = 0;
+ }
+
+ // Call once for every new scan task created. */
+ private synchronized void startTask(String name) {
+ if (tasks.containsKey(name)) {
+ tasks.put(name, tasks.get(name) + 1);
+ } else {
+ tasks.put(name, 1);
+ }
+ }
+
+ // Call once per scan task after task completed. */
+ private synchronized void finishTask(String name) {
+ if (tasks.get(name) == 1) {
+ tasks.remove(name);
+ } else {
+ tasks.put(name, tasks.get(name) - 1);
+ }
+ }
+
+ // Append a Result to `results`.
+ private synchronized void addResult(Result result) {
+ results.add(result);
+ }
+
+ // Append a ScanError to `errors`.
+ private synchronized void addError(ScanError e) {
+ errors.add(e);
+ }
+
+ /* Process all available results and return the count of unfinished tasks. */
+ private int processResultsAndReturnRemaining() {
+ assert nextResults != null && nextResults.isEmpty();
+ assert nextErrors != null && nextErrors.isEmpty();
+ ArrayList<Result> currentResults = null;
+ ArrayList<ScanError> currentErrors = null;
+ int numRunning = 0;
+ synchronized (this) {
+ if (!this.results.isEmpty()) {
+ currentResults = results;
+ results = nextResults;
+ nextResults = null;
+ }
+ if (!this.errors.isEmpty()) {
+ currentErrors = errors;
+ errors = nextErrors;
+ nextErrors = null;
+ }
+ numRunning = tasks.size();
+ }
+ if (currentResults != null) {
+ assert nextResults == null;
+ if (currentResults.size() > highResults) {
+ highResults = currentResults.size();
+ }
+ for (Result result : currentResults) {
+ for (Match match : result.matches) {
+ System.out.printf(
+ "%s %s [%d,%d) [%d,%d) %d %dus %d %s %s\n",
+ match.partyType.name(),
+ match.matchType.name(),
+ match.startLine,
+ match.endLine,
+ match.start,
+ match.end,
+ match.end - match.start,
+ result.elapsedUs,
+ result.size,
+ formatFilenames(result.fileName),
+ match.text);
+ }
+ }
+ currentResults.clear();
+ nextResults = currentResults;
+ currentResults = null;
+ }
+ if (currentErrors != null) {
+ assert nextErrors == null;
+ if (currentErrors.size() > highErrors) {
+ highErrors = currentErrors.size();
+ }
+ for (ScanError error : currentErrors) {
+ System.err.printf(
+ "Error scanning %s: %s\n", formatFilenames(error.fileName), error.e.getMessage());
+ error.e.printStackTrace(System.err);
+ }
+ currentErrors.clear();
+ nextErrors = currentErrors;
+ currentErrors = null;
+ }
+ assert nextResults != null && nextResults.isEmpty();
+ assert nextErrors != null && nextErrors.isEmpty();
+ return numRunning;
+ }
+ }
+}
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/ArchiveTest.java b/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/ArchiveTest.java
new file mode 100644
index 0000000..c5aeb8d
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/ArchiveTest.java
@@ -0,0 +1,56 @@
+// Copyright (C) 2019 The Android Open Source Project
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.googlesource.gerrit.plugins.copyright.lib;
+
+import static com.google.common.truth.Truth.assertThat;
+
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+@RunWith(JUnit4.class)
+public class ArchiveTest {
+ @Test
+ public void testGetArchive_NotAnArchive() {
+ String[] filenames = {"output.txt", "picture.jpg", "source.c", "header.h", "Class.java"};
+
+ for (String filename : filenames) {
+ assertThat(Archive.getArchive(filename)).isNull();
+ }
+ }
+
+ @Test
+ public void testGetArchive_Archive() {
+ String[] filenames = {
+ "compressed.zip",
+ "ball.tar",
+ "release.apk",
+ "library.jar",
+ "archive.ar",
+ "archive.arj",
+ "files.dump",
+ "r2d2.cpio",
+ "ms.docx",
+ "open.odt"
+ };
+
+ for (String filename : filenames) {
+ Archive archive = Archive.getArchive(filename);
+ assertThat(archive).isNotNull();
+ assertThat(archive.isArchive(filename)).isTrue();
+ assertThat(archive.isArchive("output.txt")).isFalse();
+ }
+ }
+}
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightPatternsTest.java b/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightPatternsTest.java
new file mode 100644
index 0000000..a7bd486
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightPatternsTest.java
@@ -0,0 +1,240 @@
+// Copyright (C) 2019 The Android Open Source Project
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.googlesource.gerrit.plugins.copyright.lib;
+
+import static com.google.common.truth.Truth.assertThat;
+
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+@RunWith(JUnit4.class)
+public class CopyrightPatternsTest {
+
+ @Test
+ public void testBuildEmpty() {
+ CopyrightPatterns.RuleSet.Builder builder = CopyrightPatterns.RuleSet.builder();
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ assertThat(rules.excludePatterns).isEmpty();
+ assertThat(rules.firstPartyLicenses).isEmpty();
+ assertThat(rules.firstPartyOwners).isEmpty();
+ assertThat(rules.thirdPartyLicenses).isEmpty();
+ assertThat(rules.thirdPartyOwners).isEmpty();
+ assertThat(rules.forbiddenLicenses).isEmpty();
+ assertThat(rules.forbiddenOwners).isEmpty();
+ }
+
+ @Test
+ public void testBuildExcludePattern() {
+ CopyrightPatterns.RuleSet.Builder builder =
+ CopyrightPatterns.RuleSet.builder().excludePattern("pattern");
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ assertThat(rules.excludePatterns).containsExactly("pattern");
+ assertThat(rules.firstPartyLicenses).isEmpty();
+ assertThat(rules.firstPartyOwners).isEmpty();
+ assertThat(rules.thirdPartyLicenses).isEmpty();
+ assertThat(rules.thirdPartyOwners).isEmpty();
+ assertThat(rules.forbiddenLicenses).isEmpty();
+ assertThat(rules.forbiddenOwners).isEmpty();
+ }
+
+ @Test
+ public void testBuildFirstPartyLicense() {
+ CopyrightPatterns.RuleSet.Builder builder =
+ CopyrightPatterns.RuleSet.builder().addFirstPartyLicense("pattern");
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ assertThat(rules.excludePatterns).isEmpty();
+ assertThat(rules.firstPartyLicenses).containsExactly("pattern");
+ assertThat(rules.firstPartyOwners).isEmpty();
+ assertThat(rules.thirdPartyLicenses).isEmpty();
+ assertThat(rules.thirdPartyOwners).isEmpty();
+ assertThat(rules.forbiddenLicenses).isEmpty();
+ assertThat(rules.forbiddenOwners).isEmpty();
+ }
+
+ @Test
+ public void testBuildFirstPartyOwner() {
+ CopyrightPatterns.RuleSet.Builder builder =
+ CopyrightPatterns.RuleSet.builder().addFirstPartyOwner("pattern");
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ assertThat(rules.excludePatterns).isEmpty();
+ assertThat(rules.firstPartyLicenses).isEmpty();
+ assertThat(rules.firstPartyOwners).containsExactly("pattern");
+ assertThat(rules.thirdPartyLicenses).isEmpty();
+ assertThat(rules.thirdPartyOwners).isEmpty();
+ assertThat(rules.forbiddenLicenses).isEmpty();
+ assertThat(rules.forbiddenOwners).isEmpty();
+ }
+
+ @Test
+ public void testBuildThirdPartyLicense() {
+ CopyrightPatterns.RuleSet.Builder builder =
+ CopyrightPatterns.RuleSet.builder().addThirdPartyLicense("pattern");
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ assertThat(rules.excludePatterns).isEmpty();
+ assertThat(rules.firstPartyLicenses).isEmpty();
+ assertThat(rules.firstPartyOwners).isEmpty();
+ assertThat(rules.thirdPartyLicenses).containsExactly("pattern");
+ assertThat(rules.thirdPartyOwners).isEmpty();
+ assertThat(rules.forbiddenLicenses).isEmpty();
+ assertThat(rules.forbiddenOwners).isEmpty();
+ }
+
+ @Test
+ public void testBuildThirdPartyOwner() {
+ CopyrightPatterns.RuleSet.Builder builder =
+ CopyrightPatterns.RuleSet.builder().addThirdPartyOwner("pattern");
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ assertThat(rules.excludePatterns).isEmpty();
+ assertThat(rules.firstPartyLicenses).isEmpty();
+ assertThat(rules.firstPartyOwners).isEmpty();
+ assertThat(rules.thirdPartyLicenses).isEmpty();
+ assertThat(rules.thirdPartyOwners).containsExactly("pattern");
+ assertThat(rules.forbiddenLicenses).isEmpty();
+ assertThat(rules.forbiddenOwners).isEmpty();
+ }
+
+ @Test
+ public void testBuildForbiddenLicense() {
+ CopyrightPatterns.RuleSet.Builder builder =
+ CopyrightPatterns.RuleSet.builder().addForbiddenLicense("pattern");
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ assertThat(rules.excludePatterns).isEmpty();
+ assertThat(rules.firstPartyLicenses).isEmpty();
+ assertThat(rules.firstPartyOwners).isEmpty();
+ assertThat(rules.thirdPartyLicenses).isEmpty();
+ assertThat(rules.thirdPartyOwners).isEmpty();
+ assertThat(rules.forbiddenLicenses).containsExactly("pattern");
+ assertThat(rules.forbiddenOwners).isEmpty();
+ }
+
+ @Test
+ public void testBuildForbiddenOwner() {
+ CopyrightPatterns.RuleSet.Builder builder =
+ CopyrightPatterns.RuleSet.builder().addForbiddenOwner("pattern");
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ assertThat(rules.excludePatterns).isEmpty();
+ assertThat(rules.firstPartyLicenses).isEmpty();
+ assertThat(rules.firstPartyOwners).isEmpty();
+ assertThat(rules.thirdPartyLicenses).isEmpty();
+ assertThat(rules.thirdPartyOwners).isEmpty();
+ assertThat(rules.forbiddenLicenses).isEmpty();
+ assertThat(rules.forbiddenOwners).containsExactly("pattern");
+ }
+
+ @Test
+ public void testBuildEachNamedRuleExclusion() {
+ for (String ruleName : CopyrightPatterns.lookup.keySet()) {
+ CopyrightPatterns.RuleSet.Builder builder =
+ CopyrightPatterns.RuleSet.builder().exclude(ruleName);
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ CopyrightPatterns.Rule rule = CopyrightPatterns.lookup.get(ruleName);
+
+ if (rule.exclusions != null) {
+ assertThat(rules.excludePatterns).containsAllIn(rule.exclusions);
+ }
+ if (rule.licenses != null) {
+ assertThat(rules.excludePatterns).containsAllIn(rule.licenses);
+ }
+ if (rule.owners != null) {
+ assertThat(rules.excludePatterns).containsAllIn(rule.owners);
+ }
+ }
+ }
+
+ @Test
+ public void testBuildEachNamedRuleFirstParty() {
+ for (String ruleName : CopyrightPatterns.lookup.keySet()) {
+ CopyrightPatterns.RuleSet.Builder builder =
+ CopyrightPatterns.RuleSet.builder().addFirstParty(ruleName);
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ CopyrightPatterns.Rule rule = CopyrightPatterns.lookup.get(ruleName);
+
+ if (rule.exclusions != null) {
+ assertThat(rules.excludePatterns).containsExactlyElementsIn(rule.exclusions);
+ }
+ if (rule.licenses != null) {
+ assertThat(rules.firstPartyLicenses).containsExactlyElementsIn(rule.licenses);
+ }
+ if (rule.owners != null) {
+ assertThat(rules.firstPartyOwners).containsExactlyElementsIn(rule.owners);
+ }
+ }
+ }
+
+ @Test
+ public void testBuildEachNamedRuleThirdParty() {
+ for (String ruleName : CopyrightPatterns.lookup.keySet()) {
+ CopyrightPatterns.RuleSet.Builder builder =
+ CopyrightPatterns.RuleSet.builder().addThirdParty(ruleName);
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ CopyrightPatterns.Rule rule = CopyrightPatterns.lookup.get(ruleName);
+
+ if (rule.exclusions != null) {
+ assertThat(rules.excludePatterns).containsExactlyElementsIn(rule.exclusions);
+ }
+ if (rule.licenses != null) {
+ assertThat(rules.thirdPartyLicenses).containsExactlyElementsIn(rule.licenses);
+ }
+ if (rule.owners != null) {
+ assertThat(rules.thirdPartyOwners).containsExactlyElementsIn(rule.owners);
+ }
+ }
+ }
+
+ @Test
+ public void testBuildEachNamedRuleForbidden() {
+ for (String ruleName : CopyrightPatterns.lookup.keySet()) {
+ CopyrightPatterns.RuleSet.Builder builder =
+ CopyrightPatterns.RuleSet.builder().addForbidden(ruleName);
+
+ CopyrightPatterns.RuleSet rules = builder.build();
+
+ CopyrightPatterns.Rule rule = CopyrightPatterns.lookup.get(ruleName);
+
+ if (rule.exclusions != null) {
+ assertThat(rules.excludePatterns).containsExactlyElementsIn(rule.exclusions);
+ }
+ if (rule.licenses != null) {
+ assertThat(rules.forbiddenLicenses).containsExactlyElementsIn(rule.licenses);
+ }
+ if (rule.owners != null) {
+ assertThat(rules.forbiddenOwners).containsExactlyElementsIn(rule.owners);
+ }
+ }
+ }
+}
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightScannerTest.java b/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightScannerTest.java
new file mode 100644
index 0000000..393b3b1
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/CopyrightScannerTest.java
@@ -0,0 +1,495 @@
+// Copyright (C) 2019 The Android Open Source Project
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.googlesource.gerrit.plugins.copyright.lib;
+
+import static com.google.common.truth.Truth.assertThat;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.ImmutableList;
+import com.google.common.truth.Correspondence;
+import com.googlesource.gerrit.plugins.copyright.lib.CopyrightPatterns.RuleSet;
+import com.googlesource.gerrit.plugins.copyright.lib.CopyrightScanner.Match;
+import com.googlesource.gerrit.plugins.copyright.lib.CopyrightScanner.MatchType;
+import com.googlesource.gerrit.plugins.copyright.lib.CopyrightScanner.PartyType;
+import com.sun.management.HotSpotDiagnosticMXBean;
+import java.io.ByteArrayInputStream;
+import java.io.InputStream;
+import java.lang.management.ManagementFactory;
+import java.nio.charset.StandardCharsets;
+import java.util.stream.Collectors;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+@RunWith(JUnit4.class)
+public class CopyrightScannerTest {
+
+ private static final Correspondence<String, String> CONTAINS_STRING =
+ new Correspondence<String, String>() {
+ @Override
+ public boolean compare(String actual, String expected) {
+ return actual.contains(expected);
+ }
+
+ @Override
+ public String toString() {
+ return "contains";
+ }
+ };
+
+ private CopyrightPatterns.RuleSet.Builder builder;
+
+ @Before
+ public void setUp() {
+ builder =
+ CopyrightPatterns.RuleSet.builder()
+ .exclude("EXAMPLES")
+ .addFirstParty("APACHE2")
+ .addFirstParty("ANDROID")
+ .addForbidden("NOT_A_CONTRIBUTION");
+ }
+
+ @Test
+ public void tesFindMatch_firstPartyLicense() throws Exception {
+ CopyrightScanner scanner = newScanner();
+ ImmutableList<Match> actual =
+ scanner.findMatches(
+ "header.h",
+ -1,
+ readerFromString(
+ "/*\n * License: apache2\n */\n#ifndef HEADER_H\n#define HEADER_H\n#end\n"));
+ assertThat(actual.stream().map(m -> m.matchType).collect(Collectors.toList()))
+ .containsExactly(MatchType.LICENSE);
+ assertThat(actual.stream().map(m -> m.partyType).collect(Collectors.toList()))
+ .containsExactly(PartyType.FIRST_PARTY);
+ assertThat(actual.stream().map(m -> m.text).collect(Collectors.toList()))
+ .comparingElementsUsing(CONTAINS_STRING)
+ .containsExactly("apache2");
+ }
+
+ @Test
+ public void tesFindMatch_firstPartyOwner() throws Exception {
+ CopyrightScanner scanner = newScanner();
+ ImmutableList<Match> actual =
+ scanner.findMatches(
+ "Class.java",
+ -1,
+ readerFromString(
+ "/*\n * Copyright (C) 2019 Android Open Source Project\n"
+ + " */\npublic class Class {}\n"));
+ assertThat(actual.stream().map(m -> m.matchType).collect(Collectors.toList()))
+ .containsExactly(MatchType.AUTHOR_OWNER);
+ assertThat(actual.stream().map(m -> m.partyType).collect(Collectors.toList()))
+ .containsExactly(PartyType.FIRST_PARTY);
+ assertThat(actual.stream().map(m -> m.text).collect(Collectors.toList()))
+ .comparingElementsUsing(CONTAINS_STRING)
+ .containsExactly("Android Open Source Project");
+ }
+
+ @Test
+ public void tesFindMatch_thirdPartyLicense() throws Exception {
+ builder.addThirdParty("BSD");
+ CopyrightScanner scanner = newScanner();
+ ImmutableList<Match> actual =
+ scanner.findMatches(
+ "script.sh",
+ -1,
+ readerFromString("#!/bin/sh\n# SPDX-License-Identifier: BSD-2-Clause\nexit 0\n"));
+ assertThat(actual.stream().map(m -> m.matchType).collect(Collectors.toList()))
+ .containsExactly(MatchType.LICENSE);
+ assertThat(actual.stream().map(m -> m.partyType).collect(Collectors.toList()))
+ .containsExactly(PartyType.THIRD_PARTY);
+ }
+
+ @Test
+ public void tesFindMatch_thirdPartyOwner() throws Exception {
+ CopyrightScanner scanner = newScanner();
+ ImmutableList<Match> actual =
+ scanner.findMatches(
+ "Class.java",
+ -1,
+ readerFromString(
+ "/*\n * Copyright (C) 2019 Sarah W. Eng <swe@example.com>\n"
+ + " */\npublic class Class {}\n"));
+ assertThat(actual.stream().map(m -> m.matchType).collect(Collectors.toList()))
+ .containsExactly(MatchType.AUTHOR_OWNER);
+ assertThat(actual.stream().map(m -> m.partyType).collect(Collectors.toList()))
+ .containsExactly(PartyType.THIRD_PARTY);
+ }
+
+ @Test
+ public void tesFindMatch_forbiddenLicense() throws Exception {
+ CopyrightScanner scanner = newScanner();
+ ImmutableList<Match> actual =
+ scanner.findMatches(
+ "script.sh", -1, readerFromString("#!/bin/sh\n# Not a contribution.\nexit 0\n"));
+ assertThat(actual.stream().map(m -> m.matchType).collect(Collectors.toList()))
+ .containsExactly(MatchType.LICENSE);
+ assertThat(actual.stream().map(m -> m.partyType).collect(Collectors.toList()))
+ .containsExactly(PartyType.FORBIDDEN);
+ }
+
+ @Test
+ public void tesFindMatch_forbiddenOwner() throws Exception {
+ builder.addForbidden("GOOGLE");
+ CopyrightScanner scanner = newScanner();
+ ImmutableList<Match> actual =
+ scanner.findMatches(
+ "Class.java",
+ -1,
+ readerFromString(
+ "/*\n * Copyright (C) 2019 Google Inc.\n" + " */\npublic class Class {}\n"));
+ assertThat(actual.stream().map(m -> m.matchType).collect(Collectors.toList()))
+ .containsExactly(MatchType.AUTHOR_OWNER);
+ assertThat(actual.stream().map(m -> m.partyType).collect(Collectors.toList()))
+ .containsExactly(PartyType.FORBIDDEN);
+ }
+
+ @Test
+ public void testFindMatch_unknownLicense() throws Exception {
+ CopyrightScanner scanner = newScanner();
+ ImmutableList<Match> actual =
+ scanner.findMatches(
+ "Class.java",
+ -1,
+ readerFromString(
+ "/*\n * Licensed in the jurisdiction of New York.\n"
+ + " */\npublic class Class {}\n"));
+ assertThat(actual.stream().map(m -> m.matchType).collect(Collectors.toList()))
+ .containsExactly(MatchType.LICENSE);
+ assertThat(actual.stream().map(m -> m.partyType).collect(Collectors.toList()))
+ .containsExactly(PartyType.UNKNOWN);
+ assertThat(actual.stream().map(m -> m.text).collect(Collectors.toList()))
+ .containsExactly("Licensed...jurisdiction");
+ }
+
+ @Test
+ public void tesFindMatch_exclusion() throws Exception {
+ builder.excludePattern("apache2");
+ CopyrightScanner scanner = newScanner();
+ ImmutableList<Match> foundMatches =
+ scanner.findMatches(
+ "header.h",
+ -1,
+ readerFromString(
+ "/*\n * License: apache2\n */\n#ifndef HEADER_H\n#define HEADER_H\n#end\n"));
+ assertThat(foundMatches).isEmpty();
+ }
+
+ @Test
+ public void tesFindMatch_largeFileNoMatches() throws Exception {
+ Stopwatch sw = Stopwatch.createUnstarted();
+ CopyrightScanner scanner = newScanner();
+ IndexedLineReader file = largeFile("");
+ sw.start();
+ ImmutableList<Match> foundMatches = scanner.findMatches("header.h", -1, file);
+ sw.stop();
+ assertThat(foundMatches).isEmpty();
+ // Fail if protections against excessive backtracking seem to fail.
+ assertThat(sw.elapsed().getSeconds()).isLessThan(8L); // normally 1s
+ }
+
+ @Test
+ public void tesFindMatch_largeFileWithMatch() throws Exception {
+ Stopwatch sw = Stopwatch.createUnstarted();
+ builder.addThirdParty("BSD");
+ CopyrightScanner scanner = newScanner();
+ IndexedLineReader file = largeFile("# SPDX-License-Identifier: BSD-3-Clause\n");
+ sw.start();
+ ImmutableList<Match> actual = scanner.findMatches("script.sh", -1, file);
+ sw.stop();
+ assertThat(actual.stream().map(m -> m.matchType).collect(Collectors.toList()))
+ .containsExactly(MatchType.LICENSE);
+ assertThat(actual.stream().map(m -> m.partyType).collect(Collectors.toList()))
+ .containsExactly(PartyType.THIRD_PARTY);
+ // Fail if protections against excessive backtracking seem to fail.
+ assertThat(sw.elapsed().getSeconds()).isLessThan(8L); // normally 1s
+ }
+
+ @Test
+ public void testFindMatch_fullApache2License() throws Exception {
+ MatchType[] otherThanLicense = {MatchType.AUTHOR_OWNER};
+ PartyType[] otherThan1p = {PartyType.THIRD_PARTY, PartyType.FORBIDDEN, PartyType.UNKNOWN};
+ CopyrightScanner scanner = newScanner();
+ ImmutableList<Match> actual = scanner.findMatches("LICENSE", -1, fullApache2Text());
+ assertThat(actual.stream().map(m -> m.matchType).collect(Collectors.toList()))
+ .containsNoneIn(otherThanLicense);
+ assertThat(actual.stream().map(m -> m.partyType).collect(Collectors.toList()))
+ .containsNoneIn(otherThan1p);
+ }
+
+ private CopyrightScanner newScanner() {
+ RuleSet rules = builder.build();
+ return new CopyrightScanner(
+ rules.firstPartyLicenses,
+ rules.thirdPartyLicenses,
+ rules.forbiddenLicenses,
+ rules.firstPartyOwners,
+ rules.thirdPartyOwners,
+ rules.forbiddenOwners,
+ rules.excludePatterns);
+ }
+
+ private IndexedLineReader fullApache2Text() {
+ return new IndexedLineReader(
+ "LICENSE",
+ -1,
+ newInputStream(
+ "\n"
+ + " Apache License\n"
+ + " Version 2.0, January 2004\n"
+ + " http://www.apache.org/licenses/\n"
+ + "\n"
+ + " TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n"
+ + "\n"
+ + " 1. Definitions.\n"
+ + "\n"
+ + " \"License\" shall mean the terms and conditions for use, reproduction,\n"
+ + " and distribution as defined by Sections 1 through 9 of this document.\n"
+ + "\n"
+ + " \"Licensor\" shall mean the copyright owner or entity authorized by\n"
+ + " the copyright owner that is granting the License.\n"
+ + "\n"
+ + " \"Legal Entity\" shall mean the union of the acting entity and all\n"
+ + " other entities that control, are controlled by, or are under common\n"
+ + " control with that entity. For the purposes of this definition,\n"
+ + " \"control\" means (i) the power, direct or indirect, to cause the\n"
+ + " direction or management of such entity, whether by contract or\n"
+ + " otherwise, or (ii) ownership of fifty percent (50%) or more of the\n"
+ + " outstanding shares, or (iii) beneficial ownership of such entity.\n"
+ + "\n"
+ + " \"You\" (or \"Your\") shall mean an individual or Legal Entity\n"
+ + " exercising permissions granted by this License.\n"
+ + "\n"
+ + " \"Source\" form shall mean the preferred form for making modifications,\n"
+ + " including but not limited to software source code, documentation\n"
+ + " source, and configuration files.\n"
+ + "\n"
+ + " \"Object\" form shall mean any form resulting from mechanical\n"
+ + " transformation or translation of a Source form, including but\n"
+ + " not limited to compiled object code, generated documentation,\n"
+ + " and conversions to other media types.\n"
+ + "\n"
+ + " \"Work\" shall mean the work of authorship, whether in Source or\n"
+ + " Object form, made available under the License, as indicated by a\n"
+ + " copyright notice that is included in or attached to the work\n"
+ + " (an example is provided in the Appendix below).\n"
+ + "\n"
+ + " \"Derivative Works\" shall mean any work, whether in Source or Object\n"
+ + " form, that is based on (or derived from) the Work and for which the\n"
+ + " editorial revisions, annotations, elaborations, or other modifications\n"
+ + " represent, as a whole, an original work of authorship. For the purposes\n"
+ + " of this License, Derivative Works shall not include works that remain\n"
+ + " separable from, or merely link (or bind by name) to the interfaces of,\n"
+ + " the Work and Derivative Works thereof.\n"
+ + "\n"
+ + " \"Contribution\" shall mean any work of authorship, including\n"
+ + " the original version of the Work and any modifications or additions\n"
+ + " to that Work or Derivative Works thereof, that is intentionally\n"
+ + " submitted to Licensor for inclusion in the Work by the copyright owner\n"
+ + " or by an individual or Legal Entity authorized to submit on behalf of\n"
+ + " the copyright owner. For the purposes of this definition, \"submitted\"\n"
+ + " means any form of electronic, verbal, or written communication sent\n"
+ + " to the Licensor or its representatives, including but not limited to\n"
+ + " communication on electronic mailing lists, source code control systems,\n"
+ + " and issue tracking systems that are managed by, or on behalf of, the\n"
+ + " Licensor for the purpose of discussing and improving the Work, but\n"
+ + " excluding communication that is conspicuously marked or otherwise\n"
+ + " designated in writing by the copyright owner as \"Not a Contribution.\"\n"
+ + "\n"
+ + " \"Contributor\" shall mean Licensor and any individual or Legal Entity\n"
+ + " on behalf of whom a Contribution has been received by Licensor and\n"
+ + " subsequently incorporated within the Work.\n"
+ + "\n"
+ + " 2. Grant of Copyright License. Subject to the terms and conditions of\n"
+ + " this License, each Contributor hereby grants to You a perpetual,\n"
+ + " worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n"
+ + " copyright license to reproduce, prepare Derivative Works of,\n"
+ + " publicly display, publicly perform, sublicense, and distribute the\n"
+ + " Work and such Derivative Works in Source or Object form.\n"
+ + "\n"
+ + " 3. Grant of Patent License. Subject to the terms and conditions of\n"
+ + " this License, each Contributor hereby grants to You a perpetual,\n"
+ + " worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n"
+ + " (except as stated in this section) patent license to make, have made,\n"
+ + " use, offer to sell, sell, import, and otherwise transfer the Work,\n"
+ + " where such license applies only to those patent claims licensable\n"
+ + " by such Contributor that are necessarily infringed by their\n"
+ + " Contribution(s) alone or by combination of their Contribution(s)\n"
+ + " with the Work to which such Contribution(s) was submitted. If You\n"
+ + " institute patent litigation against any entity (including a\n"
+ + " cross-claim or counterclaim in a lawsuit) alleging that the Work\n"
+ + " or a Contribution incorporated within the Work constitutes direct\n"
+ + " or contributory patent infringement, then any patent licenses\n"
+ + " granted to You under this License for that Work shall terminate\n"
+ + " as of the date such litigation is filed.\n"
+ + "\n"
+ + " 4. Redistribution. You may reproduce and distribute copies of the\n"
+ + " Work or Derivative Works thereof in any medium, with or without\n"
+ + " modifications, and in Source or Object form, provided that You\n"
+ + " meet the following conditions:\n"
+ + "\n"
+ + " (a) You must give any other recipients of the Work or\n"
+ + " Derivative Works a copy of this License; and\n"
+ + "\n"
+ + " (b) You must cause any modified files to carry prominent notices\n"
+ + " stating that You changed the files; and\n"
+ + "\n"
+ + " (c) You must retain, in the Source form of any Derivative Works\n"
+ + " that You distribute, all copyright, patent, trademark, and\n"
+ + " attribution notices from the Source form of the Work,\n"
+ + " excluding those notices that do not pertain to any part of\n"
+ + " the Derivative Works; and\n"
+ + "\n"
+ + " (d) If the Work includes a \"NOTICE\" text file as part of its\n"
+ + " distribution, then any Derivative Works that You distribute must\n"
+ + " include a readable copy of the attribution notices contained\n"
+ + " within such NOTICE file, excluding those notices that do not\n"
+ + " pertain to any part of the Derivative Works, in at least one\n"
+ + " of the following places: within a NOTICE text file distributed\n"
+ + " as part of the Derivative Works; within the Source form or\n"
+ + " documentation, if provided along with the Derivative Works; or,\n"
+ + " within a display generated by the Derivative Works, if and\n"
+ + " wherever such third-party notices normally appear. The contents\n"
+ + " of the NOTICE file are for informational purposes only and\n"
+ + " do not modify the License. You may add Your own attribution\n"
+ + " notices within Derivative Works that You distribute, alongside\n"
+ + " or as an addendum to the NOTICE text from the Work, provided\n"
+ + " that such additional attribution notices cannot be construed\n"
+ + " as modifying the License.\n"
+ + "\n"
+ + " You may add Your own copyright statement to Your modifications and\n"
+ + " may provide additional or different license terms and conditions\n"
+ + " for use, reproduction, or distribution of Your modifications, or\n"
+ + " for any such Derivative Works as a whole, provided Your use,\n"
+ + " reproduction, and distribution of the Work otherwise complies with\n"
+ + " the conditions stated in this License.\n"
+ + "\n"
+ + " 5. Submission of Contributions. Unless You explicitly state otherwise,\n"
+ + " any Contribution intentionally submitted for inclusion in the Work\n"
+ + " by You to the Licensor shall be under the terms and conditions of\n"
+ + " this License, without any additional terms or conditions.\n"
+ + " Notwithstanding the above, nothing herein shall supersede or modify\n"
+ + " the terms of any separate license agreement you may have executed\n"
+ + " with Licensor regarding such Contributions.\n"
+ + "\n"
+ + " 6. Trademarks. This License does not grant permission to use the trade\n"
+ + " names, trademarks, service marks, or product names of the Licensor,\n"
+ + " except as required for reasonable and customary use in describing the\n"
+ + " origin of the Work and reproducing the content of the NOTICE file.\n"
+ + "\n"
+ + " 7. Disclaimer of Warranty. Unless required by applicable law or\n"
+ + " agreed to in writing, Licensor provides the Work (and each\n"
+ + " Contributor provides its Contributions) on an \"AS IS\" BASIS,\n"
+ + " WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n"
+ + " implied, including, without limitation, any warranties or conditions\n"
+ + " of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n"
+ + " PARTICULAR PURPOSE. You are solely responsible for determining the\n"
+ + " appropriateness of using or redistributing the Work and assume any\n"
+ + " risks associated with Your exercise of permissions under this License.\n"
+ + "\n"
+ + " 8. Limitation of Liability. In no event and under no legal theory,\n"
+ + " whether in tort (including negligence), contract, or otherwise,\n"
+ + " unless required by applicable law (such as deliberate and grossly\n"
+ + " negligent acts) or agreed to in writing, shall any Contributor be\n"
+ + " liable to You for damages, including any direct, indirect, special,\n"
+ + " incidental, or consequential damages of any character arising as a\n"
+ + " result of this License or out of the use or inability to use the\n"
+ + " Work (including but not limited to damages for loss of goodwill,\n"
+ + " work stoppage, computer failure or malfunction, or any and all\n"
+ + " other commercial damages or losses), even if such Contributor\n"
+ + " has been advised of the possibility of such damages.\n"
+ + "\n"
+ + " 9. Accepting Warranty or Additional Liability. While redistributing\n"
+ + " the Work or Derivative Works thereof, You may choose to offer,\n"
+ + " and charge a fee for, acceptance of support, warranty, indemnity,\n"
+ + " or other liability obligations and/or rights consistent with this\n"
+ + " License. However, in accepting such obligations, You may act only\n"
+ + " on Your own behalf and on Your sole responsibility, not on behalf\n"
+ + " of any other Contributor, and only if You agree to indemnify,\n"
+ + " defend, and hold each Contributor harmless for any liability\n"
+ + " incurred by, or claims asserted against, such Contributor by reason\n"
+ + " of your accepting any such warranty or additional liability.\n"
+ + "\n"
+ + " END OF TERMS AND CONDITIONS\n"
+ + "\n"
+ + " APPENDIX: How to apply the Apache License to your work.\n"
+ + "\n"
+ + " To apply the Apache License to your work, attach the following\n"
+ + " boilerplate notice, with the fields enclosed by brackets \"[]\"\n"
+ + " replaced with your own identifying information. (Don't include\n"
+ + " the brackets!) The text should be enclosed in the appropriate\n"
+ + " comment syntax for the file format. We also recommend that a\n"
+ + " file or class name and description of purpose be included on the\n"
+ + " same \"printed page\" as the copyright notice for easier\n"
+ + " identification within third-party archives.\n"
+ + "\n"
+ + " Copyright [yyyy] [name of copyright owner]\n"
+ + "\n"
+ + " Licensed under the Apache License, Version 2.0 (the \"License\");\n"
+ + " you may not use this file except in compliance with the License.\n"
+ + " You may obtain a copy of the License at\n"
+ + "\n"
+ + " http://www.apache.org/licenses/LICENSE-2.0\n"
+ + "\n"
+ + " Unless required by applicable law or agreed to in writing, software\n"
+ + " distributed under the License is distributed on an \"AS IS\" BASIS,\n"
+ + " WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n"
+ + " See the License for the specific language governing permissions and\n"
+ + " limitations under the License."));
+ }
+
+ private IndexedLineReader largeFile(String embeddedText) {
+ StringBuilder sb = new StringBuilder();
+ sb.append(" ");
+ sb.append(sb); // 128
+ sb.append(sb); // 256
+ sb.append(sb); // 512
+ sb.append(sb); // 1024
+ sb.append('\n');
+ String onek = sb.toString();
+ sb.setLength(0);
+ for (int i = 0; i < 256; i++) {
+ sb.append(String.format(" x%2x", i));
+ }
+ sb.append('\n');
+ String mixed1k = sb.toString();
+ for (int j = 0; j < 4; j++) {
+ for (int i = 0; i < 16; i++) {
+ sb.append(onek);
+ }
+ for (int i = 0; i < 48; i++) {
+ sb.append(onek, 0, 512 - 10 * i);
+ sb.append(mixed1k, 512 + 10 * i, mixed1k.length());
+ }
+ if (j == 1) {
+ sb.append(embeddedText);
+ sb.append('\n');
+ }
+ }
+
+ return new IndexedLineReader("big_file", -1, newInputStream(sb.toString()));
+ }
+
+ private IndexedLineReader readerFromString(String text) {
+ return new IndexedLineReader("test", -1, newInputStream(text));
+ }
+
+ private InputStream newInputStream(String text) {
+ return new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8));
+ }
+}
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/IndexedLineReaderTest.java b/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/IndexedLineReaderTest.java
new file mode 100644
index 0000000..a337027
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/lib/IndexedLineReaderTest.java
@@ -0,0 +1,242 @@
+// Copyright (C) 2019 The Android Open Source Project
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package com.googlesource.gerrit.plugins.copyright.lib;
+
+import static com.google.common.truth.Truth.assertThat;
+
+import java.io.ByteArrayInputStream;
+import java.io.InputStream;
+import java.nio.CharBuffer;
+import java.nio.charset.StandardCharsets;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+@RunWith(JUnit4.class)
+public class IndexedLineReaderTest {
+
+ private static final int BUFSIZE = 1024;
+ private static final String I18N_STRING =
+ "2Îñţļ3国際化4\uD84F\uDCFE\uD843\uDE6D\uD84D\uDF3F8\uD83D\uDC69\uD83C\uDFFD";
+
+ private static final String AMPERSAND_QUOT_STRING = "\n"I think therefore I am."\n";
+ private static final String HEX_QUOTE_STRING = "\n"I think therefore I am."\n";
+ private static final String VAR_QUOTE_STRING = "\n<var>I think therefore I am.</var>\n";
+ private static final String ESCAPED_QUOTE = "\"I think therefore I am.\"";
+
+ private final char[] buf = new char[BUFSIZE];
+ private final CharBuffer cb = CharBuffer.wrap(buf);
+ private final StringBuilder sb = new StringBuilder();
+
+ private IndexedLineReader reader;
+
+ @Before
+ public void setUp() throws Exception {
+ cb.clear();
+ sb.setLength(0);
+ }
+
+ @After
+ public void tearDown() throws Exception {
+ if (reader != null) {
+ reader.close();
+ }
+ }
+
+ @Test
+ public void testEmptyStream_read() throws Exception {
+ reader = readerFromString("");
+ assertThat(reader.read(cb)).isEqualTo(0);
+ cb.flip();
+ assertThat(cb.toString()).isEmpty();
+ }
+
+ @Test
+ public void testEmptyStream_readString() throws Exception {
+ reader = readerFromString("");
+
+ assertThat(reader.readString('\n', sb)).isAtMost(0);
+ assertThat(sb.toString()).isEmpty();
+ }
+
+ @Test
+ public void testSimpleStream_read() throws Exception {
+ reader = readerFromString("Hello there!");
+ assertThat(reader.read(cb)).isEqualTo(12);
+ cb.flip();
+ assertThat(cb.toString()).isEqualTo("Hello there!");
+ }
+
+ @Test
+ public void testSimpleStream_readString() throws Exception {
+ reader = readerFromString("Hello there!");
+
+ assertThat(reader.readString('\n', sb)).isEqualTo(12);
+ assertThat(sb.toString()).isEqualTo("Hello there!");
+ sb.setLength(0);
+ assertThat(reader.readString('\n', sb)).isLessThan(0);
+ assertThat(sb.toString()).isEmpty();
+ }
+
+ @Test
+ public void testNulDelimitedStream_readString() throws Exception {
+ reader = readerFromString("line1\000line");
+
+ assertThat(reader.readString('\000', sb)).isEqualTo(6);
+ assertThat(sb.toString()).isEqualTo("line1");
+ sb.setLength(0);
+ assertThat(reader.readString('\000', sb)).isEqualTo(4);
+ assertThat(sb.toString()).isEqualTo("line");
+ sb.setLength(0);
+ assertThat(reader.readString('\000', sb)).isLessThan(0);
+ assertThat(sb.toString()).isEmpty();
+ }
+
+ @Test
+ public void testI18nStream_read() throws Exception {
+ reader = readerFromString(I18N_STRING);
+ assertThat(reader.read(cb)).isEqualTo(I18N_STRING.length());
+ cb.flip();
+ assertThat(cb.toString()).isEqualTo(I18N_STRING);
+ }
+
+ @Test
+ public void testI18nStream_readString() throws Exception {
+ reader = readerFromString(I18N_STRING);
+
+ assertThat(reader.readString('\n', sb)).isEqualTo(I18N_STRING.length());
+ assertThat(sb.toString()).isEqualTo(I18N_STRING);
+ sb.setLength(0);
+ assertThat(reader.readString('\n', sb)).isLessThan(0);
+ assertThat(sb.toString()).isEmpty();
+ }
+
+ @Test
+ public void testAmpersandQuotStream_read() throws Exception {
+ reader = readerFromString(AMPERSAND_QUOT_STRING);
+ assertThat(reader.read(cb)).isEqualTo(AMPERSAND_QUOT_STRING.length());
+ cb.flip();
+ assertThat(cb.toString()).isEqualTo("\n" + ESCAPED_QUOTE + "\n");
+ }
+
+ @Test
+ public void testAmpersandQuotStream_readString() throws Exception {
+ reader = readerFromString(AMPERSAND_QUOT_STRING);
+
+ assertThat(reader.readString('\n', sb)).isEqualTo(1);
+ assertThat(sb.toString()).isEmpty();
+ sb.setLength(0);
+ reader.readString('\n', sb);
+ assertThat(sb.toString()).isEqualTo(ESCAPED_QUOTE);
+ sb.setLength(0);
+ assertThat(reader.readString('\n', sb)).isLessThan(0);
+ assertThat(sb.toString()).isEmpty();
+ }
+
+ @Test
+ public void testHexQuoteStream_read() throws Exception {
+ reader = readerFromString(HEX_QUOTE_STRING);
+ assertThat(reader.read(cb)).isEqualTo(HEX_QUOTE_STRING.length());
+ cb.flip();
+ assertThat(cb.toString()).isEqualTo("\n" + ESCAPED_QUOTE + "\n");
+ }
+
+ @Test
+ public void testHexQuoteStream_readString() throws Exception {
+ reader = readerFromString(HEX_QUOTE_STRING);
+
+ assertThat(reader.readString('\n', sb)).isEqualTo(1);
+ assertThat(sb.toString()).isEmpty();
+ sb.setLength(0);
+ reader.readString('\n', sb);
+ assertThat(sb.toString()).isEqualTo(ESCAPED_QUOTE);
+ sb.setLength(0);
+ assertThat(reader.readString('\n', sb)).isLessThan(0);
+ assertThat(sb.toString()).isEmpty();
+ }
+
+ @Test
+ public void testVarQuoteStream_read() throws Exception {
+ reader = readerFromString(VAR_QUOTE_STRING);
+ assertThat(reader.read(cb)).isEqualTo(VAR_QUOTE_STRING.length());
+ cb.flip();
+ assertThat(cb.toString()).isEqualTo("\n" + ESCAPED_QUOTE + "\n");
+ }
+
+ @Test
+ public void testVarQuoteStream_readString() throws Exception {
+ reader = readerFromString(VAR_QUOTE_STRING);
+
+ assertThat(reader.readString('\n', sb)).isEqualTo(1);
+ assertThat(sb.toString()).isEmpty();
+ sb.setLength(0);
+ reader.readString('\n', sb);
+ assertThat(sb.toString()).isEqualTo(ESCAPED_QUOTE);
+ sb.setLength(0);
+ assertThat(reader.readString('\n', sb)).isLessThan(0);
+ assertThat(sb.toString()).isEmpty();
+ }
+
+ @Test
+ public void testBytes_read() throws Exception {
+ byte[] bytes = new byte[128];
+ for (int i = 0; i <= 127; i++) {
+ bytes[i] = (byte) (i + 0x80);
+ }
+
+ reader = readerFromByteArray(bytes);
+ assertThat(reader.read(cb)).isEqualTo(128);
+ cb.flip();
+
+ // Select malformed chars mapped to spaces, symbols or accented chars. Rest mapped to '?'.
+ assertThat(cb.toString())
+ .isEqualTo(
+ "?????™?*????????????ö????™?????? ?????? ?©??? ®??????? *????????ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ"
+ + "ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ");
+ }
+
+ @Test
+ public void testBytes_readString() throws Exception {
+ byte[] bytes = new byte[128];
+ for (int i = 0; i <= 127; i++) {
+ bytes[i] = (byte) (i + 0x80);
+ }
+
+ reader = readerFromByteArray(bytes);
+ assertThat(reader.readString('\n', sb)).isEqualTo(128);
+
+ // Select malformed chars mapped to spaces, symbols or accented chars. Rest mapped to '?'.
+ assertThat(sb.toString())
+ .isEqualTo(
+ "?????™?*????????????ö????™?????? ?????? ?©??? ®??????? *????????ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ"
+ + "ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ");
+ sb.setLength(0);
+ assertThat(reader.readString('\n', sb)).isLessThan(0);
+ }
+
+ private IndexedLineReader readerFromString(String text) {
+ return new IndexedLineReader("test", -1, newInputStream(text));
+ }
+
+ private IndexedLineReader readerFromByteArray(byte[] bytes) {
+ return new IndexedLineReader("test", -1, new ByteArrayInputStream(bytes));
+ }
+
+ private InputStream newInputStream(String text) {
+ return new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8));
+ }
+}
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/archives/first_party.zip b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/archives/first_party.zip
new file mode 100644
index 0000000..4d322cc
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/archives/first_party.zip
Binary files differ
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/archives/forbidden.cpio b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/archives/forbidden.cpio
new file mode 100644
index 0000000..058c77a
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/archives/forbidden.cpio
Binary files differ
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/archives/third_party.tgz b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/archives/third_party.tgz
new file mode 100644
index 0000000..efdd7c3
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/archives/third_party.tgz
Binary files differ
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/AFFERO.txt.gz b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/AFFERO.txt.gz
new file mode 100644
index 0000000..dde7781
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/AFFERO.txt.gz
Binary files differ
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/ANDROID.txt b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/ANDROID.txt
new file mode 100644
index 0000000..71a183b
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/ANDROID.txt
@@ -0,0 +1,3 @@
+/*
+ * Copyright (C) 2019 Android Open-Source Project
+ */
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/APACHE2.txt b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/APACHE2.txt
new file mode 100644
index 0000000..d645695
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/APACHE2.txt
@@ -0,0 +1,202 @@
+
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+ APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+ Copyright [yyyy] [name of copyright owner]
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/BSD2.txt b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/BSD2.txt
new file mode 100644
index 0000000..453967f
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/BSD2.txt
@@ -0,0 +1,26 @@
+This code is made available under the BSD 2-Clause License
+==========================================================
+
+Copyright (c) 2019, Jane Doe <jdoe@example.com>
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+Redistributions of source code must retain the above copyright notice, this list
+of conditions and the following disclaimer.
+
+Redistributions in binary form must reproduce the above copyright notice, this
+list of conditions and the following disclaimer in the documentation and/or
+other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/MIT.txt b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/MIT.txt
new file mode 100644
index 0000000..af63709
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/MIT.txt
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) Jane Doe
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/UNKNOWN.txt b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/UNKNOWN.txt
new file mode 100644
index 0000000..67a5ebe
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata/licenses/UNKNOWN.txt
@@ -0,0 +1,4 @@
+#!/bin/sh
+# Copyright (C) 2019
+# Jane Doe <jdoe@example.com>
+exit 0
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/tools/AndroidScanTest.sh b/src/test/java/com/googlesource/gerrit/plugins/copyright/tools/AndroidScanTest.sh
new file mode 100755
index 0000000..bcaf29b
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/tools/AndroidScanTest.sh
@@ -0,0 +1,107 @@
+#!/bin/bash
+
+# Copyright (C) 2019 The Android Open Source Project
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+readonly scanner="${TEST_SRCDIR}/copyright/android_scan"
+readonly testdata="${TEST_SRCDIR}/copyright/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata"
+
+function die() {
+ echo -e "$@" >&2
+ exit 1
+}
+
+echo "Testing APACHE2 (1p) license"
+output=$(echo "${testdata}/licenses/APACHE2.txt" | "${scanner}" --f=- -v) \
+ || die "Failed scanning APACHE2 license."
+licenses=$(echo "${output}" | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "FIRST_PARTY" ]]; then
+ die "Expected first party and only first party licenses in APACHE2 license but found ${output}"
+fi
+
+echo "Testing ANDROID (1p) owner"
+output=$("${scanner}" -v "${testdata}/licenses/ANDROID.txt") || die "Failed scanning ANDROID owner."
+licenses=$(echo "${output}" | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "FIRST_PARTY" ]]; then
+ die "Expected first party and only first party owners in ANDROID owner but found ${output}"
+fi
+
+echo "Testing first_party.zip deep"
+output=$("${scanner}" --deep "${testdata}/archives/first_party.zip") \
+ || die "Failed deep scanning first_party.zip"
+licenses=$(echo "${output}" | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "FIRST_PARTY" ]]; then
+ die "Expected first party and only first party owners in first_party.zip but found ${output}"
+fi
+
+echo "Testing BSD2 (3p) license"
+output=$("${scanner}" "${testdata}/licenses/BSD2.txt") || die "Failed scanning BSD2 license."
+licenses=$(echo "${output}" | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party licenses in BSD2 license but found ${output}"
+fi
+
+echo "Testing MIT (3p) license"
+output=$("${scanner}" "${testdata}/licenses/MIT.txt") || die "Failed scanning MIT license."
+licenses=$(echo "${output}" | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party licenses in MIT license but found ${output}"
+fi
+
+echo "Testing UNKNOWN (3p) owner"
+output=$("${scanner}" "${testdata}/licenses/UNKNOWN.txt") || die "Failed scanning UNKNOWN owner."
+licenses=$(echo "${output}" | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party owners in UNKNOWN owner but found ${output}"
+fi
+
+echo "Testing third_party.tgz deep"
+output=$("${scanner}" --deep "${testdata}/archives/third_party.tgz") \
+ || die "Failed deep scanning third_party.tgz"
+licenses=$(echo "${output}" | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party licenses in third_party.tgz but found ${output}"
+fi
+
+echo "Testing AFFERO (forbidden) license"
+output=$("${scanner}" "${testdata}/licenses/AFFERO.txt.gz") || die "Failed scanning AFFERO license."
+licenses=$(echo "${output}" | fgrep -v 'OWNER' | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "FORBIDDEN" ]]; then
+ die "Expected forbidden and only forbidden licenses in AFFERO license but found ${output}"
+fi
+licenses=$(echo "${output}" | fgrep OWNER | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party owners in AFFERO license but found ${output}"
+fi
+
+echo "Testing forbidden.cpio deep"
+output=$("${scanner}" --deep "${testdata}/archives/forbidden.cpio") \
+ || die "Failed deep scanning forbidden.cpio"
+licenses=$(echo "${output}" | fgrep -v 'OWNER' | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "FORBIDDEN" ]]; then
+ die "Expected forbidden and only forbidden licenses in forbidden.cpio (deep) but found ${output}"
+fi
+licenses=$(echo "${output}" | fgrep OWNER | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party owners in fobidden.cpio but found ${output}"
+fi
+
+echo "Testing scan of non-file error"
+"${scanner}" -v "${TEST_SRCDIR}/google3/javatests/com/google/devtools/compliance" 2>/dev/null \
+ && die "Expected directory to fail scan."
+
+echo "Testing scan of no files error"
+"${scanner}" 2>/dev/null && die "Expected scan of no files to fail."
+
+exit 0
diff --git a/src/test/java/com/googlesource/gerrit/plugins/copyright/tools/ScanToolTest.sh b/src/test/java/com/googlesource/gerrit/plugins/copyright/tools/ScanToolTest.sh
new file mode 100755
index 0000000..beb23cd
--- /dev/null
+++ b/src/test/java/com/googlesource/gerrit/plugins/copyright/tools/ScanToolTest.sh
@@ -0,0 +1,118 @@
+#!/bin/bash
+
+# Copyright (C) 2019 The Android Open Source Project
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+readonly scanner="${TEST_SRCDIR}/copyright/scan_tool"
+readonly testdata="${TEST_SRCDIR}/copyright/src/test/java/com/googlesource/gerrit/plugins/copyright/testdata"
+
+function die() {
+ echo -e "$@" >&2
+ exit 1
+}
+
+echo "Testing APACHE2 (unknown) license"
+output=$("${scanner}" "${testdata}/licenses/APACHE2.txt") || die "Failed scanning APACHE2 license."
+licenses=$(echo "${output}" | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "UNKNOWN" ]]; then
+ die "Expected unknown and only unknown licenses in APACHE2 license but found ${output}"
+fi
+
+echo "Testing ANDROID (3p) owner"
+output=$("${scanner}" "${testdata}/licenses/ANDROID.txt") || die "Failed scanning ANDROID owner."
+licenses=$(echo "${output}" | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party owners in ANDROID owner but found ${output}"
+fi
+
+echo "Testing first_party.zip deep"
+output=$("${scanner}" --deep "${testdata}/archives/first_party.zip") \
+ || die "Failed deep scanning first_party.zip"
+licenses=$(echo "${output}" | fgrep -v OWNER | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "UNKNOWN" ]]; then
+ die "Expected unknown and only unknown licenses in first_party.zip but found ${output}"
+fi
+licenses=$(echo "${output}" | fgrep OWNER | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party owners in first_party.zip but found ${output}"
+fi
+
+echo "Testing BSD2 (unknown) license"
+output=$("${scanner}" "${testdata}/licenses/BSD2.txt") || die "Failed scanning BSD2 license."
+licenses=$(echo "${output}" | fgrep -v OWNER | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "UNKNOWN" ]]; then
+ die "Expected unknown and only unknown licenses in BSD2 but found ${output}"
+fi
+licenses=$(echo "${output}" | fgrep OWNER | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party owners in BSD2 but found ${output}"
+fi
+
+echo "Testing MIT (unknown) license"
+output=$("${scanner}" "${testdata}/licenses/MIT.txt") || die "Failed scanning MIT license."
+licenses=$(echo "${output}" | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "UNKNOWN" ]]; then
+ die "Expected unknown and only unknown licenses in MIT license but found ${output}"
+fi
+
+echo "Testing UNKNOWN (3p) owner"
+output=$("${scanner}" "${testdata}/licenses/UNKNOWN.txt") || die "Failed scanning UNKNOWN owner."
+licenses=$(echo "${output}" | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party owners in UNKNOWN owner but found ${output}"
+fi
+
+echo "Testing third_party.tgz deep"
+output=$("${scanner}" --deep "${testdata}/archives/third_party.tgz") \
+ || die "Failed deep scanning third_party.tgz"
+licenses=$(echo "${output}" | fgrep -v OWNER | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "UNKNOWN" ]]; then
+ die "Expected unknown and only unknown licenses in third_party.zip but found ${output}"
+fi
+licenses=$(echo "${output}" | fgrep OWNER | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party owners in third_party.zip but found ${output}"
+fi
+
+echo "Testing AFFERO (unknown) license"
+output=$("${scanner}" "${testdata}/licenses/AFFERO.txt.gz") || die "Failed scanning AFFERO license."
+licenses=$(echo "${output}" | fgrep -v 'OWNER' | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "UNKNOWN" ]]; then
+ die "Expected unknown and only unknown licenses in AFFERO license but found ${output}"
+fi
+licenses=$(echo "${output}" | fgrep OWNER | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party owners in AFFERO license but found ${output}"
+fi
+
+echo "Testing forbidden.cpio deep"
+output=$("${scanner}" --deep "${testdata}/archives/forbidden.cpio") \
+ || die "Failed deep scanning forbidden.cpio"
+licenses=$(echo "${output}" | fgrep -v 'OWNER' | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "UNKNOWN" ]]; then
+ die "Expected unknown and only unknown licenses in forbidden.cpio but found ${output}"
+fi
+licenses=$(echo "${output}" | fgrep OWNER | cut -d ' ' -f1 | sort -u)
+if [[ "${licenses}" != "THIRD_PARTY" ]]; then
+ die "Expected third party and only third party owners in fobidden.cpio but found ${output}"
+fi
+
+echo "Testing scan of non-file error"
+"${scanner}" "${TEST_SRCDIR}/google3/javatests/com/google/devtools/compliance" 2>/dev/null \
+ && die "Expected directory to fail scan."
+
+echo "Testing scan of no files error"
+"${scanner}" 2>/dev/null && die "Expected scan of no files to fail."
+
+exit 0
diff --git a/tools/bzl/BUILD b/tools/bzl/BUILD
new file mode 100644
index 0000000..c5ed0b7
--- /dev/null
+++ b/tools/bzl/BUILD
@@ -0,0 +1 @@
+# Empty file required by Bazel
diff --git a/tools/bzl/junit.bzl b/tools/bzl/junit.bzl
new file mode 100644
index 0000000..3af7e58
--- /dev/null
+++ b/tools/bzl/junit.bzl
@@ -0,0 +1,4 @@
+load(
+ "@com_googlesource_gerrit_bazlets//tools:junit.bzl",
+ "junit_tests",
+)
diff --git a/tools/bzl/maven_jar.bzl b/tools/bzl/maven_jar.bzl
new file mode 100644
index 0000000..2eabedb
--- /dev/null
+++ b/tools/bzl/maven_jar.bzl
@@ -0,0 +1 @@
+load("@com_googlesource_gerrit_bazlets//tools:maven_jar.bzl", "maven_jar")
diff --git a/tools/bzl/plugin.bzl b/tools/bzl/plugin.bzl
new file mode 100644
index 0000000..0b25d23
--- /dev/null
+++ b/tools/bzl/plugin.bzl
@@ -0,0 +1,6 @@
+load(
+ "@com_googlesource_gerrit_bazlets//:gerrit_plugin.bzl",
+ "PLUGIN_DEPS",
+ "PLUGIN_TEST_DEPS",
+ "gerrit_plugin",
+)