Teach SafeHtml.linkify() to ignore trailing ">"
Because we are running linkify on the HTML safe URL, a string
such as "<http://foo>" is actually appearing to our regex as the
string "<http://foo>". As ">" is a valid sequence of URL
characters we were pulling the ">" into the URL, when in fact
our intent was to leave it out.
We now skip "<" and ">" within a URL, as these are meant to
be read by the browser after parsing as "<" and ">", and these are
not considered to be part of the URL.
Bug: GERRIT-277
Change-Id: Ide9a63c3c998eac6a3ce9f23066668c2e7a9aba6
Signed-off-by: Shawn O. Pearce <sop@google.com>
diff --git a/src/main/java/com/google/gwtexpui/safehtml/client/SafeHtml.java b/src/main/java/com/google/gwtexpui/safehtml/client/SafeHtml.java
index 4c61588..b19ad6c 100644
--- a/src/main/java/com/google/gwtexpui/safehtml/client/SafeHtml.java
+++ b/src/main/java/com/google/gwtexpui/safehtml/client/SafeHtml.java
@@ -70,13 +70,15 @@
/** Convert bare http:// and https:// URLs into <a href> tags. */
public SafeHtml linkify() {
+ final String part = "(?:" +
+ "[a-zA-Z0-9$_.+!*',%;:@=?#/-]" +
+ "|&(?!lt;|gt;)" +
+ ")";
return replaceAll(
"(https?://" +
- "[a-zA-Z0-9$_.+!*',%;:@&=?#/-]{2,}" +
- "([(]" +
- "[a-zA-Z0-9$_.+!*',%;:@&=?#/-]*" +
- "[)])*" +
- "[a-zA-Z0-9$_.+!*',%;:@&=?#/-]*" +
+ part + "{2,}" +
+ "(?:[(]" + part + "*" + "[)])*" +
+ part + "*" +
")",
"<a href=\"$1\">$1</a>");
}