manifest: refactor the filesystem checking logic for more reuse

This function is currently written with copyfile & linkfile in mind.
Generalize the logic & function arguments slightly so we can reuse
in more places that make sense.

This changes the validation logic slightly too in that we no longer
allow "." for the dest attribute with copyfile & linkfile, nor for
the src attribute with copyfile.  We already rejected those later on
when checking against the active filesystem, but now we reject them
a little sooner when parsing.

The empty path check isn't a new requirement exactly -- repo used to
crash on it, so it was effectively blocked, but now we diagnosis it.

Bug: https://crbug.com/gerrit/14156
Change-Id: I0fdb42a3da60ed149ff1997c5dd4b85da70eec3d
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/298442
Reviewed-by: Michael Mortensen <mmortensen@google.com>
Tested-by: Mike Frysinger <vapier@google.com>
diff --git a/manifest_xml.py b/manifest_xml.py
index ff8e061..d05f4d0 100644
--- a/manifest_xml.py
+++ b/manifest_xml.py
@@ -1121,8 +1121,33 @@
     return relpath, worktree, gitdir, objdir
 
   @staticmethod
-  def _CheckLocalPath(path, symlink=False):
-    """Verify |path| is reasonable for use in <copyfile> & <linkfile>."""
+  def _CheckLocalPath(path, dir_ok=False, cwd_dot_ok=False):
+    """Verify |path| is reasonable for use in filesystem paths.
+
+    Used with <copyfile> & <linkfile> elements.
+
+    This only validates the |path| in isolation: it does not check against the
+    current filesystem state.  Thus it is suitable as a first-past in a parser.
+
+    It enforces a number of constraints:
+    * No empty paths.
+    * No "~" in paths.
+    * No Unicode codepoints that filesystems might elide when normalizing.
+    * No relative path components like "." or "..".
+    * No absolute paths.
+    * No ".git" or ".repo*" path components.
+
+    Args:
+      path: The path name to validate.
+      dir_ok: Whether |path| may force a directory (e.g. end in a /).
+      cwd_dot_ok: Whether |path| may be just ".".
+
+    Returns:
+      None if |path| is OK, a failure message otherwise.
+    """
+    if not path:
+      return 'empty paths not allowed'
+
     if '~' in path:
       return '~ not allowed (due to 8.3 filenames on Windows filesystems)'
 
@@ -1165,12 +1190,12 @@
 
     # Some people use src="." to create stable links to projects.  Lets allow
     # that but reject all other uses of "." to keep things simple.
-    if parts != ['.']:
+    if not cwd_dot_ok or parts != ['.']:
       for part in set(parts):
         if part in {'.', '..', '.git'} or part.startswith('.repo'):
           return 'bad component: %s' % (part,)
 
-    if not symlink and resep.match(path[-1]):
+    if not dir_ok and resep.match(path[-1]):
       return 'dirs not allowed'
 
     # NB: The two abspath checks here are to handle platforms with multiple
@@ -1202,7 +1227,8 @@
 
     # |src| is the file we read from or path we point to for symlinks.
     # It is relative to the top of the git project checkout.
-    msg = cls._CheckLocalPath(src, symlink=element == 'linkfile')
+    is_linkfile = element == 'linkfile'
+    msg = cls._CheckLocalPath(src, dir_ok=is_linkfile, cwd_dot_ok=is_linkfile)
     if msg:
       raise ManifestInvalidPathError(
           '<%s> invalid "src": %s: %s' % (element, src, msg))
diff --git a/tests/test_manifest_xml.py b/tests/test_manifest_xml.py
index 40388f9..4b54514 100644
--- a/tests/test_manifest_xml.py
+++ b/tests/test_manifest_xml.py
@@ -87,6 +87,8 @@
   def test_bad_paths(self):
     """Make sure bad paths (src & dest) are rejected."""
     PATHS = (
+        '',
+        '.',
         '..',
         '../',
         './',