Special character handling in URLs

Bazaar allows locations to be specified in multiple ways, either:

  • Fully qualified URLs

  • File system paths, relative or absolute

Internally brz treats all locations as URLs. For any file system paths that are specified it will automatically determine the appropriate URL representation, and escape special characters where necessary.

There are a few characters which have special meaning in URLs and need careful handling to avoid ambiguities. Characters can be escaped with a % and a hex value in URLs. Any non-ASCII characters in a file path will automatically be urlencoded when the path is converted to a URL.

URLs represent non-ASCII characters in an encoding defined by the server, but usually UTF-8. The % escapes should be of the UTF-8 bytes. Bazaar tries to be generous in what it accepts as a URL and to print them in a way that will be readable.

For example, if you have a directory named ‘/tmp/%2False’ these are all valid ways of accessing the content (0x2F, or 47, is the ASCII code for forward slash):

cd /tmp
brz log /tmp/%2False
brz log %2False
brz log file:///tmp/%252False
brz log file://localhost/tmp/%252False
brz log file:%252False

These are valid but do not refer to the same file:

brz log file:///tmp/%2False (refers to a file called /tmp/\/alse)
brz log %252False (refers to a file called /tmp/%252False)

Comma also has special meaning in URLs, because it denotes segment parameters

segment parameters: http://www.ietf.org/rfc/rfc3986.txt (section 3.3)

Comma is also special in any file system paths that are specified. To use a literal comma in a file system path, specify a URL and URL encode the comma:

brz log foo,branch=bla # path "foo" with the segment parameter "branch" set to "bla"
brz log file:foo%2Cbranch=bla # path "foo,branch=bla"
brz log file:foo,branch=bla # path "foo" with segment parameter "branch" set to "bla"