Since I started to code for some websevice’s authentication method, it’s common that you need to use Percent-encoding somewhere. I used to use Perl do this such job:

% echo -n $'Encoded string has 中文\n' | perl -p -e 's/([^A-Za-z0-9-._~])/sprintf("%%%02X", ord($1))/seg' ; echo
Encoded%20string%20has%20%E4%B8%AD%E6%96%87%0A

If you want to do it in pure Bash, there are two issues:

  1. To be able to feed Bash builtin printf with argument is same as ord($1) as shown above, and
  2. Commonly, Bash supports Unicode, which implies that you get character by character not byte character. A Unicode character could be multi-byte.

The first key is to have ord() in Bash. If you search for bash chr ord on Internet, you will find something like this:

% printf "%%%02X" "'A" ; echo
%41

I took looks at few pages but it seems that many just copied from here and there, no explanations on single quote usage in "'A". I found the answer from info coreutils 'printf invocation':

  • If the leading character of a numeric argument is " or ' then its value is the numeric value of the immediately following character. Any remaining characters are silently ignored if the POSIXLY_CORRECT environment variable is set; otherwise, a warning is printed. For example, printf "%d" "'a" outputs 97 on hosts that use the ASCII character set, since a has the numeric value 97 in ASCII.

Second key is to get byte character. Amazingly, it’s fairly simple, you just change locale, e.g. LANG=en_US.ISO8859-1.

A complete code is:

pe () {
  local LANG=en_US.ISO8859-1 ch i
  for ((i=0;i<${#1};i++)); do
    ch="${1:i:1}"
    [[ $ch =~ [._~A-Za-z0-9-] ]] && echo -n "$ch" || printf "%%%02X" "'$ch"
  done
  }

pe $'Encoded string has 中文\n' ; echo

The performance isn’t good, on my computer, the encoding rate is about 3.84 kbytes/second, Perl’s rate is about 2.182 Mbytes/second. But it should be enough for general use since usually you won’t feed it with a file but a simple string.