tl;dr Add two extra lines, and use printw or addwstr.

Displaying Unicode character in these days is somewhat unavoidable, sometimes, in the 21st century, pure ASCII just isn’t having enough characters. I’d bring up DRAWILLE as an example. It might not suitable for scientific graphing, but certainly more accurate than a graph of * dots.

I have a few encounters with the Unicode and ncurses library. I usually have no problems to add the support for the Unicode characters. Nonetheless, it might be more to it than what I am about to show you in this post, it’s more like my personal notes of what works. If you have more insights and corrections to make, please don’t hesitate to tell me.

1   Setup

It really only needs two extra lines for setlocale and the header file as seen below.


#include <locale.h>

int
main (int argc, char *argv[])
{
setlocale(LC_ALL, "");
/* ... */
}

2   Linking

There is also a common issue regarding the linking, it should link against a library name like libncursesw.so, there is a w in the filename. If linking against the wrong library, some wide character functions will be missing, even you don’t use those, you might get an output like M-b~BM-,.

A safer way to build the binary is to use the helper script for ncurses, ncursesw5-config, for example:


gcc -o out $(ncursesw5-config --cflags --libs) input.c

As long as this helper is available, it can provides the right library to link against to. Manually using -l<curses-library-name> may not be universal, not even across Linux distributions.

3   Sample code


// as test.c
// % gcc -o test -std=c99 $(ncursesw5-config --cflags --libs) test.c

#include <locale.h>
#include <curses.h>
#include <stdlib.h>


int
main (int argc, char *argv[])
{
setlocale(LC_ALL, "");

initscr();

printw("Euro\n");

printw("€\n"); // literal Unicode
printw("\u20ac\n"); // escaped Unicode (C99)

printw("%lc\n", L'€'); // wint_t
printw("%ls\n", L"€"); // wchar_t
addwstr(L"\u20AC\n"); // wchar_t

printw("\xe2\x82\xac\n"); // utf-8 encoded
addstr("\xe2\x82\xac\n"); // utf-8 encoded

for (int i = 0; i < 10; i++)
{
printw("%c %lc\n", '0' + i, L'0' + i);
}

getch();
endwin();

return EXIT_SUCCESS;
}

It’s pretty much like how you print out Unicode using printf, but you use ncurses’ printw or some other functions, such as addwstr.

Frankly, I am not entirely sure about many things, but they all work and I haven’t no problem coding them out this ways. Nonetheless, I will try to explain as best as I can.

4   Unicode string

My source code always is UTF-8 encoded, so is the terminal, GCC has never thrown anything about it, I just put in any Unicode characters I need into the narrow character string as before.


printw("€\n"); // literal Unicode

If you are up to manually type in escaped Unicode using \uXXXX, also known as universal character names since C99, or even byte representation of UTF-8 encoded character, there is no one to stop you.


printw("\u20ac\n"); // escaped Unicode (C99)
printw("\xe2\x82\xac\n"); // utf-8 encoded
addstr("\xe2\x82\xac\n"); // utf-8 encoded

5   addstr to addwstr

Using addwstr with wchar_t string, the wide character string, you can use L"" wide character string literal.


addwstr(L"\u20AC\n"); // wchar_t

Note

You need to define _XOPEN_SOURCE_EXTENDED for defining NCURSES_WIDECHAR, which resulting wide character function declarations, although it’s fine without defining that macro, just a warning would be thrown.

You should not feed wide character string, L"", into addstr — no w in the function name — you will get a result like M-,; vice verse, nor should you fee narrow character string, "", into addwstr.

6   printw to printw with modifier l


printw("%lc\n", L'€'); // wint_t
printw("%ls\n", L"€"); // wchar_t

Using %lc and %ls modifier and specifier, from printf(3):

c
[…] If an l modifier is present, the wint_t (wide character) argument is converted to a multibyte sequence by a call to the wcrtomb(3) function, […]
l
[…] If an l modifier is present: The const wchar_t * argument is expected to be a pointer to an array of wide characters. Wide characters from the array are converted to multibyte characters […]

Note

For the details of wint_t and wchar_t, see Extended Characters.

Although it uses a narrow character formatting string, to me, this actually simplify thing a lot. You can use printw for all printing with mixed narrow/wide character/string as arguments.

7   Operator on wchar_t

Just like char, you can use simple mathematical operator on wchar_t.

CodeOutput

for (int i = 0; i < 10; i++)
{
printw("%c %lc\n", '0' + i, L'0' + i);
}

0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9

8   add_wch?

You might ask: we now all use strings? Any replacement for addch? There is one add_wch, which also requires the macro for struct cchar_t:


#define CCHARW_MAX 5
typedef struct
{
attr_t attr;
wchar_t chars[CCHARW_MAX];
}
cchar_t;

I am guessing the CCHARW_MAX means 4 bytes plus null character, 4 bytes because the possibility of UTF-8 encoding could take up to four bytes. Therefore, it should be fine to directly pass a wide character literal as in the following example.


const cchar_t wch = {A_NORMAL, L"€"};
add_wch(&wch);

/* or */

wadd_wch(stdscr, & (cchar_t) {A_NORMAL, L"€"});

As you see it’s not very friendly to use, I personally wouldn’t bother with this function. There might be something more suitable, just I can’t find them.

Alternatively, you could compose the array, it would be {L'€', L'\0'} in this case, but I doubt a normal human being would want to do so and even think about using add_wch.

9   Conclusion

It’s not hard for displaying Unicode. You only needs two lines as mentioned in setup, and use printw as if using printf or addwstr for simple string without the formatting needs.

Of course, you could use those mv or w prefix functions for cursor movement or window specifying, respectively, such as mvaddwstr, mvwaddwstr, or mvprintw.