[UCR]  
[/\]
Universidad de Costa Rica
Escuela de Ciencias de la
Computación e Informática
[<=] [home] [<>] [\/] [=>]
Google Translate

ztrcpy() and ztrins(): A few important extensions for <string.h>

Adolfo Di Mare




Abstract [<>] [\/] [/\]

Some string manipulation functions provided in the C language standard library are not safe. In this article functions ztrcpy(), ztrins(), and others, are presented as an alternative to avoid some of the inherent problems in the corresponding standard library functions. This implementation should work in most C language environments as well as in C++.

Motivation [<>] [\/] [/\]

In his paper “Managed String Library for C”, Robert Seacord [1] describes several common string manipulation errors [2] and many approaches that people have used to work around the shortcomings of functions like strcpy(), because they allow unbounded string copies. The companion standard function strncpy() takes an extra “size” argument, but this function will not place the end of string mark in every case. Hence, it makes sense to use a version of strcpy() that will always leave the result string zero terminated. I call this function ztrcpy() where the leading “Z” is a remainder that this function works in the same manner as strcpy() but its “size” parameter prevents unbounded memory overruns. Function ztrcpy() has the following signature:

char * ztrcpy(
    size_t size, /* sizeof(dest) */
    char * dest,
    const char * src );

Function strlcpy() (with and “L”) is an alternative to strcpy() because it always zero terminates strings. However, it returns as a size_t number the length of the source string, a value used to determine whether string truncation occurred when copying. This is confusing if one wants to “avoid unbounded copies” without having to deal, after the facts, with “the size the string should have”. Also, I do not like that the “size” parameter for strlcpy() is at the end because it seems more natural to me to put it right where the destination string parameter is. Most people will not read the fine letter in a prescription; in my opinion, strlcpy() has some quirks that are tough to grasp.

There is another replacement for strcpy(), called strcpy_s(), but it returns an error number code using the opaque errno_t type. Again: why force the programmer to check codes after each invocation? In my experience, what is important is to avoid memory overwrites (truncated strings due to lack of memory are easier to spot because they show up with less characters).

A while ago I used a C library that implemented function strins() to insert a string inside another. I used it in a few programs, but there were times when it would produce unbounded string copies (my fault: I used small strings!). At first, it was hard to spot the error because the programs behavior would be very strange (sometimes it would cycle back to the beginning of the routine, probably because of run time stack corruption). I have learned to trust the compiler more than myself, but those memory overruns were not fun to deal with. I wanted to “fix” strins(), but never took the time to do it, until I came across Seacord's article in DrDobbs Journal. This is why I implemented ztrins(), as a memory safe version of strins(). Function ztrins() has the following signature:

char * ztrins(
    size_t       size,      /* sizeof(dest) */
    char *       dest,
    size_t       n,         /* insertion point */
    const char * insert);   /* insertion string */


Funcionality [<>] [\/] [/\]

I use C++ as my development tool. Since the functions I proposed belong to the C language, I decided to write the whole implementation in C. At first I planned on implementing many functions, but after a little thinking I decided to write as few as possible. At last, I implemented the “size” version for strcpy() and strcat(), [ ztrcpy(), ztrcat() ], three “size checked” functions to manipulate strings [ ztrins(), strdel(), ztrsub() ], another three functions to remove leading and trailing characters [ strltrim(), strrtrim(), strtrim() ], one function to remove characters from a memory block [ memczap() ], a couple of functions to figure out the prefix and suffix in a string [ strpfx(), strsffx() ], and a couple of functions to remove the accent in the Latin 1 accented letters [ strxltn1(), strxacct() ].

The following example illustrates the usage of these functions (eqstr(a,b) compares two C strings):

{{  /* test::ztrins() */
    char s30[30];              /* 123456789.123456789.1 -> 21 chars */
    ztrcpy(     sizeof(s30),s30, "====!-----+.........+" );
    { { ztrins( sizeof(s30),s30,  4, "_2_4_"); } } /* [4] <-> s30+(4) */
                            /*   /!\   */
    assertTrue(  eqstr(s30,  "====_2_4_!-----+.........+") );
    assertTrue( 26 == strlen("====_2_4_!-----+.........+") );
    { { { assertTrue(  21+strlen("_2_4_") == strlen(s30) ); } } }

    {   /* replace JIM with ROMEO */
        char *p; char poem[] = "JIM, JIM, JIM ... Where are you?";
        while ( 0!=(p=strstr(poem,"JIM")) ) {
            strdel( p, strlen("JIM") );
            ztrins( sizeof(poem),poem, p-poem, "ROMEO" );
        }
        assertTrue( eqstr(poem,"ROMEO, ROMEO, ROMEO ... Where ar") );

        assertTrue( strlen("JIM")<strlen("ROMEO") ); /* -> truncation */
        assertTrue( strlen("ROMEO, ROMEO, ROMEO ... Where ar") ==
                    strlen("JIM, JIM, JIM ... Where are you?") );
    }

    ztrcpy(     sizeof(s30),s30, "====!-----+.........+" ); /* -> 21 chars */
    { { ztrins( sizeof(s30),s30, 00, "________18________"); } } /* [0] */
    assertTrue(  eqstr(s30,  "________18________====!-----+") );
    assertTrue( strlen(s30) == sizeof(s30)-1 ); /* max size */

    ztrcpy(     sizeof(s30),s30, "0123456789" );
    { { ztrins( /*size->*/1,s30, 0, "" ); } }
    assertTrue( eqstr(s30, "") ); /* (size==1) ==> (s30[0]==0) */
    assertTrue( eqstr(s30+1 ,     "123456789" ) );
}}

String 's30' can hold up to 29 characters. First, it is initialized using the memory safe ztrcpy() function. Then, ztrins() inserts "_2_4_" at position [4] and, as the result fits within the size of 's30', there is no truncation. Later, the word "JIM" in string 'poem' is replaced with "ROMEO", but as the size of 'poem' is determined at compile time, when the longer word "ROMEO" is put in place, the last letters in 'poem' need to be truncated. The difference in length from "JIM" to "ROMEO" is 2 letters, and as three instances of "JIM" get substituted, the last 2×3 letters are left out (trailing substring "e you?" has 6 characters).

Maybe a novice programmer would have a little difficult figuring out why ROMEO's poem got truncated, but at least no memory override would happen. If the non size checking versions of these functions were used, it would be hard to find the bug if the string was not big enough. It is true that the implementation for ztrcpy() and ztrins() require more time to check parameters and boundaries, but in most applications the speed differences can only be measured in millions of seconds, which is negligible. Also, there is no standard strins() in the string.h header file. This is a summary of the implemented string routines:

/* ztring.h (C) 2014 adolfo@di-mare.com */

/* 'size' checked versions of 'strcpy()' && 'strcat()' */
char* ztrcpy( size_t size, char * dest, const char * src );
char* ztrcat( size_t size, char * dest, const char * src );

/* insert, delete and substring (with 'size' check) */
char* ztrins( size_t size, char * dest, size_t n, const char *insert );
char* strdel( char * dest, size_t len );
char* ztrsub( size_t size, char * dest, const char *src, size_t len );

/* trim 'str' left, right and both */
char* strltrim( const char * str , char tr );
char* strrtrim(       char * str , char tr );
char* strtrim(        char * str , char tr );

/* remove character 'ch' from memory block 'mem' */
size_t memczap( size_t size,void *mem, int ch );

/* string prefix and suffix (boolean) */
int strpfx(  const char *str, const char *prefix );
int strsffx( const char *str, const char *suffix );

/* Get span until character in character range '[a..z]' */
size_t strrspn( const char * str, char a, char z );

/* transform to ASCII accented letters in Latin 1 alphabet */
char  strxltn1( char accented_latin_1 );
char* strxacct( char* str );

As the size parameter in these functions precedes the destination string, it is easy to tie them together. For example, the following macro ZS() can be used to homologate strcpy() code with ztrcpy():

#ifdef USE_ZTR
    #define ZS(x) sizeof(x),x /** Shortcut macro */
#else
    #define ZS(x) x
    /* convert ztrcpy(ZS(dest),src) -> strcpy(dest,src) */
    #define    ztrcpy(   dest, src)    strcpy(dest,src)
    /* convert ztrcat(ZS(dest),src) -> strcat(dest,src) */
    #define    ztrcat(   dest, src)    strcat(dest,src)
#endif

Many have argued that using macros is a bad idea [3], but in this case a simple macro like ZS() helps the transition from the unsafe into the safer version of each function.


Implementation [<>] [\/] [/\]

It is easy to find in the net some implementations of functions that are similar to ztrcpy() and ztrcat():

http://google.com/search?as_qdr=all&num=100&as_q=strlcpy+code
http://google.com/search?as_qdr=all&num=100&as_q=strlcat+code

However, implementations to insert a string into another are not as plentiful:

http://google.com/search?as_qdr=all&num=100&as_q=c+string+insert+code
http://search.yahoo.com/search?n=100&p=c+string+insert+code
http://www.bing.com/search?q=c+string+insert+code

The implementation for ztrins() requires a little bit of care because it is easy to fall into an unbounded string copy. Moreover, special care is needed to handle many limit cases. This is a “size checked” C language implementation to insert one string into another:

char* ztrins( size_t size, char * dest, size_t n, const char * insert ) {
    if ( dest==NULL || size==0 ) { return dest; }
    else if ( size==1 ) { *dest=0; return dest; }
    else { /* ( size>=2 ) */
        size_t inslen, destlen = strlen( dest );
        --size; /* max length for 'dest' */
        if ( destlen>size ) { destlen = size; }
        if ( n>size || n>destlen || insert[0]==0 ) {
            dest[size] = 0; return dest;
        }
        inslen = strlen( insert );
        if ( size <= n+inslen ) { /* the whole 'insert' does not fit */
            memmove( &dest[n] , insert, (size-n) );
        }
        else { /* first move tail to the right */
            if ( size <= destlen+inslen ) { /* only a piece fits */
                memmove( &dest[n+inslen], &dest[n], (size-(n+inslen)) );
            }
            else { /* insert the whole thing */
                memmove( &dest[n+inslen], &dest[n], (destlen-n) );
                size =  destlen+inslen;
            }
            memmove( &dest[n] , insert , inslen ); /* insert */
        }
        dest[size] = 0;
    }
    return dest;
}


Mandatory string size compliance [<>] [\/] [/\]

The implementation of all the functions in ztring.c will force the string length to fit within its size. This means that very long strings will have its length adjusted according to the 'size' parameter received by the routine. The following code illustrates this:

char dest[15];
{
    ztrcpy( sizeof(dest),dest, "012345" );
    assertTrue(   eqstr( dest, "012345" ) );
    ztrins( /*size->*/ 1,dest,  0, "abc" );
    assertTrue( eqstr( dest, "" ) && eqstr( 1+dest, "12345" ) );
}
{
    ztrcpy( sizeof(dest),dest, "012345" );
    assertTrue(   eqstr( dest, "012345" ) );
    ztrins( /*size->*/ 3,dest,  0,"abc" );
    assertTrue( eqstr( dest, "ab" ) && eqstr( 3+dest,"345" ) );
}

In the first block of code, the 'size' parameter used to invoke ztrins() is '1' (one), which leaves no space to store any letters within the string. At run time, there is no way to figure out that the size for 'dest' is bigger than one, but nonetheless ztrins() zero terminates 'dest' to force it to be a string that can fit in a character array of 'size' characters.

In the second block the value stored in 'dest' is a string that has more than '3' characters. When function ztrins() is invoked with a 'size' parameter with value '3', ztrins() puts the end of string marker that makes the value stored within 'dest' less than three characters long. In this case, after invoking ztrins() the value stored in 'dest' will be bounded to the 'size' parameter used in the invocation. This behavior can help to fix errors in some implementations.


Specification details [<>] [\/] [/\]

Both the Apache Portable Runtime (APR) [4] and the GNOME Library [5] provide functions similar to ztrcpy() (but there are none like ztrins()). The differences are very subtle and deserve discussion. Both functions apr_cpystrn() and g_strlcpy() receive the 'size' parameter as the last one, whereas ztrcpy() has it as its first parameter. Function g_strlcpy() is a portability wrapper used to call strlcpy() and it will always zero terminate the destination string. Function apr_cpystrn() returns a pointer to the end of string 'NUL' character as a means to check whether the copied string was truncated because it did not fit in the destination. To accomplish the same result when using ztrcpy() code similar to the following should be used:

{
    /* detect truncation using 2 invocations to strlen() */
    ztrcpy( sizeof(dest),dest, src );
    if ( strlen(dest) < strlen(src) {
        take_action( "truncation ocurred" );
    }

    /*  apr_cpystrn() is faster because it requires only one invocation */
    /*  to strlen(), which always examines all characters in the string */
    if ( apr_cpystrn(dest,src,sizeof(dest)) - dest < strlen(src) ) {
        take_action( "truncation ocurred" );
    }

    /* g_strlcpy() returns the length of the source string */
    if ( g_strlcpy(dest,src,sizeof(dest) ) >= sizeof(dest) ) {
        take_action( "truncation ocurred" );
    }
}

Function g_strlcpy() does not return a pointer, but a number that can be used to detect truncation. It is hard to debate what is best, but I decided to make ztrcpy() as similar as posible to strcpy() to help programmers substitute the later with the former. The approach taken with g_strlcpy() seems less convoluted than that of apr_cpystrn(): further discussion can be found in [6].


Other useful functions [<>] [\/] [/\]

There are other functions that might be useful at times. The three trimming functions, [ strltrim(), strrtrim(), strtrim() ], can help in removing leading or trailing characters. For simplicity, they take a single letter as parameter because usually the trimming is done over blanks. Instead of shifting the string value to the left, what strltrim() (trim left) does is return a pointer after all the trimmed characters; this pointer can be used to move around the rest of the string. If the desired behavior is to move the suffix of the string left, a simple memmove() invocation can be used:

    memmove( str, strltrim(str,' '), 1+strlen(str) );

There are also two functions to determine if a string is the prefix or suffix of another. A little more interesting is function memczap() that scans a block of memory and removes a character, moving left the other characters. For example, if a string contains "(*:**-*)", after removing the asterisk '*', the value stored in the string will be "(:-)". This is cute.


Testing [<>] [\/] [/\]

There are programmers who do not have problems using pointer arithmetic to manipulate strings (I am not one of them). I wrote test_ztring.c, a simple unit test program for these functions, but after getting all of them to do what I expected, I still was not sure if my implementation was free from memory overrun errors. I looked around the net for tools to help on making sure that my code did not have any unbound string copies, but at last I decided to write my own bound checker as C++ template class zchz<>. I twinkled my code to use only simple string declarations that can be transformed easily to use my template class. After that, I used Tormod Tjaberg's program GSAR to transform each declaration [7], following a pattern like this:

C   → char dest[15]    char s15[15]    sizeof(dest)    sizeof(s15)
C++ → zchz<15> dest    zchz<15> s15    dest.strsz()    s15.strsz()

I named my class zchz<> to preserve the same spacing from program test_ztring.c into test_ztring.cpp (this is the C++ bound checking version of the program). As it is invalid to overload sizeof() in C++ [8], I included method zchz<>::strsz() to get the value that sizeof() would have returned if it was overloaded (again, I named this method "strsz()" "STRing SiZe" to preserve spacing within the test program source code).

Any zchz<> variable contains three memory blocks that can hold no more than '200' characters (this value is hard coded, but it can be changed if bigger strings are required for testing). The middle one is used to store a string value, and the other two are used to hold a bit pattern. Whenever an unbounded memory copy occurs, either the left or right block would be corrupted: this event can be discovered the next time that any method of zchz<> is used. When running the test program step-by-step with the symbolic debugger it is easy to pinpoint where each failure occurs. It would be more useful if the exact location of the failure would be reported by zchz<>, but usually a test program displays stuff only on failure and, after all test failures get fixed, the test program no longer displays failure messages produced by methods from class zchz<>. If no failure messages get displayed it means that all test cases where successful.

Unit tests are designed to exercise every feature in a program, but they can never be exhaustive. Hence, when a test program finds no failure it does not mean that the program is correct because the program can still have bugs that were not uncovered by the test data. As we cannot work forever producing test cases, most of us stop testing when a reasonable amount of test cases show no failure.


Conclusions [<>] [\/] [/\]

It is not very difficult to improve on some of the functions that cause many problems to C programmers. Moreover, the functions presented are very similar to their less safe versions, which might convince some hard core programmers to take a look at them (it could also happen that they get enough recognition to be a part of the standard language). If programmers decide to use other functions that are already included in other libraries, maybe they can take a look to function ztrins() which is seldom implemented elsewhere. The source code is available here:

http://www.di-mare.com/adolfo/p/ztring/ztring.zip


Aknowledgments [<>] [\/] [/\]

      Alejandro Di Mare and David Chaves made valuable suggestions that helped improve earlier versions of this work. The spelling and grammar were checked using the http://spellcheckplus.com/ tool. The Graduate Program in Computación e Informática, the Escuela de Ciencias de la Computación e Informática and the Universidad de Costa Rica provided funding for this research.




Source code [<>] [\/] [/\]

ztring.zip source code:
http://www.di-mare.com/adolfo/p/ztring/ztring.zip
ztring.c: A few important extensions for <string.h>
http://www.di-mare.com/adolfo/p/ztring/ztring_8c.html
uUnit.h: assertTrue() && assertFalse()
http://www.di-mare.com/adolfo/p/ztring/uUnit_8h.html

Doxygen:
ftp://ftp.stack.nl/pub/users/dimitri/doxygen-1.8.6-setup.exe

References [<>] [\/] [/\]

[1] Seacord, Robert: Managed String Library for C, Dr.Dobbs: The World of Software Development, October 01, 2005.
      http://www.drdobbs.com/cpp/184402023
      http://www.drdobbs.com/article/print?articleId=184402023
[2] Seacord, Robert: Secure Coding in C and C++: Strings, published bythe Addison-Wesley Professional, SEI Series in Software Engineering, September 9, 2005 Chapter available in:
      http://www.informit.com/articles/article.aspx?p=430402&seqNum=2
[3] Stroustrup, Bjarne: So, what's wrong with using macros?, in Bjarne Stroustrup's C++ Style and Technique FAQ, 2012.
      http://www.stroustrup.com/bs_faq2.html#macro
[4] The Apache Software Foundation: Apache Portable Runtime, 2014.
      http://apr.apache.org/
[5] GNOME Developer: GNOME Library, 2014.
      http://developer.gnome.org/
      http://ftp.gnome.org/pub/gnome/sources/glib/
[6] Miller, Todd C. & de Raadt, Theo: strlcpy and strlcat - consistent, safe, string copy and concatenation, in 1999 USENIX Annual Technical Conference. Monterey, California, USA, June 6–11, 1999.
      http://static.usenix.org/event/usenix99/full_papers/millert/millert.pdf
      http://www.courtesan.com/todd/papers/strlcpy.html
[7] Tjaberg, Tormod: gsar121.zip: General Search And Replace on files, 2008.
      http://home.online.no/~tjaberg/
      http://home.online.no/~tjaberg/gsar121.zip
      http://gnuwin32.sourceforge.net/packages/gsar.htm
[8] Stroustrup, Bjarne: Why can't I overload dot, ::, sizeof, etc.?, in Bjarne Stroustrup's C++ Style and Technique FAQ, 2012.
      http://www.stroustrup.com/bs_faq2.html#overload-dot

Indice [<>] [\/] [/\]

[-] Abstract
[1] Motivation
[2] Funcionality
[3] Implementation
[4] Mandatory string size compliance
[5] Specification details
[6] Other useful functions
[8] Testing
[8] Conclusions
[9] Aknowledgments
[10] Source code

Bibliografía
Indice
Acerca del autor
Acerca de este documento
[/\] Principio [<>] Indice [\/] Final

Acerca del autor [<>] [\/] [/\]

Adolfo Di Mare: Investigador costarricense en la Escuela de Ciencias de la Computación e Informática [ECCI] de la Universidad de Costa Rica [UCR], en donde ostenta el rango de Profesor Catedrático. Trabaja en las tecnologías de Programación e Internet. También es Catedrático de la Universidad Autónoma de Centro América [UACA]. Obtuvo la Licenciatura en la Universidad de Costa Rica, la Maestría en Ciencias en la Universidad de California, Los Angeles [UCLA], y el Doctorado (Ph.D.) en la Universidad Autónoma de Centro América.
Adolfo Di Mare: Costarrican Researcher at the Escuela de Ciencias de la Computación e Informática [ECCI], Universidad de Costa Rica [UCR], where he is full professor and works on Internet and programming technologies. He is Cathedraticum at the Universidad Autónoma de Centro América [UACA]. Obtained the Licenciatura at UCR, and the Master of Science in Computer Science from the University of California, Los Angeles [UCLA], and the Ph.D. at the Universidad Autónoma de Centro América.
[mailto]Adolfo Di Mare <adolfo@di-mare.com>

Acerca de este documento [<>] [\/] [/\]

Referencia: Di Mare, Adolfo: ztrcpy() and ztrins(): A few important extensions for <string.h> : Technical Report 2014-01-ADH, Escuela de Ciencias de la Computación e Informática, Universidad de Costa Rica, 2014.
Internet: http://www.di-mare.com/adolfo/p/ztring.htm       Google Translate
http://www.di-mare.com/adolfo/p/ztring.pdf       Google Translate
http://www.di-mare.com/adolfo/p/ztring/ztring.zip
See Also: http://www.drdobbs.com/cpp/232700238
http://www.drdobbs.com/article/print?articleId=232700238
Autor: Adolfo Di Mare <adolfo@di-mare.com>
Contacto: Apdo 4249-1000, San José Costa Rica
Tel: (506) 2511-8000       Fax: (506) 2438-0139
Revisión: ECCI-UCR, March 2014
Visitantes:

Copyright © 2014 Adolfo Di Mare
Derechos de autor reservados © 2014 Adolfo Di Mare <adolfo@di-mare.com>
[home] [<>] [/\]