How Do I Copy Thee? Let Me Count the Ways

Over the years I've interviewed a lot of programmers and one of the questions I've always asked is for the candidate to implement the strcpy function. This article discusses why I asked this question, walks through the range of "solutions" I received, and takes a look back at what I learned about programmers from this one simple question.

Why strcpy?

First I should start off with a bit of a disclaimer: this isn't the only question I ask as part of an interview process. On the contrary, I put candidates through a phone screen, have them take a written quiz (of which this is one of about 40 questions), and follow all that up with a take-home programming problem. No, the strcpy question is just one data point I use when evaluating programmers ... but it is probably one of the best.

To review, strcpy is the function for copying one string to another that is part of the standard library of the C programming language. Sounds simple, but asking someone to code it up is a bit of a loaded question, and that's what makes it so great for interviews. To implement the function, you have to know what it does and how strings are represented in C, which gets to one of the important design choices of C's creators: there is no native string type in C. Instead, a string is represented as a sequence of bytes (char values) in memory terminated with a byte containing zero. A pointer to the first byte in the sequence represents the string, but its really more of a convention then a first-class data type (although the compiler helps you a little by automatically appending a zero to string constants in memory).

Then there is the design of strcpy itself. It was designed to have the feel of an assignment statement for strings so where you might say

int a, b;
...
a = b;

to copy the value from b into a, you would likewise use

char *a, *b;
...
strcpy(a, b);

to copy the string value from b to a, that is, the destination is the first parameter (left-hand side) and the source is the second parameter (right-hand side), just like an assignment.

Another feature of C is that assignment statements themselves have a well-defined value and this is often used in shortcuts like

int val;
if ((val = getval()) != -1) {
    // do something with val
}

where we can call a function that returns a value, assign it to a variable and test its value in one expression. A (little-known) feature of strcpy is that it returns the value of the destination pointer as a convenience to the caller so you can copy a string, and use the value in one expression as in

char *a, *b;
...
printf("%s\n", strcpy(a, b));

Admittedly, not a feature I've used much, but experienced C programmers are usually aware its there.

Another interesting thing about implementing strcpy is that the solution is introduced, discussed and refined in the C "bible": The C Programming Language by Brian Kernighan and Dennis Ritchie, more commonly referred to as the "K&R book." Anyone who really wanted to learn C would have at least heard of this book and hopefully been curious enough to read it. If they had (and they paid attention), then they would already know the answer. Experience has shown me, though, that it doesn't appear to be as popular as I would have thought - at least not with the candidates that have crossed my path.

Phrasing the Question

The strcpy question was one of a number of problems I put on a written quiz for interviewing programmers. I liked having candidates write it on paper because it avoided the problem of working on a computer where different programmers use different tools - everyone should be equally familiar with paper and pencil. It also avoided the distractions of a development environment where the person might be messing around with build settings and key bindings rather than focusing on the problem at hand.

As for how to phrase the question, initially I was wary of giving away too much with the signature of the function, and part of the reason for asking this specific question was to see if users were familiar with such a ubiquitous function, so my first version was simply

Implement the C standard library function: strcpy

I figured most people would know what it was and what it did so there wasn't really a need to be more explicit. And if they didn't know the function, then that told me a lot - maybe more than whether or not they could code it. You have to understand, the interviewee had already done well enough on a phone screen to be invited in for a round of interviews. I figured wrong. Asking the question in this way led to very few correct answers. Some people had not heard of the function, many people who knew what it did got the source and destination arguments mixed up and practically no one got the return value right. So for the second version I gave up on people knowing the correct signature from memory and rephrased the question to be more explicit (and more in line with ANSI C):

Implement the following C library function:

char* strcpy(char* s1, const char* s2) {
    ...
}

Note that I didn't give away the source and destination arguments by name but I thought most people would see the const modifier hint. I also expected the return type of the function to jog a few people's memories. It turns out that the const hint was not strong enough, however, as many people continued to confuse the source and destination arguments. A few people understood that they should be returning something, but even with the prototype, very few people actually got that part right.

I tweaked the question again to arrive at the third version. I simply gave up on the return type in order to simplify the problem so they could focus on the basic algorithm. I also named the arguments hoping to clear up the confusion:

Implement the following C library function:

void strcpy(char* dst, const char* src) {
    ...
}

So now I was leading them to the water, and was expecting many correct, or nearly correct solutions. Did I ever say I was a bit of an optimist? How foolish I was to think my new version of the question was clear enough. Granted, most people started off on the right track copying chars from the const src pointer to the dst pointer. (No one ever asked why its return type was not char * so I felt validated on that omission.) But wait, you can do it without calling strlen - think about it! And why are you calling malloc? Or memcpy? Okay, one more refinement and I think I've got it:

Implement the following C library function, without calling any other routines:

void strcpy(char* dst, const char* src) {
    ...
}

Whew!

Evaluating the Answer

There were five key things I looked for when evaluating an answer.

  1. Did they know what the function was supposed to do? That is, was their algorithm an attempt at an actual copy of a string?
  2. Was the algorithm correct? Did they copy all the characters? Did they copy the terminating null? (Later, when we went through the quiz, I'd ask the candidate to walk through their code for the string "Hi" to test/break it.)
  3. Did they follow my instructions - no system calls or other routines?
  4. Did they get the src and dst arguments right?
  5. Did they use arrays or pointers? This was more to see how comfortable they were with pointers.

I also considered a few other points for "extra credit": Did they know about the return value? Did they write defensively (check for null arguments)? As you can see, there is a lot to be learned from such a basic question. I also believe that people are creatures of habit: they tend to do things the same way in this setting as they might on the job. If they were sloppy and careless on the interview ... well caveat emptor.

The other thing I was always thinking about was the context they were in. Sure, interviewing can be stressful, and there were many other problems on the quiz so they couldn't spend all their time on this one, but this was an interview for a job! I expected people to be going a little over the top in order to give a good impression, so details were important. And while I wouldn't say that any one thing on an interview should be a sole determination if someone is hired or not, it was hard for me to get excited about hiring someone that couldn't get this right.

The Solution

Although there are many ways to solve this problem, we can look to the K&R book for the classic solution.

void strcpy(char* dst, const char* src)
{
    while (*dst++ = *src++)
       ;
}
Solution 1: The "classic" K&R answer

Its simple, concise and correct. We can quibble about where to put the semicolon or if we should use braces, but that is all just style - the statement that does all the work is in the test of the while. This solution uses a number of features of the C language to get a lot done in essentially one line of code:

  • A char is copied from one pointer to another using the dereferencing ("*") operator
  • Post-incrementing is used to march the pointers forward in memory after the char is copied
  • It takes advantage of the fact that the result of an assignment is the value of the statement of itself
  • The terminating zero of a C-string is used as a boolean test for the while loop

That's a lot packed into one simple expression! These are all features of C that its creators designed into the language for a purpose. This expression, and slight variations of it, are so common that they are idioms in C programming and any C programmer will likely come across code with statements like these time after time. They better fully understand what's going on here if they are going to master the language (or at least to read and understand other people's code).

I mentioned earlier that the "real" version of strcpy returns a copy of the destination pointer as a convenience. Thus, a more correct version of the function would be

char* strcpy(char* dst, const char* src)
{
    char* ret = dst;

    while (*dst++ = *src++)
        ;

    return ret;
}
Solution 2: Handle the return value correctly

That is, we save a copy of the destination pointer and return it after we've copied the string. While that is more correct, it doesn't demonstrate any real coding skills - its really more of a trivia question about strcpy. That's why I eventually rephrased my interview question to use a void return type (and of course because no one was getting that detail correct, anyway).

There are a couple of common variations on this solution: using a different looping construct, and using arrays rather than pointers. The following solution shows one variant using a for loop.

char* strcpy(char* dst, const char* src)
{
    int i;
    for(i = 0; src[i] != 0; i++) {
        dst[i] = src[i];
    }
    dst[i] = 0;

    return dst;
}
Solution 3: Using array notation

Since we never modify dst - we use i as an index to offset from the starting address - there is no reason to make a temporary copy for the return value. But since we aren't copying and testing for null in the same statement here, the terminating zero is not copied in the loop. Thus we need to terminate the destination string outside of the loop.

These are all correct solutions in that they copy the string including the terminating null. And although I personally prefer a version using pointers as it demonstrates a degree of understanding and confidence in using them, the array version is perfectly fine.

There are, of course, dozens of other variations that could be written. We could use a for with pointers, use a while loop with arrays, use a do loop - we could even use recursion if we wanted to be silly (and inefficient). But most often I've found good solutions will look a lot like one of the snippets above.

Answers

I've taken a not-so-random sample of some past solutions to show some of the classic mistakes people make. For dramatic effect, I'll start with the good ones and work my way down through the common mistakes people make, all the way to the really awful answers that I'm not even sure what they were trying to do. Remember, these are actual answers from actual programmers. Additionally, these were people who were often gainfully employed at the time of the interview, had been recommended by recruiters or other people, and had already been screened through a phone interview. All of these snippets are exactly as answered including syntax errors, comments, creative operators, etc.

void strcpy(char* dst, const char* src)
{
    for (; dst != Null && src != Null;)
        *dst++ = *src++;
}
Answer 1

So close! If they would have just replaced the test clause of the for loop with its body, they would have nailed it. But they got confused.

  • They assumed "Null" is some pre-defined constant for zero
  • They compared addresses and not the value pointed to by the address so this would not terminate
  • Even if they got the loop test correct, they wouldn't have copied the terminating null character

But all in all not a bad answer and the use of pointers for copy-and-increment is promising. You can tell that they've either read up on C or used it in the past, but probably aren't using it day to day.

void strcpy(char* dst, const char* src)
{
    if (src == NULL) return;
    if (dst == NULL) return;

    while (src[i++] != '\0') {
        dst[i] = src[i];
    }
    dst[i] = '\0';
}
Answer 2

Hey, defensive programming! I like that, and I see him copying the null at the end, could we have a winner? Oh, I'm sorry, you got the last char but forgot the first one (and you forgot to define i).

void strcpy(char* dst, const char* src)
{
    while(*src)
        *dst++ = *src++;
}
Answer 3

Another close one! They seem comfortable with pointers and their use in the copy-and-increment idiom. But the while test is going to kick you out of the loop before you copy the terminating 0 (which you would have noticed if you tested it). Sorry.

void strcpy(char* dst, const char* src)
{
    *dst = *src;
}
Answer 4

Well, it does work for the special case of strcpy(buf, ""), and I have to admit, it is fast, but I was looking for something a little more ... correct.

void strcpy(char* dst, const char* src)
{
    int counter = 0;
    while (*(src + counter) <> '/0') {
        *(dst + counter) = *(src + counter);
        counter++;
    }
}
Answer 5

Well the syntax for "not equals" smells a little like VB and there seems to be confusion on the syntax for an escape character. Overlooking those points, there is still the problem with the terminating null, and while the pointer expressions are valid, I find them less clear than the examples that increment the pointers.

At least in answers 1-5 you can see some familiarity with C and the use of pointers, plus its clear that people knew what strcpy did, if not exactly how it did it. Now let's move on to some more "creative" answers.

void strcpy(char* dst, const char* src)
{
    char i = null;
    do while i <> '\n'
    {
        i = &src;
        &dst = i;

        dst++;
        src++;
    }
}
Answer 6

There are a number of issues with this answer:

  • Confusion between the dereferencing and address operators (*/&)
  • Wrong syntax for a do/while loop
  • Wrong syntax for "not equals"
  • They are looking for a terminating end of line character, not a null
  • It doesn't copy the terminating null (or \n for that matter)
void strcpy(char* dst, const char* src)
{
    int idst, isrc;
    idst = 0; isrc = 0;
    while (src[isrc] != "\0")
        dst[idst++] = src[isrc++];
}
Answer 7

This time they confused the syntax of a char constant with a string constant. This is actually a very nasty bug (if the compiler didn't warn on type mismatches) as "0" evaluates to an address that stores a string constant so the while would never terminate. Oh, and they also forgot the null (are you beginning to see a theme?).

void strcpy(char* dst, const char* src)
{
    int ilen = 0;
    char *trav, *trav2;

    for(trav = src; *trav++; ilen++);
    free dst;
    dst = malloc(ilen + 1);
    trav2 = dst;
    for (trav = src; *trav2 = *trav++; trav2++);
}
Answer 8

Where do I start?

  • The first for is counting the length of src (we'll see why in a couple of lines)
  • Then they try to free dst (with a syntax error on the free call). That's going to surprise the caller!
  • Now they get (leak) a new chunk of memory that the caller will never know about
  • The second for then actually copies the string correctly! Yeah!

What's funny is if they would have deleted the first three lines (the for, free and malloc), this would be correct (although a bit overly confusing with inconsistent initialization and incrementing statements). Because those three lines are there, however, it copies the string to our newly allocated chunk of memory that no one will know about as soon as the function exits. But I'm being pedantic - it may never get that far since the free call would likely cause a crash. There's also the issue of not following instructions with the use of malloc (at least they didn't call strlen, though).

The use of malloc and free is particularly troubling. This demonstrates that the candidate misunderstands a couple of the basic design decisions that went into the C language: that it has no built-in memory operators (i.e. no new or delete), and that the developer is completely responsible for memory management. Allocating memory inside a function that advertises only to copy a string would violate this rule. More importantly, strcpy requires the user pass in the destination address, thus it should be clear that the user is responsible for making sure that address points to a valid destination with enough space to hold the string.

Note: The strdup function, on the other hand, does violate this rule but clearly states this in its documentation. You can also infer this from its design. It takes only one argument - the source string - and returns the address of a new duplicate of the string. Since the user cannot pass the destination address, the function must allocate it for them - and return it. It is essentially a memory allocator that fills the new chunk of memory for the user, and I think would have been more appropriately called salloc and put in the malloc.h include file rather than string.h. Although strdup may be convenient in certain cases, I still prefer not to use it. I prefer controlling how the memory for the copy is obtained - I might have a static char array that I reuse, or an array on the stack. I think it is much more clear to take responsibility for obtaining the block of memory and calling strcpy with a pointer to it, rather than calling strdup and having to remember to match up a free with each call to it - its difficult enough matching malloc and free calls.

void strcpy(char* dst, const char* src)
{
    int size = strlen(src);
    malloc(dst, size)  // syntax?

    for (int i = 0; i < size; i++) {
        dst[i] = src[i];
    }
    return dst;
}
Answer 9

I'm disappointed from the first line. They didn't follow instructions and made a call to strlen. And things only get worse with the call to malloc. But they returned dst! Nice! (Except of course its pointing to a non-terminated string and a different address then was passed in - assuming they would have eventually fixed the malloc call to: dst = malloc(size);.

void strcpy(char* dst, const char* src)
{
    free(dst);
    dst = malloc(sizeof(src));
    if (dst) {

    } else { // error; }
}
Answer 10

Let's see - system calls, freeing dst, and hopefully the string is 3 bytes (4 with the terminating zero) since they are allocating a chunk of memory for the size of an address, and not the length of the string it points to. But then the solution fizzles out. Come on, you weren't shy about using system calls. Why not round it out with a call to memcpy to at least finish it!

Looking Back

There are a number of things I learned about programmers from this one simple question. You can argue whether the question is a good one, whether my criteria for evaluating answers was fair, and whether this question is even relevent anymore. But one thing that you can't argue with is the value of asking people to demonstrate their knowledge by writing code, plus the comparitive value in asking the same question over and over. You learn a lot about people by asking them to solve something real that they should be familiar with and should understand.

First of all you learn how wildly one programmer's knowledge about a language can vary from another's. When someone says they "know" a language, what does that really mean? Does that mean they know the syntax? That they are comfortable with the standard library? That they understand the internal representation of data structures? That they are comfortable with pointers?

You also find out if people are curious or not. When they were learning C, did they find out about the K&R book and read it on their own? Sometimes if a candidate does well I'll have them come back for a second round of interviews and I might ask them, how would you do it now? I love it when they ran home, read up on C and learned how to do it better. Or they proactively just emailed me an improved version because they couldn't let go of it.

What about their style? Is the code legible? Organized? Were they careful? Did they bother to test with the most trivial example to see if it works? Granted, strcpy is too small to answer some of these questions but it wasn't the only programming problem they were given. The point is that looking at someone's code tells a lot about them - it is what you are hiring them to produce and you want to know how they go about creating it. Would you hire a photographer without looking at their portfolio?

Some people may find it disagreeable that I give someone a written quiz as it creates a stressful situation. I agree. I've always worked in a high pressure environment and want to make sure the candidates can handle that pressure (most of my career I worked on trading desks of Wall Street firms).

I've done enough of this to know that this isn't a statistical anomoly. The truth is that there are a lot of bad programmers out there. That's not to say that this simple question is the one true determinator of a good or bad programmer - its just one of many data points I've used in interviewing programmers. But I'd certainly make the argument that its positively correlated with a person's skills and knowledge and all other factors being equal, I'd much prefer hiring someone who nailed it then someone who took twenty lines to get it wrong.

|