How Do I Copy Thee? Let Me Count the Ways
Over the years I've interviewed a lot of
programmers and one of the questions I've always
asked is for the candidate to implement the
strcpy function. This article
discusses why I asked this question, walks through
the range of "solutions" I received, and takes a
look back at what I learned about programmers from
this one simple question.
Why strcpy?
First I should start off with a bit of a disclaimer: this isn't the only question I ask as part of an interview process. On the contrary, I put candidates through a phone screen, have them take a written quiz (of which this is one of about 40 questions), and follow all that up with a take-home programming problem. No, the strcpy question is just one data point I use when evaluating programmers ... but it is probably one of the best.
To review, strcpy is the function
for copying one string to another that is part of
the standard library of the C programming language.
Sounds simple, but asking someone to code it up is
a bit of a loaded question, and that's what makes
it so great for interviews. To implement the
function, you have to know what it does and how
strings are represented in C, which gets to one of
the important design choices of C's creators: there
is no native string type in C. Instead, a string is
represented as a sequence of bytes (char values) in
memory terminated with a byte containing zero. A
pointer to the first byte in the sequence
represents the string, but its really more of a
convention then a first-class data type (although
the compiler helps you a little by automatically
appending a zero to string constants in
memory).
Then there is the design of strcpy
itself. It was designed to have the feel of an
assignment statement for strings so where you might
say
int a, b; ... a = b;
to copy the value from b into
a, you would likewise use
char *a, *b; ... strcpy(a, b);
to copy the string value from b to
a, that is, the destination is the
first parameter (left-hand side) and the source is
the second parameter (right-hand side), just like
an assignment.
Another feature of C is that assignment statements themselves have a well-defined value and this is often used in shortcuts like
int val; if ((val = getval()) != -1) { // do something with val }
where we can call a function that returns a
value, assign it to a variable and test its value
in one expression. A (little-known) feature of
strcpy is that it returns the value of
the destination pointer as a convenience to the
caller so you can copy a string, and use the value
in one expression as in
char *a, *b; ... printf("%s\n", strcpy(a, b));
Admittedly, not a feature I've used much, but experienced C programmers are usually aware its there.
Another interesting thing about implementing
strcpy is that the solution is
introduced, discussed and refined in the C "bible":
The C Programming Language by Brian
Kernighan and Dennis Ritchie, more commonly
referred to as the "K&R book." Anyone who
really wanted to learn C would have at least heard
of this book and hopefully been curious enough to
read it. If they had (and they paid attention),
then they would already know the answer. Experience
has shown me, though, that it doesn't appear to be
as popular as I would have thought - at least not
with the candidates that have crossed my path.
Phrasing the Question
The strcpy question was one of a
number of problems I put on a written quiz for
interviewing programmers. I liked having candidates
write it on paper because it avoided the problem of
working on a computer where different programmers
use different tools - everyone should be equally
familiar with paper and pencil. It also avoided the
distractions of a development environment where the
person might be messing around with build settings
and key bindings rather than focusing on the
problem at hand.
As for how to phrase the question, initially I was wary of giving away too much with the signature of the function, and part of the reason for asking this specific question was to see if users were familiar with such a ubiquitous function, so my first version was simply
Implement the C standard library function: strcpy
I figured most people would know what it was and what it did so there wasn't really a need to be more explicit. And if they didn't know the function, then that told me a lot - maybe more than whether or not they could code it. You have to understand, the interviewee had already done well enough on a phone screen to be invited in for a round of interviews. I figured wrong. Asking the question in this way led to very few correct answers. Some people had not heard of the function, many people who knew what it did got the source and destination arguments mixed up and practically no one got the return value right. So for the second version I gave up on people knowing the correct signature from memory and rephrased the question to be more explicit (and more in line with ANSI C):
Implement the following C library function:
char* strcpy(char* s1, const char* s2) { ... }
Note that I didn't give away the source and
destination arguments by name but I thought most
people would see the const modifier
hint. I also expected the return type of the
function to jog a few people's memories. It turns
out that the const hint was not strong
enough, however, as many people continued to
confuse the source and destination arguments. A few
people understood that they should be returning
something, but even with the prototype, very few
people actually got that part right.
I tweaked the question again to arrive at the third version. I simply gave up on the return type in order to simplify the problem so they could focus on the basic algorithm. I also named the arguments hoping to clear up the confusion:
Implement the following C library function:
void strcpy(char* dst, const char* src) { ... }
So now I was leading them to the water, and was
expecting many correct, or nearly correct
solutions. Did I ever say I was a bit of an
optimist? How foolish I was to think my new version
of the question was clear enough. Granted, most
people started off on the right track copying chars
from the const src pointer to the
dst pointer. (No one ever asked why
its return type was not char * so I
felt validated on that omission.) But wait, you can
do it without calling strlen - think
about it! And why are you calling
malloc? Or memcpy? Okay,
one more refinement and I think I've got it:
Implement the following C library function, without calling any other routines:
void strcpy(char* dst, const char* src) { ... }
Whew!
Evaluating the Answer
There were five key things I looked for when evaluating an answer.
- Did they know what the function was supposed to do? That is, was their algorithm an attempt at an actual copy of a string?
- Was the algorithm correct? Did they copy all the characters? Did they copy the terminating null? (Later, when we went through the quiz, I'd ask the candidate to walk through their code for the string "Hi" to test/break it.)
- Did they follow my instructions - no system calls or other routines?
- Did they get the
srcanddstarguments right? - Did they use arrays or pointers? This was more to see how comfortable they were with pointers.
I also considered a few other points for "extra credit": Did they know about the return value? Did they write defensively (check for null arguments)? As you can see, there is a lot to be learned from such a basic question. I also believe that people are creatures of habit: they tend to do things the same way in this setting as they might on the job. If they were sloppy and careless on the interview ... well caveat emptor.
The other thing I was always thinking about was the context they were in. Sure, interviewing can be stressful, and there were many other problems on the quiz so they couldn't spend all their time on this one, but this was an interview for a job! I expected people to be going a little over the top in order to give a good impression, so details were important. And while I wouldn't say that any one thing on an interview should be a sole determination if someone is hired or not, it was hard for me to get excited about hiring someone that couldn't get this right.
The Solution
Although there are many ways to solve this problem, we can look to the K&R book for the classic solution.
void strcpy(char* dst, const char* src) { while (*dst++ = *src++) ; }
Solution 1: The "classic" K&R answer
Its simple, concise and correct. We can quibble
about where to put the semicolon or if we should
use braces, but that is all just style - the
statement that does all the work is in the test of
the while. This solution uses a number
of features of the C language to get a lot done in
essentially one line of code:
- A char is copied from one pointer to another using the dereferencing ("*") operator
- Post-incrementing is used to march the pointers forward in memory after the char is copied
- It takes advantage of the fact that the result of an assignment is the value of the statement of itself
- The terminating zero of a C-string is used as a boolean test for the while loop
That's a lot packed into one simple expression! These are all features of C that its creators designed into the language for a purpose. This expression, and slight variations of it, are so common that they are idioms in C programming and any C programmer will likely come across code with statements like these time after time. They better fully understand what's going on here if they are going to master the language (or at least to read and understand other people's code).
I mentioned earlier that the "real" version of
strcpy returns a copy of the
destination pointer as a convenience. Thus, a more
correct version of the function would be
char* strcpy(char* dst, const char* src) { char* ret = dst; while (*dst++ = *src++) ; return ret; }
Solution 2: Handle the return value correctly
That is, we save a copy of the destination
pointer and return it after we've copied the
string. While that is more correct, it doesn't
demonstrate any real coding skills - its really
more of a trivia question about
strcpy. That's why I eventually
rephrased my interview question to use a
void return type (and of course
because no one was getting that detail correct,
anyway).
There are a couple of common variations on this
solution: using a different looping construct, and
using arrays rather than pointers. The following
solution shows one variant using a for
loop.
char* strcpy(char* dst, const char* src) { int i; for(i = 0; src[i] != 0; i++) { dst[i] = src[i]; } dst[i] = 0; return dst; }
Solution 3: Using array notation
Since we never modify dst - we use
i as an index to offset from the
starting address - there is no reason to make a
temporary copy for the return value. But since we
aren't copying and testing for null in the same
statement here, the terminating zero is not copied
in the loop. Thus we need to terminate the
destination string outside of the loop.
These are all correct solutions in that they copy the string including the terminating null. And although I personally prefer a version using pointers as it demonstrates a degree of understanding and confidence in using them, the array version is perfectly fine.
There are, of course, dozens of other variations that could be written. We could use a for with pointers, use a while loop with arrays, use a do loop - we could even use recursion if we wanted to be silly (and inefficient). But most often I've found good solutions will look a lot like one of the snippets above.
Answers
I've taken a not-so-random sample of some past solutions to show some of the classic mistakes people make. For dramatic effect, I'll start with the good ones and work my way down through the common mistakes people make, all the way to the really awful answers that I'm not even sure what they were trying to do. Remember, these are actual answers from actual programmers. Additionally, these were people who were often gainfully employed at the time of the interview, had been recommended by recruiters or other people, and had already been screened through a phone interview. All of these snippets are exactly as answered including syntax errors, comments, creative operators, etc.
void strcpy(char* dst, const char* src) { for (; dst != Null && src != Null;) *dst++ = *src++; }
Answer 1
So close! If they would have just replaced the test clause of the for loop with its body, they would have nailed it. But they got confused.
- They assumed "Null" is some pre-defined constant for zero
- They compared addresses and not the value pointed to by the address so this would not terminate
- Even if they got the loop test correct, they wouldn't have copied the terminating null character
But all in all not a bad answer and the use of pointers for copy-and-increment is promising. You can tell that they've either read up on C or used it in the past, but probably aren't using it day to day.
void strcpy(char* dst, const char* src) { if (src == NULL) return; if (dst == NULL) return; while (src[i++] != '\0') { dst[i] = src[i]; } dst[i] = '\0'; }
Answer 2
Hey, defensive programming! I like that, and I
see him copying the null at the end, could we have
a winner? Oh, I'm sorry, you got the last char but
forgot the first one (and you forgot to define
i).
void strcpy(char* dst, const char* src) { while(*src) *dst++ = *src++; }
Answer 3
Another close one! They seem comfortable with pointers and their use in the copy-and-increment idiom. But the while test is going to kick you out of the loop before you copy the terminating 0 (which you would have noticed if you tested it). Sorry.
void strcpy(char* dst, const char* src) { *dst = *src; }
Answer 4
Well, it does work for the special case of
strcpy(buf, ""), and I have to admit,
it is fast, but I was looking for something a
little more ... correct.
void strcpy(char* dst, const char* src) { int counter = 0; while (*(src + counter) <> '/0') { *(dst + counter) = *(src + counter); counter++; } }
Answer 5
Well the syntax for "not equals" smells a little like VB and there seems to be confusion on the syntax for an escape character. Overlooking those points, there is still the problem with the terminating null, and while the pointer expressions are valid, I find them less clear than the examples that increment the pointers.
At least in answers 1-5 you can see some
familiarity with C and the use of pointers, plus
its clear that people knew what strcpy
did, if not exactly how it did it. Now let's move
on to some more "creative" answers.
void strcpy(char* dst, const char* src) { char i = null; do while i <> '\n' { i = &src; &dst = i; dst++; src++; } }
Answer 6
There are a number of issues with this answer:
- Confusion between the dereferencing and
address operators (
*/&) - Wrong syntax for a
do/whileloop - Wrong syntax for "not equals"
- They are looking for a terminating end of line character, not a null
- It doesn't copy the terminating null (or
\nfor that matter)
void strcpy(char* dst, const char* src) { int idst, isrc; idst = 0; isrc = 0; while (src[isrc] != "\0") dst[idst++] = src[isrc++]; }
Answer 7
This time they confused the syntax of a char
constant with a string constant. This is actually a
very nasty bug (if the compiler didn't warn on type
mismatches) as "0" evaluates to an address that
stores a string constant so the while
would never terminate. Oh, and they also forgot the
null (are you beginning to see a theme?).
void strcpy(char* dst, const char* src) { int ilen = 0; char *trav, *trav2; for(trav = src; *trav++; ilen++); free dst; dst = malloc(ilen + 1); trav2 = dst; for (trav = src; *trav2 = *trav++; trav2++); }
Answer 8
Where do I start?
- The first
foris counting the length ofsrc(we'll see why in a couple of lines) - Then they try to free
dst(with a syntax error on thefreecall). That's going to surprise the caller! - Now they get (leak) a new chunk of memory that the caller will never know about
- The second
forthen actually copies the string correctly! Yeah!
What's funny is if they would have deleted the
first three lines (the for, free and
malloc), this would be correct
(although a bit overly confusing with inconsistent
initialization and incrementing statements).
Because those three lines are there, however, it
copies the string to our newly allocated chunk of
memory that no one will know about as soon as the
function exits. But I'm being pedantic - it may
never get that far since the free call
would likely cause a crash. There's also the issue
of not following instructions with the use of
malloc (at least they didn't call
strlen, though).
The use of malloc and
free is particularly troubling. This
demonstrates that the candidate misunderstands a
couple of the basic design decisions that went into
the C language: that it has no built-in memory
operators (i.e. no new or delete), and that the
developer is completely responsible for memory
management. Allocating memory inside a function
that advertises only to copy a string would violate
this rule. More importantly, strcpy requires the
user pass in the destination address, thus it
should be clear that the user is responsible for
making sure that address points to a valid
destination with enough space to hold the
string.
Note: The strdup function, on the
other hand, does violate this rule but clearly
states this in its documentation. You can also
infer this from its design. It takes only one
argument - the source string - and returns the
address of a new duplicate of the string. Since the
user cannot pass the destination address, the
function must allocate it for them - and return it.
It is essentially a memory allocator that fills the
new chunk of memory for the user, and I think would
have been more appropriately called salloc and put
in the malloc.h include file rather
than string.h. Although
strdup may be convenient in certain
cases, I still prefer not to use it. I prefer
controlling how the memory for the copy is obtained
- I might have a static char array that I reuse, or
an array on the stack. I think it is much more
clear to take responsibility for obtaining the
block of memory and calling strcpy
with a pointer to it, rather than calling
strdup and having to remember to match
up a free with each call to it - its
difficult enough matching malloc and
free calls.
void strcpy(char* dst, const char* src) { int size = strlen(src); malloc(dst, size) // syntax? for (int i = 0; i < size; i++) { dst[i] = src[i]; } return dst; }
Answer 9
I'm disappointed from the first line. They
didn't follow instructions and made a call to
strlen. And things only get worse with
the call to malloc. But they returned
dst! Nice! (Except of course its
pointing to a non-terminated string and a different
address then was passed in - assuming they would
have eventually fixed the malloc call
to: dst = malloc(size);.
void strcpy(char* dst, const char* src) { free(dst); dst = malloc(sizeof(src)); if (dst) { } else { // error; } }
Answer 10
Let's see - system calls, freeing
dst, and hopefully the string is 3
bytes (4 with the terminating zero) since they are
allocating a chunk of memory for the size of an
address, and not the length of the string it points
to. But then the solution fizzles out. Come on, you
weren't shy about using system calls. Why not round
it out with a call to memcpy to at
least finish it!
Looking Back
There are a number of things I learned about programmers from this one simple question. You can argue whether the question is a good one, whether my criteria for evaluating answers was fair, and whether this question is even relevent anymore. But one thing that you can't argue with is the value of asking people to demonstrate their knowledge by writing code, plus the comparitive value in asking the same question over and over. You learn a lot about people by asking them to solve something real that they should be familiar with and should understand.
First of all you learn how wildly one programmer's knowledge about a language can vary from another's. When someone says they "know" a language, what does that really mean? Does that mean they know the syntax? That they are comfortable with the standard library? That they understand the internal representation of data structures? That they are comfortable with pointers?
You also find out if people are curious or not. When they were learning C, did they find out about the K&R book and read it on their own? Sometimes if a candidate does well I'll have them come back for a second round of interviews and I might ask them, how would you do it now? I love it when they ran home, read up on C and learned how to do it better. Or they proactively just emailed me an improved version because they couldn't let go of it.
What about their style? Is the code legible?
Organized? Were they careful? Did they bother to
test with the most trivial example to see if it
works? Granted, strcpy is too small to
answer some of these questions but it wasn't the
only programming problem they were given. The point
is that looking at someone's code tells a lot about
them - it is what you are hiring them to produce
and you want to know how they go about creating it.
Would you hire a photographer without looking at
their portfolio?
Some people may find it disagreeable that I give someone a written quiz as it creates a stressful situation. I agree. I've always worked in a high pressure environment and want to make sure the candidates can handle that pressure (most of my career I worked on trading desks of Wall Street firms).
I've done enough of this to know that this isn't a statistical anomoly. The truth is that there are a lot of bad programmers out there. That's not to say that this simple question is the one true determinator of a good or bad programmer - its just one of many data points I've used in interviewing programmers. But I'd certainly make the argument that its positively correlated with a person's skills and knowledge and all other factors being equal, I'd much prefer hiring someone who nailed it then someone who took twenty lines to get it wrong.