Saturday 19 December 2009

Minimal surprise in Java and Python

I've blogged previously about the idea that a programming language will be easier to learn if it causes minimal surprise to the learner. Let's consider a couple of examples here.

Integer Division


In common with other popular languages like C, C++, Java, C# and Ruby, Python 2 implements 'floor division' for integers - so evaluating 1/2 will yield 0, for example. This is an unwelcome surprise for most programming novices, who naturally expect a language to implement the normal rules of arithmetic. In our experience of teaching Python 2 (and before that, Java and C++), most students make mistakes with integer division more than once before they learn to cope with this counterintuitive behaviour.

Python 3 changes int division so that it now returns a float value. The // operator can be used to obtain the old behaviour:
>>> 1 / 2
0.5
>>> 1 // 2
0
For some programmers whose expectations have been shaped by exposure to the C family of languages, this change has been the source of much anguish ever since it was first proposed in PEP 238 back in 2001 (as can be seen in comp.lang.python newsgroup discussion on the topic), but it removes what was clearly a significant hurdle for those new to programming.

Console I/O


In another post, I described the asymmetry in console I/O that exists in Python 2 as an example of redundancy, but it can also be viewed as an unexpected surprise. And there's something else, about console input specifically, that can be surprising to Python newbies.

For a long time, Python has had two built-in functions that read from the console: input and raw_input.

The first of these reads a string of characters and attempts to evaluate them, such that a sequence of digits yields an integer value, characters enclosed in quotes yield a string, etc. Unfortunately, this is less useful than it sounds. Consider the following simple Python 2 program:
# hello.py - a program that greets you

name = input('Who are you? ')
print 'Hello', name
Here are two attempts to run this program:
$ python hello.py
Enter your name: nick
Traceback (most recent call last):
  File "hello.py", line 3, in <module>
    name = input('Who are you? ')
  File "<string>", line 1, in <module>
NameError: name 'nick' is not defined

$ python hello.py
Enter your name: max
Hello <built-in function max>
In both cases, the student running the program has forgotten that string input should be enclosed in quotes, with the result that Python treats the inputs as names of objects in the global namespace. The first attempt fails because there is no object named nick, but the second attempt succeeds because Python has a built-in function named max. Both results are surprising, even baffling, to programming novices.

The less surprising option in Python 2 is to use raw_input instead of input. The raw_input function returns console input as a string object and allows the programmer to decide exactly how this string should be handled. The string can be left alone in cases where text is expected (as in hello.py) or it can be converted explicitly to the required type:
number = float(raw_input('Enter a value: '))
We have seen many cases where students have used input in their Python 2 programs, having failed to recognise that raw_input is a safer, less confusing alternative (in spite of our attempts to explain this most carefully). Python 3 prevents such confusion by providing a single function named input, equivalent to the raw_input function of earlier versions. Evaluation of the input string is still possible, but it must be done explicitly, with code like eval(input()).

1 comment:

  1. The input/raw_input difference and the Python 3 fix is a nice example. But in my ideal programming language, 1/2 would be a rational number.

    ReplyDelete