Thursday 17 December 2009

Minimal redundancy in Java and Python (2)

Following on from Part 1 of this discussion, let's consider some more examples of minimal redundancy (or the lack thereof) in the Java and Python programming languages.

Integer Representation


The C & C++ languages support eight different primitive integer representations: ten if you include char and unsigned char. Java has only four primitive integer types (lacking the unsigned types of C & C++) but also has wrapper classes for each of these, plus a BigInteger class for representing arbitrary-precision integers - giving a grand total of nine representations. There is too much choice here for the novice programmer, for whom considerations of optimisation and efficient memory usage are unimportant compared with the business of learning the fundamentals of programming.

Python has historically provided only two representations: an int type equivalent to the long of Java, C and C++; and a long type similar to Java's BigInteger. Whilst two is certainly a big improvement on nine, there is still unnecessary redundancy here. The distinction between arbitrary-precision and fixed-length integer representations and the notion that CPU support makes the latter more efficient are worthy of discussion, certainly, but I would argue that such issues fall outside the scope of an introductory programming course.

Python 3 removes the redundancy completely by having a single integer type, complementing its single floating-point type.

Console I/O


It is common in introductory programming courses to use the console for input and output. Simple output has always been straightforward in Java, courtesy of System.out.println, but the language has yet to acquire a correspondingly simple System.in.readln method. Java SE 6 was the first release to improve console I/O significantly, via a Console class proving printf and readLine methods.

Python has also suffered from a lack of symmetry in its support for console I/O, albeit of a more subtle kind. Consider, for example, a simple program to greet its user by name:
name = raw_input('Who are you? ')
print 'Hello', name
Can you spot the conceptual redundancy in this code? Console input is handled with a function call, but console output involves a print statement. Two distinctly different concepts are being used here, when one would do. Fortunately, Python 3 corrects the design oversight by making print a function.

I have to admit, I wasn't particularly convinced that this was a problem until a student asked me for help one day during a lab class. The exercise involved writing a program to convert a temperature from the Celsius scale to the Fahrenheit scale, and his solution, written for Python 2.5, looked something like this:
ctemp = float(raw_input('Enter temperature in Celsius: '))
ftemp = 9.0*ctemp / 5 + 32
print(ctemp, 'deg. C', is, ftemp, 'deg. F')
His question was quite simple: "Can you tell me how to get rid of the brackets and commas in the output?" Having written a function call to handle data input, he must have temporarily forgotten the syntax for printing and assumed, reasonably enough, that it would be handled symmetrically by Python, i.e. as a function call; instead, he ended up constructing a tuple by accident and passing it to the print statement! He might have realised his error, but for the fact that we hadn't yet discussed tuples at that point in the course... Thankfully, such confusion cannot arise in Python 3.

You may have noticed the occurrence of raw_input in the two preceding code examples and perhaps wondered why the more obvious-sounding built-in function input isn't being used instead. It turns out that the latter can cause nasty surprises for novices, as I'll discuss in another post. For now, I'll simply point out that this is another example of redundancy in Python. The input function is entirely unnecessary, given that its behaviour can be duplicated by combining another built-in function, eval, with raw_input. Python 3 fixes things by having a single function called input that behaves just like the old raw_input function.

Object-Oriented Programming


There is no redundancy to speak of in Java's object model. True, it gives us concrete classes, abstract classes and interfaces, but these things are distinctly different from each other. (The fact that Java forbids multiple inheritance means that a purely abstract class is not equivalent to an interface, for example.)

Things have not been so clear-cut in Python, due to the way in which Python's object model has evolved. Before Python 2.2 arrived in December 2001, there were significant differences between Python's built-in types and user-defined classes, such that it was impossible to create a user-defined subclass of a built-in type. Version 2.2 went a long way towards healing the class/type split by introducing new-style classes alongside the existing classic classes, and by turning most of the built-in types into these new-style classes. Since version 2.2, it has therefore been possible to begin class definitions in two ways:
# Old-style
class Foo:
    ...

# New-style
class Foo(object):
    ...
Students often fail to recognise that these examples are two different kinds of class. This could be a particular problem for those coming to Python from Java, where class Foo and class Foo extends Object are entirely equivalent ways of beginning a class definition.

Unfortunately, many of the books and online tutorials on Python published since the release of version 2.2 have done little to clarify the distinction between old- and new-style classes or provide adequate guidance to novice programmers on which kind of class should be used. Some (e.g., Norton et al, Lutz) have ignored new-style classes altogether whilst others (e.g., Mount et al) have acknowledged their existence but used old-style classes almost exclusively in example code. In some cases (e.g., Hetland) there is a balanced discussion of the two class types and a few titles (e.g., Chun, Telles) have encouraged a more modern approach by concentrating almost exclusively on new-style classes. This lack of consistency can be very confusing for students.

Python 3 solves the problem by removing old-style classes entirely, leaving us with a single object model based on new-style classes. In Python 3, class definitions start off much as they did before, but with the essential difference that object is implicitly a superclass (as is the case in Java and C#). Thus, there is no longer any difference between the two styles of class definition shown above.

2 comments:

  1. Lua eliminates redundancy of number types altogether. It has just single "number" type that handles the issue (http://rom.elrac.de/reference/lua/pil/2.3.html). :)

    There are ways to eliminate a clear distinction between classes and functions even. You just have to define an "entity" that's generic enough and provide some little extras. I designed a little language based on this observation. I still have the rough specs somewhere.

    By the way Lua doesn't provide OOP features natively. You have to implement them using metaprogramming. LISP goes even further of course!

    ReplyDelete
  2. I keep meaning to take a look at Lua...

    ReplyDelete