Thursday 17 December 2009

Minimal redundancy in Java and Python (1)

I've blogged previously about the idea that minimal redundancy is important in making a programming language easy to learn; in essence, a language should avoid providing multiple, redundant ways of achieving the same goal. In this post, I'll present an example of how Java and Python compare when evaluated according to this principle.

Let's consider the concept of iteration through a sequence of things. In Java, the syntax required to implement this concept varies considerably, depending on the type of sequence being handled and the version of Java that we happen to be using. For example, if we wish to iterate through the characters of a string, printing each one on the console, we would accomplish this with code such as
for (int i = 0; i < message.length(); ++i) {
  System.out.println(message.charAt(i));
}
The equivalent approach to indexing characters in an array is a little different, with length being an attribute of the array rather than a method call and square brackets being used in place of the charAt method:
for (int i = 0; i < messageChars.length; ++i) {
  System.out.println(messageChars[i]);
}
The equivalent for loop for iterating through a Vector of objects such as integers is again subtly different; indeed, it can be written in two ways - either using the old syntax prevalent before the appearance of the Collections framework in JDK 1.2 or using the alternative get method call that was added to bring the Vector class into line with other containers from the Collections framework:
for (int i = 0; i < vec.size(); ++i) {
  System.out.println(vec.elementAt(i));
}

for (int i = 0; i < vec.size(); ++i) {
  System.out.println(vec.get(i));
}
Of course, it is also possible to enumerate items in a Vector one at a time rather than indexing them, but yet again there is more than one approach to this. For example, we can use the Enumeration interface that predates JDK 1.2 or we can use the Iterator interface introduced with the Collections framework:
Enumeration<Integer> enumerator = vec.elements();
while (enumerator.hasMoreElements()) {
  System.out.println(enumerator.nextElement());
}

Iterator<Integer> iterator = vec.iterator();
while (iterator.hasNext()) {
  System.out.println(iterator.next());
}
Admittedly, JDK 1.5 helped the Java programmer by introducing a 'for each' loop syntax that greatly simplifies the Vector enumeration examples above and can even be used to enumerate array elements:
for (int number : vec) {
  System.out.println(number);
}

for (char character : messageChars) {
  System.out.println(character);
}
Useful though this undoubtedly is, it has yet to replace the older approaches as the obvious way to do things; all of those older approaches are still viable in Java, and continue to be described in textbooks and online tutorials.

Now consider a text file. This can be regarded as a sequence of lines, and Java provides a couple of obvious ways of reading such a sequence - but, unfortunately, the syntax is different yet again from the examples shown earlier:
// Before JDK 1.5
BufferedReader inputFile = new BufferedReader(new FileReader("foo.txt"));
String line = inputFile.readLine();
while (line != null) {
  System.out.println(line);
  line = inputFile.readLine();
}

// Since JDK 1.5
Scanner inputFile = new Scanner(new File("foo.txt"));
while (inputFile.hasNextLine()) {
  System.out.println(inputFile.nextLine());
}

We can summarise the preceding arguments by saying that characters in a string, objects in a container and lines in a file are all examples of a sequence of things, but Java requires different iteration syntax in each case. To make matters worse, in two of the three cases there are multiple options available!

Python is vastly simpler in comparison, supporting each of these scenarios with one obvious, consistent syntax:
for character in string:
    print character

for number in numbers:
    print number

input_file = open('foo.txt')
for line in input_file:
    print line,
True, we could also write these examples using while loops, but that would be a very obviously inferior way of doing things - so much so that I've never seen it advocated in any book or online tutorial on Python programming.

4 comments:

  1. I should point out in fairness that Vector was chosen deliberately as a 'worst-case scenario'. ArrayList, which was introduced with the Collections Framework, supports only three distinct styles of iteration as opposed to the five options available for Vector. Nevertheless, the point that Python is much simpler and less redundant still stands.

    ReplyDelete
  2. It might be neater to use "with" in the last example as it takes care of closing the file for you. Something along this should do the trick (works in 2.6 and 3k (import with from the future for 2.5)):

    with open('foo.txt') as f: data = f.read(); print data

    (Sorry for the lack of indentation, it just wouldn't work ok on the preview.)

    ReplyDelete
  3. Good idea, Juho.

    I must admit that I've not got into the habit of using 'with' yet, mainly because I've recently been developing for a system running an older version of Python.

    ReplyDelete
  4. I agree that 'with' is a superior way to handle file access, but the point still holds that you are still free to use the 'for' syntax, so if a newbie chooses to do that it will still work which is a good thing.

    Also i think we can let Python off with this, after all the requirement to opena nd close a filehandle means it isn't realy an ordinary sequence. I suppose you would use a 'for' loop embedded in a 'while' statement.

    ReplyDelete