Friday, April 24, 2009

Metaclasses and Extension Classes (a.k.a. “The Killer Joke”)

In Python’s original implementation, classes were first class objects that could be manipulated just like any other object. However, the process of creating a class object was something that was set in stone. Specifically, when you defined a class such as this
class ClassName(BaseClass, ...):
...method definitions...
The body of the class would be executed in its own local dictionary. The name of the class, a tuple of base classes, and this local dictionary would then be passed to an internal class creation function that was responsible for creating the class object. Since all of this machinery was hidden away behind the scenes, it was an implementation detail that users didn’t need to worry about.

Don Beaudry was the first to point out that this was a missed opportunity for expert users. Specifically, if classes were simply special kinds of objects, why couldn’t you create new kinds of classes that could be customized to behave in different ways? He then suggested a very slight modification to the interpreter that would allow new kinds of class objects to be created by C code extension modules. This modification, first introduced in 1995, has long been known as the “Don Beaudry hook” or “Don Beaudry hack”, where the ambiguity about the name was an intentional joke. Jim Fulton later generalized the modification where it remained a part of the language (albeit under-documented) until it was replaced by true support for metaclasses through the introduction of new-style classes in Python 2.2 (see below).

The basic idea behind the Don Beaudry hook is that expert users would be able to create custom class objects if there was some way to supply a user-supplied function in the final step of class creation. Specifically, if the class name, base classes, and local dictionary could be passed to a different construction function, then that function could do whatever it wanted with the information to create a class object. The only catch was that I didn't want to make any changes to the class syntax which had already been well-established.

To do this, the hook required you to create a new type object in C that was callable. Then, when an instance of such a callable type was used as a base class in a class statement, the class creation code would magically call that type object instead of creating a standard class object. The behavior of classes created this way (and their instances) was completely up to the extension module providing the callable type object.

To modern Python users it may sound strange that this was considered such a hack. But at the time, type objects were not callable -- e.g. 'int' was not a built-in type but a built-in function that returned an instance of the int object, and the int type was neither easily accessible nor callable. User-defined classes were of course callable, but this was originally special-cased by the CALL instruction, as these were implemented in a completely different way than built-in types. Don Beaudry is eventually responsible for planting the insight in my head that led first to metaclasses and later to new-style classes and the eventual death of classic classes.

Originally, Don Beaudry’s own set of Python extensions named MESS was the only user of this feature. However, by the end of 1996, Jim Fulton had developed a very popular third-party package called Extension Classes, which used the Don Beaudry hook. The Extension Classes package eventually became unnecessary after Python 2.2 introduced metaclasses as a standard part of the object machinery.

In Python 1.5, I removed the requirement to write a C extension in order to use the Don Beaudry hook. In addition to check for a callable base class type, the class creation code now also checked for an attribute named “__class__” on the base class, and would call this if present. I wrote an essay about this feature that was the first introduction to the idea of metaclasses for many Python users. Due to the head-exploding nature of the ideas presented therein, the essay was soon nicknamed “The Killer Joke” (a Monty Python reference).

Perhaps the most lasting contribution of the Don Beaudry hook is the API for the class creation function, which was carried over to the new metaclass machinery in Python 2.2. As described earlier, the class creation function is called with three arguments: a string giving the class name, a tuple giving the base classes (possibly empty or a singleton), and a dictionary giving the contents of the namespace in which the indented block with method definitions (and other class-level code) has been executed. The return value of the class creation function is assigned to a variable whose name is the class name.

Originally, this was simply the internal API for creating classes. The Don Beaudry hook used the same call signature and hence it became a public API. The most important aspect of this API is that the block containing the method definitions is executed before the class creation function is called. This places certain restrictions on the effectiveness of metaclasses, since a metaclass cannot influence the initial contents of the namespace in which the method definitions are executed.

This was changed in Python 3000, so that now a metaclass can provide an alternative mapping object in which the class body is executed. In order to support this, the syntax for specifying an explicit metaclass is also changed: it uses keyword argument syntax in the list of base classes, which was introduced for this purpose.

In the next episode I'll write more about how the idea of metaclasses led to the introduction of new-style classes in 2.2 (and the eventual demise of classic classes in 3.0).

Thursday, April 23, 2009

New! Now in Japanese!

There's now a Japanese translation of this blog. Yay! There is also a Spanish version, and a French version (sorry, I don't know the URL yet -- let me know if you do).

I don't read those languages (or not well enough) and am not in a position to "approve" them, but I'm fine with their existence. If you want to start another translation, be my guest -- send me a link to the blog so I can update this page. If you see a mistake in a translation, just contact the translator directly. (However, I do not want to see copies of my blogs appearing. There's no point for those, and typically sites that copy popular blogs are just trying to attract eyeballs to their ads by reproducing popular material without putting any creative work into it.)

Wednesday, April 22, 2009

And the Snake Attacks

Alright. Fine. In 1995, when I was first exposed to Python, any reference to "snake" was verboten. Python was named after Monty Python, not the reptile. If anybody was attacking, it was Knights who say Ni or possibly the Rabbit of Caerbannog.

In any case... back in 1994, I was battling fictitious baddies in the LPMUD scene. The Web was barely present, and broadband was unheard of. Low-bandwidth entertainment was the order of the day.

Wait. One more step back. 1979. My first computer was an Apple ][, and one of my favorite games was the Colossal Cave. Soon after that, I learned about and played Zork. I fell in love with the concept of interaction fiction and how a computer could lead you through these stories. These games hooked me, and led me into a life of computers. (and yes, you can imagine my ecstasy at meeting Don Woods some 25+ years later!)

So the MUD scene was quite interesting to me. But I wanted to help build these games. I met John Viega, a fellow LPMUD game author, coder, and designer. At the time, he was working at the University of Virginia in their Computer Graphics Lab, working on a system called Alice. Their system was intended for less computer-savvy and they wanted an easy-to-learn language for people to create animations. They chose Python for its clarity, power, and simplicity.

John was a huge fan, and pointed me at Python. "You have to learn this!" "Fine. Fine.", I said. The language was easy, yet powerful. It could do everything, without a lot of hassle.

That was February, 1995, and I haven't turned back.

Little did I know, at the time, just how pivotal Python would be to my career and my life. Thank you, Guido, for your creation.

Tuesday, April 21, 2009

Origins of Python's "Functional" Features

I have never considered Python to be heavily influenced by functional languages, no matter what people say or think. I was much more familiar with imperative languages such as C and Algol 68 and although I had made functions first-class objects, I didn't view Python as a functional programming language. However, earlier on, it was clear that users wanted to do much more with lists and functions.

A common operation on lists was that of mapping a function to each of the elements of a list and creating a new list. For example:
def square(x):
    return x*x

vals = [1, 2, 3, 4]
newvals = []
for v in vals:
    newvals.append(square(v))

In functional languages such as Lisp and Scheme, operations such as this were provided as built-in functions of the language. Thus, early users familiar with such languages found themselves implementing comparable functionality in Python. For example:
def map(f, s):
    result = []
    for x in s:
            result.append(f(x))
    return result

def square(x):
    return x*x

vals = [1, 2, 3, 4]
newvals = map(square,vals)

A subtle aspect of the above code is that many people didn't like the fact that you to define the operation that you were applying to the list elements as a completely separate function. Languages such as Lisp allowed functions to simply be defined "on-the-fly" when making the map function call. For example, in Scheme you can create anonymous functions and perform mapping operations in a single expression using lambda, like this:
(map (lambda (x) (* x x)) '(1 2 3 4))  

Although Python made functions first-class objects, it didn't have any similar mechanism for creating anonymous functions.

In late 1993, users had been throwing around various ideas for creating anonymous functions as well as various list manipulation functions such as map(), filter(), and reduce(). For example, Mark Lutz (author of "Programming Python") posted some code for a function that created functions using exec:
def genfunc(args, expr):
    exec('def f(' + args + '): return ' + expr)
    return eval('f')

# Sample usage
vals = [1, 2, 3, 4]
newvals = map(genfunc('x', 'x*x'), vals)

Tim Peters then followed up with a solution that simplified the syntax somewhat, allowing users to type the following:
vals = [1, 2, 3, 4]
newvals = map(func('x: x*x'), vals)

It was clear that there was a demand for such functionality. However, at the same time, it seemed pretty "hacky" to be specifying anonymous functions as code strings that you had to manually process through exec. Thus, in January, 1994, the map(), filter(), and reduce() functions were added to the standard library. In addition, the lambda operator was introduced for creating anonymous functions (as expressions) in a more straightforward syntax. For example:
vals = [1, 2, 3, 4]
newvals = map(lambda x:x*x, vals)

These additions represented a significant, early chunk of contributed code. Unfortunately I don't recall the author, and the SVN logs don't record this. If it's yours, leave a comment! UPDATE: As is clear from Misc/HISTORY in the repo these were contributed by Amrit Prem, a prolific early contributor.

I was never all that happy with the use of the "lambda" terminology, but for lack of a better and obvious alternative, it was adopted for Python. After all, it was the choice of the now anonymous contributor, and at the time big changes required much less discussion than nowadays, for better and for worse.

Lambda was really only intended to be a syntactic tool for defining anonymous functions. However, the choice of terminology had many unintended consequences. For instance, users familiar with functional languages expected the semantics of lambda to match that of other languages. As a result, they found Python’s implementation to be sorely lacking in advanced features. For example, a subtle problem with lambda is that the expression supplied couldn't refer to variables in the surrounding scope. For example, if you had this code, the map() function would break because the lambda function would run with an undefined reference to the variable 'a'.
def spam(s):
    a = 4
    r = map(lambda x: a*x, s)

There were workarounds to this problem, but they counter-intuitively involved setting default arguments and passing hidden arguments into the lambda expression. For example:
def spam(s):
    a = 4
    r = map(lambda x, a=a: a*x, s)

The "correct" solution to this problem was for inner functions to implicitly carry references to all of the local variables in the surrounding scope that are referenced by the function. This is known as a "closure" and is an essential aspect of functional languages. However, this capability was not introduced in Python until the release of version 2.2 (though it could be imported "from the future" in Python 2.1).

Curiously, the map, filter, and reduce functions that originally motivated the introduction of lambda and other functional features have to a large extent been superseded by list comprehensions and generator expressions. In fact, the reduce function was removed from list of builtin functions in Python 3.0. (However, it's not necessary to send in complaints about the removal of lambda, map or filter: they are staying. :-)

It is also worth nothing that even though I didn't envision Python as a functional language, the introduction of closures has been useful in the development of many other advanced programming features. For example, certain aspects of new-style classes, decorators, and other modern features rely upon this capability.

Lastly, even though a number of functional programming features have been introduced over the years, Python still lacks certain features found in “real” functional programming languages. For instance, Python does not perform certain kinds of optimizations (e.g., tail recursion). In general, because Python's extremely dynamic nature, it is impossible to do the kind of compile-time optimization known from functional languages like Haskell or ML. And that's fine.