Andres Valloud: Maths, virtual machines & books ~ ClubSmalltalk

Andres Valloud was born in Buenos Aires, Argentina. He has been working as Lead technical engineer at Cincom Systems. In this interview he answered some questions about the present in his job at Cincom and some questions about his recently published books.

CS: Andrés, How did you know Smalltalk and what was your first impression?

AV: I first learned about Smalltalk while studying mathematics in college. A math instructor, Leandro Caniglia, had found that I was also programming mathematics related stuff in things like x86 assembler, and he began insisting that I should give Smalltalk a try instead. At first I basically thought "Smalltalk what's that?", and dismissed the suggestion. But eventually he convinced me to go to his place on a Saturday afternoon. We had a 3 hour session. Within an hour I knew all my work had become obsolete. The simplicity of the language was astonishing. All the inconveniences typical of other programming languages were gone and nowhere to be seen. In fact we spent more time thinking about trees and math problems rather than about Smalltalk itself. To me this was unquestionable proof that the language was designed to help people solve problems, as opposed to forcing people to solve programming language problems on top of their original issues.

CS: You have been working in the Smalltalk Industry for a long time and now you are working with virtual machines. How did you end up working on the lowest level of a Smalltalk environment?

AV: I think it has to do with the orientation with which I have gone through challenges in the past, and perhaps a sprinkle of personality traits. From the beginning I didn't feel it was acceptable to take everything at face value, and rather concentrated on figuring out how things fit together as a whole on my own. This was applied to things like x86 assembler, then to Smalltalk, and perhaps the inevitable consequence is that now the process requires knowledge of the VM in order to continue.

CS: What are the constraints when you are working at that VM level? And, what are the usual requirements?

AV: Right now I am working on a VM that runs on 15 or so different platforms. The requirements are that the same source code should compile and run correctly on all of them, and that this implementation must provide the same visible behavior to the image. This turns out to become an unintended constraint, because it is here where you begin to see that standards look great on paper, and yet practice has all sorts of oddball and exceptional cases that must be addressed.

CS: If you have code that run in so many platforms with their particular issues, how is a VM tested before been released?

AV: At Cincom we have a test suite with which we test each interim build we make, on every platform. Also, there are numerous tests that verify that image functionality works as expected. Finally there is also the Cincom Smalltalk developer program, which gives customers access to weekly builds so they can be evaluated in advance.

CS: The VM of VisualWorks has been one of the fastest in the market, what are the secrets behind this VM?

AV: The techniques used are well known. For example, the VisualWorks VM uses a JIT approach to translate Smalltalk methods into native code. These translated methods do not have to abide by the usual C stack calling conventions, and so a lot of stack traffic is eliminated. Some primitives are also translated, and so they do not need the overhead of a C stack frame either. In addition to this, the compiled methods have polymorphic inline caches which for the most part avoid costly method lookups.

CS: If we talk about performance, what are your recommendations and what kind of code should we avoid?

AV: From the point of view of a Smalltalk image, arguably the number one performance offender is ifTrue:ifFalse:. It may not be evident at first sight, and perhaps it may even seem counterintuitive. However, typically what one does with ifTrue:ifFalse: is to do things like this:

anObject hasSomeProperty

ifTrue: [anObject doSomething]
ifFalse: [anObject doSomethingElse]

But how could be ifTrue:ifFalse: be a problem? Most, if not all, Smalltalks heavily optimize ifTrue:ifFalse:, so this does not look like it can be made faster. However, there is a way. The issue here is that the program is making a run time distinction that perhaps could be made a design time. In other words, if we made two classes, one for objects that have some property, and another for objects that do not have some property, then we would be able to rewrite the code above like this:

ObjectWithSomeProperty>>doWhatIsAppropriate

^self doSomething

ObjectWithOtherProperty>>doWhatIsAppropriate

^self doSomethingElse

Once we have these methods, we can simply replace our original ifTrue:ifFalse: with a single line of code

anObject doWhatIsAppropriate

So we have made the ifTrue:ifFalse: disappear. Where did it go? Into a cached message lookup, which in VisualWorks will resolve to a few assembler instructions to check for the class of the receiver in a polymorphic inline cache. In other words, the code will run faster, and in addition it will better express the knowledge available to developers while modeling the problem at hand.

Fast code does not have to be unreadable.

CS: If someone wants to start learning about Smalltalk's virtual machines, which books and resources would you recommend to start looking at?

AV: Personally I have found several loose resources to be useful. For example, there are presentations by people like Eliot Miranda which are available on the web. Furthermore, there are published papers that describe things like polymorphic inline caches and so on, particularly the Self papers such as the one here: http://research.sun.com/self/papers/pics.html.

CS: You wrote a book about hashing. Could you explain us what is the importance of the hashing in ourdaily work over a Smalltalk environment?

AV: Hashing is a technique to handle large amounts of data which offers O(1) access time regardless of the size of the data set. In today's world of increasingly large amounts of data that applications must handle, hashing becomes very attractive because of its O(1) behavior characteristics. Consider for example the need to detect duplicate objects. One could keep a sorted list of the unique ones seen so far, and then use binary search to determine whether any object should be added to the list or classified as a duplicate. While this could be reasonably fast for moderately sized data sets, hashing can do this in constant time for each object tested, and keep doing so regardless of the number of objects seen so far. Note that sorting is not necessary either. Compared to the sorted collection approach, hashing scales significantly better and will also execute the task in considerably less time

CS: You have written a mentoring book about Smalltalk, could you tell us a little about the motivation behind this book and who should read it?

AV: Some of us are lucky to be mentored, but most of us will not have that opportunity because there are not many mentors in the first place. My main motivation was to write down things that I have learned from my mentors, to remove the luck factor in having access to this kind of material. I have found this knowledge invaluable, so I hope it proves useful to others too.

If I said ... Would you answer

Sports?
Soccer.
Food?
I have very few dislikes, and I enjoy a variety of different cuisines from all around the world.
Computer brand?
Mac
Operative system?
*nix, Mac OS/X.
Mobile Phone?
As simple as possible.
City?
Portland, Oregon, Yosemite National Park / Mammoth Lakes, no country of preference so far.
Book?
Concrete Mathematics, by Graham, Knuth and Patashnik
Film?
Akira Kurosawa's Dreams
TV Series?
I don't watch TV anymore :).
Magazine?
None.
Car?
Honda.
Open Source?
I have no strong preference one way or the other. To me, the interesting distinction is whether software is either useful or not, where its usefulness also depends on the licenses attached to it.