A reformatting for readability of http://www.infoq.com/interviews/armstrong-peyton-jones-erlang-haskell

I'm Sadek Drobi. I'm here at Erlang Factory with Simon Peyton Jones and Joe Armstrong. Can you please tell us about yourselves and what you've been busy with lately?

JA: I'm Joe Armstrong and I'm at Erlang Factory. I've just been to a very nice talk where Simon has told us about the birth of Haskell and Erlang and how they point along parallel routes solving the same problems. I think we can talk a bit about that, because in your lecture you said things about “We tried that and it was an absolute disaster”. We also tried - you know this making everything parallel - we did that as well.

SPJ: Was it a disaster?

JA: Yes. You know this bit about you deleted all the people's code? We've done that as well.

SPJ: I'm Simon Peyton Jones, I work at the Microsoft Research in Cambridge at the moment. Previously I was a professor at Glasgow University. I've been doing functional programming now for 30-odd years, because I got addicted to it when I was at university and never been able to give it up. It's like a drug, but a good drug. Haskell is a programming language evolved because a bunch of different academics were working on lazy functional programming languages around the world, but they were all different languages and we realized that if we'd just agree to a common language, then we would be able to share a lot more work together, so we got together and formed a committee, which wasn't necessarily a very promising way to go about designing an elegant and beautiful language. But actually the committee worked very well, because we had enough common goals. That was in the very early '90s. Haskell has come a long way since then. Many languages die fairly quickly after they've been designed, but Haskell has lasted 20 years, as has Erlang.

JA: Erlang had a different start than Haskell. It started in an industrial environment. It started in the computer science lab at Ericsson and the original goal was to find a better way of programming. It wasn't actually to make a programming language at all, it was somewhat after the event that we discovered we created a programming language - that was kind of accidental.

SPJ: But you also had a very specific application at least, didn't you? You really were telecoms.

JA: We wanted to just program a small telephone exchange in the best way possible and we wanted to just steal our ideas from different programming languages and put them together and just try them out and see if they work.

SPJ: You knew the application you needed and you knew that concurrency was key - concurrency and robustness was key.

JA: I remember back in the '80s going to conferences and talking about shared memory systems, and I always poked up my hand and asked the same question over again - “Well what happens if one of these shared memory things fails?” and they would say “Well it all fails”. That's what we didn't want, you know. We make products with… a standard Ericsson product should have a downtime of four minutes per year or something like that; or better than that. Our key problem is handling failure. Right from the early days, we thought “Well the only way to handle failure is no shared memory and no transaction memory or anything that locks things” and that's why we copy everything. That's why we have a sophisticated error handling. So, here we are back in the '80s, about 1984 - 1985, just thinking how we can put these things together and it was only later that we realized “Oh, we've made a programming language and better give it a name” and people started using it and such.

SPJ: It never struck me quite as positively as this before, but Erlang was born very much out of a demand pool. You had a telephone exchange, you wanted to program it and the language happened to come out of that. Haskell was the exact reverse. We were on a mission from God to explain why laziness was fantastic for the world, so it was very much technology push. We wanted to explain functional programming, lazy functional programming in particular, through the medium of a language. You were at this end and we were at this end.

JA: But in the same point in time we were doing the same things. I remember one of the things we did in Erlang, we… First of all we implemented it in Prolog and I did that. Later, Robert Virding came along and Robert was very into parallel concurrent languages. We made one of our first big mistakes, which was announcing what the performance of something will be before you've done it. You've made that one!

SPJ: No, no, no! We'd never do that!

JA: We cross-compiled Erlang to Strand, which was a parallel language and we implemented a language like that and we happily told everybody “It will be 6 times faster” before we'd actually done this.

SPJ: Was it?

JA: No. It didn't work.

SPJ: Didn't work at all! Naught times faster!

JA: The first thing we did, we wrote an Erlang program, we cross-compiled it to all this parallel stuff and we had 3 Erlang processes and it was as parallel as possible and suddenly there were 6 million processes in the machine, 6 million threads - we got far too much parallelism. At that was the same time you said, in your talk, “Well, we tried to make everything parallel and it was just far too much parallelism.” We did the same stuff at the same time.

SPJ: A nice remark I remember you making about Robert was that he came to you one day and asked to make a few small changes in the compiler. Then, the next line in your paper was “Robert is incapable of making a small change to anything, so he completely rewrote the whole thing.”

JA: He left a little comment at the top, “Joe first wrote this”, and then everything is completely different. We used to rewrite each others' code.

Haskell and Erlang have 2 distinct models of concurrency, right? Haskell is going to side-effect free, Erlang is about messaging. Can you talk a bit about this in contrast to both models of concurrency?

SPJ: I suppose Haskell initially wasn't a concurrent language at all. It was a purely functional language and we had the idea from the beginning that a purely functional language was a good substrate for doing concurrency on. But it turned out to be a lot harder to turn that into reality than I think we really expected because we thought “If you got e1+e2 then you can evaluate e1 at the same time as e2 and everything will be wonderful because they can't affect each other because it's pure.”

But it turned out to be very hard to turn that into actual wall clock speedups on processes, leaving aside all issues of robustness or that kind of stuff, because if you do that style of concurrency you get lots of very tiny fine-grained processes and you get no locality and you get very difficult scheduling problems. The overheads overwhelm the benefits, in short. From there, we've evolved generally towards giving the programmer some control over that kind of concurrency. That was the first form of concurrency that Haskell grew. It was a controlled form of this implicit level of concurrency. We added other things afterwards that we'll talk about later, but it was very different to Erlang's model of concurrency.

JA: Erlang started with just this pure process and copying everything. The reason for copying was error handling. Error handling was central. In my view of the world, the view of the world I've always had was all these little processes all talking to each other in a global namespace.

SPJ: But that already, as you know, was completely alien to Haskell, which is just a functional program which is evaluated, so a totally different model.

JA: In the first incarnation of Erlang, it was just… little black boxes were communicating, copying their messages - it's a mailbox model, copying to a mailbox. What we were doing was relational. We had Prolog processes inside the black boxes.

SPJ: But you saw the light.

JA: We had to. You are launching a rocket program, because… We wanted from a black box perspective, if you send a certain sequence of messages in, you want the same sequence of messages to come out. You want that to be reproducible and deterministic and Prolog isn't like that. It backtracks, and it has things like that. It just sort of became more natural to make it functional. We never made a decision about having types or not having types. That wasn't an issue. We started with Prolog. Prolog didn't have types, so we got the dynamic type system that Prolog had. The issues we were interested in were limiting errors, propagation of errors, restarting things, restarting bits of the system without taking down all the system, having things which may appear inconsistent while you are upgrading them, continuously evolving systems, not systems you stop and restart.

We can't stop our systems and globally check they are consistent and then relaunch them. We incrementally change bits and we recognize that they are inconsistent under short time periods and we live with that. Finding ways of living with failure, making systems that work, despite the fact they are inconsistent, despite the fact that failures occur. Error models are very sophisticated. When I see things like Scala or I see on the 'net there's this kind of “Erlang-like semantics”, that usually means mailboxes and message boxes, it doesn't mean all the error handling, it doesn't mean the live code upgrade. The live upgrade of code while you are running a system needs a lot of deep plumbing under the counter - it's not easy.

SPJ: I still think it's amazing you can make that work at all.

JA: Phil Wadler saw it… You know, you can have two versions of the same model with all the same names, he didn't believe it was possible until we showed it to him.

You mentioned some other language got inspired from Erlang and from Haskell, also about concurrency. For example, Scala. The dynamic part of Erlang actors in the same way that languages like F# and C# implement features coming from functional programming, more specifically Haskell. Can you talk a bit about this and how things get implemented in the mainstream or in other languages?

JA: I don't know. People say “What will happen to Erlang?” and I have no idea. There are certain languages which are very influential, but don't get widely used, like Smalltalk, for example. Smalltalk is the ultimate object-oriented language and it has influenced Objective-C and all sorts of languages. It has this core band of pure Smalltalk programmers who don't believe in anything else, but it wasn't destined to take over the world. Erlang is probably… I don't know, maybe it's in that category, it influences other languages… It comes from a funny place, it comes from Ericsson, it comes from a telecoms company and that's really not our main business, to make languages.

Microsoft's main business is to make languages, but you have the sort of muscle to support a language. We have languages supported by companies like Microsoft and Sun and Java and C# and things like that. Then, you have languages not supported by big companies like Ruby and Perl, and they have their own communities. Erlang is in between - it's not in the sense of Microsoft or Sun being supported in the big scale, but it's not living all by itself in an open source community. It has the financial resources to do the compiler and keep the core clean, which is very good. Where it goes from there, I don't know.

SPJ: But nevertheless, there is this company that supports it - there isn't for Haskell, curiously. Microsoft is generously supporting myself and Simon Marlow, but they are not supporting us to work on Haskell. I'm hired as a researcher and I happen to work on Haskell.

JA: It's a good change to work on C#.

SPJ: In principle I could. I don't think it's going to any time soon. I suppose Microsoft could turn around and tell me “Please don't work on Haskell any more.”

Like they did with Erik Meijer, right?

SPJ: Erik is a whole different ball of wax. Erik works in the developer division in Microsoft and that's much more product focused, but amazingly Erik has brilliantly found a way to meld ideas from functional programming in quite a product-focused kind of way and build a little research bubble in the developer division that they perceive as being actively valuable to them. I think that's amazing! Whereas I'm a mere parasite. I'm a researcher who isn't required on a month-by-month basis in the way that Erik is to produce immediate value to the company.

But I think Haskell is also a bit different to Erlang.

I think of Haskell as a… Its principle value is a kind of ideas factory. It's a laboratory in which ideas can grow and flow. It has a very active user community and a very active research community. Lots of research papers use Haskell as their substrate and indeed use GHC, the compiler that Simon and I built, as a substrate for their work. It's a laboratory in which ideas can go in that may then get transferred elsewhere. We've seen lots of examples of that, not perhaps specifically for Haskell or for functional programming in general, going right back to garbage collection, which is now taken as “Of course you have garbage collection”, but it was a long time before that was taken as wrote, it was a long time that people thought it would never be taken as wrote, but that grew up in the functional programming community.

Generics in Java and C# grew up in their functional programming type system world and are now taken for granted. LINQ, the language integrated query system in C#, is heavily informed by ideas of functional programming. I'm actually quite happy if Haskell serves as a role of generating ideas then move into the mainstream. Whether Haskell will itself ever become a truly mainstream language - it has hundreds of thousands of users, but not hundreds of millions of users, so it's a completely different kind of scale than Microsoft's product languages.

Whether it will ever become a language on that scale, I don't know and I wouldn't want to give up being an ideas factory in order to get that. I'm just delighted that my colleague, Don Syme, who is 3 doors down the corridor, has developed F#, in which now actually there's a kind of pipeline, so now he can get pipeline ideas out of Haskell into F# and out of F# into C#. And I'm also delighted that he successfully made the case to Microsoft to turn it into a product that the company does support in a way that Erlang is supported and Haskell is not. F# really is a Microsoft product. That's a huge breakthrough for a big mainstream company like Microsoft to support a functional language. But the ideas factory is the bit that I… that is the most important thing, the high order bit.

JA: That's fun, isn't it? It's the same in Erlang. I think in a way you know your graph, and it's like you have this initial enthusiasm, you get up to there and there's this sort of plateau where not much happens. I think that happened in Erlang as well. I don't understand why, it seemed to be very stop-go. You work on things, then suddenly something happens and then I think it's a time taken to ingest the ideas, these ideas are fermenting in people's brains for a long time.

When people started being permanently connected to the Internet with broadband connections - that's a quantitative change. You saw the file-sharing networks. It's only about 4-5 years ago that people started deploying distributed algorithms. In the first few years of that they said “What shall we do?” Nobody knows so they invented Bittorrent and things like that. You have social networking programs, you have things like that.

Google Wave will come along and replace email in a few years time. Suddenly people want to know how to write distributed programs, which they never wanted to do before. So, how do you do distributed transactions? How do you do consistency? It was always a pretty obscure part of computer science, these distributed algorithms. Then we said “We've got these to do parallel. How the heck do we program these things?” In Erlang or Haskell, these algorithms are just difficult, but in other languages they are…

SPJ: Downright impossible!

JA: Yeah, right! I mean even if you have a very clean, pure programming language and you take Paxton algorithms or something like that, they are complicated things. They make your head hurt. There is a whole branch of mathematics, a whole branch of computer science to understand distributed algorithms and they live at the bottom of these social networks and things, ticking along. So I don't think you are going to see Erlang replacing Javascript or anything like that in browsers, but where you are seeing them being deployed is in the infrastructures, in the clouds and things like that, to glue the bits together properly, because that's where you need the complex algorithms. We are seeing an infrastructure-building things, cloud computing and what you are doing inside modern massive multicores, how do you organize computations. There where it's being used a lot.

SPJ: Whenever you've got concurrency and multiple processes working, you need to be very careful about side effects. Otherwise it just does your head in. Something that Haskell and Erlang both share is being careful about effects. Haskell is sort of super-careful and Erlang is merely careful, but in both cases, we don't have this, unrestricted side effects all the time, the computational fabric being effectful. It seems to me it makes it jolly hard to write programs that exploit multithreads.

JA: I didn't really know what thread safety was in Java, so I wrote a little Java Swing thing and of a Java friend I asked: I wrote this Java process and it worked fine. I could create one window, and then I created 2 windows in a graphical program and I drew a rectangle in one and I drew a rectancle in the other and it crashed. And I said “Why did it crash?” And he said “Well the Swing library's not threadsafe”. Now, what does that mean? It means if you got one thing that works, you do 2 of them in parallel, they interact in strange ways. I thought “How can you program like that? It's impossible to program!”

If you got this non-thread safety, maybe you got this code that's threadsafe that reads stuff and this code that's threadsafe that writes stuff, you try to compose into a program that reads and writes and then you go “Oh, it didn't work!” Then you are scratching your head and thinking “Maybe something is wrong. Let's put a mutate or a lock around the whole thing” and it works for a while and then somebody else has forgotten to lock it. What happens when it crashes? How is the failure model integrated with the locking models? Because if you lock stuff it seems you don't fail and you release the lock and then you hide that in the libraries and nobody can see and you inherit stuff from there and you've got a mess.

SPJ: One of the ways that I speculate that functional languages may end up influencing the mainstream, is that mainstream languages will gain a larger and larger subset of purely functional programming, that will make it easier to write functional programs that don't use side effects a lot. So, you can get more and more computation done, useful and complicated computation, like the algorithms you were describing, without using side effects. I think that, as they get richer subsets - F# is an example of this because it's built on a substrate that is completely imperative. The .NET system is an imperative system, but F# makes it easier - it reduces the barrier to entry for writing functionally. I think that's a trend that we will see continue.

JA: What I noticed is that the Erlang libraries, they are pure libraries, and the pure functions are incredibly reusable. They just work forever. You do them, you test them and forget about them and you can reuse them in many different contexts that you hadn't thought about. The goal in system design is to divide your system in such a way that as much as possible of it is pure and the messing stuff's on the side.

SPJ: That's why you need those types, Joe!

JA: No, that's why you need good error recovery mechanism, because even with your types, you need to test it. First, I want to write a factorial program, my factorial program will say “factorial of n is 42” and the type inference then will say “This is a type int to int”, so it's OK, it's correct. You have to say factorial 3 is 6.

SPJ: You are misrepresenting me! I never said that types guarantee your program is correct, but you were talking about segregating events and types are a very good way to segregate things.

In the same topic, in Scala, which we say got inspired by Erlang, for the actor model, actually they don't copy memory, they don't copy data structure, but they rather share them and there are no constraints on the data structure, so it could be mutable and it could be shared. Martin Odersky, father of Scala wrote a paper about using types, be it inferred or not, type annotations for guaranteeing that the reference is not consumed concurrently. So, you can use mutable data structure, which is very nice for performance reasons, yet you get some guarantees of not having race conditions. Can you talk a little bit about that from the two perspectives?

SPJ: First thing to say, Scala is not just a knock-off of Erlang. It's a whole sophisticated language which rather elegantly combines functional and object oriented systems, in a rather innovative and unusual way. But I think you are right - one thing you can do with Scala is implement an Erlang-like model. Once you've got an imperative system, in which you can mutate things, it is possible to use type systems to constrain exactly how you mutate them so that, for example, ownership types in object oriented languages are an example of doing this.

I don't know about Scala's… I don't know the paper that you are describing of Martin's, but systems that allow mutual objects to flow around and be used once and be owned by different threads can work, but they tend to be rather complicated. My gut feel is that you need type systems that tread this narrow line between being expressive enough to say what you want and not being so complicated that nobody can understand them anymore.

Ownership types for a conventional object oriented language are a way of controlling some of these things, but that feels to me like taking a system that's already using too many effects and trying to retrospectively patch it up, is better to start - from my preferences - would be to start with a system in which effects are (a) few, because the functional subset is very big and (b) rather constrained. Now, Haskell has a rather crude distinction between effectful programs and not. That's not enough to just say it has some effects, might not be enough. That gets you back into the land of controlling effects more precisely, but at least it's a smaller piece of your program.

At the moment I'm agnostic about whether types are the right mechanism for trying to do this much more fine-grained control of effects in which you are saying “This reference is owned by that thread for this period and then the ownership moves to that.” That's a rather sophisticated thing to do. It's very clever, I'd be very interested in papers written about it.

JA: I don't like the idea of mixing things too much.

SPJ: What sort of mixing?

JA: If you look at Haskell, Erlang, Scala and F#, what do you see? If you look at Haskell, you see something which, within its context, within the little framework it sets up for its sandbox, it's very consistent, it's very beautiful. You look at Erlang, it kind of fits, the bits fit together nicely. Of course, they don't fit together nicely with the JVM or with .NET or anything like that. If you want to use all the nice things that are there, you can't use them, or you can use them, but it's difficult. So the other approach is, you say “Let's use the JVM and target lots of different languages, so that the different languages can use each other” or you can do that within the .NET framework, you get Scala and you get F#.

The benefit there is you can use all these other things that are available, but in order to do them, you have to break them, massively corrupt and break these abstraction boundaries and I don't like that. I think you are breaking abstraction boundaries in the wrong place. How I would like to see systems built is through communicating back boxes. And I would like to see the type systems applied to the definition of the protocols themselves and I haven't seen that done.

SPJ: That's been lots of work on that kind of stuff in the type systems for contracts.

JA: Yes, I've done some stuff myself, but it seems to have very little impact on how protocols are designed. Protocols are designed, typical ways are, if you write RFCs and things like that and they are defined in English with text and a lot of people think they are defining protocols, but they are in fact defining message structures and they don't say anything about ordering…

SPJ: You must send an A and then a B and they must be all applied.

JA: Yes. Classically, an API for files, say, you can open a file and you can close a file and you can read a file and you can write it. They don't tell you that you have to open it before you can read it, do they?

SPJ: The very idea that there is a thing that you do things to it is part of what gives rise to that problem.

JA: In a lot of programming languages, if you have a file open and read and write, the programmer is supposed to know just by magic that you should open it before you read it. That makes sense on one processor - what does it mean on a parallel program? What happens when you have 2 things that try to open the file at the same time? Do you get 2 copies and one of them fails? Is it like a transaction? Is the file put back to the old state? If you start trying to read the APIs, you can't find the answer in the API. John Hughes was saying “Did you know that if you open a file and write it and rewind it and do this and do this then this happens, which is very counterintuitive” and you try and look in the POSIX specs, and you can't find any reference to what they should do. This good check threading discovers things like this. If you are doing I/O properly, in the sense that you are doing it, you don't do it at all and save it all up to the end, then do it! A lot of those problems go away.

SPJ: We don't actually. It just looks like it. We certainly don't save it all up to the end!

JA: You're fibbing!

SPJ: We are tackling the awkward spots. There are very detailed operational semantics given that explains that I/O happens as you go. It's no good if you've got to print the message and then read the response and then print something else. You can't save that up to the end!

That takes us to a discussion about user's model. For me, as a user of Haskell, I have a mental model that everything is happening by the end. That's what the type system tells me - that everything is saved until the end.

SPJ: When you mean the user model, you mean the programmer? Not the user user?

Yes. This is in sync with how things happen, but sometimes it gets out of sync. It's the same thing with when you are dealing with processors. You are sending messages, but sometimes it's maybe not that cheap to send a message and also it hits barriers of using this mental model that you established. What do you think of this problem?

SPJ: I don't think I understood the question.

With any feature of the language, you define a mental model for the programmer, so that the programmer can reason about this feature. The implementation tries to get in sync with this mental model, but it's not always true. In some cases, it gets far from it. It's the same thing with processors, when you are sending messages. It's not exactly the same thing, because you need to copy or you don't need to copy. Sometimes it's not performant to do it that way, you do it another way. What do you think of this problem? The same thing with I/O, that as a programmer I think that is happening by the end, but it is actually happening on the fly, it's happening right away.

SPJ: It is happening right away and by design. I think the presentation that I gave today which said the main program is an I/O action, which is gotten by stitching together other things, that's a first approximation. I've taken quite a lot of trouble to add a tutorial that explains in rather more explicit details. I mean, that model doesn't work at all for a concurrent program, for example, when you are spawning threads that have got to talk to each other, of course you aren't going to save all that up to the end.

So no, the I/O monad is an action that runs in an interactive fashion with other I/O threads that are going on at the same time and it's possible give a much more precise model for what's happening than say you save up one big thing that's performed at the end. I think the truth of it then, the one big thing you do at the end is just an oversimplification you might do for your first program that just printed 3 things, but then you have to get a slightly more sophisticated understanding of what's going on.

Or, alternatively, just bail out into “well it looks like a C program and actually it runs like a C program”, you say “Do: print this, read that, send this message to this other process” and that's exactly what happens, just as if it were a C program. You can look at a monadic I/O Haskell program with 2 sets of spectacles. You can either think of it very much like an imperative program and that's a very accurate model actually, or you can think of it as the compiler does in this more functional way and that's what means that the compiler's optimizations are actually robust. I suppose that's the way I deal with the dissonance, is to try to say “that was an oversimplification to start with”.

JA: You don't tell all the truth.

SPJ: That's right, the truth in bits.

JA: For example, it might sound that when you send messages between Erlang processors, your mental model of what's happening is you copy everything. And indeed we do copy everything, apart from - now there's small print - if it's a big binary and you are on the same processor, then it's in a shared heap with a reference counting garbage collector.

SPJ: And it's immutable, of course.

JA: Yes, but it's like… What's it called in operating systems, when you have a process, if you don't write to the memory, you've got the same memory, but you do copy on write. So, when you write to the memory, you split it into 2. The binaries behave like that and they are in a shared heap.

SPJ: You have to be careful about your cost model. If somebody suddenly redistributed that, so it suddenly was on a different processor…

JA: If you distributed, then it is copied.

SPJ: That's what I mean - suddenly the program would run at a very different speed.

JA: Even that statement, that it's on a shared heap and binaries are not copied, is not true for small binaries because small binaries are copied, because it's just not worth sharing them and doing things. Even that statement is not quite true, because sometimes you keep fragmented ones - there is an awful lot of small print.

SPJ: It's a business of a runtime system to provide a simple abstraction for complicated stuff.

JA: Should a programmer know about that? No.

SPJ: No. But we are not talking about the cost model, right? Whereas you were talking about the understanding of what semantics of the program is.

Yes. When performance is a question, you need to think about performance. In the same sense, how much should the programmer know about what's actually happening with processors or processes? What we saw in your presentation is that you started with a quite transparent system where the programmer doesn't know on which thread things are executing, and you ended up with something very explicit saying “Your processor executed this, the other processor executed something that is similar to Erlang”. What do you think?

JA: I think when you are saying the programmer needs to think about performance, I wouldnt' say that - I'd say the programmer needs to measure the performance of the programs, because the one area where my intuition is really wrong is in guessing performance, apart from the fact that I know the input. Parsing stuff is usually the big bottleneck apart from that. When you put these, when you have virtual machines running on top of multicores with core affinities and you put virtual machines inside the virtual machines, your intuition about performance goes out of the window.

Sometimes you give some hints to the compiler like “This is a future, this is delay” or whatever it is and sometimes you are really explicit and say “I need a process to execute this”. Or, the same thing with data parallelism - I say “I want this list to be executed in several processors”, so I'm very explicit about this.

JA: I think that annotating your program with parallel things is very very difficult. Your intuition would probably be wrong.

SPJ: What you're saying is you need good profiling tools. Your own intuition about how sequential programs run is often wrong, too. It's silly to spend time optimizing code that isn't going to run very much, so you need good profiling tools. One of the things that GHC, our compiler, has not been good at until really quite recently is having good profiling tools for parallel programs. Because you really want to know, why isn't that concurrent? Or maybe it is concurrent but the granularity is too small, so why? You need good insight into what's going on and that's quite hard to do. My colleagues Satnam Singh and Simon Marlow are working on a device called ThreadScope, that for the first time gives some insight into that. Lots of other people have built profiling systems for parallel systems - we've been lagging a bit there, really.

JA: In Erlang everything is parallel because we start up with lots of processes. SPJ: Do you have any profiling tools?

JA: Yes, sure.

SPJ: So, you can see “Oh, this process is stalled.”

JA: What's interesting is when you are finding the sequentiality. Sometimes the system just degenerates to one sequential process and you think “Why? What's going wrong here?” That goes back to algorithms, because, a lot of the sequentiality is in this one place where you have to do allocation. If you're booking seats for an aircraft or something, you've got one allocator, but if you wanted to parallelize that, then maybe you've got 2 allocators and you have all the odd seats and you allocate all the even seats from there. You can't sit next to your wife or something. We might give half the seats to one guy and another, and you want better than that so you split it into 4. So probably you end up with this logarithmic allocator that's message passing to get rid of these bottlenecks.

SPJ: You've got to narrow it down in the first place. That's why you need the profiling for.

JA: I think the things you see in clusters - in a cluster you have variable latency, you have load balancing and things like that - you are going to have to put those in the underlying runtime system, sort of hide them under the counter and do exactly the same thing that people have done in clusters for years. Because a big multicore is a cluster and it's on a chip and the latencies are different - the different cores have different caches, so you can send messages between 2 cores very quickly and between 2 other cores very slowly, because of the memory cache architecture. I think that needs to be reflected in the internal load balancing and those sorts of things. A programmer shouldn't see that!

SPJ: They are not new problems. People have been working on them for quite a long time, but I think the difference is, in languages like Haskell and Erlang, is that the programming abstractions are rather high-level. The high-level programming abstractions are good for the programmer, because it means that they don't have to worry about things, but it means that the runtime system and so forth has to cross a bigger abstraction gap. So the programmer in effect has less control. The more control a programmer is prepared to take, well then everything becomes easy. Not only does the programmer make better decisions, perhaps, at the cost perhaps of a lot of work, but also if you do have profiling tools at a lower level, then those things are easier to measure and present to the programmer. At a high level it's quite hard to think what to measure or how to present it. I think we are still finding out how cross that abstraction barrier.

JA: With each new generation of chips, it's… We don't know what's going to happen.

SPJ: How you measure abstraction withing the hardware, yes, as well. You have much less control than you think, really.

JA: It's quite funny, because we got these new Nehalem processors and boards and I don't know what's going to happen. We'll make measurements on it. I think it's rather exciting now, because the last 10 years I knew what was going to happen more or less, I could predict, and then the big multicore start coming out and the Tilera chips and things like that. What is going to happen? I don't know, it's back to experimentation. Look at the Intel Polaris chips! 80 cores or so! A gigaflop at 11 watts!

SPJ: Some of it's a big toss-up though, because these machines have lots of layers of abstractions in them and they are all not quite under your control. So, you spend a lot of time measuring things and sometimes it goes fast and sometimes it doesn't, so you might discover that the reason it's going slowly is because 2 unrelated variables happen to be allocated in a place that was in the same cache line. There are better things to do with your life than figure out the 2 variables! That kind of work doesn't feel productive to me, but the more sophisticated the underlying hardware gets, the more you have to do that. So, I don't know quite what to do about that, except those systems that have some kind of built-in locality, that's the data parallelism stuff I was talking about. That's my best hope for dealing with these machines with very nonuniform memory access.

JA: In Erlang we might start migrating processes, you find that these 2 processes are talking to each other and just migrate and put them in the same core.

SPJ: That'd be lovely! It's difficult to do well. We have our own stacks and heaps, of course, so we can move things around very freely, but then we find out, “The reason things are going so slowly is because this processor keeps migrating around, it's never in the right place. It keeps migrating where it thinks it's meant to be, but actually it turns out it's always in the wrong place. It's really annoying when that happens and it always happens on the test program that the first student writes. Real systems, it's often more OK, it's like factorial. You write factorial and it doesn't perform well.

SPJ: So, I've told you what I most envy about Erlang. What do you most envy about Haskell?

JA: All the types. I mean they're very nice. I wish we had them. On the other hand, wouldn't you love to have all these generic turn-to-binary, these sort of things? How can you live without them?

SPJ: I have a little bit of residual envy about generics.

JA: You just take anything and compare it to the serializer and then send it? SPJ: That's sinfully easy, and shouldn't be allowed.

JA: And the garbage collector doesn't need to know about what types are generated.

SPJ: Oh the garbage collector, that's not a problem. Our garbage collectors are complicated, not because of types, it knows nothing of types. Garbage collector's fine, but I have to admit that, just walking over trees of other trees, generic programming, lack of types is very convenient. Haven't you enjoyed the whole cottage industry of papers, about typed generic programming, which is being extraordinary creative, people have done extremely clever things, so that would have never happened. Necessity is the mother of invention.

JA: I read “The Real World Haskell”, very nice. When I'm reading “The Real World Haskell”, I'm started off reading the simple programs - they were obvious. The more complicated programs - I was writing the Erlang program in a margin next to it to understand it. And at that point I saw how similar they are. The other thing is, you know you talk about the first thing a Haskell programmer must do is write down all the types, and then they're doing the design. That's exactly the first thing I do, in my head. And I write the odd comment down now and then.

SPJ: I just want you to have a way to do that explicitly!

JA: When you're developing Erlang, you don't have to write the entire program to get it to work, and you're not quite sure what the types are. In Erlang you just write some program here, and you evaluate something and it throws out this great big thing and you say: “Oh, that's what the type was”. Then you can say: “Well that wasn't quite right, let's change it to that”. But you're not restricted in having to get all the bits working before you can just do those experiments. Wouldn't you miss something like that?

SPJ: You still don't have to get all the bits working, you have to think of what type you might have, but you might evolve that type.

JA: Or you can just run it and see. We have things called parse transforms which allow you to change them. In a module you can say there's a parse transform, and the parse transform is given the parse tree of the program and it can turn it into another parse tree before it's given to the compiler. That's a way of doing deeply sinful things that you normally can't do. And the nice thing about that is, somebody has asked me “How do I get started with this?”

Well, there is a function, epp parse_form(), which just takes an Erlang module and it produces all the abstract forms in it. And then you dump it into a file. Then you just open it up in a text editor; you don't need to know anything about the types, you just read it. It's pretty obvious - it says “This is the function” and you can sort of guess what the stuff does.

SPJ: You're looking at the parse tree of a particular component…

JA: Of a module, yes. And because it's got this generic sort of structure, it's dumped it all out and you can look at it. You don't really need to know what the types are. You can guess “Well, I can see that's a function application so I can start transforming it. You write generic functions that walk down the whole parse tree and extract all the variables from it without knowing what form of tree it is - tiny little functions. Don't you miss things like that?

SPJ: I do, actually.

JA: Actually, we're biased. We should ask John Hughes or Simon Thompson who have more experience of both worlds.

SPJ: So John has promised me a paper on why static typing is harmful. Because he's someone who grew up with static types and he's done a lot of Erlang programming now, he has quite a visceral understanding of both, and he actually, he's told me that he will write a paper about this.

JA: So, to me, good functional programmers have come to Erlang. Both Phil Wadler and John Hughes, when they came to Erlang, wrote the most bizarre Erlang programs I've never seen. The first thing that they do is they define a macro, which is “freeze” or “force”, which makes it lazy because it just returns the func, and they are forced to evaluate it. And then they write their entire program which looks to be, from an Erlang point of view, like a Klein Bottle - the entire program is inside out, and it just sort of ends with this “force” or “freeze”. And they write completely incomprehensible Erlang code. It all works beautifully.

SPJ: I don't think that has anything to do with types though.

JA: No, it has to do with lazy evaluation, they force you to think in a different way.

SPJ: They want to write lazy Erlang programs.

JA: Yeah, they do. That's the first thing they do. It's really weird Erlang. But it works!

SPJ: When you are going to write a lazy program, you probably wouldn't choose Erlang as your first language. Going back to what you were saying, when you start thinking about your Erlang program, you do think types in your head, so the ability to write down in a formal way and have them checked… Perhaps, while not denying an exploratory program, but in fact several people, more than one of us, like Phil in the first place and Kostis or some of his colleagues have designed soft type systems for Erlang, but they never really caught on, and I'm never quite sure why.

JA: If you're talking about the Dialyzer, we do run that.

SPJ: You do use it on a regular basis.

JA: Absolutely, especially in big projects. It's even part of… We have automatic testing, we run through these, and we look through. But they actually… By the time they've been through the test suites, it picks up very few errors.

SPJ: But you use it as an error checker rather than a design tool. I mean you don't use it to write down your types is what I mean, you don't use it as part of your design flow. Because I think that's probably the most powerful thing about types is to use it as a design tool.

JA: I think we should have gone down type checking rather than type inference, where we start by writing the type signatures and then type check, rather than just inferring what we've got.

SPJ: You are saying that writing a type doesn't mean the program is correct, but in some says for me, that's the whole point. A type is a weak specification, it's a partial specification which means that it can be small, so I can say in one line something that is meaningful to me as a programmer, and that helps me glue my program together without having to specify lots of details about exactly what it does - it's called “lookup”, so I know what it does, more or less. Its type tells me a lot of detail.

JA: I must admit to a secret. When I'm thinking about a program, you know you've got all these mental dialogues before you write the program, and my mental dialogue is as if I'm writing a blog, a blog software. A blog is a sequence of chunks, and a chunk is a title or a paragraph and a paragraph is a something or a something. I'm very strictly having this little dialogue.

SPJ: So secretly you're writing a Haskell program in your head, but it comes out in Erlang.

JA: Then I type it in. And actually for difficult problems, I write these down in English or in formal notations.

  • armstrong_peyton_jones_interview.txt
  • Last modified: 2009-09-28 12:16
  • by