> *Parametric polymorphism does not simplify code, it obfuscates it as now you h...

nwmcsween · on July 1, 2014

> A function, polymorphic or otherwise, does not influence code that does not call it.

Obfuscate does not mean influence, it obfuscates it you now have to follow the rabbit hole to where the type information is (language detail but it is how almost all parametric polymorphism is) which makes reasoning about it a pain.

> That's an implementation detail, and mostly false anyway.

No it's not 'unbound' parametric polymorphism in a compiled language has to produce symbols for any visible (not internal) function as there would be no way to know what might get called.

> Most languages that make use of parametric polymorphism don't duplicate code.

Yes any compiled language does how on earth would you call a symbol that took a bool vs a size_t (I recommend you look at how ELF works). On somewhat related note a sufficiently smart compiler could 'deduplicate' common parts of the slow path within a generic function and create calls but that's about all it could do for deduplication without performance hits.

loup-vaillant · on July 1, 2014

You need practice with an ML derivative. Fetch a Haskell tutorial, then go write a little project of your choice that requires a few dozens lines.

> Obfuscate does not mean influence, it obfuscates it you now have to—

Wow, slow down. And give me an example, or I won't know what you mean.

> No it's not 'unbound' parametric polymorphism in a compiled language has to produce symbols for any visible (not internal) function as there would be no way to know what might get called.

Your lack of punctuation is hard to parse.

Anyway, it doesn't work like that. C++ for instance doesn't instantiate the polymorphic function for every possible type. Actually, it tries to compile monomorphic code first and only specialized polymorphic stuff as needed. This is why you need to actually use template code before the compiler can check it properly. (Notice how some error messages only surface when you use template code?

Let me give you an example (untested code):

  template<typename T>
  T sum(vector<T> vec)
  {
    T sum = vec[0];
    for (int i = 0; i < vec.size(); i++)
      sum = sum + vec[i];
    return sum;
  }

Why the generic code? Because I'm likely to perform sums on integers, floating point numbers, complex numbers and other fancy stuff. I fail to see how this approach obfuscates anything, since it let me write less code.

Now my C++ compiler will not compile this function for integers, floats, and every user-defined class I have in my program. If my program only uses it on vectors of integers, it will only instantiate the integer version, even if I have floats in my program.

> Yes any compiled language does how on earth would you call a symbol that took a bool vs a size_t (I recommend you look at how ELF works).

Muhaha, you foolish mortal. Let me tell you how this works in OCaml.

Under the hood, OCaml data are one of two things: an integer, or a pointer to heap data. The compiler can distinguish them thanks to the least significant bit: 1 when it is an integer, 0 when it is a pointer.

This is possible because in most machines pointers are generally aligned to word boundaries, and words are almost always 16 bits or more. Integers on the other hand have one less bit. On a 32 bit machine for instance, OCaml integers fit in the 31 most significant bits. More precisely, when the program sees a 32 bit word whose value is 43, it knows that it's an integer, and that the underlying integer is 21 (meaning, 43>>1).

You will note this suspiciously looks like a dynamic language implementation of runtime tags. It is a bit cleverer than that. First, the code is statically checked, so it never performs any runtime check with respect to this tag. The garbage collector on the other hand knows little about the program, and needs a way to distinguish raw data from heap pointers to do its job.

Now polymorphic code. In OCaml, a polymorphic function knows nothing about its polymorphic arguments. This is important, because it mean the code inside won't ever inspect nor modify the values at runtime. Take this example:

  let app f x = f x         (* app: (a -> b) -> a -> b *)

`app` is function application reified into a function. Yes it's silly in most cases. Bear with me. Look at its type. It accepts two arguments (of type a->b and a respectively), and returns a value (of type b). As you may have noticed, we have no frigging clue what those `a` and `b` mean. That's what it means to be polymorphic.

Now let's call the function on actual arguments.

  app (fun x -> x + 1) 42   (* the result is 43 *)

Okay, so the first argument is a function from integers to integers, and the second argument is… 42 (an integer). And poof magic, it works.

Under the hood it's not complicated. `app` knows that its first argument is a function, and it knows that the type of its second argument is compatible with that function. Since the first argument is a function, at runtime, it must be represented by a pointer to the heap. More specifically, it will point to a closure on the heap. We don't know much about this closure:

  +---+-------+
  | f | Data… |
  +---+-------+

We don't know anything about that `Data` stuff, but we do know that `f` is a pointer to code that will accept at least one argument.

Then there is 42. In the CPU it will be represented as 85 (42<<1 + 1). But it doesn't matter. `app` doesn't know if it's an integer or a pointer to a heap: from where it stands that word is just an opaque blob of data. The only safe thing it can do with it is copy it around. (And the static type checker ensures it does no more than that.)

So… `app` has 3 things: a pointer to a closure, a pointer to a piece of code, and an opaque blob of data (which happens to be an integer, but it doesn't know that). What it must do is clear:

  - Push the opaque blob of data to the heap.
  - Push the pointer to the closure to the heap.
  - Call f

And voilà, we have polymorphic code at the assembly language level. By carefully not inspecting the data, it works on every kind of data. No need for de-duplication.

Still, we don't have our result. We just called `f`, which has 2 arguments to contend with: its closure, and its "real" argument: 42. Now as you can see in the source code, `f` is not polymorphic at all. It works on integers. So it knows about its argument. Actually, it knows two things: the `Data` part of the closure is empty, and its argument is an integer. So it just adds 1 to 42 (possibly using some clever CPU arithmetic involving the LEA instruction), pops 2 elements off the stack, and pushes its result (43, which we represent as 87).

Now we're back to our polymorphic `f` which has this 87 blob of opaque data at the top of the stack. Well, it just returns it to its own caller, who hopefully will know how to handle that data.

---

As I have just illustrated, there is no need for duplication in the first place. Polymorphic code in OCaml generates polymorphic code at the assembly language level. And this was a naive compilation scheme. "Dumb" turns out to be sufficiently smart. De-duplication out of the box if you will.

And about that "slow path" (implying a fast path somewhere) that is typical of JIT compilation, we don't have that shit in statically typed functional languages. The "slow path" is already fast, since it doesn't perform any test at runtime!