I deleted the numerical checks a while back after confirming the backward pass is correct to keep the code base lean - running https://github.com/markusheimerl/gpt/blob/main/transformer/a... is also somewhat of a confirmation that the backward pass is correct, since an analytically incorrect backward pass cant fit perfectly to synthetic data.
The point of automation is that it reduces the bargaining power of human workers, but they still have to trade their time and effort for wages. When automation improves, the wage proportion of the created value shrinks.
Housebuilding is motivated as much by need for housing as the prospect of increasing market value, "building equity" and so on. This prospect depends most strongly on location, which means the land price dominates the cost of construction.
Not entirely true. AstraZeneca and ABB are examples that remain partly Swedish but many companies were merged into big multinationals and eventually marginalised.
My understanding after scanning the code examples is the technique expands the dimensionality of each data point with a set consisting of the quadratic coefficients of its existing dimensions. I thought it sounded like kernel PCA.
It seems to be a common reflex of Rust advocates that, whenever an issue with using the language is asked about, the response is "That's just a garbage-collected code pattern" followed by "and therefore you shouldn't want it." It's happened multiple times in this thread. [Edit: and both the times I was thinking of were from you, so need to weaken that conclusion]
Aside from having vibes of "I've chosen to get hit weekly in the face with a baseball bat, but have learned to like it, and so should you" it's also seldom true.
All three of these examples are also quite easy to do with C and C++. It's not about garbage collection.
"It's possible to write <lang-X> in <lang-Y>" is a common trope, but "It's possible to write <lang-X> in Rust" is painful and borderline impossible in my experience. I don't mean this as a defense of rust, I just think it's why the learning curve is so harsh.
Rust makes you be explicit about memory management. I guarantee you if you threw everything into a box inside an Arc, your copy closures would have still worked just fine and your Haskell idiom would translate cleanly. Only now everything is heap allocated and reference counted. Before LLMs took over the reins, this was the hallmark of beginner rust code because it WOULD work, just with unnecessary allocations and copying and pointer dereferencing.
Rust makes the tradeoffs explicit, and Rust programmers tend to obsess over minimizing those tradeoffs to get abstractions that are zero-cost. So doing it “the rust way” is often very complicated and tricky to get right while satisfying the borrow checker and type system, but once found is lean, fast, clean, and safe.
Oy. It is also a common experience that, when I struggle in Rust to use a pattern that's common in nearly every other language or find another way to achieve the same goal, people who know of my Haskell background call it a "Haskell pattern," and thereby avoid facing the suggestion that their favorite language is missing some pretty basic affordances.
No, boxing everything does not magically make things more dyn-compatible. It will not magically solve the issue that tokio does a whole-program transformation that does its most restrictive checking only after all local checks have been resolved. It will not magically allow more reuse between datatypes. It will solve none of the problems I encountered... because if beginner-Rust could solve any of these problems, then they would have ceased to be problems for me by the time I became intermediate.
> Rust programmers tend to obsess over minimizing those tradeoffs to get abstractions that are zero-cost. So doing it “the rust way” is often very complicated and tricky to get right while satisfying the borrow checker and type system, but once found is lean, fast, clean, and safe.
You and I must be using very different definitions of "lean." For me, "complicated" and "lean" do not go together
I am sorry that Rust isn't for you. There is beauty in a systems programming language, but you have to be willing to think as a systems programmer. That's not for everybody.
This is getting pretty funny. In this branch of the thread alone, I've seen the defenses of: (1) "Rust is fine, you're just expecting the affordances of a GC language." (2) "Rust is fine, you're just expecting the affordances of Haskell," and now (3) "Rust is fine, you're just not used to systems programming"
It's okay if your language has problems (I have plenty of criticisms of my favorite languages), but I find it odd and concerning how frequently I've seen Rust programmers try to deflect instead of engaging in criticism.
I actually have a huge systems programming background and identify as a systems programmer. C and C++ by and large do not have the problems I've written about. These things are Rust problems, not systems problems.
> difficult or impossible in Rust were to me pretty basic patterns for modularity
Many things are plainly not permitted, either because the borrow-checker isn't clever enough, or the pattern is unsafe (without garbage collection and so on).
Many functional/Haskell patterns simply can not be translated directly to Rust.
That "and so on" is doing a lot of work. You may accept rejecting garbage collection as a reasonable trade-off, but the bulk of the cost is coming from a much more aggressive tradeoff Rust is making with is at odds with the goals of most application code.
A deeply-baked assumption of Rust is that your memory layout is static. Dynamic memory layout is perfectly compatible with manual memory management, but Rust does not readily support it because of its demands for static memory layout.
A very easy place to see this is the difference in decorator types between Rust and other languages like Java. Java's legacy File/reader API has you write things like `new PrintWriter(new BufferedWriter(new FileWriter("foo.txt")))`, where each layer adds some functionality to the base layer. The resulting value has principal type `PrintWriter` and can be used through the `Writer` interface.
The equivalent code in Rust would give you a value of type `PrintWriter<BufferedWriter<FileWriter>>` which can only be passed to functions that expect exactly that type and not, say, a `PrintWriter<BufferedWriter<StringStream>>`. You would solve this by using a template function that takes a `T where T: Writer` parameter and gets compiled separately for every use-site, thus contributing to Rust's infamous slow build times.
It would be perfectly sane, and desirable for application code, to be able to pass around a PrintWriter value as an owned pointer to a PrintWriter struct which contains an owned pointer to a BufferedWriter struct which contains an owned pointer to a FileWriter struct. You could even have each pointer actually be to a Writer value of unknown size, and thus recover modularity.
In Rust, there is sometimes a painful and very fragile way to do this: have each writer type contain a Box<&dyn Writer>, effectively the same as the Java solution above. This works, except that, if one day you want to add a method to the Writer trait that breaks dyn-compatibility, then you will no longer be able to do this, and will need to rewrite all code that uses this type.
You can usually manage dyn compatibility issues in my experience by writing a base trait that is not dyn compatible and then an Ext trait that is, which is auto implemented for all implementers of the base trait. You see this pattern all over the place, including with several of the buffer traits you mentioned.
Mostly, this works out well enough: dyn compatibility pretty much just insists your methods can in fact work with just a reference to an unknown variant of the type.
Good suggestion. I think started doing that kind of thing towards the end of my days with Rust. It's been close to a year now, and don't remember how well it worked out.
Some people ask me why I do not use Rust as opposed to C++ if it is already safer and more modern.
But I see the forums (and I also trued some toy stuff at times) plagued with rigidity problems that in C++ have obvious solutions.
For example, I am not going to fight a borrow-checker all the stack up to get a 0.0005% perf improvement, if sny, when I can use smart pointers.
I am not going to use Result everywhere when I can throw an exception and get done with it instead of refactoring all the stack up for the intermediate return types (though I use expected and optional and like them, but it is a choice depending on what I am doing).
I am not going to elaborate safe interfaces for my arrays of data I need to send to a GPU: there is no vslue in it and I can get it wrong snyway, it os ceremony. I assume this kind of code is unsafe by nature.
I find C++ just more flexible. Yes, it has warts, but I use all warnings as errors, clang tidy and have a lot of flexibility. I use values to avoid any trace of dangling and when it is going to get bad, I can, most of the time, switch to smart pointers.
I really do not get why someone would use Rust except for very niche cases like absolutely no memory unsafety (but this is not free either, as some reports show: you need to really be careful about reviewing unsafe if your domain is unsafe by nature or uses bindings to keep Rust invariants or you write only safe code, in whcih case, if memory safety is critical, it does give you something).
But I do not see Rust good for writing general application code. At least not compared to well-written C++ nowadays.
“I’m not going to use Rust because I don’t like it” seems like what you’re saying, which is totally fine. Plenty of people, myself included, manage to write and enjoy writing general application code in Rust. You’re allowed to not get it, just like I’m allowed to dislike writing C++.
No. That is not what I am saying. I am saying there are contexts where you do not get value out of it and you can potentially decrease your productivity because it is more rigid. You have examples above if you want to read through.
In no way I am saying it is useless. I just see niche uses for it compared to alternatives.
I read most of your comment as phrasing the things that make rust unique as being additional burdens relative to what you would prefer, which is fair, but often they are what I appreciate about the language. Explicit result types are a great example.
Rigidity is a trade off: it can make initial development slower but refactors significantly easier, just as an example.
I don’t think any of your examples show it to be niche. It operates well in most of the space where C++ is a good option, and a bit beyond that (embedded, firmware, but also higher level things where you want performance but don’t want to worry about memory safety).
> but also higher level things where you want performance but don’t want to worry about memory safety).
Well, at the cost of having a straight jacket. Result without option for exception handling is an example. You need to refactor all the way up if you notice that suddenly when refactoring you needed a Result bc a new error appears that could not happen before or you need to preventively spam Result everywhere since the start. You need to handle those all the stack up. The borrow checker is also rigid. I do know why it exists. I understand its value. I am just talking about the toll it imposes while coding, and wondering if it is a good default (I think most of the time it is not, but when you need it, it is invaluable, however these cases are a minority).
Another insight is that when you really go low-level, most of the time you are working with unsafe interfaces probably. At that time, you are using unsafe and now you have to satisfy Rust's borrow checker. How? By hand. So you lost part of the value proposition.
Can you recover it? Yes. How? By reviewing that code. But if I have to review that code, what is better from choosing a language (in this situation I mean, there are situations where Rust is the better choice) where I can understand the invariants in unsafe code better and anyway I have linters and a lot of established guidelines that are not difficult to follow? And by not difficult to follow I mean they are embedded in tooling like clang-tidy, not that I can follow because I know a lot.
So for me it is not so obvious at all, especially in the presence of quite a few unsafe blocks. If you want it safe, at that time, you are starting to compete with other unsafe languages: you need human review anyways... if there is tooling in Rust for unsafe blocks (I can imagine there could be something), that improves things competitively for Rust in unsafe blocks. But if you need careful review, you are stuck again in the non-magical real world: things are safe if you checked absolutely everything.
> Rigidity is a trade off: it can make initial development slower but refactors significantly easier, just as an example.
That is certainly true. It is also true that in areas where you put this extra effort and quickly refactor, it makes things more difficult.
Refactoring, if you mix it with unsafe, needs a much more strict review than just pretending things are safe because you refactored and put things behind an unsafe interface and present it as safe.
I am not convinced at all this is what you need in most scenarios. The productivity impact is relatively high IMHO.
OTOH, if I really want correctness (real correctness!) but not absolute full speed, I think I can reach to Ocaml (very practical) or Haskell (this one is also a bit too rigid actually sometimes).
So I am left in a situation where Rust just seems to be appealing for places where the most absolute memory safety is needed. But memory safety is still a composed characteristic of a running program: you have to take into account unsafe interfaces, bindings, etc.
So the only way to get real safety is anyways to review everything (if that is what you really want to deliver), probably proving your code, which anyway requires human intervention. Did we ever (even if less often) see crashes for invariant violations in code advertised as safe in Rust? Certainly yes. I acknowledge it is usually an improvement, but still not a guarantee.
So if it is not a guarantee and I can reach other tools where anyway the guarantee is there through GC or other mechanisms and where it is not I am equal to Rust, then, why bother?
Probably the only place where I see Rust appealing is where you need both max. performance and absolute memory safety (but you will still need the kind of reviews I mentioned if you spam unsafe and interact with bindings anyway). Those are niche cases, not the norm.
I see like a suboptimal choice to write much of the application code in Rust, even when you need speed, compared to C++. C++ has very good tools for compile-time programming, expression templates, good warnings and linters, a big ecosystem and it is way more voluble (exceptions and results can be used, invariants in unsafe code are easier to follow since a borrow checker does not need to be satisfied "by hand").
So I am not sure at all Rust is the reply for a more or less mainstream general-purpose application language.
There is no magic bullet here, but I do know that when coding in Rust, the productivity toll I am paying is not negligible and I can reach for tools and techniques that make me very close or equal to that productivity.
*`dyn Writer`. `impl Writer` can only be used in function parameters.
This was one of the example approaches I gave. This works...at first. The problem is that, if you want to add a new function to the Writer trait which makes Writer no longer dyn-compatible, such as, say, any async function, then you can no longer write `Box<dyn Writer>` and need to rewrite all code that uses it.
(although you can dig under the hood and specify a pinned-down Future type, covering one kind of awfulness with another)
The reason for exp(x) is that its derivative is exp(x), which makes it possible to express the gradient of s(x) in terms of s(x), or both in terms of exp(x). This simplifies the computation of backward pass.
I agree that "it has nice derivatives" is a great empirical reason to use a specific function in ML, but it doesn't sufficiently prove that it's the best function to use. And even if a derivative term looks more complex, that doesn't necessarily imply that it is more computationally expensive to compute, so that can't be the only criteria to select a function.
Luckily, there are more axiomatic reasons for why softmax is the preferred way to map inputs to a probability distribution.
reply