The problem in most of those cases is not specifically AI. Many of the issues you cited are related to Anthropic specifically and many could have been avoided with better testing.
Yes, I am assuming the AI/LLM of choice you've implemented in your software engineering org is Claude because as far as I can tell there aren't really alternatives that come close to its quality in software.
The scenario you're describing seems like more of a language thing than a perception thing. We generally learn names of colors by references to common objects. I would argue that if people agree something is "Red, like a strawberry, tomato, or apple" then it doesn't really matter what you're seeing, that color is red.
Our experience doesn’t become unimportant just because it’s lost in translation. It’s a paradox that we can’t know what X feels like to another person because communication is very lossy, but that does not warrant dismissal. We are not p-zombies, we do feel things.
In fact, the argument that “what we experience doesn’t matter” looks incongruous insofar as it is made by an entity experiencing something and in fact because said entity is experiencing something—the entity has no access to anything but experience.
I'm not saying our experience is unimportant. I'm talking about how we communicate what colors are. I'm not an expert by any means, but it seems like the way we communicate a shared understanding of what colors are is based on observing things that are the same color. I just don't think we have a way of communicating our subjective view of what a color looks like without reference to some other color.
This article[0] provides some details. Basically if you go through the lookup process on Apple's website and you don't have an existing D-U-N-S number, you can request one from D&B for free via Apple.
On this note, one thing I've found Codex to do is worry more than necessary about breaking changes for internal APIs. Maybe a bit more prompting would fix this, but I found even when iteratively implementing larger new features, it worries about breaking APIs that aren't used by anything but the new code yet.
One thing I've found that I've found super helpful for this is converting profiling results to Markdown and feeding it back into the agent in a loop. I've done it with a bit of manual orchestration, but it could probably be automated pretty well. Specifically, pprof-rs[0] and pprof-to-md[1] have worked pretty well for me, YMMV.
Yes but the problem is that the agent reads the profile and doesn't seem to really understand how to improve things. For example, it will see "cycles are spent in GC" and make up a bunch of reasons why that might be happening.
I worry about the costs from an energy and environmental impact perspective. I love that AI tools make me more productive, but I don't like the side effects.
Environmental impact of ai is greatly overstated. Average person will make bigger positive impact on environment by reducing his meat intake by 25% compared with combined giving up flying and AI use.
Is this before or after you account for the initial training impact? Because that would need to be factored in for a good faith calculation here, much as the companies would rather we didn't.
They wouldn't, well there's Etag and alike but it still a round trip on level 7 to the origin. However the pattern generally is to say when the content is good to in the Response headers, and cache on that duration, for an example a bitcoin pricing aggregator might say good for 60 seconds (with disclaimers on page that this isn't market data), whilst My Little Town news might say that an article is good for an hour (to allow Updates) and the homepage is good for 5 minutes to allow breaking news article to not appear too far behind.
Based on the post, it seems likely that they'd just delay per the robots.txt policy no matter what, and do a full browser render of the cached page to get the content. Probably overkill for lots and lots of sites. An HTML fetch + readability is really cheap.
reply