Rendered at 19:30:20 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
airstrike 1 days ago [-]
I'll share the first-hand account I recently got from someone else.
> We've used it at work
> it is... not as hype as everyone is concerned about
> I'd argue the framework around it for security scanning is the arguably more useful side of the tool, definitely doesnt take a huge model to get all the issues it flagged on our systems
> For us, it absolutely flooded us with noise
> I mean hundreds if not thousands of false positives or minor issues or not applicable
> For every one reasonable issue
> The biggest issue it created was the execs treated every issue it produced like it was a drop everything and fix the issue type deal
> I'm talking company wide drop all things "we need to patch nginx because this module that no one uses and is disabled by default has this RCE vulnerability™
> Or "all ec2 AMIs need to be upgraded because it flagged a a version specific docker vulnerability", it flagged every single machine with docker regardless of if the actual vulnerability was relevant
> Vulnerability was with a very specific Auth plugin configuration you could enable with docker and specifically the Mosley docker compatible tool, but it is clear it only knew there was a vulnerability in docker, not if it was applicable or not
> Meanwhile dirtyfrag and friends not a single peep from btw despite it allowing for container escape
> Idk, I was underwhelmed with the quality of the reporting it gave really. If the company allowed me to get information about all the infrastructure in our entire organisation to run Claude over it repeatedly looking for recent CVEs I'm sure I could produce the same results...
bgilroy26 1 days ago [-]
It seems like there is a genuine communication breakdown between management and engineering. Engineers know that there are vulnerabilities all over the place and that there have been for ages and that where the rubber hits the road every vulnerability does not represent a successful exploit by some nefarious actor.
Management can often treat cybersecurity like a black box that represents millions upon millions in liability. If Mythos represents an opportunity to bring management's understanding of the amount of "security vulnerability debt" everyone carries into the real world, it might be a good thing
torginus 22 hours ago [-]
I had a geniunely surreal conversation with the security team the past week, it went like:
'Hi, we are reaching out to you because our tool flagged a large data transfer between such and such services'
'Wait, the source endpoint is an internal service, the target endpoint is an internal S3 bucket (I was doing a routine DB backup) Neither are reachable from the internet. How is it a security issue?'
'Our tool has flagged it'
chillfox 19 hours ago [-]
Almost all the corporate security professionals I have dealt with have been tool runners with no more than Helpdesk level skills.
void-star 18 hours ago [-]
As someone with over 30 years experience in computer security, both in corporate as well as boutique security and startup shops, who has been consistently fighting this trend, and recently bearing witness to and engaging in the current AI surge: I can say with absolute confidence that it is only getting and going to get even worse yet.
People like me who know there is a better way are getting pushed harder to lean on AI tooling even though we know that it is making things worse. This isn’t just because our founder/funding overlords are pressing us to do it. The sheer volume of new mission critical code being pumped out enabled by vibe coding is also leaving us little choice but to lean in too just to try and keep up.
We can all see it as clear as day: The tech isn’t ready for any of this. But nobody wants to hear that and everyone is marching off the cliff together anyway. We’re all going to land in the same waste pit together. Raise a glass and whimper.
jatora 17 hours ago [-]
AI is far better at security than the majority of security professionals. It is a net positive.
People constantly compare AI to this very rare expert human rather than the reality of who is already employed. Experts like you are a major culprit of this. And it puts you at odds with yourself to both admit the industry is full of subpar workers and then lament that they will be replaced with workers that are better, but still worse than you.
What is wrong with someone to make them think in this manner? Is it just a kneejerk response with little thought? Is it ego? Is it a coping mechanism? I find it very strange and interesting and annoying.
nullpoint420 13 hours ago [-]
I also don’t like your framing, here.
We need experts to know when AI is wrong, which it is all the time.
Earlier this week someone commented here that we shouldn’t expect a language model to know that you need to drive a car to a car wash, to wash a car.
So then, what do we expect it to know? Who’s responsible for when it’s wrong?
Also, why can’t Mythos just fix all these issues itself if it’s so smart. And test them to make sure they work?
scrollaway 11 hours ago [-]
> why can’t Mythos just fix all these issues itself if it’s so smart. And test them to make sure they work?
“Why”: because you didn’t ask it. It’s not its job in this case.
You don’t hire an accountant and tell them “why can’t you fix my cash-flow problems and make me money if you’re so smart”
nullpoint420 4 hours ago [-]
Ah ok, sure. The difference being the model should know how to do both based on what I’ve been told.
So why didn’t Anthropic ask it for me?
18 hours ago [-]
ikiris 15 hours ago [-]
That means you aren't high enough up to deal with the non helpdesk level security people.
torginus 14 hours ago [-]
True. It is a well-known fact that braincells per capita, and technical competence and understanding rapidly increase the higher you are on the management ladder.
deet 19 hours ago [-]
To be fair though, models might be changing the calculus for what constitutes a vulnerability that is too small / too obscure to care about.
If AI is reducing the cost of using the long tail of small vulnerabilities or is making possible chaining them together into something more profound, then those small, less-concerning issues might requiring addressing in a way that was previously not required.
thewebguyd 1 days ago [-]
It won't bring understanding though is the problem. You get situations like the parent, where the execs don't have the knowledge, time, or care to learn beyond "vulnerability bad, must patch now"
Execs/Management types getting extra visibility into the technical side, in my experience, has only ever resulted in additional but meaningless work, like just checking boxes on a compliance/audit checklist without actually considering the impacts of those changes, or whether a company is actually vulnerable to the disclosed CVE.
It's along the same lines of the BS I deal with day to day from upper management arguing back with "But ChatGPT said..." meanwhile pasting some hallucinated crap that doesn't even apply to our environment.
LLMs are basically a dunning-kruger machine for management. Engineering is best left alone and trusted to do what they are being paid to do.
JumpCrisscross 22 hours ago [-]
Yeah, I’m getting the sense that Mythos is for cybersecurity what blockchain was for back-end finance. A bit useful. But mostly good for bringing attention to upgrading neglected systems.
yggyy 15 hours ago [-]
This doesn’t make any sense either.
Many systems in relation to banking are very old and will stay that way - the economics are not favourable.
rossjudson 15 hours ago [-]
I recommend "How to measure anything in cybersecurity risk". Really interesting read about putting actual value on security.
_puk 1 days ago [-]
[dead]
mohamedkoubaa 1 days ago [-]
In other words it is equivalent to spending a million dollars on an audit by a software security consulting company
lgpartman 1 days ago [-]
Or to RedHat for rewriting Python core 500 times.
The "humans do it too" argument gets tiresome. Even if the consulting company fails, the money goes back to employees and back into the real economy. Now it goes to Don Amodei.
The consulting company could be local, which provides a higher degree of confidence, though not proof, that no data is exfiltrated to the US.
And so on.
Almondioco 24 hours ago [-]
I think Opus 4.6 and Mythos overall/marketing wise are key points because it told the world that LLMs are now a critical / usefull tool for security audits.
Its aligns with the significant jump in helpfulness in CTF.
But i think its good to hear that its not that crazy good. Everything slowing it down is good.
jr-throw 1 days ago [-]
I'm pretty impressed with regular Claude Code with Opus 4.7/4.8 in finding vulnerabilities in our code. Maybe 70% are false positives though. It's a lot of work to manually push back on the findings. Still worth it.
steve_adams_86 24 hours ago [-]
It's similar with performance optimizations.
One example was Claude thinking we could optimize converting vector tiles to raster by operating in float32 rather than float64. It turned out the library we have to use casts to float64 anyway, so the work of casting to 32 then to 64 rather than staying at 64 actually slowed the path down by 12%.
Yet it also finds the odd thing that isn't very intuitive but leads to marked improvements I never would have uncovered because... Well, as a human with only 24 hours in a day, there's no way I'll turn over every leaf and find these items on my own.
I'm totally fine with the false positives because they're so easy the verify.
AlexCoventry 23 hours ago [-]
I thought one of the advantages of Glasswing was that it could produce a PoC for you. Was it producing working PoC's?
0123456789ABCDE 22 hours ago [-]
why are folks looking at the output of the first pass?
my understanding, and experience, is that you 1. run a bunch of sessions with small permutations to create variety, 2. run more sessions dedupe reports into a smaller collections of potential vulns, 3. run a handful of agents at max effort to write PoCs + write-ups, 4. rank findings, 5. finally look at what, if anything that, was found. maybe ask questions, try and understand if the PoC is running against a realistic setup.
until you can confirm a vuln report is valid, you must assume it is invalid.
SpicyLemonZest 20 hours ago [-]
What Project Glasswing claimed at launch is that Mythos can "surpass all but the most skilled humans at finding and exploiting software vulnerabilities". What you're describing sounds more like making skilled humans more effective at penetration testing. That's cool, but it's not clear how much it matters, because most security teams were not previously bottlenecked on penetration testing capacity.
0123456789ABCDE 11 hours ago [-]
i wasn't thinking about pen-testing, but vulnerability-research, which seems to match that quote. but, you're right, gp is referring to "security scanning". i just feel like, even then whoever's running the research, should triage and validate results, before passing on to mgmt.
protocolture 18 hours ago [-]
Seems like there might be a market for a product that just prefixes "The AI Said" on emails to executives about security vulns.
jsisto 23 hours ago [-]
This reminds me of when I added Snyk to our CI/CD and brought development to a standstill
zwigglers 23 hours ago [-]
Same pattern. Scanners flag everything. The problem is there's no layer between findings and everyone's inbox. Prioritization is harder than detection.
Escapade5160 19 hours ago [-]
This is the same gripe I have over any LLM vulnerability tooling. 95% of what gets flagged is something that if taken by itself could be a vulnerability. However, the path to execute that specific vuln, in that specific function, is impossible in that particular code base and it just makes noise.
sbayg 21 hours ago [-]
In other words it creates work. In other words Jevons paradox.
I can’t wait for the first court case where an LLM surfaces a vuln, lazy devs ignore it, and someone later sues the company into oblivion for liability.
fn-mote 19 hours ago [-]
When was the last time you remember a company being sued into oblivion for a security breach?
The cost in the US is more like “one year of credit monitoring”.
jen20 18 hours ago [-]
> The biggest issue it created was the execs treated every issue it produced like it was a drop everything and fix the issue type deal
While this is definitely not the ideal end of the spectrum either, execs treating security issues as something serious instead of annoyances that should only be addressed if revenue can be tied to doing so is a welcome improvement.
huflungdung 17 hours ago [-]
[dead]
mekpro 1 days ago [-]
It’s clear that Anthropic has run out of the compute capacity needed to serve Mythos publicly.
They’re using security concerns to mask their inability to deliver the model at scale, while still trying to maintain their lead over OpenAI. As a result, they’ve chosen to release it privately under the banner of an “ethical” rollout.
mofeien 24 hours ago [-]
Jack Clark, co-founder of Anthropic said the following at an Oxford lecture last week ([0], at around 10 and 12 mins):
"It's a technology that we do not fully understand because it's more grown than made. And it is a technology that you can concoct plausible scenarios where it could kill every single person on the planet. So to think building this technology is without risk would be an act of hubris or insanity.
[...]
The technology is in fact so powerful that I should clearly state that if it was possible to elegantly slow the development of this technology to give ourselves more time as a species to deal with it, that would likely be a good thing. ... But in the absence of a coordinated global slowdown, we are left with the current situation, which is a powerful technology being developed at breakneck speed by a variety of actors and a variety of countries locked in a competition with one another where commercial and geopolitical rivalries are often drowning out the larger existential-to-the-species aspects of the technology being built. This isn't an ideal situation, but it's the one we find ourselves in."
They know they are in a race that no one will win.
>you can concoct plausible scenarios where it could kill every single person on the planet.
Idiots can scary black box their way to that concern. Plausible? Not so much.
graerg 17 hours ago [-]
It is already very plausible (and has been since the 1950s) without the advent of LLMs. This is just another layer on top of the preexisting and very plausible existential threats we already face.
protocolture 16 hours ago [-]
Detail it. Justify it.
Your comment about before LLMs is a non sequitur. Demonstrate that an LLM can kill everyone on the planet.
patcon 15 hours ago [-]
Task a squirrel with justifying the risk of a fox, but from the biomolecular level. That is the level of the task you are setting out.
There can be arms-races in domains that are unfathomable to the participants. A small mammal will die a billion times over before it understands the evolutionary mechanisms and the genetic playing field on which it loses. Actors are not necessarily privy to understand the means by which they will lose, and humans have only existed in a small window of time in which we fashioned a manicured garden, in which that full understanding was briefly possible. It is not favoured in the universe for us to fully understand our environment imho
If the risk must be exhaustively detailed before it is given credence, we are already doomed, and deservedly so
protocolture 14 hours ago [-]
>Task a squirrel with justifying the risk of a fox, but from the biomolecular level. That is the level of the task you are setting out.
Thats a really deep thought for a 12 year old.
>There can be arms-races in domains that are unfathomable to the participants.
You cant even justify LLMs as being unfathomable. Oh watch out I am fathoming them. You cant stop me fathoming all over the place.
>A small mammal will die a billion times over before it understands the evolutionary mechanisms and the genetic playing field on which it loses.Actors are not necessarily privy to understand the means by which they will lose, and humans have only existed in a small window of time in which we fashioned a manicured garden, in which that full understanding was briefly possible. It is not favoured in the universe for us to fully understand our environment imho
Non Sequitur. One that sounds like it was made up for that "What the Bleep" garbage.
>If the risk must be exhaustively detailed before it is given credence, we are already doomed, and deservedly so
The risk needs to be justified as something more substantial than weird people writing wannabe edgy messages on the internet. If someone on the internet told you that we need to drastically reverse living standards because there's a risk that modern technology will summon King Kong any reasonable person would ask for the working out instead of running for a cave.
patcon 14 hours ago [-]
You're kind of an asshole. No thanks
protocolture 13 hours ago [-]
Its not like you handed me anything but woo to work with. There's really nothing less respectful than making up absolute nonsense and expecting a kind and thoughtful reply.
pertymcpert 12 hours ago [-]
No they're right. Regardless if one agrees with you or not, doesn't change the fact that your behavior was that of an asshole. I would know since I'm one too.
tomjakubowski 23 hours ago [-]
It's worth noting that Clark's career started in PR and journalism.
airstrike 19 hours ago [-]
It's worth remembering that people do not always say what they believe. Instead, they often say that which benefits them the most.
clbrmbr 21 hours ago [-]
Was a good watch, tho would have liked to be there in person. Props to Brenden & his Cosmos team for really setting the bar.
WarmWash 23 hours ago [-]
Thank you Mr. Altman for firing the starting gun when no one else wanted to race.
(The ambiguity of sarcasm is intentional here.)
sumedh 8 hours ago [-]
Didnt Google start the race with their paper?
WarmWash 6 hours ago [-]
Google (and other labs) wanted to keep the tech internal because of the obvious safety concerns. Once they were confident that the tech was understood and under control, the public could start being drip fed. Everyone on the ground back then was hyper cautious.
Then Altman made ChatGPT public, and the race began.
delusional 24 hours ago [-]
"Oh what peril we are in where I must get rich by killing all of you" Is a statement that should make you disregard anyone saying it at any time. Either they are liars, or they are so morally bankrupt that they are willing to sacrifice the species for short term satisfaction. Either option makes them more fit for a mental hospital than a stage.
NiloCK 1 days ago [-]
I find this line of reasoning highly dubious.
Yes, Anthropic is compute constrained, even after the SpaceX Colossus deal.
But supply constraints are the normal operating mode of any market. Anthropic could choose to serve whatever models it pleases at whatever price points it chooses and let the market decide where the value is.
If Mythos at $X overwhelms their capacity, they could just charge $X+1. If still overwhelmed, there are larger prices as well.
OtherShrezzing 1 days ago [-]
During periods of market exuberance, it’s in the vendors interest not to reveal where exactly x+1 is. At the moment, everyone just guesstimates the company’s TAM. Bringing certainty to that guesstimate cuts Anthropic off from the most exuberant market participants, bringing their post-IPO price down unnecessarily.
JumpCrisscross 22 hours ago [-]
> Mythos at $X overwhelms their capacity, they could just charge $X+1
This may not be as valuable in the long term as getting committed customers hooked at a sustainable price.
deaton 1 days ago [-]
The question is, will anyone pay enough for Mythos to offset the opportunity cost of offering that much Opus? You don't want to end up in a spot where you don't have enough compute and your service's reliability degrades to an unusable state like xAI.
orrito 1 days ago [-]
I feel like there's always a demand for the very best models, even at insane prices. If the opportunity cost is x times opus, maybe few but there will always be companies willing to pay x+1 times opus.
suttontom 1 days ago [-]
Isn't that kind of what they're doing with this rollout? Except they're just hand picking the companies.
ikiris 14 hours ago [-]
Only if the price is under the competition, which does exist now.
tiahura 1 days ago [-]
Sort of, but valuation models depend on X being in a certain range. If it > this range, revenue and therefore valuation are impacted.
mattnewton 1 days ago [-]
No insider info, but just wanted to mention that pricing signals things too. If Mythos is only servable at $X*Y dollars and isn’t Y times better than $X of compute at another provider, it’s quite possible that affects the IPO price negatively versus the halo of having the worlds most expensive model that is “too powerful to release” unpriced and unbenchmarked.
I think that most people at Anthropic are true believers from my interactions with them so I don’t believe this theory anecdotally. The simplest explanation is that it really is taking a while to gain confidence they won’t be used for a spree of bad cyber attacks. Knowing how long it takes institutions to fix security issues when filed by humans I would be more suprised if this wasn’t the case.
But I would forgive anyone who did think it was deliberately sandbagged; given the staggering sums at play, true believers might believe the ends justify the means to a little “marketing” like this.
malfist 1 days ago [-]
And then the bubble would collapse. Corps are already putting limits on token usage across the board because of costs. Increasing costs would significantly contract the hype bubble.
jb_briant 1 days ago [-]
It is not "clear", as your comment suggests, it's hidden. Which is semantically the opposite of clear. Regarding your theory, might be true, might be false. But it's highly speculative.
Forgeties79 1 days ago [-]
All of us, including you, know that he is not saying "they are being transparent." When someone says "it's clear that..." in this way they're saying "It's clear to us what is really happening here.
jb_briant 1 days ago [-]
It's not clear, there is no tangible proof that Mythos is not released because they don't have compute power to serve it. Saying that would imply that the "too dangerous" is a lie. Nobody has proof. It can feel "clear" for you, but it's not. Hence, I correct it.
jb_briant 1 days ago [-]
Yes I got how they used the phrase. And it was wrong, so I wanted to react. Thanks for your addition, it dissipates any doubt on the intention of OP: he thinks Anthropic is hiding the lack of power by pretending it's too dangerous. But he is wrong to assume that without proof, hence my reaction.
Forgeties79 1 days ago [-]
Agreed, but I'm talking about how they are, very clearly, using the phrase.
WhitneyLand 1 days ago [-]
The not clear comment is valid by either interpretation.
To a lot of us it’s not clear that’s what’s happening. It’s speculation and one possibility.
It may also be a secondary consideration and not the primary gating factor.
Anthropic has had their missteps but it’s still plausible to take what they say at face value.
jb_briant 1 days ago [-]
I agree, saying "it's clear" when at best, "it's plausible" doesn't let the conversation happen.
And pretending to know what is going on behind the scene, anon on HN is not credible
baq 1 days ago [-]
I had to patch my Linux boxes daily at some point in the past couple months. I don’t want Mythos to be publicly released for as long as it is economically feasible for Anthropic. I hope they have a gentleman’s agreement with OpenAI and DeepMind about this, too.
Chinese labs will force their hands, until then let’s hope maximum number of projects get patched at a reasonable pace.
cmxch 16 hours ago [-]
I hope that such agreement gets broken hard and given the MSRC cold shoulder. If that means abliterated Qwen et al embarrasses Mythos to deliver a wider rollout, I’ll take that.
Trusting Anthropic to deliver is like asking Microsoft to pay out for bugs.
simonw 1 days ago [-]
They started Glasswing before they struck that $1.25B/month deal with xAI/SpaceX for their (notoriously dirty) Memphis data centers.
So they have a whole lot more compute now than they did last month.
mekpro 1 days ago [-]
Yes, 300 MW from SpaceX helps a lot, but I think that’s mainly to support Opus demand, which has grown faster than expected. If Mythos is roughly 5× more expensive to serve than Opus, as the pricing suggests, then 300 MW is nowhere near enough to enable large-scale deployment of Mythos.
As an ordinary developer who relies on a $20–$200/month subscription, I feel disappointed by the release of a paper describing a model that I can’t actually use.
aspenmartin 1 days ago [-]
Ok but they can easily upsell this to enterprise customers at a market price reflective of their capacity constraints. Big corps would pay it, this is clearly a major update.
nickthegreek 1 days ago [-]
But that compute might not be available to then long term. Hard to make big moves with a contract like that.
simonw 1 days ago [-]
I don't know if any of the big AI labs have confidence in planning for the long term.
For all they know they'll find a new optimization that lets them serve Opus class models for half the computing cost next month. Or someone will invent the next OpenClaw and demand will 10x over night.
mrbluecoat 1 days ago [-]
While I tend to be cynical with big tech, if this statement is indeed true we owe them some thanks for staving off a zero day tsunami.
> 50 initial partners ... found more than 10,000 high- or critical-severity security flaws.
cobolcomesback 1 days ago [-]
So why is OpenAI also releasing 5.5-Cyber in a private manner? Are they also out of compute?
LiamPowell 1 days ago [-]
OpenAI has been pulling this marketing trick for years. Remember how GPT-3 was too dangerous to release? It's also probably bad PR if script kiddies have access to GPT model with no guardrails even if it doesn't enable any significant attacks.
signatoremo 1 days ago [-]
I suppose you meant GPT-2, but for years? Did they say the same about subsequent models?
LiamPowell 1 days ago [-]
They did it for 2 and 3, however it looks like they didn't for 4 and 5.
For GPT-2 and GPT-3 it seems like the concern was that they hadn't yet figured out how to properly write safeguards for it yet:
> The company believes making its API generally available was made possible due to its progress with safeguards, and that opening up the API to all developers will help see applications developed faster. ...
> A large emphasis has been placed on safe use of the tool, which in the past has been criticised for a range of shortcomings, including racism and prejudices against specific genders and religions.
LiamPowell 12 hours ago [-]
Maybe, but they certainly used it for marketing too. At the time they contacted a bunch of publications and gave them access but told them they could only share snippets of the output [1]. The only reason to set restrictions like that is marketing.
> Now, OpenAI's terms of service don't let me give you the full list. I have to curate them, and show you a sample. Those are the terms and conditions I agreed to.
AgentME 24 hours ago [-]
GPT-4 was announced in March 2023 and wasn't made available to all developers until July 2023.
atleastoptimal 1 days ago [-]
Why do you think that? All these rumors about compute constraint just seem like speculation and not based on any data or information. All they would need to do is increase their prices to free up compute capacity.
notahacker 1 days ago [-]
The security concerns argument would have worked better if a forum full of people hadn't promptly obtained access by the extremely sophisticated tactic of guessing its URL...
pshirshov 1 days ago [-]
I bet Huawei and co would be happy to sell them some cheapo chips for inference!
Almondioco 24 hours ago [-]
Or they actually take the 'technology can kill' serious.
lossolo 1 days ago [-]
Probably. This is an 8-12 trillion-parameter model, which is why it costs so much, that is also a major reason, besides RL and synthetic data, why it suddenly gained these new capabilities. They claim it was not fine-tuned or trained specifically for cybersecurity, but is instead a general purpose model.
y0eswddl 1 days ago [-]
it's also a marketing ploy.
cute_boi 1 days ago [-]
Also, they just want to jack up the price by creating sensation.
benashford 1 days ago [-]
[dead]
ianm218 1 days ago [-]
In case the topic of memory safety is interesting to anyone I've been experimenting with using AI agents to port common web infra projects to safe/ performant Rust. Somewhat inspired by the Bun port - was thinking that at some point memory safety might be such a big deal that people just need drop in replacements.
- Valkey/ Redis port here https://github.com/ianm199/valdr (passes ~99% of single node test suite, real prod features like replication/ clustering/ HA early or not implemented)
- Further along port of Lua 5.1-5.5 https://github.com/ianm199/lua-rs-port/tree/main
- I have a less developed nginx version that would be the north star
- These projects are very alpha at the moment
If anyone is interested in getting involved in this or has done similar experiments I'd love to collaborate! There is so much variation in how you can run these large scale agent fleets I don't think anyone has a perfect system yet.
julianlam 1 days ago [-]
Respectfully, as an OSS maintainer (not to the scale of nginx or valkey, of course)... if a third-party used an AI agent to rewrite my software in a different language, that gives me absolutely no reason to support that new project.
It is in all respects foreign code in a language I may or may not be familiar with, and worse yet, if I were to take over, I'd be responsible for maintaining the whole black box forever more?
Thank you but no thanks.
baq 1 days ago [-]
I don’t think anyone expects you to TBH. If you show interest, great. If not, the robot will translate your work into a different form of expression anyway. If you’re releasing open source software under BSD-like licenses, it’s still better than some company taking your work and selling it with zero value contributed back.
rxhampton 1 days ago [-]
This kind of nihilistic argument worked in 2000 when open source software was a counter movement.
patates 1 days ago [-]
I don't understand how it's nihilistic and how it doesn't work in 2026.
mswphd 23 hours ago [-]
seems like it could be a decent opportunity for a bad actor to slip in something nefarious as well. We all know that nobody is reviewing a >>500k loc diff. For something like the Bun rewrite, where plausibly the person driving the agent(s) didn't want to sneak in something nefarious things might be ok. That seems significantly less true when you cannot trust the person driving the agent(s)/producing the >>500k loc diff.
ianm218 1 days ago [-]
Yes I hope this can be separated from people who are inundating OSS maintainers with slop PRs - these are fully separate projects with zero expectation of involvement from maintainers. Valkey itself is forked off the original Redis.
There might be a world where people soon just find unsafe C code exposed to the web (i.e. nginx) an untenable situation and I hope it can be a helpful resource.
Anyway, I see open source code as positive sum. Maybe in the end only a small community who cares about cross compilation finds this helpful and thats a win!
dyauspitr 1 days ago [-]
Why would you have to take it over? Wouldn’t it just be a fork/different project entirely.
jspdown 1 days ago [-]
I find this kind of rewrite both disrespectful and completely useless. Useless because the difficulty isn't getting to a working state but maintaining it. You now have to build a community around it to make any of this worthwhile. What would this software be worth if security issues weren't patched and bugs weren't fixed? You can't do this alone.
And I find it disrespectful because people have spent decades building this, and you're taking all that collectively built knowledge to create something that will compete with the project itself.
I hope people will restrain themself from doing this at least in the name of good ethic. I fear this is going to hurt OSS a lot.
I hope people will hold back from this, if only out of respect for the work that came before. I fear it could do real damage to OSS. It would discourage the maintainers whose effort makes any of it possible.
ianm218 24 hours ago [-]
Hmm I view open source as purely positive sum. Valkey was forked from Redis in the first place.
But this is more about memory safety - you can have immense respect for the giants who built these tools but also be worried that memory safety might become an even bigger deal. If someone found a memory zero day in nginx or openSSL for example that is a very big deal!
I think this is one strategy we should look into, hopefully people in the C community look into other options like project Glasswing/ next generation fuzzers etc. When the world of security is changing so fast it is good to get a lot of shots on net.
0x000xca0xfe 23 hours ago [-]
And what if someone gets pwned by a bog standard logic or input validation bug in your slopped together "nginx" that is not present in the original?
pixl97 20 hours ago [-]
And what if they get owned by a memory safety issue that's in the original and not the rewrite?
I know many of these projects have been around for years but it's time for developers to put on their big boy panties and start taking memory safe languages seriously. Watching the same attacks again and again for 30 years is getting droll.
ianm218 20 hours ago [-]
If someone is running projects with a big "alpha" tag in production, exposed to the web they very well might get pwned haha!
overfeed 1 days ago [-]
Are you preserving the original software licenses, or AI-laundering the code in the manner[1] of https://malus.sh
1. AI-rewrites are not clean room implementations.
15155 18 hours ago [-]
Do you have any specific case citations to substantiate this idea?
overfeed 3 hours ago [-]
"AI-laundering" is a moral judgement, based on not being a dick.
Regarding the footnote: pick any of the litigated cases, then consider if AI rewrites meet the bar of a clean-room.
ianm218 24 hours ago [-]
Yes original licenses are preserved
safercplusplus 1 days ago [-]
If the source language is C++, another option might be to use AI agents to port to a memory-safe subset of C++ [1]. For the most part, this involves surgical changes and glorified find-and-replace operations. And I'm guessing way fewer tokens :)
If the source language is legacy C, then another option might be (deterministic) transpilation to a memory-safe subset of C++ [2]. The resulting code wouldn't necessarily be performance-optimal, but it can be used for the majority of code that isn't really performance-sensitive.
I love Rust, but porting others software to Rust (or any language) is a mixed bag. I'm a strong believer that good software requires deep domain knowledge to build and maintain. Porting code you don't understand by hand already risks still not understanding it afterwards, doing it in an automated fashion all but guarantees it.
All that to say I think these automated ports are interesting experiments. However if you want to build something people can trust, the people need to be able to trust that you fully understand what is built, and why it's built the way it is.
rxhampton 1 days ago [-]
If someone is porting such disparate projects as Valkey and Lua it is just for show and will be pre-alpha forever.
No one wants Bun in Rust, no one wants the rsync vibe code additions. This is just the only pro-AI comment, so the AI people voted it to the top.
ianm218 1 days ago [-]
Hmm my reasoning was that in order to have Nginx work in Rust you want to expose scripting so having a decent Lua in Rust is key to not call out to C for that. And then Valkey/ Redis is a lot simpler than nginx so it was a good way to learn how some of this works.
And I'd disagree on no one wants - Lua is quite helpful since it is easily used in WASM. There has been some interest from people in the Bevy community - a game engine in Rust - since you can't have Lua scripting in browser games easily with the C version.
Lua is often heavily used in Redis/Valkey, if you are interested in porting Valkey, it makes sense to also port Lua, they aren't "disparate".
baq 1 days ago [-]
I don’t care if bun is written in zig, rust, go, f# or sql. If it works, it works.
I also don’t care if it’s written by humans or LLMs or robot overlords from Alpha Centauri. Again, if it works, it works.
The operative word here is ‘works’. Code is now cheap, QA still isn’t. Since people don’t really like doing the same thing twice, specs for working code have never been written. Nowadays there is no reason to not create a spec detailed enough for robots to make no mistakes (pun intended) when filling in the gaps when converting from spec space to code space. As long as this remains true, I don’t care who or what does the boring parts.
jspdown 1 days ago [-]
Who is going to maintain all these forks? The person forking it has no community and can't realistically be an expert in all of these domains.
So no, it doesn't "work", it just worked at some point in the past.
overfeed 1 days ago [-]
> Who is going to maintain all these forks?
"Claude" would probably be the response of a typical tokenmaxxer. I have no desire to use software that was drive-by forked by someone who doesn't understand the trade-offs made by a project, or the values underpinning them. I'll AI-assisted domain-expert, or experienced maintainer who stumbled into the role over someone with no grounding in the domain being the human in the loop for AI agents.
dingnuts 1 days ago [-]
[dead]
mentalgear 1 days ago [-]
Here's my big fear: Even IF (and that's a BIG if) we get all critical vulnerabilities fixed in tech (before adversarial/state-actors turn up with open attack models) - we still have (in at least a year) models that will be so good in social engineering that they can still (given enough tokens) gain access to whatever system they want.
If society can't trust banks and other institutions to safely control their data, what follows ?
Do we we collectivelly switch off the internet?
protocolture 18 hours ago [-]
>Here's my big fear: Even IF (and that's a BIG if) we get all critical vulnerabilities fixed in tech (before adversarial/state-actors turn up with open attack models) - we still have (in at least a year) models that will be so good in social engineering that they can still (given enough tokens) gain access to whatever system they want.
I was working at the fruit company when they just hard stopped people from recovering their fruitcloud accounts via phone support due to social engineering.
Social Engineering risk just increases the burden on the consumer/internal support services. The risk is that not everyone has pulled up stumps to protect these services. After a few high profile fuck ups they will. The herd loses 2 beasts and the rest wander away from that water hole.
Its much like how after bitlocker we dont have user access to backup server disks anymore. The lesson was learned and we moved on. Lots of high profile fuckups but we dont get those anymore. CTO's were forced, basically at gunpoint, to adapt or die.
colechristensen 1 days ago [-]
Social engineering as a problem goes away when anybody can get a model to do it for them for $5. It stops being possible, it's really the bank's problem when they can't have a minimum wage call center or a robot responsible for people's data.
p-e-w 1 days ago [-]
Yes. There will be a few high-profile incidents, and then institutions will be forced to stop performing administrative actions based on people’s word.
applfanboysbgon 1 days ago [-]
This outcome is massively detrimental to humanity at large. By eliminating the human factor from support, you make it impossible to get support in edge cases that fall outside of the pre-planned bureacratic process. Everyone already hates that Google can arbitrarily ban anybody they please with no way to get in contact with a human, and you want to extend that to banks in control of people's life savings?
hallway_monitor 1 days ago [-]
I don't think anyone is saying that. You will just need to be authenticated before giving any commands to the bank. Maybe some type of TOTP that you can use over the phone or in person.
applfanboysbgon 1 days ago [-]
That is the exact problem. You have identification tied to your device. Your device is lost or stolen. Now you can't access your bank account. Human support can help you out by finding flexible ways to ascertain your identity. This is the angle social engineers exploit, tricking employees trying to be helpful to abuse that area of flexibility. You can take away human judgment and all flexibility in the system, and that will make the system more secure, but it also results in a deeply uncaring system that makes life harder for people. Rigid bureacracy doesn't do a good job of accounting for a house fire destroying everything you own or your e-mail provider shutting down; these are fringe cases but they do happen and there are positive resolutions available as long as human discretion is involved.
DANmode 1 days ago [-]
No.
You don’t tie it to “your device”.
You tie it to your security key.
Which is treated like a credit card.
and your extended family, friends, or volunteers can act as social proof to allow you back into your accounts,
if your key burns up, it breaks and you were too cool to provision a backup, etc.
pesus 1 days ago [-]
Credit cards are lost and stolen all the time, and it isn't really a big deal when it happens, since charges can usually be easily reversed. This does not sound like the same scenario. It also doesn't account for people who lack friends/family nearby or at all.
> it breaks and you were too cool to provision a backup
If we're relying on the average person to back things up properly, this idea is doomed from the start.
DANmode 1 days ago [-]
> If we're relying on the average person to back things up properly, this idea is doomed from the start.
The average person is relying on the average person, for everything, and I agree, they are doomed from the start.
Tech-related items inclusive.
DANmode 1 days ago [-]
Yes, you wouldn’t offer your private key to a random food truck.
Just new banks.
Same as people being unafraid of their car key being cloned - because they don’t hand it around the general public.
repeekad 1 days ago [-]
> Everyone already hates that Google can arbitrarily ban people
Yet they’re still the predominate search engine, sadly the concerns of the few don’t interest monopolistic profit seekers without forced regulations, think how airlines are legally required to give refunds for delayed flights, there’s a reason it required legislation
insanitybit 1 days ago [-]
A lot of social engineering attacks die the second you have domain bound 2FA. Not everything, but a lot.
But the idea that we'll squash all of the critical vulns is simply nonsense, despite the weird Firefox blog posts that indicate otherwise.
MostlyStable 1 days ago [-]
We don't need to squash all of them, we need to squash all of them that are practically findable by current and very near term frontier models.
insanitybit 24 hours ago [-]
Also impossible imo so it's moot.
MostlyStable 22 hours ago [-]
Because you think that current models can, in a practical sense, find an infinite number of vulnerabilities, or you think that they can find so many that it isn't possible to fix them?
In other words: do you think that the impossibility lines in exhausting the number finds or does the impossibility lie in fixing them?
In either case, do you think that this was also true pre-AI? That is to say: it was not possible to, given some set of practical resource constraints, find and fix all the vulnerabilities that a similarly-resourced group would find?
If so, then would you say that you just fundamentally don't believe in secure software and the only defense is lack of attention?
insanitybit 17 hours ago [-]
I think that there are, practically, infinite vulnerabilities in common and critical software - browsers, operating systems, etc. So discovering all of them is not tractable, and even if we 100x our rate of discovery it won't matter.
> In either case, do you think that this was also true pre-AI? That is to say: it was not possible to, given some set of practical resource constraints, find and fix all the vulnerabilities that a similarly-resourced group would find?
Yes.
> If so, then would you say that you just fundamentally don't believe in secure software and the only defense is lack of attention?
I believe in security software, few people are building it though and the majority of relevant attack surface is dogshit for security.
Squashing vulns via discovery is irrelevant to security. If we want safer software it has to be built to be safer.
UltraSane 1 days ago [-]
If things really get that bad then everything will require FIDO keys or push authorization using a phone app and possibly a initial registration code sent to a physical address. This is how Epic MyChart works.
lern_too_spel 1 days ago [-]
The government should be in charge of ID Provider infrastructure and has local offices (postal) that can establish physical identity (and already do for people who need to travel abroad), but the religiously affiliated NWO conspiracy theorists have made this politically infeasible in the US, so we have unsavory private sector providers like World ID stepping in.
waffleiron 1 days ago [-]
Not so sure I would want a company that does not see any issues with mass surveillance of my country [1] to have access to critical infrastructure or its source code where I live.
> But using these systems for mass domestic surveillance is incompatible with democratic values.
827a 1 days ago [-]
GPT-5.5-Cyber has already at least hit if not surpassed Mythos capability in cyber tasks. The only reason they're holding back is because once its out everyone would realize that its capabilities were a step change in March, but are not anymore, yet it costs significantly more and is much slower.
john_strinlai 1 days ago [-]
how did you go about assessing this?
chis 15 hours ago [-]
But GPT-5.5-Cyber is also not released publicly?
jansan 1 days ago [-]
So you believe one marketing department more than the other?
I believe the correct way to interpret AISI’s findings is that both Mythos and 5.5-Cyber are capable of solving their full benchmark (the only two models that can); Mythos does it with fewer tokens and more consistently.
Two things of note: 5.5-Cyber is likely to be substantially cheaper than Mythos, given it is priced around Opus. Additionally: AISI has never tested OpenAI’s best public model and actual Mythos competitor: 5.5-Pro.
strictnein 18 hours ago [-]
Work in a top tier security org at a Fortune 50. We still can't get access to this stuff, even though we've reached out repeatedly.
I mention this because if you're frustrated that you can't access it, you're not alone. Even with our company's heft and a security org that is very well known in the industry we're getting nowhere.
aliljet 1 days ago [-]
Is this just one giant marketing plot?
hasteg 1 days ago [-]
There's a lot of speculation that it is indeed a marketing plot and the model is just a step improvement over current capabilities... and the real reason they aren't releasing the model is they are compute constrained and cannot serve the model. To my knowledge there's no proof of this however, but given the fact that literally 60 days ago they made Mythos out to be the end of the world and last Friday they announced that they will release the model in a few weeks, I feel like it was indeed something along those lines (marketing ploy).
kspacewalk2 1 days ago [-]
Their IPO is coming up soon. It would be interesting if Mythos remained mythical right up until then, wouldn't it?
basch 1 days ago [-]
Or just control of supply and demand. If they can charge twice as much serving half as many customers, that leaves a lot of potential future customers leftover.
protocolture 18 hours ago [-]
yes
datakan 1 days ago [-]
The week before they released Mythos to governments they had all their source code stolen. It's all about improving their image and creating propoganda.
pixelesque 1 days ago [-]
It wasn't "all their source code", it was the source code to Claude Code: not really any of their internal secret sauce, at least directly.
0123456789ABCDE 22 hours ago [-]
it wasn't stolen either. an employee accidentally included a source map file with the release.
aspectop 1 days ago [-]
i think anthropic is being performative here, creating a hype for mythos and not releasing. i guess this is all a marketing thing to sell a security specialized AI to enterprise and startups at a way larger cost coz security market is deep in money.
skybrian 1 days ago [-]
This is just cover for being sore that you don’t have access yet <- see what I did there?
People and organizations can have mixed motivations. It’s often not “just” one thing.
So, they expand the program to US "ally" governments and corporations.
These entities will now give all their IP to an American company that only promises not to spy on Americans.
Subsequently, the NSA can audit the leaked sources manually and find real exploits.
yalogin 22 hours ago [-]
We are in the early stages of monetizing the AI stack/service and Anthropic is set to take it in. Not sure if it’s cost effective for them or not but they are clear winner here. They have created this awareness among executives about the value and need for AI and that is what matters, it will be budgeted accordingly. They are positioning it as a must have not just for productivity but also beyond that as a Swiss army tool , pretty smart
bushido 1 days ago [-]
This feels more and more like a marketing/scarcity play for the largest global corps.
Will likely give them time to expand capacity as well. And make them harder to dislodge in these orgs.
aspenmartin 1 days ago [-]
To me this makes little sense — I can’t imagine the orgs they have limited this rollout to don’t already have Claude subscriptions and integrations. And sure this may play nicely into branding a build a mystique around the model but ultimately they are missing out on a ton of revenue and risking being totally front-run now that model performance parameters are out and people have firsthand experience. Feels more like a fairly genuine attempt to be responsible. They could have easily rolled out an update and done some PR to absolve themselves of responsibility
jb_briant 1 days ago [-]
Urgency x scarcity, unbeatable marketing move.
bushido 1 days ago [-]
It is really good. Will also cut through the common procurement, legal and change management processes seen at these orgs.
jb_briant 1 days ago [-]
Genius^2
merrvk 1 days ago [-]
Got to say, Anthropic have hell of a marketing team.
CephalopodMD 1 days ago [-]
This is either a chuffed up PR move or an extremely generous alpha fold "publish all the proteins" moment
saidnooneever 1 days ago [-]
PR. their models dont do anything more than others. neither do their tools. they have very agrressive and misleading marketing.
work in cyber for 15+ years, worked at largest CS vendor globally and fortune5 companies. (top of the list).
only ppl using AI are shops with clueless ppl getting flooded in nonsense.
You will see the same sentiment in a different flavor (different products) in OSS security mailing lists now.
do not fall for it. for every one of these things. Test it, verify it, for yourself. dont take anyone's word. 100B+ valuation makes external voices generally worthless (easy to buy)
Im not saying its all worthless or will never amount to anything. But test yourself. please. a lot of money is getting chucked around.
Look at it this way.
They have a stake in having companies create not very good software.
They have a stake in supplying engineers to fix that.
They have a stake in tools to fix problems that it creates.
you can guess the next service offering...
yanis_t 1 days ago [-]
Is there any evidence Mythos is qualitatively better than the Opus 4.x?
I'm afraid that the usual mantra that "we just need more scale" that worked well for attracting investments, is not working anymore - bigger models provide marginal improvements while naturally get much more expensive to run.
Is this why both Anthropic and OpenAI are rushing for IPOs this year?
wrsh07 1 days ago [-]
It is quantitatively better at finding and exploiting vulnerabilities. Pretty wild that everyone here is just in denial about that, when folks who have used it say it's as good as the hype
From what I've read so far it's less about Mythos being much better at tasks in isolation.
Security wise, it's about being able to find and chain multiple vulnerabilities to actually create viable exploits.
So I would imagine that if you were using it for regular software development you may not feel that it's that different unless used in a particular way?
pixl97 19 hours ago [-]
There seems to be some number of people here on HN that make their money in old style cyber security that seem to be under the delusion that LLMs are just going to go away and it's going to go back to business as usual for their cash cow.
I work with a number of people in security that have come around, and while they still think LLMs are rather garbage at architecture, they see how well current models we can access now are at finding security issues. They can chain together wildly different concepts and turn them into working exploits.
aspenmartin 1 days ago [-]
> Im afraid that the usual mantra that "we just need more scale" that worked well for attracting investments, is not working anymore - bigger models provide marginal improvements while naturally get much more expensive to run.
It's super interesting to hear this refrain on HN, it is alarmingly common. Anthropic released benchmark numbers on Mythos, as they have for all of their models. Once models become public, people evaluate them in a myriad of ways. We have had reliable scaling laws for years and they still hold. Epoch capability index continues to grow exactly as expected. Where does this idea come from?
As for cost, the cost per token at a given level of performance drops up to 40x per year.
cmxch 17 hours ago [-]
Mythos numbers are effectively irreproducible aside from cherry-picked approvals.
aspenmartin 15 hours ago [-]
Yes absolutely. However the benchmarks they did release numbers for are key ones and the leaps were very large. Absolutely possible that they either lied or that the full picture is much muddier but based on the numbers they do show that’s hard for me to imagine a likely scenario that would produce that.
rfgplk 1 days ago [-]
It probably isn't, at least in terms of security or memory safety. The current models can already sniff out all memory vulnerabilities with relative ease, you can't really beat that.
JacobAsmuth 22 hours ago [-]
Have you read firefoxes findings? They found it to be qualitatively improved over Opus, and have published several of the resulting CVEs as well as more detailed numbers.
bcrosby95 22 hours ago [-]
They also seem to point to it being more the harness than the model itself.
JacobAsmuth 4 hours ago [-]
Really? They mention that Opus 4.7 in the same harness found like 1% of the bugs that Mythos found.
cassianoleal 1 days ago [-]
In the meantime, not everyone with actual access to the model are all that impressed.
“Cybersecurity weather person and award winning shitposter.” why are they someone we should pay attention to the opinion of?
jchw 1 days ago [-]
That's someone who is confident enough to have an evidently successful enough career to be able to access Mythos in its currently-limited rollout and yet not take themselves terribly seriously online.
Realistically their opinion deserves to hold more weight than the median HN comment.
dymk 1 days ago [-]
I dunno, I trust the engineers working on Firefox or the Linux kernel more than some random pseudo-anonymous Mastodon account -
I would prefer a pseudo-anonymous account if possible. Obviously if this is a marketing stunt the very not anonymous feedback is called into question immediately.
That said: I already was aware of Mozilla's account and despite what you are thinking, it essentially confirms everything.
> The biggest differentiating factor was the use of an agent harness, a piece of code that wraps around an LLM to guide it through a series of specific tasks. For such a harness to be useful, it requires significant resources to customize it to the project-specific semantics, tooling, and processes it will be used for.
Yep. Sounds exactly right. So the question is do we really need Mythos for this or can almost any reasonably close to frontier AI model accomplish similar results with a sufficiently advanced harness?
Jury's out but my vote is "probably most of the way". After all, alongside all of the splashy zero days dropped by eager AI companies, Greg Kroah-Hartman has been posting many useful, if minor patches to the Linux kernel produced by nothing more than a single 128 GiB Framework Desktop. So apparently, even small models can be very useful if you can find a way to get the noise out.
Mythos could still be very useful and effective and still be mostly a marketing ploy, and that's because until very recently investment in trying to make LLMs work for security auditing has been underserved. Without more substantial information, it's difficult to tell how much better at security research Mythos is vs say, Opus or DeepSeek 4 coupled with a good agent harness would be.
And in that sense, it's the same sort of crap as the GPT-2 and GPT-3 releases. A lot of hooplah about how dangerous it is to humanity. Then it turns out it's only dangerous enough that it needs to be gated behind an additional monthly subscription.
slopinthebag 1 days ago [-]
I definitely don't.
ofjcihen 22 hours ago [-]
The most intelligent person involved in the highest level projects at my current company introduces themselves as an out of work circus clown.
There is an incredible amount of competency signaled by someone who was given access to this model but doesn’t treat their online presence like a professional resume.
Whats currently an open source project which comes closest to Mythos capabilities?
adrian_b 1 days ago [-]
No single open weights model comes close to either Mythos or GPT 5.5.
Nonetheless, running many of the open weights models over a codebase, with an appropriate harness, can provide about the same vulnerability coverage (i.e. each of the open weights models would find a subset of what Mythos or GPT 5.5 could find, but the subsets are not the same).
Despite needing more runs and more time, this may be significantly cheaper, especially if the models are self hosted.
Based on what Anthropic said about Mythos, they also use a quite elaborate harness for finding bugs and vulnerabilities, i.e. not a simple prompt like "find the bugs".
They run repeatedly Mythos on each file of the codebase, many times. They start with more generic prompts, used to determine whether a more thorough analysis of that file is worthwhile. Then they use more specific prompts, to detect various classes of bugs. After it becomes probable that a certain bug exists, they do a final run where the prompt requests a confirmation of the already known bug, perhaps together with a proposed patch or a PoC exploit.
Therefore the efficiency of finding vulnerabilities depends a lot on the harness, not only on the LLM. Also, searching vulnerabilities in a big codebase when paying per token is very expensive, because it requires many runs of the LLM.
1 days ago [-]
andrewjneumann 1 days ago [-]
They keep writing like they stand to profit from this or something. Too many “coulds” in there for me too, this could be an amazing advancement and it could be nothing… normally we look at data and last headline I saw was 25 “high” vulnerabilities at the cost of $1 million in tokens.
No comparison to human teams, and I’m sure that $1 million in tokens was used by humans, in a team. So like most AI, they’ve developed a tool that capable people can use to be better, but unlike most tools, they’re claiming this to be outright magic. The magic is the hype train.
jofzar 1 days ago [-]
> The organizations in this new group are based in more than 15 countries
I mean most nasdaq tech companies would be in 13+ countries, why are they writing this like it's a big number, is hilariously small?
newtonsmethod 1 days ago [-]
I assume they're using a more candid definition where they're not counting all the countries a company may be based, but rather the primary country they're based in.
I don't think they're trying to flex this as a large number. They don't want to give an exact number, as that may change etc / is fuzzy, but also want to give you an idea of the scale.
They say "In the future, we intend to expand our geographical reach much further". I imagine this commentary is somewhat related to the concerns that AI will create an even worse "global underclass". AI developments are first accessible to Americans, then allies, and then later the whole world.
SpicyLemonZest 1 days ago [-]
They're writing it in contrast to the previous scope, which doesn't seem to have been available to any organizations based outside the US. (There was news a few weeks ago about how Japanese banks were going to gain access, but based on the timing I think this announcement is that access.)
aplthrowaway67 1 days ago [-]
How "altruistic" of them. If only Anthropic extended this level of care to the environment or the economy.
aspenmartin 1 days ago [-]
Why do you think the impact to the economy is bad? Also, if youre talking about the environment from the lens of data centers, I agree they CAN be incredibly problematic and that should be regulated and pushed back against when they engage in problematic behaviors (stressing water supply in a drought area e.g.). But not blindly -- data centers can be a clear win-win in a lot of ways, especially if done right. "data centers = bad" is way way way too simplistic a picture
aplthrowaway67 57 minutes ago [-]
These are extremely common and well-discussed opinions, you can disagree with them but I would venture that asking "WHY do you think AI will negatively impact the economy?" is so ignorant it's not even a good-faith argument.
fontain 1 days ago [-]
“Mythos Preview continues a long-term trend that we’ve been warning about for some time: within 6 to 12 months […]”
The only trend Mythos continues is Anthropic’s trend of warning that disaster is always 6 to 12 months away.
jb_briant 1 days ago [-]
Step 1: claim you created a tool so dangerous you can't release it
Step2: offer to test it, but only for the biggest companies in the world
Step 3: onboard those big players on your tooling and product
Step 4: profit
This is genius.
estearum 1 days ago [-]
And all you have to do is demonstrate unique value during the pilot phase!
Err... wait... that was already the hard part... hmm
jb_briant 1 days ago [-]
Genius marketing move doesn't mean there is no value.
It means than even if the value you offer is similar as your competitors, you are the one conquering the market.
That's the only way to not becoming a commodity.
geodel 1 days ago [-]
With trillion dollars at stake they can hire best of best in sales and marketing. And unlike some hardcore hackers who may have ethics that does not always move in direction of more money. Sales and marketing people are highly motivated for opportunities to make more money.
jb_briant 1 days ago [-]
Our game is to craft shit, their game is to sell shit. You gotta respect the different tastes in the nature!
geodel 1 days ago [-]
Yeah, Companies to buy shit and their employees to eat shit. Lion king would say it is great circle of life.
jb_briant 1 days ago [-]
Here spoke the wise man!
cyanydeez 1 days ago [-]
<stop hiring people>
Don't you understand, if they really did do the <ai magic> they don't need to hire anyone, IT SELLS ITSELF
skybrian 1 days ago [-]
It’s true that providing security services to so many organizations will likely put them in a position to earn lots of money. It makes them an essential service, sort of like what happened with Cloudflare and denial-of-service attacks. (There are competitors, but they’re the first company people think of.)
But I think that downplays the importance of having a good product. If the product didn’t work, this would be a good way to lose trust with a lot of organizations in a hurry.
sandeepkd 1 days ago [-]
This is a circular economy that makes everyone look good. Almost all of these enterprise companies are sitting on top of so much of tech debt that in any realistic scenario they cant really patch vulnerabilities if they are even in double digits. A lot of these companies would not even let their valuable enough code to be ingested by LLM's.
At this phase no company would risk their brand by calling the product as ineffective. The big players are in it together and small ones have no option but to play along.
Nevertheless collecting the historical wisdom and running it at machine scale does have a lot of benefits for sure. The only question is the signal to noise ratio, machine is doing what humans did, just at a multiplier speed and with a lot more context than what a normal human can hold.
jb_briant 1 days ago [-]
Yeah and apparently, Mythos is pretty effective at finding critical issues. So it seams to be a good product served with a genius offer. Anthropic founding engineers are already comfortable, they will end rich.
They did produce great value, claude code and opus 4.5 are a singularity in software engineering.
The job we practiced for decades simply doesn't exist anymore.
skybrian 1 days ago [-]
Yes, it flips it from “don’t use a remote LLM on our code because security” to “we must run an LLM on our code because security.”
aspenmartin 1 days ago [-]
These companies are surely already onboarded…? They claim like 10k verified and high severity CVEs. Would you have preferred they just rolled it out like another opus update? You wouldn’t be insinuating in that situation that they were careless and reckless? They risk missing a boatload of revenue if openAI front runs them for a public launch. In what world is this some sort of scam??
jb_briant 1 days ago [-]
Where did I use the word scam?
Marketing move doesn't mean scam. It describe the ability to sell people over a narrative and surpassing your competitor in market share. And that's exactly what is happening.
My post is a "tribute" to the efficiency of Anthropic's communication.
I never complained about anything, nor calling it a scam, nor saying they should have released mythos to the public instead of rolling it out to a selected cohort.
You tried to expand my words to make me say something I didn't, because my post wasn't giving you a clear conclusion of my opinion regarding their private release.
aspenmartin 1 days ago [-]
Ok you’re totally right, I read this as a cynical “this is all marketing” post ==> a scammy connotation. Without that read, your points are fairly valid, but are you still implying this is all a pure marketing tactic? If so I would still argue against that as a necessity but surely marketing could be heavily involved. But still: this could easily be a footgun. OpenAI will easily release the same model and now that Anthropic has taken the initiative to do a slower more contained rollout they wouldn’t need to do any of that. So from a business perspective I would still argue this whole glasswing initiative would make their sales and marketing department pretty nervous. I mean in a second-order branding sense sure this plays into the “we are ethical” ethos but it hardly seems worth the risk
jb_briant 1 days ago [-]
I don't have enough elements to conclude if the world would collapse if Mythos was released publicly without Glasswing.
Nor publicly or in my internal reasoning. I rarely conclude without proof or very intense and clear intuition.
From a strategic PoV it makes sense to check if their model is dangerous, I wouldn't want to have my brand name associated with "NK hacker team find zero day in all linux servers of the web and ..."
baggachipz 1 days ago [-]
Seems like they're not even close to step 4.
wslh 1 days ago [-]
And put Chris Olah, Anthropic co-founder, sitting next to Pope Leo XIV presenting his first encyclical, Magnifica Humanitas, at the Vatican.
cyanydeez 1 days ago [-]
>can't release it
can't release it the plebs
jb_briant 1 days ago [-]
Unsure about that, opus was already insanely good and we got it for a fractional cost via subscriptions.
They want the plebs, they want the mass.
cyanydeez 1 days ago [-]
I don't think so bro; anthropic wants B2B cash.
cmxch 1 days ago [-]
That’s fine as long as I can identify and reject any Mythos derived patch as being irreproducible.
IanCal 1 days ago [-]
Why would it not be reproducible?
cmxch 16 hours ago [-]
Its analysis from prompt/harness to end products.
IanCal 9 hours ago [-]
You can't do that for me either.
Why would you refuse to use a patch that deals with a valid PoC exploit?
If a random contributor posted an explanation of an exploit, showed it worked in an executable way, presented a patch and you could see that the exploit no longer worked - would you refuse to use the fix until the contributor showed how they figured it out?
cmxch 6 hours ago [-]
Given where Mythos alleges to go, reproducibility far beyond a hash promise, an alleged (but not really proven) existence of an PoC, and “Trust me bro” is necessary.
When an ungated (or even abliterated) public model can repeatedly, easily, and accurately embarrass Anthropic’s models, that might change.
astrange 1 days ago [-]
How can a patch be "reproducible"? The testcases are reproducible.
cmxch 16 hours ago [-]
How Mythos’s mysterymeat got there from front to back.
philipwhiuk 1 days ago [-]
It would have been nice to have a list of the 150, but I guess it would make them a hacking target?
maipen 1 days ago [-]
I don't get how this is event front page of HN.
dyauspitr 24 hours ago [-]
Grow up, what else would be? This is about as relevant as it gets to this forum.
catigula 1 days ago [-]
I still find it funny that GPT-5.5 is just as good as Mythos and yet Anthropic likes to make things worse than they actually are.
aspenmartin 1 days ago [-]
What's your basis for this?
andai 1 days ago [-]
[dead]
frays 1 days ago [-]
[flagged]
Jtarii 1 days ago [-]
Thanks for your input Claude.
devmor 1 days ago [-]
I see that as not just a spam post, but a generated addition to the dead internet - a real win for us algorithms.
jwpapi 1 days ago [-]
Ragebait god
cyanydeez 1 days ago [-]
Expanding Project Glasswing (IPO)
3sk_ask8 1 days ago [-]
Anthropic has the marketing of a weight loss product.
- They still claim 10000 issues, but they found only one in curl.
- They did not find rsync issues but Claude rather introduced rsync issues.
- Facebook is a member of this cult program but Mythos did not find the account takeover flaw.
- Mythos did not find the issues in Anthropic's own Bun rewrite.
They will not release Mythos because it would be exposed as a fraud before the IPO.
rfgplk 1 days ago [-]
It's just pure marketing, and most people are falling for it. The primary issue stems from their definition of "vulnerability". Most C code will be _swimming_ in vulnerabilities depending on how you analyze it (ie function that accepts a pointer but doesn't validate -> potential vulnerability right there). The only thing that matters is if it's de facto exploitable or not.
poemxo 1 days ago [-]
To be fair the curl author is excellent at what he does.
testfrequency 1 days ago [-]
Mythos gives BIG Tesla FSD energy, I’m over it
atleastoptimal 1 days ago [-]
What does that even mean?
UltraSane 1 days ago [-]
All hype no substance.
atleastoptimal 1 days ago [-]
Mythos was announced a few month ago and has been actually demoed in many companies who have all reported its abilities, supporting the claims made by Anthropic. How is this in any way similar to the FSD situation?
testfrequency 1 days ago [-]
Tesla does* the same thing with “influencers”, close enough perhaps?
Everyone either doesn’t have access, or always has the “bad version” and the “trust me it’s 10x better” version is always Coming Soon™
atleastoptimal 24 hours ago [-]
You're conflating the protracted promises for full-self-driving with the current rollout of autonomous driving features in Teslas, a feature people are using today. I've driving with multiple people who use their Tesla self-driving and report its quality/accuracy, this isn't some overpromised future feature.
And I mean, it's not like Anthropic is a zero-product company that is only offering gated access to their only product, Opus 4.7/4.8 are very good and are driving billions in revenue. Anyone can use it and see how good it is, and it is clear that it is a very good model at many things. It is no huge leap to imagine that a model that is 10x bigger is also better at many of the tasks that Opus is good at.
They are gating the release because of cybersecurity/misuse concerns, which makes sense because
1. Existing models are already being used to find exploits and hack into systems
2. We don't know the effects of releasing a tool which can autonomously exploit systems, especially in a world driven by a "security through obscurity" philosophy. It makes sense to give a heads-up to patch up software that affects billions of users before releasing it.
Imagining that this delayed rollout is all a big marketing scheme, that they have gotten dozens of multi-national companies to play along, and that Anthropic is somehow now just patently being dishonest about something while they have every incentive to not be dishonest (especially when they are neck and neck with OpenAI and their relative success depends on verified claims about model abilities), is pure conspiratorial thinking and driving more by a motivated cynicism about AI companies rather than a reasoned examination of the claims being made.
conradludgate 1 days ago [-]
Continously saying "FSD will be ready next year" for the last N years
atleastoptimal 1 days ago [-]
Mythos was announced a few month ago and has been actually demoed in many companies who have all reported its abilities, supporting the claims made by Anthropic. How is this in any way similar to the FSD situation?
mrbonner 1 days ago [-]
Maybe it is just me: I feel Anthropic most recent product announcements resemble more and more like what IBM tactic was at its high. For instance, the Watson AI hype after it defeated Kasparov. The difference is IBM actually wanted and let businesses buy and use Watson as opposed to time released like what Anthropic does to even boost the hype higher.
3sk_ask8 1 days ago [-]
Big Blue defeated Kasparov. The Watson hype was about winning Jeopardy, which is still kind of the only use case for current AI.
> We've used it at work
> it is... not as hype as everyone is concerned about
> I'd argue the framework around it for security scanning is the arguably more useful side of the tool, definitely doesnt take a huge model to get all the issues it flagged on our systems
> For us, it absolutely flooded us with noise
> I mean hundreds if not thousands of false positives or minor issues or not applicable
> For every one reasonable issue
> The biggest issue it created was the execs treated every issue it produced like it was a drop everything and fix the issue type deal
> I'm talking company wide drop all things "we need to patch nginx because this module that no one uses and is disabled by default has this RCE vulnerability™
> Or "all ec2 AMIs need to be upgraded because it flagged a a version specific docker vulnerability", it flagged every single machine with docker regardless of if the actual vulnerability was relevant
> Vulnerability was with a very specific Auth plugin configuration you could enable with docker and specifically the Mosley docker compatible tool, but it is clear it only knew there was a vulnerability in docker, not if it was applicable or not
> Meanwhile dirtyfrag and friends not a single peep from btw despite it allowing for container escape
> Idk, I was underwhelmed with the quality of the reporting it gave really. If the company allowed me to get information about all the infrastructure in our entire organisation to run Claude over it repeatedly looking for recent CVEs I'm sure I could produce the same results...
Management can often treat cybersecurity like a black box that represents millions upon millions in liability. If Mythos represents an opportunity to bring management's understanding of the amount of "security vulnerability debt" everyone carries into the real world, it might be a good thing
'Hi, we are reaching out to you because our tool flagged a large data transfer between such and such services'
'Wait, the source endpoint is an internal service, the target endpoint is an internal S3 bucket (I was doing a routine DB backup) Neither are reachable from the internet. How is it a security issue?'
'Our tool has flagged it'
People like me who know there is a better way are getting pushed harder to lean on AI tooling even though we know that it is making things worse. This isn’t just because our founder/funding overlords are pressing us to do it. The sheer volume of new mission critical code being pumped out enabled by vibe coding is also leaving us little choice but to lean in too just to try and keep up.
We can all see it as clear as day: The tech isn’t ready for any of this. But nobody wants to hear that and everyone is marching off the cliff together anyway. We’re all going to land in the same waste pit together. Raise a glass and whimper.
People constantly compare AI to this very rare expert human rather than the reality of who is already employed. Experts like you are a major culprit of this. And it puts you at odds with yourself to both admit the industry is full of subpar workers and then lament that they will be replaced with workers that are better, but still worse than you.
What is wrong with someone to make them think in this manner? Is it just a kneejerk response with little thought? Is it ego? Is it a coping mechanism? I find it very strange and interesting and annoying.
We need experts to know when AI is wrong, which it is all the time.
Earlier this week someone commented here that we shouldn’t expect a language model to know that you need to drive a car to a car wash, to wash a car.
So then, what do we expect it to know? Who’s responsible for when it’s wrong?
Also, why can’t Mythos just fix all these issues itself if it’s so smart. And test them to make sure they work?
“Why”: because you didn’t ask it. It’s not its job in this case.
You don’t hire an accountant and tell them “why can’t you fix my cash-flow problems and make me money if you’re so smart”
So why didn’t Anthropic ask it for me?
If AI is reducing the cost of using the long tail of small vulnerabilities or is making possible chaining them together into something more profound, then those small, less-concerning issues might requiring addressing in a way that was previously not required.
Execs/Management types getting extra visibility into the technical side, in my experience, has only ever resulted in additional but meaningless work, like just checking boxes on a compliance/audit checklist without actually considering the impacts of those changes, or whether a company is actually vulnerable to the disclosed CVE.
It's along the same lines of the BS I deal with day to day from upper management arguing back with "But ChatGPT said..." meanwhile pasting some hallucinated crap that doesn't even apply to our environment.
LLMs are basically a dunning-kruger machine for management. Engineering is best left alone and trusted to do what they are being paid to do.
Many systems in relation to banking are very old and will stay that way - the economics are not favourable.
The "humans do it too" argument gets tiresome. Even if the consulting company fails, the money goes back to employees and back into the real economy. Now it goes to Don Amodei.
The consulting company could be local, which provides a higher degree of confidence, though not proof, that no data is exfiltrated to the US.
And so on.
Its aligns with the significant jump in helpfulness in CTF.
But i think its good to hear that its not that crazy good. Everything slowing it down is good.
One example was Claude thinking we could optimize converting vector tiles to raster by operating in float32 rather than float64. It turned out the library we have to use casts to float64 anyway, so the work of casting to 32 then to 64 rather than staying at 64 actually slowed the path down by 12%.
Yet it also finds the odd thing that isn't very intuitive but leads to marked improvements I never would have uncovered because... Well, as a human with only 24 hours in a day, there's no way I'll turn over every leaf and find these items on my own.
I'm totally fine with the false positives because they're so easy the verify.
my understanding, and experience, is that you 1. run a bunch of sessions with small permutations to create variety, 2. run more sessions dedupe reports into a smaller collections of potential vulns, 3. run a handful of agents at max effort to write PoCs + write-ups, 4. rank findings, 5. finally look at what, if anything that, was found. maybe ask questions, try and understand if the PoC is running against a realistic setup.
until you can confirm a vuln report is valid, you must assume it is invalid.
I can’t wait for the first court case where an LLM surfaces a vuln, lazy devs ignore it, and someone later sues the company into oblivion for liability.
The cost in the US is more like “one year of credit monitoring”.
While this is definitely not the ideal end of the spectrum either, execs treating security issues as something serious instead of annoyances that should only be addressed if revenue can be tied to doing so is a welcome improvement.
They’re using security concerns to mask their inability to deliver the model at scale, while still trying to maintain their lead over OpenAI. As a result, they’ve chosen to release it privately under the banner of an “ethical” rollout.
[0] https://www.youtube.com/watch?v=8zIcP5WlShw
Idiots can scary black box their way to that concern. Plausible? Not so much.
Your comment about before LLMs is a non sequitur. Demonstrate that an LLM can kill everyone on the planet.
There can be arms-races in domains that are unfathomable to the participants. A small mammal will die a billion times over before it understands the evolutionary mechanisms and the genetic playing field on which it loses. Actors are not necessarily privy to understand the means by which they will lose, and humans have only existed in a small window of time in which we fashioned a manicured garden, in which that full understanding was briefly possible. It is not favoured in the universe for us to fully understand our environment imho
If the risk must be exhaustively detailed before it is given credence, we are already doomed, and deservedly so
Thats a really deep thought for a 12 year old.
>There can be arms-races in domains that are unfathomable to the participants.
You cant even justify LLMs as being unfathomable. Oh watch out I am fathoming them. You cant stop me fathoming all over the place.
>A small mammal will die a billion times over before it understands the evolutionary mechanisms and the genetic playing field on which it loses.Actors are not necessarily privy to understand the means by which they will lose, and humans have only existed in a small window of time in which we fashioned a manicured garden, in which that full understanding was briefly possible. It is not favoured in the universe for us to fully understand our environment imho
Non Sequitur. One that sounds like it was made up for that "What the Bleep" garbage.
>If the risk must be exhaustively detailed before it is given credence, we are already doomed, and deservedly so
The risk needs to be justified as something more substantial than weird people writing wannabe edgy messages on the internet. If someone on the internet told you that we need to drastically reverse living standards because there's a risk that modern technology will summon King Kong any reasonable person would ask for the working out instead of running for a cave.
(The ambiguity of sarcasm is intentional here.)
Then Altman made ChatGPT public, and the race began.
Yes, Anthropic is compute constrained, even after the SpaceX Colossus deal.
But supply constraints are the normal operating mode of any market. Anthropic could choose to serve whatever models it pleases at whatever price points it chooses and let the market decide where the value is.
If Mythos at $X overwhelms their capacity, they could just charge $X+1. If still overwhelmed, there are larger prices as well.
This may not be as valuable in the long term as getting committed customers hooked at a sustainable price.
I think that most people at Anthropic are true believers from my interactions with them so I don’t believe this theory anecdotally. The simplest explanation is that it really is taking a while to gain confidence they won’t be used for a spree of bad cyber attacks. Knowing how long it takes institutions to fix security issues when filed by humans I would be more suprised if this wasn’t the case.
But I would forgive anyone who did think it was deliberately sandbagged; given the staggering sums at play, true believers might believe the ends justify the means to a little “marketing” like this.
To a lot of us it’s not clear that’s what’s happening. It’s speculation and one possibility.
It may also be a secondary consideration and not the primary gating factor.
Anthropic has had their missteps but it’s still plausible to take what they say at face value.
Chinese labs will force their hands, until then let’s hope maximum number of projects get patched at a reasonable pace.
Trusting Anthropic to deliver is like asking Microsoft to pay out for bugs.
So they have a whole lot more compute now than they did last month.
As an ordinary developer who relies on a $20–$200/month subscription, I feel disappointed by the release of a paper describing a model that I can’t actually use.
For all they know they'll find a new optimization that lets them serve Opus class models for half the computing cost next month. Or someone will invent the next OpenClaw and demand will 10x over night.
> 50 initial partners ... found more than 10,000 high- or critical-severity security flaws.
GPT-2: https://slate.com/technology/2019/02/openai-gpt2-text-genera...
GPT-3: https://www.itpro.com/technology/artificial-intelligence-ai/...
> The company believes making its API generally available was made possible due to its progress with safeguards, and that opening up the API to all developers will help see applications developed faster. ...
> A large emphasis has been placed on safe use of the tool, which in the past has been criticised for a range of shortcomings, including racism and prejudices against specific genders and religions.
[1]: https://youtu.be/TfVYxnhuEdU?t=102
Transcript of the timestamped part:
> Now, OpenAI's terms of service don't let me give you the full list. I have to curate them, and show you a sample. Those are the terms and conditions I agreed to.
- Valkey/ Redis port here https://github.com/ianm199/valdr (passes ~99% of single node test suite, real prod features like replication/ clustering/ HA early or not implemented) - Further along port of Lua 5.1-5.5 https://github.com/ianm199/lua-rs-port/tree/main - I have a less developed nginx version that would be the north star - These projects are very alpha at the moment
If anyone is interested in getting involved in this or has done similar experiments I'd love to collaborate! There is so much variation in how you can run these large scale agent fleets I don't think anyone has a perfect system yet.
It is in all respects foreign code in a language I may or may not be familiar with, and worse yet, if I were to take over, I'd be responsible for maintaining the whole black box forever more?
Thank you but no thanks.
There might be a world where people soon just find unsafe C code exposed to the web (i.e. nginx) an untenable situation and I hope it can be a helpful resource.
Anyway, I see open source code as positive sum. Maybe in the end only a small community who cares about cross compilation finds this helpful and thats a win!
I hope people will restrain themself from doing this at least in the name of good ethic. I fear this is going to hurt OSS a lot.
I hope people will hold back from this, if only out of respect for the work that came before. I fear it could do real damage to OSS. It would discourage the maintainers whose effort makes any of it possible.
But this is more about memory safety - you can have immense respect for the giants who built these tools but also be worried that memory safety might become an even bigger deal. If someone found a memory zero day in nginx or openSSL for example that is a very big deal!
I think this is one strategy we should look into, hopefully people in the C community look into other options like project Glasswing/ next generation fuzzers etc. When the world of security is changing so fast it is good to get a lot of shots on net.
I know many of these projects have been around for years but it's time for developers to put on their big boy panties and start taking memory safe languages seriously. Watching the same attacks again and again for 30 years is getting droll.
1. AI-rewrites are not clean room implementations.
Regarding the footnote: pick any of the litigated cases, then consider if AI rewrites meet the bar of a clean-room.
If the source language is legacy C, then another option might be (deterministic) transpilation to a memory-safe subset of C++ [2]. The resulting code wouldn't necessarily be performance-optimal, but it can be used for the majority of code that isn't really performance-sensitive.
[1] https://github.com/duneroadrunner/scpp_code_migration [2] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...
All that to say I think these automated ports are interesting experiments. However if you want to build something people can trust, the people need to be able to trust that you fully understand what is built, and why it's built the way it is.
No one wants Bun in Rust, no one wants the rsync vibe code additions. This is just the only pro-AI comment, so the AI people voted it to the top.
And I'd disagree on no one wants - Lua is quite helpful since it is easily used in WASM. There has been some interest from people in the Bevy community - a game engine in Rust - since you can't have Lua scripting in browser games easily with the C version.
But anyway if people want it or not memory safety might become much more important so I think it is a good area to explore. Some people think large C codebases are inherently unsecurable https://alexgaynor.net/2020/may/27/science-on-memory-unsafet...
Lua is often heavily used in Redis/Valkey, if you are interested in porting Valkey, it makes sense to also port Lua, they aren't "disparate".
I also don’t care if it’s written by humans or LLMs or robot overlords from Alpha Centauri. Again, if it works, it works.
The operative word here is ‘works’. Code is now cheap, QA still isn’t. Since people don’t really like doing the same thing twice, specs for working code have never been written. Nowadays there is no reason to not create a spec detailed enough for robots to make no mistakes (pun intended) when filling in the gaps when converting from spec space to code space. As long as this remains true, I don’t care who or what does the boring parts.
So no, it doesn't "work", it just worked at some point in the past.
"Claude" would probably be the response of a typical tokenmaxxer. I have no desire to use software that was drive-by forked by someone who doesn't understand the trade-offs made by a project, or the values underpinning them. I'll AI-assisted domain-expert, or experienced maintainer who stumbled into the role over someone with no grounding in the domain being the human in the loop for AI agents.
If society can't trust banks and other institutions to safely control their data, what follows ?
Do we we collectivelly switch off the internet?
I was working at the fruit company when they just hard stopped people from recovering their fruitcloud accounts via phone support due to social engineering.
Social Engineering risk just increases the burden on the consumer/internal support services. The risk is that not everyone has pulled up stumps to protect these services. After a few high profile fuck ups they will. The herd loses 2 beasts and the rest wander away from that water hole.
Its much like how after bitlocker we dont have user access to backup server disks anymore. The lesson was learned and we moved on. Lots of high profile fuckups but we dont get those anymore. CTO's were forced, basically at gunpoint, to adapt or die.
You don’t tie it to “your device”.
You tie it to your security key.
Which is treated like a credit card.
and your extended family, friends, or volunteers can act as social proof to allow you back into your accounts,
if your key burns up, it breaks and you were too cool to provision a backup, etc.
> it breaks and you were too cool to provision a backup
If we're relying on the average person to back things up properly, this idea is doomed from the start.
The average person is relying on the average person, for everything, and I agree, they are doomed from the start.
Tech-related items inclusive.
Just new banks.
Same as people being unafraid of their car key being cloned - because they don’t hand it around the general public.
Yet they’re still the predominate search engine, sadly the concerns of the few don’t interest monopolistic profit seekers without forced regulations, think how airlines are legally required to give refunds for delayed flights, there’s a reason it required legislation
But the idea that we'll squash all of the critical vulns is simply nonsense, despite the weird Firefox blog posts that indicate otherwise.
In other words: do you think that the impossibility lines in exhausting the number finds or does the impossibility lie in fixing them?
In either case, do you think that this was also true pre-AI? That is to say: it was not possible to, given some set of practical resource constraints, find and fix all the vulnerabilities that a similarly-resourced group would find?
If so, then would you say that you just fundamentally don't believe in secure software and the only defense is lack of attention?
> In either case, do you think that this was also true pre-AI? That is to say: it was not possible to, given some set of practical resource constraints, find and fix all the vulnerabilities that a similarly-resourced group would find?
Yes.
> If so, then would you say that you just fundamentally don't believe in secure software and the only defense is lack of attention?
I believe in security software, few people are building it though and the majority of relevant attack surface is dogshit for security.
Squashing vulns via discovery is irrelevant to security. If we want safer software it has to be built to be safer.
[1] https://www.anthropic.com/news/statement-department-of-war :
> But using these systems for mass domestic surveillance is incompatible with democratic values.
They seem pretty close, in both average and "best run" scores. And, in a highly verifiable domain, "best run" or pass@n is what you're looking for.
Two things of note: 5.5-Cyber is likely to be substantially cheaper than Mythos, given it is priced around Opus. Additionally: AISI has never tested OpenAI’s best public model and actual Mythos competitor: 5.5-Pro.
I mention this because if you're frustrated that you can't access it, you're not alone. Even with our company's heft and a security org that is very well known in the industry we're getting nowhere.
People and organizations can have mixed motivations. It’s often not “just” one thing.
https://www.0xsid.com/blog/meta-account-takeover-fiasco
These entities will now give all their IP to an American company that only promises not to spy on Americans.
Subsequently, the NSA can audit the leaked sources manually and find real exploits.
Will likely give them time to expand capacity as well. And make them harder to dislodge in these orgs.
work in cyber for 15+ years, worked at largest CS vendor globally and fortune5 companies. (top of the list).
only ppl using AI are shops with clueless ppl getting flooded in nonsense.
You will see the same sentiment in a different flavor (different products) in OSS security mailing lists now.
do not fall for it. for every one of these things. Test it, verify it, for yourself. dont take anyone's word. 100B+ valuation makes external voices generally worthless (easy to buy)
Im not saying its all worthless or will never amount to anything. But test yourself. please. a lot of money is getting chucked around.
Look at it this way.
They have a stake in having companies create not very good software. They have a stake in supplying engineers to fix that. They have a stake in tools to fix problems that it creates.
you can guess the next service offering...
I'm afraid that the usual mantra that "we just need more scale" that worked well for attracting investments, is not working anymore - bigger models provide marginal improvements while naturally get much more expensive to run.
Is this why both Anthropic and OpenAI are rushing for IPOs this year?
Cf wrote a genuinely good piece and had found a bunch of bugs: https://blog.cloudflare.com/cyber-frontier-models/
Wolfssl is security focused and it found a novel exploit https://www.wolfssl.com/how-claude-mythos-preview-helped-har...
You can pretend that it's all smoke and mirrors, but that just doesn't match up with reality: https://www.paloaltonetworks.com/blog/2026/05/defenders-guid...
Security wise, it's about being able to find and chain multiple vulnerabilities to actually create viable exploits.
So I would imagine that if you were using it for regular software development you may not feel that it's that different unless used in a particular way?
I work with a number of people in security that have come around, and while they still think LLMs are rather garbage at architecture, they see how well current models we can access now are at finding security issues. They can chain together wildly different concepts and turn them into working exploits.
It's super interesting to hear this refrain on HN, it is alarmingly common. Anthropic released benchmark numbers on Mythos, as they have for all of their models. Once models become public, people evaluate them in a myriad of ways. We have had reliable scaling laws for years and they still hold. Epoch capability index continues to grow exactly as expected. Where does this idea come from?
As for cost, the cost per token at a given level of performance drops up to 40x per year.
https://cyberplace.social/@GossiTheDog/116679693992983945
Realistically their opinion deserves to hold more weight than the median HN comment.
https://arstechnica.com/information-technology/2026/05/mozil...
https://www.theregister.com/software/2026/03/26/linux-kernel...
That said: I already was aware of Mozilla's account and despite what you are thinking, it essentially confirms everything.
> The biggest differentiating factor was the use of an agent harness, a piece of code that wraps around an LLM to guide it through a series of specific tasks. For such a harness to be useful, it requires significant resources to customize it to the project-specific semantics, tooling, and processes it will be used for.
Yep. Sounds exactly right. So the question is do we really need Mythos for this or can almost any reasonably close to frontier AI model accomplish similar results with a sufficiently advanced harness?
Jury's out but my vote is "probably most of the way". After all, alongside all of the splashy zero days dropped by eager AI companies, Greg Kroah-Hartman has been posting many useful, if minor patches to the Linux kernel produced by nothing more than a single 128 GiB Framework Desktop. So apparently, even small models can be very useful if you can find a way to get the noise out.
Mythos could still be very useful and effective and still be mostly a marketing ploy, and that's because until very recently investment in trying to make LLMs work for security auditing has been underserved. Without more substantial information, it's difficult to tell how much better at security research Mythos is vs say, Opus or DeepSeek 4 coupled with a good agent harness would be.
And in that sense, it's the same sort of crap as the GPT-2 and GPT-3 releases. A lot of hooplah about how dangerous it is to humanity. Then it turns out it's only dangerous enough that it needs to be gated behind an additional monthly subscription.
There is an incredible amount of competency signaled by someone who was given access to this model but doesn’t treat their online presence like a professional resume.
https://www.linkedin.com/in/kevin-beaumont-security/
Nonetheless, running many of the open weights models over a codebase, with an appropriate harness, can provide about the same vulnerability coverage (i.e. each of the open weights models would find a subset of what Mythos or GPT 5.5 could find, but the subsets are not the same).
Despite needing more runs and more time, this may be significantly cheaper, especially if the models are self hosted.
Based on what Anthropic said about Mythos, they also use a quite elaborate harness for finding bugs and vulnerabilities, i.e. not a simple prompt like "find the bugs".
They run repeatedly Mythos on each file of the codebase, many times. They start with more generic prompts, used to determine whether a more thorough analysis of that file is worthwhile. Then they use more specific prompts, to detect various classes of bugs. After it becomes probable that a certain bug exists, they do a final run where the prompt requests a confirmation of the already known bug, perhaps together with a proposed patch or a PoC exploit.
Therefore the efficiency of finding vulnerabilities depends a lot on the harness, not only on the LLM. Also, searching vulnerabilities in a big codebase when paying per token is very expensive, because it requires many runs of the LLM.
No comparison to human teams, and I’m sure that $1 million in tokens was used by humans, in a team. So like most AI, they’ve developed a tool that capable people can use to be better, but unlike most tools, they’re claiming this to be outright magic. The magic is the hype train.
I mean most nasdaq tech companies would be in 13+ countries, why are they writing this like it's a big number, is hilariously small?
I don't think they're trying to flex this as a large number. They don't want to give an exact number, as that may change etc / is fuzzy, but also want to give you an idea of the scale.
They say "In the future, we intend to expand our geographical reach much further". I imagine this commentary is somewhat related to the concerns that AI will create an even worse "global underclass". AI developments are first accessible to Americans, then allies, and then later the whole world.
The only trend Mythos continues is Anthropic’s trend of warning that disaster is always 6 to 12 months away.
Step2: offer to test it, but only for the biggest companies in the world
Step 3: onboard those big players on your tooling and product
Step 4: profit
This is genius.
Err... wait... that was already the hard part... hmm
It means than even if the value you offer is similar as your competitors, you are the one conquering the market.
That's the only way to not becoming a commodity.
Don't you understand, if they really did do the <ai magic> they don't need to hire anyone, IT SELLS ITSELF
But I think that downplays the importance of having a good product. If the product didn’t work, this would be a good way to lose trust with a lot of organizations in a hurry.
At this phase no company would risk their brand by calling the product as ineffective. The big players are in it together and small ones have no option but to play along.
Nevertheless collecting the historical wisdom and running it at machine scale does have a lot of benefits for sure. The only question is the signal to noise ratio, machine is doing what humans did, just at a multiplier speed and with a lot more context than what a normal human can hold.
They did produce great value, claude code and opus 4.5 are a singularity in software engineering.
The job we practiced for decades simply doesn't exist anymore.
Marketing move doesn't mean scam. It describe the ability to sell people over a narrative and surpassing your competitor in market share. And that's exactly what is happening.
My post is a "tribute" to the efficiency of Anthropic's communication. I never complained about anything, nor calling it a scam, nor saying they should have released mythos to the public instead of rolling it out to a selected cohort.
You tried to expand my words to make me say something I didn't, because my post wasn't giving you a clear conclusion of my opinion regarding their private release.
Nor publicly or in my internal reasoning. I rarely conclude without proof or very intense and clear intuition.
From a strategic PoV it makes sense to check if their model is dangerous, I wouldn't want to have my brand name associated with "NK hacker team find zero day in all linux servers of the web and ..."
can't release it the plebs
They want the plebs, they want the mass.
Why would you refuse to use a patch that deals with a valid PoC exploit?
If a random contributor posted an explanation of an exploit, showed it worked in an executable way, presented a patch and you could see that the exploit no longer worked - would you refuse to use the fix until the contributor showed how they figured it out?
When an ungated (or even abliterated) public model can repeatedly, easily, and accurately embarrass Anthropic’s models, that might change.
- They still claim 10000 issues, but they found only one in curl.
- They did not find rsync issues but Claude rather introduced rsync issues.
- Facebook is a member of this cult program but Mythos did not find the account takeover flaw.
- Mythos did not find the issues in Anthropic's own Bun rewrite.
They will not release Mythos because it would be exposed as a fraud before the IPO.
Everyone either doesn’t have access, or always has the “bad version” and the “trust me it’s 10x better” version is always Coming Soon™
And I mean, it's not like Anthropic is a zero-product company that is only offering gated access to their only product, Opus 4.7/4.8 are very good and are driving billions in revenue. Anyone can use it and see how good it is, and it is clear that it is a very good model at many things. It is no huge leap to imagine that a model that is 10x bigger is also better at many of the tasks that Opus is good at.
They are gating the release because of cybersecurity/misuse concerns, which makes sense because
1. Existing models are already being used to find exploits and hack into systems
2. We don't know the effects of releasing a tool which can autonomously exploit systems, especially in a world driven by a "security through obscurity" philosophy. It makes sense to give a heads-up to patch up software that affects billions of users before releasing it.
Imagining that this delayed rollout is all a big marketing scheme, that they have gotten dozens of multi-national companies to play along, and that Anthropic is somehow now just patently being dishonest about something while they have every incentive to not be dishonest (especially when they are neck and neck with OpenAI and their relative success depends on verified claims about model abilities), is pure conspiratorial thinking and driving more by a motivated cynicism about AI companies rather than a reasoned examination of the claims being made.