Title: User Comment Replies — LessWrong Description: A community blog devoted to refining the art of rationality Keywords: No keywords Text content: User Comment Replies — LessWrong This website requires javascript to properly function. Consider activating javascript to get access to all site functionality. LESSWRONGLWLoginAll of Joseph Miller's Comments + RepliesWei Dai's ShortformJoseph Miller3d*20Does this still seem wrong to you?Yes. I plan to write down my views properly at some point. But roughly I subscribe to non-cognitivism.Moral questions are not well defined because they are written in ambiguous natural language, so they are not truth apt. Now you could argue that many reasonable questions are also ambiguous in this sense. Eg the question "how many people live in Sweden" is ultimately ambiguous because it is not written in a formal system (ie. the borders of Sweden are not defined down to the atomic level).But you could in theory define the... (read more)Reply2Wei Dai2dOk, I see where you're coming from, but think you're being overconfident about non-cognitivism. My current position is that non-cognitivism is plausible, but we can't be very sure that it is true, and making progress on this meta-ethical question also requires careful philosophical reasoning. These two posts of mine are relevant on this topic: Six Plausible Meta-Ethical Alternatives , Some Thoughts on MetaphilosophyExplaining British Naval Dominance During the Age of SailJoseph Miller3d40The unemployment pool that resulted from this efficiency wage made it easier to discipline officers by moving them back to the captains list.I don't understand this point or how it explains captains' willingness to fight.Reply2Arjun Panickssery3dThat part encourages captains to avoid shirking in general (rather than to use aggressive tactics in particular) because it increases the costs of job loss (due to high compensation) and because there are captains in reserve that can replace them quickly.Wei Dai's ShortformJoseph Miller3d40the One True Form of Moral ProgressHave you written about this? This sounds very wrong to me.Reply6Wei Dai3dThe One True Form of Moral Progress (according to me) is using careful philosophical reasoning to figure out what our values should be, what morality consists of, where our current moral beliefs are wrong, or generally, the contents of normativity (what we should and shouldn't do). Does this still seem wrong to you? The basic justification for this is that for any moral "progress" or change that is not based on careful philosophical reasoning, how can we know that it's actually a change for the better? I don't think I've written a post specifically about this, but Morality is Scary is related, in that it complains that most other kinds of moral change seem to be caused by status games amplifying random aspects of human values or motivation.Tracing the Thoughts of a Large Language ModelJoseph Miller4d*234DeepMind says boo SAEs, now Anthropic says yay SAEs![1]Reading this paper pushed me a fair amount in the yay direction. We may still be at the unsatisfying level where we can only say "this cluster of features seems to roughly correlate with this type of thing" and "the interaction between this cluster and this cluster seems to mostly explain this loose group of behaviors". But it looks like we're actually pointing at real things in the model. And therefore we are beginning to be able to decompose the computation of LLMs in meaningful ways. The Addition Ca... (read more)Reply3Mateusz Bagiński3dThe most straightforward synthesis[1] of these two reports is that SAEs find some sensible decomposition of the model's internals into computational elements (concepts, features, etc.), which circuits then operate on. It's just that these computational elements don't align with human thinking as nicely as humans would like. E.g. SAE-based concept probes don't work well OOD because the models were not optimized to have concepts that generalize OOD. This is perfectly consistent with linear probes being able to detect the concept from model activations (the model retains enough information about the concept such as "harmful intent" for the probe to latch onto it, even if the concept itself (or rather, its OOD-generalizing version) is not priviledged in the model's ontology).  ETA: I think this would (weakly?) predict that SAE generalization failures should align with model performance dropping on some tasks. Or at least that the model would need to have some other features that get engaged OOD so that the performance doesn't drop? Investigating this is not my priority, but I'd be curious to know if something like this is the case. 1. ^ not to say that I'm believing it's strongly; it's just a tentative/provisional synthesis/conclusionJoseph Miller's ShortformJoseph Miller5d84Claude 3.7's annoying personality is the first example of accidentally misaligned AI making my life worse. Claude 3.5/3.6 was renowned for its superior personality that made it more pleasant to interact with than ChatGPT.3.7 has an annoying tendency to do what it thinks you should do, rather than following instructions. I've run into this frequently in two coding scenarios:In Cursor, I ask it to implement some function in a particular file. Even when explicitly instructed not to, it guesses what I want to do next and changes other parts of the code as well... (read more)ReplyWill Jesus Christ return in an election year?Joseph Miller7d83This means that the Jesus Christ market is quite interesting! You could make it even more interesting by replacing it with "This Market Will Resolve No At The End Of 2025": then it would be purely a market on how much Polymarket traders will want money later in the year.It's unclear how this market would resolve. I think you meant something more like a market on "2+2=5"?Reply3justinpombrio6dThe market named "This Market Will Resolve No At The End Of 2025" will resolve to No at the end of 2025. Like it says in its title. What's unclear about this?trevor's ShortformJoseph Miller11d20I read this and still don't understand what an acceptable target slot is.Reply4sjadler11dI think they mean heuristics for who is ok to dehumanize / treat as “other” or harm2trevor11dMy apologies, this post was pointing/grasping in a general direction and I didn't put much trouble into editing it, there was a typo at the beginning where I seem to have used the wrong word to refer to the slot concept. I just fixed it: Did that help?Joseph Miller's ShortformJoseph Miller12d20Then it will often confabulate a reason why the correct thing it said was actually wrong. So you can never really trust it, you have to think about what makes sense and test your model against reality. But to some extent that's true for any source of information. LLMs are correct about a lot of things and you can usually guess which things they're likely to get wrong.ReplyJoseph Miller's ShortformJoseph Miller13d2112LLM hallucination is good epistemic training. When I code, I'm constantly asking Claude how things work and what things are possible. It often gets things wrong, but it's still helpful. You just have to use it to help you build up a gears level model of the system you are working with. Then, when it confabulates some explanation you can say "wait, what?? that makes no sense" and it will say "You're right to question these points - I wasn't fully accurate" and give you better information.Reply2Gurkenglas13dWhat if you say that when it was fully accurate?Against Yudkowsky's evolution analogy for AI x-risk [unfinished]Joseph Miller13d70See No convincing evidence for gradient descent in activation spaceReplykave's ShortformJoseph Miller14d138It's not really feasible for the feature to rely on people reading this PSA to work well. The correct usage needs to be obvious.Reply2kave13dI'm inclined to agree, but at least this is an improvement over it only living in Habryka's head. It may be that this + moderation is basically sufficient, as people seem to have mostly caught on to the intended patterns.Joseph Miller's ShortformJoseph Miller15d94When I go on LessWrong, I generally just look at the quick takes and then close the tab. Quick takes cause me to spend more time on LessWrong but spend less time reading actual posts.On the other hand, sometimes quick takes are very high quality and I read them and get value from them when I may not have read the same content as a full post.Reply7habryka15dInteresting. I am concerned about this effect, but I do really like a lot of quick takes. I wonder whether maybe this suggests a problem with how we present posts.leogao's ShortformJoseph Miller1mo31I find it very annoying that standard reference culture seems to often imply giving extremely positive references unless someone was truly awful, since it makes it much harder to get real info from referencesAgreed, but also most of the world does operate in this reference culture. If you choose to take a stand against it, you might screw over a decent candidate by providing only a quite positive recommendation.Reply2Neel Nanda1moAgreed. If I'm talking to someone who I expect to be able to recalibrate, I just explain that I think the standard norms are dumb, the norms I actually follow, and then give an honest and balanced assessment. If I'm talking to someone I don't really know, I generally give a positive but not very detailed reference or don't reply, depending on context.How To Do Patching FastJoseph Miller1mo30Hey, long time no see! Thanks, I've correct it:∂F(eα)∂α=∂F(eα)∂eα∂eα∂α=∂F(eα)∂eα∂[eclean+α×(ecorr−eclean)]∂α=∂F(eα)∂eα(ecorr−eclean)Set α=0, ie. eα=eclean=(ecorr−eclean)∂F(eclean)∂ecleanReplyCampbell Hutcheson's ShortformJoseph Miller1mo50It's surprising he bought the gun so long in advance. There should be footage of him buying it I think as required by California law.ReplyACCount1mo101A lot of suicides are impulse decisions, and access to firearms is a known suicide risk factor.People often commit suicide with weapons they bought months, years or even decades ago - not because they planned their suicide this far ahead, but because they used a firearm that was already available.The understanding is, without a gun at hand, suicidal people often opt for other suicide methods - ones that take much longer to set up and are far less reliable. This gives them more time and sometimes more chances to reconsider - and many of them do.Reply2Rebecca1moHe may have bought it originally for protection?Campbell Hutcheson's ShortformJoseph Miller1mo50You can see what he's referring to in the pictures Webb published of the scene.ReplyLoganStrohl's ShortformJoseph Miller1mo100What is prospective memory training?ReplyRaemon1mo110This says "remembering to do things in the future."Replyleogao's ShortformJoseph Miller1mo40I think there's a spectrum between great man theory and structural forces theory and I would classify your view as much closer to the structural forces view, rather than a combination of the two.The strongest counter-example might be Mao. It seems like one man's idiosyncratic whims really did set the trajectory for hundreds of millions of people. Although of course as soon as he died most of the power vanished, but surely China and the world would be extremely different today without him.Reply2Viliam1moA synthesis between the structural forces theory and "pulling the rope sideways". The economical and other forces determine the main direction, a leader who already wanted to go in that direction gets elected and starts going in that direction, his idiosyncratic whims get implemented as a side effect. Like, instead of Hitler, there would be another German leader determined to change the post-WW1 world order, but he would probably be less obsessed about the Jews. Also, he might make different alliances.leogao's ShortformJoseph Miller1mo40The Duke of Wellington said that Napoleon's presence on a battlefield “was worth forty thousand men”.This would be about 4% of France's military size in 1812.ReplyJoseph Miller's ShortformJoseph Miller1mo40I first encountered it in chapter 18 of The Looming Tower by Lawrence Wright.But here's a easily linkable online source: https://ctc.westpoint.edu/revisiting-al-qaidas-anthrax-program/ReplyJoseph Miller's ShortformJoseph Miller1mo*337"Despite their extreme danger, we only became aware of them when the enemy drew our attention to them by repeatedly expressing concerns that they can be produced simply with easily available materials."Ayman al-Zawahiri, former leader of Al-Qaeda, on chemical/biological weapons.I don't think this is a knock-down argument against discussing CBRN risks from AI, but it seems worth considering.Reply2Eric Neyman1moDo you have a link/citation for this quote? I couldn't immediately find it.quetzal_rainbow1mo163The trick is that chem/bio weapons can't, actually, "be produced simply with easily available materials", if we talk about military-grade stuff, not "kill several civilians to create scary picture in TV". ReplyLiterature Review of Text AutoEncodersJoseph Miller1mo20This is great, thanks. I think these could be very helpful for interpretability.ReplyA History of the Future, 2025-2040Joseph Miller1mo92Thanks I enjoyed this.The main thing that seems wrong to me, similar to some of your other recent posts, is that AI progress seems to mysteriously decelerate around 2030. I predict that things will look much more sci-fi after that point than in your story (if we're still alive).Reply7L Rudolf L1moThe scenario does not say that AI progress slows down. What I imagined to be happening is that after 2028 or so, there is AI research being done by AIs at unprecedented speeds, and this drives raw intelligence forward more and more, but (1) the AIs still need to run expensive experiments to make progress sometimes, and (2) basically nothing is bottlenecked by raw intelligence anymore so you don't really notice it getting even better.7Noosphere891moThe big reason why such a slowdown could happen is that the hyper-fast scaling trends can't last beyond 2030, which has been the main driver of AI progress, and I still expect it to be the main driver to 2030, and if there's no real way for AI systems to get better past that point through algorithmic advances, then this story becomes much more plausible.6Purplehermann1moIt's more that it stops being relevant to humans, as keeping humans in the loop slows down the exponential growth   I do think VR and neuralink-like tech will be a very big deal though,  especially in regards to allowing people experiences that would otherwise be expensive in atomsJoseph Miller's ShortformJoseph Miller1mo9-2xAI claims to have a cluster of 200k GPUs, presumably H100s, online for long enough to train Grok 3.I think this is faster datacenter scaling than any predictions I've heard.Source: https://x.com/xai/status/1891699715298730482ReplyVladimir_Nesov1mo193They don't claim that Grok 3 was trained on 200K GPUs, and that can't actually be the case from other things they say. The first 100K H100s were done early Sep 2024, and the subsequent 100K H200s took them 92 days to set up, so early Dec 2024 at the earliest if they started immediately, which they didn't necessarily. But pretraining of Grok 3 was done by Jan 2025, so there wasn't enough time with the additional H200s.There is also a plot where Grok 2 compute is shown slightly above that of GPT-4, so maybe 3e25 FLOPs. And Grok 3 compute is said to be either... (read more)Reply11Rasool1moThe 200k GPU number has been mentioned since October (Elon tweet, Nvidia announcement), so are you saying that that they managed to get the model trained so fast is what beat the predictions you heard?Murder plots are infohazardsJoseph Miller1mo70DM'dReplyMurder plots are infohazardsJoseph Miller1mo161In that case I would consider applying for EA funds if you are willing to do the work professionally or set up a charity to do it. I think you could make a strong case that it meets the highest bar for important, neglected and tractable work.Reply7Chris Monteiro1moDo you know anyone who could guide me through this process?Murder plots are infohazardsJoseph Miller1mo193How long does it take you to save one life on average? GiveWell's top charities save a life for about $5000. If you can get close to that there should be many EA philanthropists willing to fund you or a charity you create.And I think they should be willing to go up to like $10-20k at least because murders are probably especially bad deaths in terms of their effects on the world.Reply6Chris Monteiro1moIt varies depending on how powerful the law enforcement agency is and whether they understand it or not, with the FBI and German Federal police being the most effective. It's not all saving lives, often it's protecting people from stalking, physical and mental abuse, child custody disputes and the like, because in many cases (especially so with women perpetrators) they would never actually turn to violence themselves. I have not been party to all of the journalist hard costs for local investigators, but I think they were doing at least $3,000 initially per major case, but they would go higher when it looked like this would turn into a full podcast episode. There is also a issue where cases without payment are considered less serious than those with by the police, and require more up front investigation to understand. As a result, far more of the 'payer' cases were investigated compared to the non-payer ones, at least in the US. Sometimes such as in the US the police would then move fast, but in places like Spain the journalist had to act as a victim advocate extensively for years, and in Italy the cases collapsed on technicalities. Frankly, beyond my personal experience, I REALLY don't want to live in a world where people can order commodity killings anonymously, as my data shows that all sorts of people would. I consider this analogous to the psychological effect that terrorism has on society, despite not being a high source of actually violence relatively speaking.  But yeah, murder is bad actually and should be given higher priority than other causes of death in my opinion.interpreting GPT: the logit lensJoseph Miller2mo20I just found the paper BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT, which precedes this post by a few months and invents essentially the same technique as the logit lens.So consider also citing that paper when citing this post.As an aside, I would guess that this is the most cited LessWrong post in the academic literature, but it would be cool if anyone had stats on that.ReplyViliam's ShortformJoseph Miller2mo20Yeah I guess, but actually the more I think about it, the more impractical it seems.Reply11Viliam's ShortformJoseph Miller2mo20I think the solution would be something like adopting a security mindset with respect to preventing community members going off the rails.The costs would be great because then everyone would be under suspicion by default, but maybe it would be worth it.Reply5Mateusz Bagiński2moWhat exactly do you have in mind? Semi-regular check-ins with every member to see what they're up to, what their thinking processes are, what recently piqued their interest, what rabbit holes they've gone into?Joseph Miller's ShortformJoseph Miller2mo144The next international PauseAI protest is taking place in one week in London, New York, Stockholm (Sunday 9th Feb), Paris (Mon 10 Feb) and many other cities around the world.We are calling for AI Safety to be the focus of the upcoming Paris AI Action Summit. If you're on the fence, take a look at Why I'm doing PauseAI.ReplyTsviBT's ShortformJoseph Miller2mo40For those in Europe, Tomorrow Biostasis makes the process a lot easier and they have people who will talk you through step by step.ReplyReality has a surprising amount of detailJoseph Miller2mo50A good example of surprising detail I just read.It turns out that the UI for a simple handheld calculator is a large design space with no easy solutions.https://lcamtuf.substack.com/p/ui-is-hell-four-function-calculatorsReplyThane Ruthenis's ShortformJoseph Miller2mo4-5Following OpenAI Twitter freakouts is a colossal, utterly pointless waste of your time and you shouldn't do it ever.I feel like for the same reasons, this shortform is kind of an engaging waste of my time. One reason I read LessWrong is to avoid twitter garbage.Reply6Thane Ruthenis2moValid, I was split on whether it's worth posting vs. it'd be just me taking my part in spreading this nonsense. But it'd seemed to me that a lot of people, including LW regulars, might've been fooled, so I erred on the side of posting.Leon Lang's ShortformJoseph Miller2mo2315we thought that forecasting AI trends was important to be able to have us taken seriouslyThis might be the most dramatic example ever of forecasting affecting the outcome.Similarly I'm concerned that a lot of alignment people are putting work into evals and benchmarks which may be having some accelerating affect on the AI capabilities which they are trying to understand."That which is measured improves. That which is measured and reported improves exponentially."Reply1I'm offering free math consultations!Joseph Miller3mo60Just did a debugging session IRL with Gurkenglas and it was very helpful!ReplyTurning up the Heat on Deceptively-Misaligned AIJoseph Miller3mo20correctness and beta-coherence can be rolled up into one specific propertyIs that rolling up two things into one, or is that just beta-coherence?ReplyActivation space interpretability may be doomedJoseph Miller3mo3-2I agree that the ultimate goal is to understand the weights. Seems pretty unclear whether trying to understand the activations is a useful stepping stone towards that. And it's hard to be sure how relevant theoretical toy example are to that question.ReplyJoseph Miller's ShortformJoseph Miller3mo100Ilya Sutskever had two armed bodyguards with him at NeurIPS.Some people are asking for a source on this. I'm pretty sure I've heard it from multiple people who were there in person but I can't find a written source. Can anyone confirm or deny?ReplyJoseph Miller's ShortformJoseph Miller3mo*60Well, it seems quite important whether the DROS registration could possibly have been staged.That would be difficult. To purchase a gun in California you have to provide photo ID[1], proof of address[2] and a thumbprint[3]. Also it looks like the payment must be trackable[4] and gun stores have to maintain video surveillance footage for up to year.[5]My guess is that the police haven't actually invested this as a potential homicide, but if they did, there should be very strong evidence that Balaji bought a gun. Potentially a very sophisticated ac... (read more)ReplyNina Panickssery's ShortformJoseph Miller3mo40 land in space will be less valuable than land on earth until humans settle outside of earth (which I don't believe will happen in the next few decades). Why would it take so long? Is this assuming no ASI? ReplyReview: PlanecrashJoseph Miller3mo40Wow that's great, thanks. @L Rudolf L you should link this in this post.Reply2L Rudolf L3moThanks for the heads-up, that looks very convenient. I've updated the post to link to this instead of the scraper repo on GitHub.Joseph Miller's ShortformJoseph Miller3mo130As in, this is also what the police say?Yes, edited to clarify. The police say there was no evidence of foul play. All parties agree he died in his bathroom of a gunshot wound.Did the police find a gun in the apartment? Was it a gun Suchir had previously purchased himself according to records? Seems like relevant info.The only source I can find on this is Webb, so take with a grain of salt. But yes, they found a gun in the apartment. According to Webb, the DROS registration information was on top of the gun case[1] in the apartment, so presumably ther... (read more)Reply6Daniel Kokotajlo3moWell, it seems quite important whether the DROS registration could possibly have been staged. If e.g. there is footage of Suchir buying a gun 6+ months prior,  using his ID, etc. then the assassins would have had to sneak in and grab his own gun from him etc. which seems unlikely. Is the interview with the NYT going to be published? Is any of the police behavior actually out of the ordinary?Joseph Miller's ShortformJoseph Miller3mo*1470This is an attempt to compile all publicly available primary evidence relating to the recent death of Suchir Balaji, an OpenAI whistleblower.This is a tragic loss and I feel very sorry for the parents. The rest of this piece will be unemotive as it is important to establish the nature of this death as objectively as possible.I was prompted to look at this by a surprising conversation I had IRL suggesting credible evidence that it was not suicide. The undisputed facts of the case are that he died of a gunshot wound in his bathroom sometime around November 2... (read more)Reply3Joseph Miller3mo100Ilya Sutskever had two armed bodyguards with him at NeurIPS.Some people are asking for a source on this. I'm pretty sure I've heard it from multiple people who were there in person but I can't find a written source. Can anyone confirm or deny?Reply8Sheikh Abdur Raheem Ali3mo  I don't understand how Ilya hiring personal security counts as evidence, especially at large events like a conference. Famous people often attract unwelcome attention, and having professional protection close by can help deescalate or deter random acts of violence, it is a worthwhile investment in safety if you can afford it. I see it as a very normal thing to do. Ilya would be vulnerable to potential assassination attempts even during his tenure at OpenAI.4RationalElf3moThank you, this is very interesting and it seems like you did a valuable public service in compiling it What do you think of the motive that he was counterfactually going to testify in a very damaging way, or that he had very damaging evidecne/data that was deleted? Daniel Kokotajlo3mo132The undisputed facts of the case are that he died of a gunshot wound in his bathroom sometime around November 26 2024. The police ruled it as a suicide with no evidence of foul play.As in, this is also what the police say?Did the police find a gun in the apartment? Was it a gun Suchir had previously purchased himself according to records? Seems like relevant info.ReplyReview: PlanecrashJoseph Miller3mo20Has someone made an ebook that I can easily download onto my kindle?I'm unclear if a good ebook should include all the pictures from the original version.Reply7dirk3moThere's https://www.mikescher.com/blog/29/Project_Lawful_ebook (which includes versions both with and without the pictures, so take your pick; the pictures are used in-story sometimes but it's rare enough you can IMO skip them without much issue, if you'd rather).Joseph Miller's ShortformJoseph Miller3mo60LLMs can pick up a much broader class of typos than spelling mistakes.For example in this comment I wrote "Don't push the frontier of regulations" when from context I clearly meant to say "Don't push the frontier of capabilities" I think an LLM could have caught that.Reply1Joseph Miller's ShortformJoseph Miller3mo30LessWrong LLM feature idea: Typo checkerIt's becoming a habit for me to run anything I write through an LLM to check for mistakes before I send it off.I think the hardest part of implementing this feature well would be to get it to only comment on things that are definitely mistakes / typos. I don't want a general LLM writing feedback tool built-in to LessWrong.Reply7Kaj_Sotala3moDon't most browsers come with spellcheck built in? At least Chrome automatically flags my typos.evhub's ShortformJoseph Miller3mo*7948The ideal version of Anthropic wouldMake substantial progress on technical AI safetyUse its voice to make people take AI risk more seriouslySupport AI safety regulationNot substantially accelerate the AI arms raceIn practice I think Anthropic hasMade a little progress on technical AI safetyUsed its voice to make people take AI risk less seriously[1]Obstructed AI safety regulationSubstantially accelerated the AI arms raceWhat I would do differently.Do better alignment research, idk this is hard.Communicate in a manner that is consistent with the apparent be... (read more)ReplyMichaelDickens3mo11-2 Don't push the frontier of regulations. Obviously this is basically saying that Anthropic should stop making money and therefore stop existing. The more nuanced version is that for Anthropic to justify its existence, each time it pushes the frontier of capabilities should be earned by substantial progress on the other three points. I think I have a stronger position on this than you do. I don't think Anthropic should push the frontier of capabilities, even given the tradeoff it faces.If their argument is "we know arms races are bad, but we have to accele... (read more)Reply8BrianTan3moMy typo reaction may have glitched, but I think you meant "Don't push the frontier of capabilities" in the last bullet?davekasten's ShortformJoseph Miller3mo30The ARENA curriculum is very good.ReplyProbability of death by suicide by a 26 year oldJoseph Miller4mo*50It does seem pretty suspicious.I'm like 98% confident this was not foul-play, partly because I doubt whatever evidence he had would be that important to the court case and obviously his death is going to draw far more attention to his view.However, 98% is still quite worrying and I wish I could be >99% confident. I will be interested to see if there is further evidence. Given OpenAI's very shady behavior with the secret non-disparagement agreements that came out a few months, it doesn't seem completely impossible they might do this (but still very very ... (read more)Reply3tylerjohnston4moI agree with your odds, or perhaps mine are a bit higher (99.5%?). But if there were foul play, I'd sooner point the finger at national security establishment than OpenAI. As far as I know, intelligence agencies committing murder is much more common than companies doing so. And OpenAI's progress is seen as critically important to both.Load More