Yearly

Weekly

Daily

Reading list

Links 25

Title: [Link] Training Compute-Optimal Large Language Models — LessWrong

Score: 0.951138058188195

User feedback: None

Out links: 22661 Raw text: 22661

https://www.lesswrong.com/posts/4dbK5dPiqHCgNdKnq/link-training-compute-optimal-large-language-models

Title: [Link] Training Compute-Optimal Large Language Models — LessWrong Description: New LM scaling paper from DeepMind (abs, pdf).   • Abstract (my emphasis): … Keywords: No keywords Text content: [Link] Training Compute-Optimal Large Language Models — LessWrong This website requires javascript ...

Title: Birdie: Advancing State Space Models with Reward-Driven Objectives and Curricula

Score: 0.9320205281364848

User feedback: None

Out links: 14601 Raw text: 14601

https://arxiv.org/html/2411.01030v3

Title: Birdie: Advancing State Space Models with Reward-Driven Objectives and Curricula Description: No description Keywords: No keywords Text content: Birdie: Advancing State Space Models with Reward-Driven Objectives and Curricula 1 Introduction 2 Background and Related Work 2.1 S...

Title: Christiano, Cotra, and Yudkowsky on AI progress — AI Alignment Forum

Score: 0.9310096943496186

User feedback: None

Out links: 722175 Raw text: 722175

https://alignmentforum.org/posts/7MCqRnZzvszsxgtJi/christiano-cotra-and-yudkowsky-on-ai-progress

Title: Christiano, Cotra, and Yudkowsky on AI progress — AI Alignment Forum Description: This post is a transcript of a discussion between Paul Christiano, Ajeya Cotra, and Eliezer Yudkowsky on AGI forecasting, following up on Paul and El… Keywords: No keywords Text content: Christiano, Cotra, and Y...

Title: chinchilla's wild implications — LessWrong

Score: 0.9306693656339415

User feedback: None

Out links: 130192 Raw text: 130192

http://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications

Title: chinchilla's wild implications — LessWrong Description: (Colab notebook here.) • This post is about language model scaling laws, specifically the laws derived in the DeepMind paper that introduced Chinchil… Keywords: No keywords Text content: chinchilla's wild implications — LessWrong This ...

Title: AXRP Episode 31 - Singular Learning Theory with Daniel Murfet — AI Alignment Forum

Score: 0.9202684099963365

User feedback: None

Out links: 76158 Raw text: 76158

https://www.alignmentforum.org/posts/q6Tky4RzEmTwfGndB/axrp-episode-31-singular-learning-theory-with-daniel-murfet

Title: AXRP Episode 31 - Singular Learning Theory with Daniel Murfet — AI Alignment Forum Description: YouTube link • What’s going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is… Keywords: No keywords Text content: AXRP Episod...

Title: 2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens

Score: 0.9199937989846819

User feedback: None

Out links: 4819631 Raw text: 4819631

https://dblalock.substack.com/i/53277599/learning-to-merge-tokens-in-vision-transformers

Title: 2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens Description: Mixture-of-Experts with Expert Choice Routing Keywords: No keywords Text content: 2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens Davis Sum...

Title: New Scaling Laws for Large Language Models — LessWrong

Score: 0.9185957983729204

User feedback: None

Out links: 22810 Raw text: 22810

https://www.lesswrong.com/posts/midXmMb2Xg37F2Kgn/new-scaling-laws-for-large-language-models

Title: New Scaling Laws for Large Language Models — LessWrong Description: On March 29th, DeepMind published a paper, "Training Compute-Optimal Large Language Models", that shows that essentially everyone -- OpenAI, DeepMind… Keywords: No keywords Text content: New Scaling Laws for Large Language Mo...

Title: Learning with not Enough Data Part 3: Data Generation | Lil'Log

Score: 0.9173636169462535

User feedback: None

Out links: 56544 Raw text: 56544

https://lilianweng.github.io/posts/2022-04-15-data-gen/

Title: Learning with not Enough Data Part 3: Data Generation | Lil'Log Description: Here comes the Part 3 on learning with not enough data (Previous: Part 1 and Part 2). Let’s consider two approaches for generating synthetic data for training. Augmented data. Given a set of existing training samples...

Title: 2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot instances

Score: 0.9143604751418759

User feedback: None

Out links: 4581732 Raw text: 4581732

https://dblalock.substack.com/i/53504615/vicreg-variance-invariance-covariance-regularization-for-self-supervised-learning

Title: 2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot instances Description: An Extendable, Efficient and Effective Transformer-based Object Detector Keywords: No keywords Text content: 2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot ins...

Title: AXRP Episode 10 - AI’s Future and Impacts with Katja Grace — AI Alignment Forum

Score: 0.9121156936373886

User feedback: None

Out links: 600729 Raw text: 600729

https://www.alignmentforum.org/s/2owK4Wra9acMaDyBo/p/xbABZRxoSTAnsf8os

Title: AXRP Episode 10 - AI’s Future and Impacts with Katja Grace — AI Alignment Forum Description: YouTube link • This podcast is called AXRP, pronounced axe-urp and short for the AI X-risk Research Podcast. Here, I (Daniel Filan) have conversation… Keywords: No keywords Text content: AXRP Episode ...

Title: Implied "utilities" of simulators are broad, dense, and shallow — LessWrong

Score: 0.9092234889332874

User feedback: None

Out links: 146014 Raw text: 146014

https://www.lesswrong.com/posts/k48vB92mjE9Z28C3s/implied-utilities-of-simulators-are-broad-dense-and-shallow

Title: Implied "utilities" of simulators are broad, dense, and shallow — LessWrong Description: This is a quick attempt at deconfusion similar to instrumentality. Same ideas, different angle. … Keywords: No keywords Text content: Implied "utilities" of simulators are broad, dense, and shallow — Less...

Title: The Platonic Representation Hypothesis

Score: 0.9068351110070376

User feedback: None

Out links: 4844 Raw text: 4844

https://arxiv.org/html/2405.07987v5

Title: The Platonic Representation Hypothesis Description: No description Keywords: Machine Learning, Representation, Artificial Intelligence, Multimodality Text content: The Platonic Representation Hypothesis 1 Introduction 2 Representations are converging Preliminaries 2.1 Dif...

Title: None

Score: 0.904689282392706

User feedback: None

Out links: 2123855 Raw text: 2123855

http://www.cs.toronto.edu/~hinton/absps/OnlineDistillation.pdf

Published as a conference paper at ICLR 2018 L ARGE SCALE DISTRIBUTED NEURAL NETWORK TRAINING THROUGH ONLINE DISTILLATION Rohan Anil Google [email protected] Robert Ormandi Google [email protected] Gabriel Pereyra ∗ Google DeepMind [email protected] George E. Dahl Google Brain [email protected]...

Title: Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup

Score: 0.9035635948106884

User feedback: None

Out links: 4562470 Raw text: 4562470

https://dblalock.substack.com/i/129327559/tune-as-you-scale-hyperparameter-optimization-for-compute-efficient-training

Title: Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup Description: This newsletter made possible by MosaicML. Keywords: No keywords Text content: Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup Davis Summarizes PapersSubscri...

Title: Richard Ngo's Shortform — AI Alignment Forum

Score: 0.9028782127731001

User feedback: None

Out links: 345436 Raw text: 345436

https://alignmentforum.org/posts/FuGfR3jL3sw6r8kB4/richard-ngo-s-shortform

Title: Richard Ngo's Shortform — AI Alignment Forum Description: A collection of shorter posts by AI Alignment Forum user Richard_Ngo Keywords: No keywords Text content: Richard Ngo's Shortform — AI Alignment Forum This website requires javascript to properly function. Consider activating javascri...

Title: EAI Alignment Speaker Series #1: Challenges for Safe & Beneficial Brain-Like Artificial General Intelligence with Steve Byrnes — AI Alignment Forum

Score: 0.9002267267526528

User feedback: None

Out links: 526958 Raw text: 526958

https://www.alignmentforum.org/posts/ajhtyKxtmmErTwH5t/eai-alignment-speaker-series-1-challenges-for-safe-and

Title: EAI Alignment Speaker Series #1: Challenges for Safe & Beneficial Brain-Like Artificial General Intelligence with Steve Byrnes — AI Alignment Forum Description: A couple months ago EleutherAI started an alignment speaker series, some of these talks have been recorded. This is the first instal...

Title: None

Score: 0.8990216042421004

User feedback: None

Out links: 200006 Raw text: 200006

https://gwern.net/doc/www/arxiv.org/79528489ccea8d598189f1f980c963c6c2ee576a.pdf

Published as a conference paper at ICLR 2021 L ARGE BATCH S IMULATION FOR D EEP R EINFORCEMENT L EARNING arXiv:2103.07013v1 [cs.LG] 12 Mar 2021 Brennan Shacklett1∗ Erik Wijmans2 Aleksei Petrenko3,4 Manolis Savva5 Dhruv Batra2 Vladlen Koltun3 Kayvon Fatahalian1 1 Stanford University 2 Georgia Inst...

Title: EIS IX: Interpretability and Adversaries — AI Alignment Forum

Score: 0.897539792522355

User feedback: None

Out links: 605710 Raw text: 605710

https://www.alignmentforum.org/s/a6ne2ve5uturEEQK7/p/kYNMXjg8Tmcq3vjM6

Title: EIS IX: Interpretability and Adversaries — AI Alignment Forum Description: Part 9 of 12 in the Engineer’s Interpretability Sequence. • Thanks to Nikolaos Tsilivis for helpful discussions.  … Keywords: No keywords Text content: EIS IX: Interpretability and Adversaries — AI Alignment Forum Th...

Title: AXRP Episode 22 - Shard Theory with Quintin Pope — AI Alignment Forum

Score: 0.8958312660193717

User feedback: None

Out links: 600755 Raw text: 600755

https://www.alignmentforum.org/s/2owK4Wra9acMaDyBo/p/4rmvMThJYNcCptAya

Title: AXRP Episode 22 - Shard Theory with Quintin Pope — AI Alignment Forum Description: YouTube link • What can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will su… Keywords: No keywords Text content: AXRP Episode 22 - Shard...

Title: AXRP Episode 22 - Shard Theory with Quintin Pope — AI Alignment Forum

Score: 0.8957749251145293

User feedback: None

Out links: 583297 Raw text: 583297

https://www.alignmentforum.org/posts/4rmvMThJYNcCptAya/axrp-episode-22-shard-theory-with-quintin-pope

Title: AXRP Episode 22 - Shard Theory with Quintin Pope — AI Alignment Forum Description: YouTube link • What can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will su… Keywords: No keywords Text content: AXRP Episode 22 - Shard...

Title: Instrumentality makes agents agenty — LessWrong

Score: 0.8932405632053488

User feedback: None

Out links: 146033 Raw text: 146033

https://www.lesswrong.com/posts/EBKJq2gkhvdMg5nTQ/instrumentality-makes-agents-agenty

Title: Instrumentality makes agents agenty — LessWrong Description: You could describe the behavior of untuned GPT-like model[1] using a (peculiar) utility function. The fact that the loss function and training didn't… Keywords: No keywords Text content: Instrumentality makes agents agenty — LessWro...

Title: Steven Byrnes - LessWrong

Score: 0.8917688357306803

User feedback: None

Out links: 182046 Raw text: 182046

https://www.lesswrong.com/users/steve2152

Title: Steven Byrnes - LessWrong Description: Steven Byrnes's profile on LessWrong — A community blog devoted to refining the art of rationality Keywords: No keywords Text content: Steven Byrnes - LessWrong This website requires javascript to properly function. Consider activating javascript to ge...

Title: 2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better

Score: 0.8903061041652235

User feedback: None

Out links: 4692917 Raw text: 4692917

https://dblalock.substack.com/i/53497927/on-the-representation-collapse-of-sparse-mixture-of-experts

Title: 2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better Description: ⭐ Merging of neural networks Keywords: No keywords Text content: 2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better Davis Summarizes PapersSubscri...

Title: None

Score: 0.889493896864104

User feedback: None

Out links: 199485 Raw text: 199485

https://gwern.net/doc/www/arxiv.org/515af527150fbaa01ed9fe175750286a6a23f108.pdf

Published as a conference paper at ICLR 2023 arXiv:2210.14891v11 [cs.LG] 29 Mar 2023 B ROKEN N EURAL S CALING L AWS Ethan Caballero Mila, McGill University [email protected] [email protected] Kshitij Gupta Mila, University of Montreal Irina Rish Mila, University of Montr...

Title: 2022-10-16 arXiv roundup: Augmentation scaling, Better CNN initialization, Transformer sparsity

Score: 0.8848739385679035

User feedback: None

Out links: 4562393 Raw text: 4562393

https://dblalock.substack.com/i/78823108/how-much-data-are-augmentations-worth-an-investigation-into-scaling-laws-invariance-and-implicit-regularization

Title: 2022-10-16 arXiv roundup: Augmentation scaling, Better CNN initialization, Transformer sparsity Description: This newsletter made possible by MosaicML. Also thanks to @andrey_kurenkov (author of The Gradient) for recommending this newsletter on Twitter! Keywords: No keywords Text content: 202...