Reading list
Links 25
Score: 0.951138058188195
User feedback: None
Out links: 22661 Raw text: 22661Title: [Link] Training Compute-Optimal Large Language Models — LessWrong Description: New LM scaling paper from DeepMind (abs, pdf). • Abstract (my emphasis): … Keywords: No keywords Text content: [Link] Training Compute-Optimal Large Language Models — LessWrong This website requires javascript ...
Score: 0.9320205281364848
User feedback: None
Out links: 14601 Raw text: 14601https://arxiv.org/html/2411.01030v3
Title: Birdie: Advancing State Space Models with Reward-Driven Objectives and Curricula Description: No description Keywords: No keywords Text content: Birdie: Advancing State Space Models with Reward-Driven Objectives and Curricula 1 Introduction 2 Background and Related Work 2.1 S...
Score: 0.9310096943496186
User feedback: None
Out links: 722175 Raw text: 722175https://alignmentforum.org/posts/7MCqRnZzvszsxgtJi/christiano-cotra-and-yudkowsky-on-ai-progress
Title: Christiano, Cotra, and Yudkowsky on AI progress — AI Alignment Forum Description: This post is a transcript of a discussion between Paul Christiano, Ajeya Cotra, and Eliezer Yudkowsky on AGI forecasting, following up on Paul and El… Keywords: No keywords Text content: Christiano, Cotra, and Y...
Score: 0.9306693656339415
User feedback: None
Out links: 130192 Raw text: 130192http://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications
Title: chinchilla's wild implications — LessWrong Description: (Colab notebook here.) • This post is about language model scaling laws, specifically the laws derived in the DeepMind paper that introduced Chinchil… Keywords: No keywords Text content: chinchilla's wild implications — LessWrong This ...
Score: 0.9202684099963365
User feedback: None
Out links: 76158 Raw text: 76158Title: AXRP Episode 31 - Singular Learning Theory with Daniel Murfet — AI Alignment Forum Description: YouTube link • What’s going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is… Keywords: No keywords Text content: AXRP Episod...
Score: 0.9199937989846819
User feedback: None
Out links: 4819631 Raw text: 4819631https://dblalock.substack.com/i/53277599/learning-to-merge-tokens-in-vision-transformers
Title: 2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens Description: Mixture-of-Experts with Expert Choice Routing Keywords: No keywords Text content: 2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens Davis Sum...
Score: 0.9185957983729204
User feedback: None
Out links: 22810 Raw text: 22810https://www.lesswrong.com/posts/midXmMb2Xg37F2Kgn/new-scaling-laws-for-large-language-models
Title: New Scaling Laws for Large Language Models — LessWrong Description: On March 29th, DeepMind published a paper, "Training Compute-Optimal Large Language Models", that shows that essentially everyone -- OpenAI, DeepMind… Keywords: No keywords Text content: New Scaling Laws for Large Language Mo...
Score: 0.9173636169462535
User feedback: None
Out links: 56544 Raw text: 56544https://lilianweng.github.io/posts/2022-04-15-data-gen/
Title: Learning with not Enough Data Part 3: Data Generation | Lil'Log Description: Here comes the Part 3 on learning with not enough data (Previous: Part 1 and Part 2). Let’s consider two approaches for generating synthetic data for training. Augmented data. Given a set of existing training samples...
Score: 0.9143604751418759
User feedback: None
Out links: 4581732 Raw text: 4581732Title: 2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot instances Description: An Extendable, Efficient and Effective Transformer-based Object Detector Keywords: No keywords Text content: 2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot ins...
Score: 0.9121156936373886
User feedback: None
Out links: 600729 Raw text: 600729https://www.alignmentforum.org/s/2owK4Wra9acMaDyBo/p/xbABZRxoSTAnsf8os
Title: AXRP Episode 10 - AI’s Future and Impacts with Katja Grace — AI Alignment Forum Description: YouTube link • This podcast is called AXRP, pronounced axe-urp and short for the AI X-risk Research Podcast. Here, I (Daniel Filan) have conversation… Keywords: No keywords Text content: AXRP Episode ...
Score: 0.9092234889332874
User feedback: None
Out links: 146014 Raw text: 146014Title: Implied "utilities" of simulators are broad, dense, and shallow — LessWrong Description: This is a quick attempt at deconfusion similar to instrumentality. Same ideas, different angle. … Keywords: No keywords Text content: Implied "utilities" of simulators are broad, dense, and shallow — Less...
Score: 0.9068351110070376
User feedback: None
Out links: 4844 Raw text: 4844https://arxiv.org/html/2405.07987v5
Title: The Platonic Representation Hypothesis Description: No description Keywords: Machine Learning, Representation, Artificial Intelligence, Multimodality Text content: The Platonic Representation Hypothesis 1 Introduction 2 Representations are converging Preliminaries 2.1 Dif...
Score: 0.904689282392706
User feedback: None
Out links: 2123855 Raw text: 2123855http://www.cs.toronto.edu/~hinton/absps/OnlineDistillation.pdf
Published as a conference paper at ICLR 2018 L ARGE SCALE DISTRIBUTED NEURAL NETWORK TRAINING THROUGH ONLINE DISTILLATION Rohan Anil Google [email protected] Robert Ormandi Google [email protected] Gabriel Pereyra ∗ Google DeepMind [email protected] George E. Dahl Google Brain [email protected]...
Score: 0.9035635948106884
User feedback: None
Out links: 4562470 Raw text: 4562470Title: Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup Description: This newsletter made possible by MosaicML. Keywords: No keywords Text content: Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup Davis Summarizes PapersSubscri...
Score: 0.9028782127731001
User feedback: None
Out links: 345436 Raw text: 345436https://alignmentforum.org/posts/FuGfR3jL3sw6r8kB4/richard-ngo-s-shortform
Title: Richard Ngo's Shortform — AI Alignment Forum Description: A collection of shorter posts by AI Alignment Forum user Richard_Ngo Keywords: No keywords Text content: Richard Ngo's Shortform — AI Alignment Forum This website requires javascript to properly function. Consider activating javascri...
Score: 0.9002267267526528
User feedback: None
Out links: 526958 Raw text: 526958Title: EAI Alignment Speaker Series #1: Challenges for Safe & Beneficial Brain-Like Artificial General Intelligence with Steve Byrnes — AI Alignment Forum Description: A couple months ago EleutherAI started an alignment speaker series, some of these talks have been recorded. This is the first instal...
Score: 0.8990216042421004
User feedback: None
Out links: 200006 Raw text: 200006https://gwern.net/doc/www/arxiv.org/79528489ccea8d598189f1f980c963c6c2ee576a.pdf
Published as a conference paper at ICLR 2021 L ARGE BATCH S IMULATION FOR D EEP R EINFORCEMENT L EARNING arXiv:2103.07013v1 [cs.LG] 12 Mar 2021 Brennan Shacklett1∗ Erik Wijmans2 Aleksei Petrenko3,4 Manolis Savva5 Dhruv Batra2 Vladlen Koltun3 Kayvon Fatahalian1 1 Stanford University 2 Georgia Inst...
Score: 0.897539792522355
User feedback: None
Out links: 605710 Raw text: 605710https://www.alignmentforum.org/s/a6ne2ve5uturEEQK7/p/kYNMXjg8Tmcq3vjM6
Title: EIS IX: Interpretability and Adversaries — AI Alignment Forum Description: Part 9 of 12 in the Engineer’s Interpretability Sequence. • Thanks to Nikolaos Tsilivis for helpful discussions. … Keywords: No keywords Text content: EIS IX: Interpretability and Adversaries — AI Alignment Forum Th...
Score: 0.8958312660193717
User feedback: None
Out links: 600755 Raw text: 600755https://www.alignmentforum.org/s/2owK4Wra9acMaDyBo/p/4rmvMThJYNcCptAya
Title: AXRP Episode 22 - Shard Theory with Quintin Pope — AI Alignment Forum Description: YouTube link • What can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will su… Keywords: No keywords Text content: AXRP Episode 22 - Shard...
Score: 0.8957749251145293
User feedback: None
Out links: 583297 Raw text: 583297Title: AXRP Episode 22 - Shard Theory with Quintin Pope — AI Alignment Forum Description: YouTube link • What can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will su… Keywords: No keywords Text content: AXRP Episode 22 - Shard...
Score: 0.8932405632053488
User feedback: None
Out links: 146033 Raw text: 146033https://www.lesswrong.com/posts/EBKJq2gkhvdMg5nTQ/instrumentality-makes-agents-agenty
Title: Instrumentality makes agents agenty — LessWrong Description: You could describe the behavior of untuned GPT-like model[1] using a (peculiar) utility function. The fact that the loss function and training didn't… Keywords: No keywords Text content: Instrumentality makes agents agenty — LessWro...
Score: 0.8917688357306803
User feedback: None
Out links: 182046 Raw text: 182046https://www.lesswrong.com/users/steve2152
Title: Steven Byrnes - LessWrong Description: Steven Byrnes's profile on LessWrong — A community blog devoted to refining the art of rationality Keywords: No keywords Text content: Steven Byrnes - LessWrong This website requires javascript to properly function. Consider activating javascript to ge...
Score: 0.8903061041652235
User feedback: None
Out links: 4692917 Raw text: 4692917https://dblalock.substack.com/i/53497927/on-the-representation-collapse-of-sparse-mixture-of-experts
Title: 2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better Description: ⭐ Merging of neural networks Keywords: No keywords Text content: 2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better Davis Summarizes PapersSubscri...
Score: 0.889493896864104
User feedback: None
Out links: 199485 Raw text: 199485https://gwern.net/doc/www/arxiv.org/515af527150fbaa01ed9fe175750286a6a23f108.pdf
Published as a conference paper at ICLR 2023 arXiv:2210.14891v11 [cs.LG] 29 Mar 2023 B ROKEN N EURAL S CALING L AWS Ethan Caballero Mila, McGill University [email protected] [email protected] Kshitij Gupta Mila, University of Montreal Irina Rish Mila, University of Montr...
Score: 0.8848739385679035
User feedback: None
Out links: 4562393 Raw text: 4562393Title: 2022-10-16 arXiv roundup: Augmentation scaling, Better CNN initialization, Transformer sparsity Description: This newsletter made possible by MosaicML. Also thanks to @andrey_kurenkov (author of The Gradient) for recommending this newsletter on Twitter! Keywords: No keywords Text content: 202...