Reading list
Links 25
Score: 0.9199937989846819
User feedback: None
Out links: 4819631 Raw text: 4819631https://dblalock.substack.com/i/53277599/learning-to-merge-tokens-in-vision-transformers
Title: 2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens Description: Mixture-of-Experts with Expert Choice Routing Keywords: No keywords Text content: 2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens Davis Sum...
Score: 0.9143604751418759
User feedback: None
Out links: 4581732 Raw text: 4581732Title: 2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot instances Description: An Extendable, Efficient and Effective Transformer-based Object Detector Keywords: No keywords Text content: 2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot ins...
Score: 0.904689282392706
User feedback: None
Out links: 2123855 Raw text: 2123855http://www.cs.toronto.edu/~hinton/absps/OnlineDistillation.pdf
Published as a conference paper at ICLR 2018 L ARGE SCALE DISTRIBUTED NEURAL NETWORK TRAINING THROUGH ONLINE DISTILLATION Rohan Anil Google [email protected] Robert Ormandi Google [email protected] Gabriel Pereyra ∗ Google DeepMind [email protected] George E. Dahl Google Brain [email protected]...
Score: 0.9035635948106884
User feedback: None
Out links: 4562470 Raw text: 4562470Title: Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup Description: This newsletter made possible by MosaicML. Keywords: No keywords Text content: Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup Davis Summarizes PapersSubscri...
Score: 0.8903061041652235
User feedback: None
Out links: 4692917 Raw text: 4692917https://dblalock.substack.com/i/53497927/on-the-representation-collapse-of-sparse-mixture-of-experts
Title: 2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better Description: ⭐ Merging of neural networks Keywords: No keywords Text content: 2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better Davis Summarizes PapersSubscri...
Score: 0.8848739385679035
User feedback: None
Out links: 4562393 Raw text: 4562393Title: 2022-10-16 arXiv roundup: Augmentation scaling, Better CNN initialization, Transformer sparsity Description: This newsletter made possible by MosaicML. Also thanks to @andrey_kurenkov (author of The Gradient) for recommending this newsletter on Twitter! Keywords: No keywords Text content: 202...
Score: 0.8804583587688676
User feedback: None
Out links: 896606 Raw text: 896606https://www.alignmentforum.org/posts/xMQ7vwFACQX3gZouv/reframing-inner-alignment
Title: Reframing inner alignment — AI Alignment Forum Description: The standard frame (Evan Hubinger, 2021) is: … Keywords: No keywords Text content: Reframing inner alignment — AI Alignment Forum This website requires javascript to properly function. Consider activating javascript to get access t...
Score: 0.8800227433525747
User feedback: None
Out links: 205126 Raw text: 205126https://gwern.net/doc/www/arxiv.org/a986ec6fafa88a1a4f52523a902c22652e30d36a.pdf
OmniNet: Omnidirectional Representations from Transformers Yi Tay * 1 Mostafa Dehghani * 2 Vamsi Aribandi 1 3 Jai Gupta 1 Philip Pham 1 Zhen Qin 1 Dara Bahri 1 Da-Cheng Juan 1 Donald Metzler 1 arXiv:2103.01075v1 [cs.CV] 1 Mar 2021 Abstract This paper proposes Omnidirectional Representations from ...
Score: 0.8792908258465023
User feedback: None
Out links: 799868 Raw text: 799868Title: Brain Efficiency: Much More than You Wanted to Know — LessWrong Description: What if the brain is highly efficient? To be more specific, there are several interconnected key measures of efficiency for physical learning machine… Keywords: No keywords Text content: Brain Efficiency: Much More t...
Score: 0.8785942109407829
User feedback: None
Out links: 205120 Raw text: 205120https://gwern.net/doc/www/arxiv.org/eeba4103b71baddb951cdde4962993257f5d6f07.pdf
Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, and Christopher Ré Department of Computer Science, Stanford University arXiv:2111.00396v2 [cs.LG] 4 Mar 2022 {albertgu,krng}@stanford.edu, [email protected] Abstract A central goal of sequence modeling ...
Score: 0.8744192655701442
User feedback: None
Out links: 975489 Raw text: 975489https://alignmentforum.org/posts/yQSmcfN4kA7rATHGK/many-arguments-for-ai-x-risk-are-wrong
Title: Many arguments for AI x-risk are wrong — AI Alignment Forum Description: The following is a lightly edited version of a memo I wrote for a retreat. It was inspired by a draft of Counting arguments provide no evidence for A… Keywords: No keywords Text content: Many arguments for AI x-risk are ...
Score: 0.8739845625117002
User feedback: None
Out links: 205037 Raw text: 205037https://gwern.net/doc/www/arxiv.org/d1278072a7a1822674440ddd0c6c820abc5b2e19.pdf
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute Tao Lei ASAPP, Inc. [email protected] 1.222 Abstract Large language models have become increasingly difficult to train because of the growing computation time and cost. In this work, we present SRU++, a highly-ef...
Score: 0.873906160599627
User feedback: None
Out links: 4838785 Raw text: 4838785https://scottlocklin.wordpress.com/category/tools/page/3/
Title: tools | Locklin on science | Page 3 Description: Posts about tools written by Scott Locklin Keywords: No keywords Text content: tools | Locklin on science | Page 3 Skip to content Skip to search - Accesskey = s Locklin on science Why e...
Score: 0.8640588116809524
User feedback: None
Out links: 4575672 Raw text: 4575672https://dblalock.substack.com/i/80235726/ul-unifying-language-learning-paradigms
Title: 2022-10-23 arXiv roundup: CNNs dominating sequence modeling, SetFit, Google finetuning trifecta Description: This newsletter made possible by MosaicML. Keywords: No keywords Text content: 2022-10-23 arXiv roundup: CNNs dominating sequence modeling, SetFit, Google finetuning trifecta ...
Score: 0.8635366494770073
User feedback: None
Out links: 4581680 Raw text: 4581680https://dblalock.substack.com/i/63471989/insights-into-pre-training-via-simpler-synthetic-tasks
Title: 2022-7-10 arXiv roundup: DeepSpeed inference, Simpler detection backbones, Spatial sparsification Description: This post made possible by MosaicML. If you like it, consider forwarding it to a friend! Keywords: No keywords Text content: 2022-7-10 arXiv roundup: DeepSpeed inference, Simpler det...
Score: 0.8624372849187085
User feedback: None
Out links: 4629572 Raw text: 4629572Title: 2022-5-15: T-Few, Task scaling, Gato - by Davis Blalock Description: These summaries made possible by MosaicML. If you find them helpful, the best way to thank me is by checking out + starring Composer, our open-source library for faster model training. Keywords: No keywords Text content: 202...
Score: 0.8616537922290168
User feedback: None
Out links: 993625 Raw text: 993625Title: Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain — AI Alignment Forum Description: I argue that an entire class of common arguments against short timelines is bogus, and provide weak evidence that anchoring to the human-brain-human-… Keyw...
Score: 0.8612758353119582
User feedback: None
Out links: 4562468 Raw text: 4562468Title: 2022-8-14 arXiv roundup: Branch-Train-Merge, Model patching, lots of LLM papers Description: This newsletter made possible by MosaicML. Keywords: No keywords Text content: 2022-8-14 arXiv roundup: Branch-Train-Merge, Model patching, lots of LLM papers Davis Summarizes...
Score: 0.8596563113643442
User feedback: None
Out links: 4581724 Raw text: 4581724https://dblalock.substack.com/i/74944665/mega-moving-average-equipped-gated-attention
Title: 2022-9-25 arXiv roundup: Metadata archaeology, Decaying pruning, 200x faster RL Description: This newsletter made possible by MosaicML. Btw, we’re looking for early customers who spend a lot on training neural nets for computer vision or NLP in PyTorch on cloud GPUs. Keywords: No keywords Tex...
Score: 0.8584212557932718
User feedback: None
Out links: 205013 Raw text: 205013https://gwern.net/doc/www/arxiv.org/c639528ca3cdba458c1e52f61e42863dce9599d7.pdf
Adaptive Multi-Resolution Attention with Linear Complexity Yao Zhang∗, 1 arXiv:2108.04962v1 [cs.LG] 10 Aug 2021 1 Yunpu Ma∗, 1 Thomas Seidl, 1 Volker Tresp 1,2 Institute of Informatics, LMU Munich, 2 Corporate Technology, Siemens AG [email protected], [email protected] [email protected]...
Score: 0.855323210951469
User feedback: None
Out links: 4692951 Raw text: 4692951Title: 2022-6-5 arXiv roundup: SAM for free, FlashAttention, Supervised MAE Description: This newsletter made possible by MosaicML. Relatedly, do you have any friends who are {ML, cloud, platform} engineers and who might be open to a new job? If so, if would be great if you could send them our caree...
Score: 0.8539219668745571
User feedback: None
Out links: 5323770 Raw text: 5323770https://a16zcrypto.com/posts/podcast/trends-2025-stablecoins-app-stores-infrastructure-ux-more/
Title: Talking trends 2025 (part 1): Stablecoins, app stores, UX, and more - a16z crypto Description: a16z crypto is a venture capital fund that has been investing in crypto and web3 startups — across all stages — since 2013. Keywords: No keywords Text content: Talking trends 2025 (part 1): Stableco...
Score: 0.8537339802575455
User feedback: None
Out links: 4581616 Raw text: 4581616Title: 2022-6-12: 7x Faster ResNet-50, BIG-Bench, Neural corpus indexer, DeepSpeed & fp8 quantization Description: Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer Keywords: No keywords Text content: 2022-6-12: 7x Faster ResNet-50, BIG-Bench, Neural corpus indexer, DeepSpe...
Score: 0.8528051177613354
User feedback: None
Out links: 199630 Raw text: 199630https://www.lesswrong.com/posts/65qmEJHDw3vw69tKm/proposal-scaling-laws-for-rl-generalization
Title: Proposal: Scaling laws for RL generalization — LessWrong Description: In this post, we (Alexander Meulemans, David Lindner, Florian Dorner) propose to study scaling laws relating the generalization capability of Reinfor… Keywords: No keywords Text content: Proposal: Scaling laws for RL genera...
Score: 0.8505010518497781
User feedback: None
Out links: 4562366 Raw text: 4562366Title: 2022-10-2 arXiv roundup: GPT-3 for $500k + Do we even need pretrained models? Description: Mosaic LLMs (Part 2): GPT-3 quality for <$500k Keywords: No keywords Text content: 2022-10-2 arXiv roundup: GPT-3 for $500k + Do we even need pretrained models? Davis Summarizes...