Yearly

Weekly

Daily

Reading list

Links 25

Title: 2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens

Score: 0.9199937989846819

User feedback: None

Out links: 4819631 Raw text: 4819631

https://dblalock.substack.com/i/53277599/learning-to-merge-tokens-in-vision-transformers

Title: 2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens Description: Mixture-of-Experts with Expert Choice Routing Keywords: No keywords Text content: 2022-2-27: Flash, Expert Choice Routing, Effective MoE, Merging inputs and tokens Davis Sum...

Title: 2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot instances

Score: 0.9143604751418759

User feedback: None

Out links: 4581732 Raw text: 4581732

https://dblalock.substack.com/i/53504615/vicreg-variance-invariance-covariance-regularization-for-self-supervised-learning

Title: 2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot instances Description: An Extendable, Efficient and Effective Transformer-based Object Detector Keywords: No keywords Text content: 2022-5-1: PolyLoss, Subquadratic loss landscapes, Large-scale training on spot ins...

Title: None

Score: 0.904689282392706

User feedback: None

Out links: 2123855 Raw text: 2123855

http://www.cs.toronto.edu/~hinton/absps/OnlineDistillation.pdf

Published as a conference paper at ICLR 2018 L ARGE SCALE DISTRIBUTED NEURAL NETWORK TRAINING THROUGH ONLINE DISTILLATION Rohan Anil Google [email protected] Robert Ormandi Google [email protected] Gabriel Pereyra ∗ Google DeepMind [email protected] George E. Dahl Google Brain [email protected]...

Title: Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup

Score: 0.9035635948106884

User feedback: None

Out links: 4562470 Raw text: 4562470

https://dblalock.substack.com/i/129327559/tune-as-you-scale-hyperparameter-optimization-for-compute-efficient-training

Title: Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup Description: This newsletter made possible by MosaicML. Keywords: No keywords Text content: Have we hit a statistical wall in LLM scaling? - 2023-6-18 arXiv roundup Davis Summarizes PapersSubscri...

Title: 2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better

Score: 0.8903061041652235

User feedback: None

Out links: 4692917 Raw text: 4692917

https://dblalock.substack.com/i/53497927/on-the-representation-collapse-of-sparse-mixture-of-experts

Title: 2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better Description: ⭐ Merging of neural networks Keywords: No keywords Text content: 2022-4-24: Merging networks, Wall of MoE papers, Diverse models transfer better Davis Summarizes PapersSubscri...

Title: 2022-10-16 arXiv roundup: Augmentation scaling, Better CNN initialization, Transformer sparsity

Score: 0.8848739385679035

User feedback: None

Out links: 4562393 Raw text: 4562393

https://dblalock.substack.com/i/78823108/how-much-data-are-augmentations-worth-an-investigation-into-scaling-laws-invariance-and-implicit-regularization

Title: 2022-10-16 arXiv roundup: Augmentation scaling, Better CNN initialization, Transformer sparsity Description: This newsletter made possible by MosaicML. Also thanks to @andrey_kurenkov (author of The Gradient) for recommending this newsletter on Twitter! Keywords: No keywords Text content: 202...

Title: Reframing inner alignment — AI Alignment Forum

Score: 0.8804583587688676

User feedback: None

Out links: 896606 Raw text: 896606

https://www.alignmentforum.org/posts/xMQ7vwFACQX3gZouv/reframing-inner-alignment

Title: Reframing inner alignment — AI Alignment Forum Description: The standard frame (Evan Hubinger, 2021) is: … Keywords: No keywords Text content: Reframing inner alignment — AI Alignment Forum This website requires javascript to properly function. Consider activating javascript to get access t...

Title: OmniNet: Omnidirectional Representations from Transformers

Score: 0.8800227433525747

User feedback: None

Out links: 205126 Raw text: 205126

https://gwern.net/doc/www/arxiv.org/a986ec6fafa88a1a4f52523a902c22652e30d36a.pdf

OmniNet: Omnidirectional Representations from Transformers Yi Tay * 1 Mostafa Dehghani * 2 Vamsi Aribandi 1 3 Jai Gupta 1 Philip Pham 1 Zhen Qin 1 Dara Bahri 1 Da-Cheng Juan 1 Donald Metzler 1 arXiv:2103.01075v1 [cs.CV] 1 Mar 2021 Abstract This paper proposes Omnidirectional Representations from ...

Title: Brain Efficiency: Much More than You Wanted to Know — LessWrong

Score: 0.8792908258465023

User feedback: None

Out links: 799868 Raw text: 799868

https://www.alignmentforum.org/posts/xwBuoE9p8GE7RAuhd/brain-efficiency-much-more-than-you-wanted-to-know

Title: Brain Efficiency: Much More than You Wanted to Know — LessWrong Description: What if the brain is highly efficient? To be more specific, there are several interconnected key measures of efficiency for physical learning machine… Keywords: No keywords Text content: Brain Efficiency: Much More t...

Title: None

Score: 0.8785942109407829

User feedback: None

Out links: 205120 Raw text: 205120

https://gwern.net/doc/www/arxiv.org/eeba4103b71baddb951cdde4962993257f5d6f07.pdf

Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, and Christopher Ré Department of Computer Science, Stanford University arXiv:2111.00396v2 [cs.LG] 4 Mar 2022 {albertgu,krng}@stanford.edu, [email protected] Abstract A central goal of sequence modeling ...

Title: Many arguments for AI x-risk are wrong — AI Alignment Forum

Score: 0.8744192655701442

User feedback: None

Out links: 975489 Raw text: 975489

https://alignmentforum.org/posts/yQSmcfN4kA7rATHGK/many-arguments-for-ai-x-risk-are-wrong

Title: Many arguments for AI x-risk are wrong — AI Alignment Forum Description: The following is a lightly edited version of a memo I wrote for a retreat. It was inspired by a draft of Counting arguments provide no evidence for A… Keywords: No keywords Text content: Many arguments for AI x-risk are ...

Title: None

Score: 0.8739845625117002

User feedback: None

Out links: 205037 Raw text: 205037

https://gwern.net/doc/www/arxiv.org/d1278072a7a1822674440ddd0c6c820abc5b2e19.pdf

When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute Tao Lei ASAPP, Inc. [email protected] 1.222 Abstract Large language models have become increasingly difficult to train because of the growing computation time and cost. In this work, we present SRU++, a highly-ef...

Title: tools | Locklin on science | Page 3

Score: 0.873906160599627

User feedback: None

Out links: 4838785 Raw text: 4838785

https://scottlocklin.wordpress.com/category/tools/page/3/

Title: tools | Locklin on science | Page 3 Description: Posts about tools written by Scott Locklin Keywords: No keywords Text content: tools | Locklin on science | Page 3 Skip to content Skip to search - Accesskey = s Locklin on science Why e...

Title: 2022-10-23 arXiv roundup: CNNs dominating sequence modeling, SetFit, Google finetuning trifecta

Score: 0.8640588116809524

User feedback: None

Out links: 4575672 Raw text: 4575672

https://dblalock.substack.com/i/80235726/ul-unifying-language-learning-paradigms

Title: 2022-10-23 arXiv roundup: CNNs dominating sequence modeling, SetFit, Google finetuning trifecta Description: This newsletter made possible by MosaicML. Keywords: No keywords Text content: 2022-10-23 arXiv roundup: CNNs dominating sequence modeling, SetFit, Google finetuning trifecta ...

Title: 2022-7-10 arXiv roundup: DeepSpeed inference, Simpler detection backbones, Spatial sparsification

Score: 0.8635366494770073

User feedback: None

Out links: 4581680 Raw text: 4581680

https://dblalock.substack.com/i/63471989/insights-into-pre-training-via-simpler-synthetic-tasks

Title: 2022-7-10 arXiv roundup: DeepSpeed inference, Simpler detection backbones, Spatial sparsification Description: This post made possible by MosaicML. If you like it, consider forwarding it to a friend! Keywords: No keywords Text content: 2022-7-10 arXiv roundup: DeepSpeed inference, Simpler det...

Title: 2022-5-15: T-Few, Task scaling, Gato - by Davis Blalock

Score: 0.8624372849187085

User feedback: None

Out links: 4629572 Raw text: 4629572

https://dblalock.substack.com/i/54969325/few-shot-parameter-efficient-fine-tuning-is-better-and-cheaper-than-in-context-learning

Title: 2022-5-15: T-Few, Task scaling, Gato - by Davis Blalock Description: These summaries made possible by MosaicML. If you find them helpful, the best way to thank me is by checking out + starring Composer, our open-source library for faster model training. Keywords: No keywords Text content: 202...

Title: Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain — AI Alignment Forum

Score: 0.8616537922290168

User feedback: None

Out links: 993625 Raw text: 993625

https://www.alignmentforum.org/posts/HhWhaSzQr6xmBki8F/birds-planes-brains-and-ai-against-appeals-to-the-complexity

Title: Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain — AI Alignment Forum Description: I argue that an entire class of common arguments against short timelines is bogus, and provide weak evidence that anchoring to the human-brain-human-… Keyw...

Title: 2022-8-14 arXiv roundup: Branch-Train-Merge, Model patching, lots of LLM papers

Score: 0.8612758353119582

User feedback: None

Out links: 4562468 Raw text: 4562468

https://dblalock.substack.com/i/68679449/quality-not-quantity-on-the-interaction-between-dataset-design-and-robustness-of-clip

Title: 2022-8-14 arXiv roundup: Branch-Train-Merge, Model patching, lots of LLM papers Description: This newsletter made possible by MosaicML. Keywords: No keywords Text content: 2022-8-14 arXiv roundup: Branch-Train-Merge, Model patching, lots of LLM papers Davis Summarizes...

Title: 2022-9-25 arXiv roundup: Metadata archaeology, Decaying pruning, 200x faster RL

Score: 0.8596563113643442

User feedback: None

Out links: 4581724 Raw text: 4581724

https://dblalock.substack.com/i/74944665/mega-moving-average-equipped-gated-attention

Title: 2022-9-25 arXiv roundup: Metadata archaeology, Decaying pruning, 200x faster RL Description: This newsletter made possible by MosaicML. Btw, we’re looking for early customers who spend a lot on training neural nets for computer vision or NLP in PyTorch on cloud GPUs. Keywords: No keywords Tex...

Title: None

Score: 0.8584212557932718

User feedback: None

Out links: 205013 Raw text: 205013

https://gwern.net/doc/www/arxiv.org/c639528ca3cdba458c1e52f61e42863dce9599d7.pdf

Adaptive Multi-Resolution Attention with Linear Complexity Yao Zhang∗, 1 arXiv:2108.04962v1 [cs.LG] 10 Aug 2021 1 Yunpu Ma∗, 1 Thomas Seidl, 1 Volker Tresp 1,2 Institute of Informatics, LMU Munich, 2 Corporate Technology, Siemens AG [email protected], [email protected] [email protected]...

Title: 2022-6-5 arXiv roundup: SAM for free, FlashAttention, Supervised MAE

Score: 0.855323210951469

User feedback: None

Out links: 4692951 Raw text: 4692951

https://dblalock.substack.com/i/58144356/fast-benchmarking-of-accuracy-vs-training-time-with-cyclic-learning-rates

Title: 2022-6-5 arXiv roundup: SAM for free, FlashAttention, Supervised MAE Description: This newsletter made possible by MosaicML. Relatedly, do you have any friends who are {ML, cloud, platform} engineers and who might be open to a new job? If so, if would be great if you could send them our caree...

Title: Talking trends 2025 (part 1): Stablecoins, app stores, UX, and more - a16z crypto

Score: 0.8539219668745571

User feedback: None

Out links: 5323770 Raw text: 5323770

https://a16zcrypto.com/posts/podcast/trends-2025-stablecoins-app-stores-infrastructure-ux-more/

Title: Talking trends 2025 (part 1): Stablecoins, app stores, UX, and more - a16z crypto Description: a16z crypto is a venture capital fund that has been investing in crypto and web3 startups — across all stages — since 2013. Keywords: No keywords Text content: Talking trends 2025 (part 1): Stableco...

Title: 2022-6-12: 7x Faster ResNet-50, BIG-Bench, Neural corpus indexer, DeepSpeed & fp8 quantization

Score: 0.8537339802575455

User feedback: None

Out links: 4581616 Raw text: 4581616

https://dblalock.substack.com/i/59092391/blazingly-fast-computer-vision-training-with-the-mosaic-resnet-and-composer

Title: 2022-6-12: 7x Faster ResNet-50, BIG-Bench, Neural corpus indexer, DeepSpeed & fp8 quantization Description: Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer Keywords: No keywords Text content: 2022-6-12: 7x Faster ResNet-50, BIG-Bench, Neural corpus indexer, DeepSpe...

Title: Proposal: Scaling laws for RL generalization — LessWrong

Score: 0.8528051177613354

User feedback: None

Out links: 199630 Raw text: 199630

https://www.lesswrong.com/posts/65qmEJHDw3vw69tKm/proposal-scaling-laws-for-rl-generalization

Title: Proposal: Scaling laws for RL generalization — LessWrong Description: In this post, we (Alexander Meulemans, David Lindner, Florian Dorner) propose to study scaling laws relating the generalization capability of Reinfor… Keywords: No keywords Text content: Proposal: Scaling laws for RL genera...

Title: 2022-10-2 arXiv roundup: GPT-3 for $500k + Do we even need pretrained models?

Score: 0.8505010518497781

User feedback: None

Out links: 4562366 Raw text: 4562366

https://dblalock.substack.com/i/76150421/downstream-datasets-make-surprisingly-good-pretraining-corpora

Title: 2022-10-2 arXiv roundup: GPT-3 for $500k + Do we even need pretrained models? Description: Mosaic LLMs (Part 2): GPT-3 quality for <$500k Keywords: No keywords Text content: 2022-10-2 arXiv roundup: GPT-3 for $500k + Do we even need pretrained models? Davis Summarizes...