Reading list
Links 25
Score: 0.9235692522269203
User feedback: None
Out links: 199769 Raw text: 199769https://gwern.net/doc/www/arxiv.org/ba4384efc1bf12de84e047795780b517cfac7ac6.pdf
L EARNING TO L EARN WITH G ENERATIVE M ODELS OF N EURAL N ETWORK C HECKPOINTS William Peebles∗ Ilija Radosavovic∗ Tim Brooks Alexei A. Efros Jitendra Malik University of California, Berkeley arXiv:2209.12892v1 [cs.LG] 26 Sep 2022 A BSTRACT We explore a data-driven approach for learning to opt...
Score: 0.904689282392706
User feedback: None
Out links: 2123855 Raw text: 2123855http://www.cs.toronto.edu/~hinton/absps/OnlineDistillation.pdf
Published as a conference paper at ICLR 2018 L ARGE SCALE DISTRIBUTED NEURAL NETWORK TRAINING THROUGH ONLINE DISTILLATION Rohan Anil Google [email protected] Robert Ormandi Google [email protected] Gabriel Pereyra ∗ Google DeepMind [email protected] George E. Dahl Google Brain [email protected]...
Score: 0.8990216042421004
User feedback: None
Out links: 200006 Raw text: 200006https://gwern.net/doc/www/arxiv.org/79528489ccea8d598189f1f980c963c6c2ee576a.pdf
Published as a conference paper at ICLR 2021 L ARGE BATCH S IMULATION FOR D EEP R EINFORCEMENT L EARNING arXiv:2103.07013v1 [cs.LG] 12 Mar 2021 Brennan Shacklett1∗ Erik Wijmans2 Aleksei Petrenko3,4 Manolis Savva5 Dhruv Batra2 Vladlen Koltun3 Kayvon Fatahalian1 1 Stanford University 2 Georgia Inst...
Score: 0.889493896864104
User feedback: None
Out links: 199485 Raw text: 199485https://gwern.net/doc/www/arxiv.org/515af527150fbaa01ed9fe175750286a6a23f108.pdf
Published as a conference paper at ICLR 2023 arXiv:2210.14891v11 [cs.LG] 29 Mar 2023 B ROKEN N EURAL S CALING L AWS Ethan Caballero Mila, McGill University [email protected] [email protected] Kshitij Gupta Mila, University of Montreal Irina Rish Mila, University of Montr...
Score: 0.8800227433525747
User feedback: None
Out links: 205126 Raw text: 205126https://gwern.net/doc/www/arxiv.org/a986ec6fafa88a1a4f52523a902c22652e30d36a.pdf
OmniNet: Omnidirectional Representations from Transformers Yi Tay * 1 Mostafa Dehghani * 2 Vamsi Aribandi 1 3 Jai Gupta 1 Philip Pham 1 Zhen Qin 1 Dara Bahri 1 Da-Cheng Juan 1 Donald Metzler 1 arXiv:2103.01075v1 [cs.CV] 1 Mar 2021 Abstract This paper proposes Omnidirectional Representations from ...
Score: 0.8785942109407829
User feedback: None
Out links: 205120 Raw text: 205120https://gwern.net/doc/www/arxiv.org/eeba4103b71baddb951cdde4962993257f5d6f07.pdf
Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, and Christopher Ré Department of Computer Science, Stanford University arXiv:2111.00396v2 [cs.LG] 4 Mar 2022 {albertgu,krng}@stanford.edu, [email protected] Abstract A central goal of sequence modeling ...
Score: 0.8754801461471287
User feedback: None
Out links: 199826 Raw text: 199826https://gwern.net/doc/www/arxiv.org/0d038e2911d09a1965e6741e6418a062b65901bb.pdf
O FFLINE Q-L EARNING ON D IVERSE M ULTI -TASK DATA B OTH S CALES A ND G ENERALIZES Aviral Kumar1,2 Rishabh Agarwal1 Xinyang Geng2 George Tucker∗,1 Sergey Levine∗,1,2 1 Google Research, Brain Team 2 UC Berkeley arXiv:2211.15144v2 [cs.LG] 17 Apr 2023 {aviralk, young.geng, svlevine}@eecs.berkeley.edu...
Score: 0.8739845625117002
User feedback: None
Out links: 205037 Raw text: 205037https://gwern.net/doc/www/arxiv.org/d1278072a7a1822674440ddd0c6c820abc5b2e19.pdf
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute Tao Lei ASAPP, Inc. [email protected] 1.222 Abstract Large language models have become increasingly difficult to train because of the growing computation time and cost. In this work, we present SRU++, a highly-ef...
Score: 0.8699225284191421
User feedback: None
Out links: 3268773 Raw text: 3268773https://cs.stanford.edu/~diyiy/docs/acl21_hiddencut.pdf
HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization Jiaao Chen, Dinghan Shen1 , Weizhu Chen1 , Diyi Yang Georgia Institute of Technology, 1 Microsoft Dynamics 365 AI {jchen896,dyang888}@gatech.edu {dishen,wzchen}@microsoft.com Abstract Fine-tuning large...
Score: 0.8686906632095005
User feedback: None
Out links: 1193705 Raw text: 1193705https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1244/final-projects/ChiYoTsaiJayMartin.pdf
Outrageously Fast LLMs: Faster Inference and Fine-Tuning with Moefication and LoRA Stanford CS224N Custom Project Mentor: Tony Wang Chi Tsai∗ Department of Computer Science Stanford University [email protected] Jay Martin† Department of Computer Science Stanford University [email protected] ...
Score: 0.8653833446523154
User feedback: None
Out links: 199554 Raw text: 199554https://gwern.net/doc/www/arxiv.org/404233e99de98f699e9d84e002e2099b202d2959.pdf
Measuring Progress in Deep Reinforcement Learning Sample Efficiency Florian E. Dorner Institute of Science, Technology and Policy, ETH Zurich [email protected] arXiv:2102.04881v1 [cs.LG] 9 Feb 2021 Abstract Sampled environment transitions are a critical input to deep reinforcement learni...
Score: 0.8584212557932718
User feedback: None
Out links: 205013 Raw text: 205013https://gwern.net/doc/www/arxiv.org/c639528ca3cdba458c1e52f61e42863dce9599d7.pdf
Adaptive Multi-Resolution Attention with Linear Complexity Yao Zhang∗, 1 arXiv:2108.04962v1 [cs.LG] 10 Aug 2021 1 Yunpu Ma∗, 1 Thomas Seidl, 1 Volker Tresp 1,2 Institute of Informatics, LMU Munich, 2 Corporate Technology, Siemens AG [email protected], [email protected] [email protected]...
Score: 0.8480798445298792
User feedback: None
Out links: 205011 Raw text: 205011https://gwern.net/doc/www/arxiv.org/0701285b128d3a748a3bf37d457c72010d08fe46.pdf
Finetuning Pretrained Transformers into RNNs Jungo Kasai♡∗ Hao Peng♡ Yizhe Zhang♣ Dani Yogatama♠ ♡ ♡ Gabriel Ilharco Nikolaos Pappas Yi Mao♣ Weizhu Chen♣ Noah A. Smith♡♢ ♡ Paul G. Allen School of Computer Science & Engineering, University of Washington ♣ Microsoft ♠ DeepMind ♢ Allen Institute for ...
Score: 0.8459963594865105
User feedback: None
Out links: 205034 Raw text: 205034https://gwern.net/doc/www/arxiv.org/86693d8a9469f413a8b2735801feaa1a9d0dc50c.pdf
RWKV: Reinventing RNNs for the Transformer Era Bo Peng1∗ Eric Alcaide2,3,4∗ Quentin Anthony2,5∗ Alon Albalak Samuel Arcadinho2,7 Huanqi Cao8 Xin Cheng9 Michael Chung10 Matteo Grella11 Kranthi Kiran GV12 Xuzheng He2 Haowen Hou13 Przemysław Kazienko14 Jan Kocoń14 Jiaming Kong15 Bartłomiej Koptyra14 H...
Score: 0.8459471281443147
User feedback: None
Out links: 205201 Raw text: 205201https://gwern.net/doc/www/openreview.net/45f3d6c27e2b3f5b53fe6ecd14b4b122a8470ac6.pdf
Under review as a conference paper at ICLR 2022 A D OT P RODUCT ATTENTION F REE T RANSFORMER Anonymous authors Paper under double-blind review A BSTRACT We introduce Dot Product Attention Free Transformer (DAFT), an efficient variant of Transformers (Vaswani et al., 2017) that eliminates the query...
Score: 0.8456152222090448
User feedback: None
Out links: 205044 Raw text: 205044https://gwern.net/doc/www/arxiv.org/9aaf30e79a8b51c86a764b0b8eb725004fbddd32.pdf
C URRENT L IMITATIONS OF L ANGUAGE M ODELS : W HAT YOU N EED IS R ETRIEVAL arXiv:2009.06857v1 [cs.CL] 15 Sep 2020 Aran Komatsuzaki Georgia Institute of Technology EleutherAI [email protected] A BSTRACT We classify and re-examine some of the current approaches to improve the performance-com...
Score: 0.8450740707172638
User feedback: None
Out links: 205046 Raw text: 205046https://gwern.net/doc/www/arxiv.org/2c84075b5f7b38e98ad6ee0739e9c30f23ab3778.pdf
Luna: Linear Unified Nested Attention Chunting Zhou LTI, CMU [email protected] Xiang Kong∗ LTI, CMU [email protected] Jonathan May ISI, USC [email protected] Sinong Wang∗ Facebook AI [email protected] Hao Ma, Luke Zettlemoyer Facebook AI {haom, lsz}@fb.com Abstract The quadratic computational and...
Score: 0.8448275767669189
User feedback: None
Out links: 205076 Raw text: 205076https://gwern.net/doc/www/arxiv.org/6cab03ecf704e10f4f43f732577daf01daa03a1b.pdf
arXiv:2006.11527v2 [cs.CL] 16 Feb 2021 M EMORY T RANSFORMER Mikhail S. Burtsev Neural Networks and Deep Learning Lab Moscow Institute of Physics and Technology Dolgoprudny, Russia [email protected] Yuri Kuratov Neural Networks and Deep Learning Lab Moscow Institute of Physics and Technology Dolgo...
Score: 0.8430499062025062
User feedback: None
Out links: 352812 Raw text: 352812http://proceedings.mlr.press/v80/kamnitsas18a/kamnitsas18a.pdf
Semi-Supervised Learning via Compact Latent Space Clustering Konstantinos Kamnitsas 1 2 Daniel C. Castro 1 2 Loic Le Folgoc 2 Ian Walker 2 Ryutaro Tanno 1 3 Daniel Rueckert 2 Ben Glocker 2 Antonio Criminisi 1 Aditya Nori 1 Abstract We present a novel cost function for semisupervised learning of ne...
Score: 0.8426350711444989
User feedback: None
Out links: 3885218 Raw text: 3885218https://www.mit.edu/~gfarina/2023/escher_iclr23/2206.04122.pdf
ESCHER: E SCHEWING I MPORTANCE S AMPLING IN G AMES BY C OMPUTING A H ISTORY VALUE F UNCTION TO E STIMATE R EGRET arXiv:2206.04122v2 [cs.GT] 11 Oct 2022 Stephen McAleer Carnegie Mellon University [email protected] Gabriele Farina Carnegie Mellon University [email protected] Marc Lanctot DeepMi...
Score: 0.8418470832430927
User feedback: None
Out links: 205114 Raw text: 205114https://gwern.net/doc/www/openreview.net/e998caa668bfed59cb006c4f3cd8de1b4620cc05.pdf
Published as a conference paper at ICLR 2021 R ANDOM F EATURE ATTENTION Hao Peng♠∗ Nikolaos Pappas♠ Dani Yogatama♣ Roy Schwartz♥ Noah A. Smith♠♦ Lingpeng Kong♦∗ ♠ Paul G. Allen School of Computer Science & Engineering, University of Washington ♣ DeepMind ♦ Allen Institute for Artificial Intelligenc...
Score: 0.8415860748933458
User feedback: None
Out links: 3010548 Raw text: 3010548https://homepages.inf.ed.ac.uk/csutton/publications/nota-ir403.pdf
Fast, Piecewise Training for Discriminative Finite-state and Parsing Models Charles Sutton and Andrew McCallum Department of Computer Science University of Massachusetts Amherst Amherst, MA 01003 USA {casutton,mccallum}@cs.umass.edu Abstract Discriminitive models for sequences and trees—such as lin...
Score: 0.8405262601175294
User feedback: None
Out links: 2832902 Raw text: 2832902https://homepages.inf.ed.ac.uk/imurray2/pub/15rnnopt/rnnopt.pdf
On the Efficiency of Recurrent Neural Network Optimization Algorithms Ben Krause, Liang Lu, Iain Murray, Steve Renals University of Edinburgh Department of Informatics [email protected], [email protected], [email protected], [email protected] Abstract This study compares the sequential an...
Score: 0.8367095466920836
User feedback: None
Out links: 352345 Raw text: 352345http://proceedings.mlr.press/v80/lucas18a/lucas18a.pdf
Mixed batches and symmetric discriminators for GAN training Thomas Lucas* 1 Corentin Tallec* 2 Jakob Verbeek 1 Yann Ollivier 3 Abstract vincing source of samples of natural images (Karras et al., 2018). GANs consist of a generator and a discriminator network. The generator maps samples from a lat...
Score: 0.8366056065500114
User feedback: None
Out links: 205074 Raw text: 205074https://gwern.net/doc/www/arxiv.org/f63a0b34378396bff253d974efc8664d5620489c.pdf
Sub-Linear Memory: How to Make Performers SLiM Valerii Likhosherstov 1 Krzysztof Choromanski 2 3 Jared Davis 4 5 Xingyou Song 2 Adrian Weller 1 6 arXiv:2012.11346v1 [cs.LG] 21 Dec 2020 Abstract The Transformer architecture has revolutionized deep learning on sequential data, becoming ubiquitous i...