Yearly

Weekly

Daily

Reading list

Links 25

Title: None

Score: 0.9235692522269203

User feedback: None

Out links: 199769 Raw text: 199769

https://gwern.net/doc/www/arxiv.org/ba4384efc1bf12de84e047795780b517cfac7ac6.pdf

L EARNING TO L EARN WITH G ENERATIVE M ODELS OF N EURAL N ETWORK C HECKPOINTS William Peebles∗ Ilija Radosavovic∗ Tim Brooks Alexei A. Efros Jitendra Malik University of California, Berkeley arXiv:2209.12892v1 [cs.LG] 26 Sep 2022 A BSTRACT We explore a data-driven approach for learning to opt...

Title: None

Score: 0.904689282392706

User feedback: None

Out links: 2123855 Raw text: 2123855

http://www.cs.toronto.edu/~hinton/absps/OnlineDistillation.pdf

Published as a conference paper at ICLR 2018 L ARGE SCALE DISTRIBUTED NEURAL NETWORK TRAINING THROUGH ONLINE DISTILLATION Rohan Anil Google [email protected] Robert Ormandi Google [email protected] Gabriel Pereyra ∗ Google DeepMind [email protected] George E. Dahl Google Brain [email protected]...

Title: None

Score: 0.8990216042421004

User feedback: None

Out links: 200006 Raw text: 200006

https://gwern.net/doc/www/arxiv.org/79528489ccea8d598189f1f980c963c6c2ee576a.pdf

Published as a conference paper at ICLR 2021 L ARGE BATCH S IMULATION FOR D EEP R EINFORCEMENT L EARNING arXiv:2103.07013v1 [cs.LG] 12 Mar 2021 Brennan Shacklett1∗ Erik Wijmans2 Aleksei Petrenko3,4 Manolis Savva5 Dhruv Batra2 Vladlen Koltun3 Kayvon Fatahalian1 1 Stanford University 2 Georgia Inst...

Title: None

Score: 0.889493896864104

User feedback: None

Out links: 199485 Raw text: 199485

https://gwern.net/doc/www/arxiv.org/515af527150fbaa01ed9fe175750286a6a23f108.pdf

Published as a conference paper at ICLR 2023 arXiv:2210.14891v11 [cs.LG] 29 Mar 2023 B ROKEN N EURAL S CALING L AWS Ethan Caballero Mila, McGill University [email protected] [email protected] Kshitij Gupta Mila, University of Montreal Irina Rish Mila, University of Montr...

Title: OmniNet: Omnidirectional Representations from Transformers

Score: 0.8800227433525747

User feedback: None

Out links: 205126 Raw text: 205126

https://gwern.net/doc/www/arxiv.org/a986ec6fafa88a1a4f52523a902c22652e30d36a.pdf

OmniNet: Omnidirectional Representations from Transformers Yi Tay * 1 Mostafa Dehghani * 2 Vamsi Aribandi 1 3 Jai Gupta 1 Philip Pham 1 Zhen Qin 1 Dara Bahri 1 Da-Cheng Juan 1 Donald Metzler 1 arXiv:2103.01075v1 [cs.CV] 1 Mar 2021 Abstract This paper proposes Omnidirectional Representations from ...

Title: None

Score: 0.8785942109407829

User feedback: None

Out links: 205120 Raw text: 205120

https://gwern.net/doc/www/arxiv.org/eeba4103b71baddb951cdde4962993257f5d6f07.pdf

Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, and Christopher Ré Department of Computer Science, Stanford University arXiv:2111.00396v2 [cs.LG] 4 Mar 2022 {albertgu,krng}@stanford.edu, [email protected] Abstract A central goal of sequence modeling ...

Title: None

Score: 0.8754801461471287

User feedback: None

Out links: 199826 Raw text: 199826

https://gwern.net/doc/www/arxiv.org/0d038e2911d09a1965e6741e6418a062b65901bb.pdf

O FFLINE Q-L EARNING ON D IVERSE M ULTI -TASK DATA B OTH S CALES A ND G ENERALIZES Aviral Kumar1,2 Rishabh Agarwal1 Xinyang Geng2 George Tucker∗,1 Sergey Levine∗,1,2 1 Google Research, Brain Team 2 UC Berkeley arXiv:2211.15144v2 [cs.LG] 17 Apr 2023 {aviralk, young.geng, svlevine}@eecs.berkeley.edu...

Title: None

Score: 0.8739845625117002

User feedback: None

Out links: 205037 Raw text: 205037

https://gwern.net/doc/www/arxiv.org/d1278072a7a1822674440ddd0c6c820abc5b2e19.pdf

When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute Tao Lei ASAPP, Inc. [email protected] 1.222 Abstract Large language models have become increasingly difficult to train because of the growing computation time and cost. In this work, we present SRU++, a highly-ef...

Title: None

Score: 0.8699225284191421

User feedback: None

Out links: 3268773 Raw text: 3268773

https://cs.stanford.edu/~diyiy/docs/acl21_hiddencut.pdf

HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization Jiaao Chen, Dinghan Shen1 , Weizhu Chen1 , Diyi Yang Georgia Institute of Technology, 1 Microsoft Dynamics 365 AI {jchen896,dyang888}@gatech.edu {dishen,wzchen}@microsoft.com Abstract Fine-tuning large...

Title: None

Score: 0.8686906632095005

User feedback: None

Out links: 1193705 Raw text: 1193705

https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1244/final-projects/ChiYoTsaiJayMartin.pdf

Outrageously Fast LLMs: Faster Inference and Fine-Tuning with Moefication and LoRA Stanford CS224N Custom Project Mentor: Tony Wang Chi Tsai∗ Department of Computer Science Stanford University [email protected] Jay Martin† Department of Computer Science Stanford University [email protected] ...

Title: None

Score: 0.8653833446523154

User feedback: None

Out links: 199554 Raw text: 199554

https://gwern.net/doc/www/arxiv.org/404233e99de98f699e9d84e002e2099b202d2959.pdf

Measuring Progress in Deep Reinforcement Learning Sample Efficiency Florian E. Dorner Institute of Science, Technology and Policy, ETH Zurich [email protected] arXiv:2102.04881v1 [cs.LG] 9 Feb 2021 Abstract Sampled environment transitions are a critical input to deep reinforcement learni...

Title: None

Score: 0.8584212557932718

User feedback: None

Out links: 205013 Raw text: 205013

https://gwern.net/doc/www/arxiv.org/c639528ca3cdba458c1e52f61e42863dce9599d7.pdf

Adaptive Multi-Resolution Attention with Linear Complexity Yao Zhang∗, 1 arXiv:2108.04962v1 [cs.LG] 10 Aug 2021 1 Yunpu Ma∗, 1 Thomas Seidl, 1 Volker Tresp 1,2 Institute of Informatics, LMU Munich, 2 Corporate Technology, Siemens AG [email protected], [email protected] [email protected]...

Title: None

Score: 0.8480798445298792

User feedback: None

Out links: 205011 Raw text: 205011

https://gwern.net/doc/www/arxiv.org/0701285b128d3a748a3bf37d457c72010d08fe46.pdf

Finetuning Pretrained Transformers into RNNs Jungo Kasai♡∗ Hao Peng♡ Yizhe Zhang♣ Dani Yogatama♠ ♡ ♡ Gabriel Ilharco Nikolaos Pappas Yi Mao♣ Weizhu Chen♣ Noah A. Smith♡♢ ♡ Paul G. Allen School of Computer Science & Engineering, University of Washington ♣ Microsoft ♠ DeepMind ♢ Allen Institute for ...

Title: None

Score: 0.8459963594865105

User feedback: None

Out links: 205034 Raw text: 205034

https://gwern.net/doc/www/arxiv.org/86693d8a9469f413a8b2735801feaa1a9d0dc50c.pdf

RWKV: Reinventing RNNs for the Transformer Era Bo Peng1∗ Eric Alcaide2,3,4∗ Quentin Anthony2,5∗ Alon Albalak Samuel Arcadinho2,7 Huanqi Cao8 Xin Cheng9 Michael Chung10 Matteo Grella11 Kranthi Kiran GV12 Xuzheng He2 Haowen Hou13 Przemysław Kazienko14 Jan Kocoń14 Jiaming Kong15 Bartłomiej Koptyra14 H...

Title: None

Score: 0.8459471281443147

User feedback: None

Out links: 205201 Raw text: 205201

https://gwern.net/doc/www/openreview.net/45f3d6c27e2b3f5b53fe6ecd14b4b122a8470ac6.pdf

Under review as a conference paper at ICLR 2022 A D OT P RODUCT ATTENTION F REE T RANSFORMER Anonymous authors Paper under double-blind review A BSTRACT We introduce Dot Product Attention Free Transformer (DAFT), an efficient variant of Transformers (Vaswani et al., 2017) that eliminates the query...

Title: None

Score: 0.8456152222090448

User feedback: None

Out links: 205044 Raw text: 205044

https://gwern.net/doc/www/arxiv.org/9aaf30e79a8b51c86a764b0b8eb725004fbddd32.pdf

C URRENT L IMITATIONS OF L ANGUAGE M ODELS : W HAT YOU N EED IS R ETRIEVAL arXiv:2009.06857v1 [cs.CL] 15 Sep 2020 Aran Komatsuzaki Georgia Institute of Technology EleutherAI [email protected] A BSTRACT We classify and re-examine some of the current approaches to improve the performance-com...

Title: None

Score: 0.8450740707172638

User feedback: None

Out links: 205046 Raw text: 205046

https://gwern.net/doc/www/arxiv.org/2c84075b5f7b38e98ad6ee0739e9c30f23ab3778.pdf

Luna: Linear Unified Nested Attention Chunting Zhou LTI, CMU [email protected] Xiang Kong∗ LTI, CMU [email protected] Jonathan May ISI, USC [email protected] Sinong Wang∗ Facebook AI [email protected] Hao Ma, Luke Zettlemoyer Facebook AI {haom, lsz}@fb.com Abstract The quadratic computational and...

Title: None

Score: 0.8448275767669189

User feedback: None

Out links: 205076 Raw text: 205076

https://gwern.net/doc/www/arxiv.org/6cab03ecf704e10f4f43f732577daf01daa03a1b.pdf

arXiv:2006.11527v2 [cs.CL] 16 Feb 2021 M EMORY T RANSFORMER Mikhail S. Burtsev Neural Networks and Deep Learning Lab Moscow Institute of Physics and Technology Dolgoprudny, Russia [email protected] Yuri Kuratov Neural Networks and Deep Learning Lab Moscow Institute of Physics and Technology Dolgo...

Title: Semi-Supervised Learning via Compact Latent Space Clustering

Score: 0.8430499062025062

User feedback: None

Out links: 352812 Raw text: 352812

http://proceedings.mlr.press/v80/kamnitsas18a/kamnitsas18a.pdf

Semi-Supervised Learning via Compact Latent Space Clustering Konstantinos Kamnitsas 1 2 Daniel C. Castro 1 2 Loic Le Folgoc 2 Ian Walker 2 Ryutaro Tanno 1 3 Daniel Rueckert 2 Ben Glocker 2 Antonio Criminisi 1 Aditya Nori 1 Abstract We present a novel cost function for semisupervised learning of ne...

Title: None

Score: 0.8426350711444989

User feedback: None

Out links: 3885218 Raw text: 3885218

https://www.mit.edu/~gfarina/2023/escher_iclr23/2206.04122.pdf

ESCHER: E SCHEWING I MPORTANCE S AMPLING IN G AMES BY C OMPUTING A H ISTORY VALUE F UNCTION TO E STIMATE R EGRET arXiv:2206.04122v2 [cs.GT] 11 Oct 2022 Stephen McAleer Carnegie Mellon University [email protected] Gabriele Farina Carnegie Mellon University [email protected] Marc Lanctot DeepMi...

Title: None

Score: 0.8418470832430927

User feedback: None

Out links: 205114 Raw text: 205114

https://gwern.net/doc/www/openreview.net/e998caa668bfed59cb006c4f3cd8de1b4620cc05.pdf

Published as a conference paper at ICLR 2021 R ANDOM F EATURE ATTENTION Hao Peng♠∗ Nikolaos Pappas♠ Dani Yogatama♣ Roy Schwartz♥ Noah A. Smith♠♦ Lingpeng Kong♦∗ ♠ Paul G. Allen School of Computer Science & Engineering, University of Washington ♣ DeepMind ♦ Allen Institute for Artificial Intelligenc...

Title: None

Score: 0.8415860748933458

User feedback: None

Out links: 3010548 Raw text: 3010548

https://homepages.inf.ed.ac.uk/csutton/publications/nota-ir403.pdf

Fast, Piecewise Training for Discriminative Finite-state and Parsing Models Charles Sutton and Andrew McCallum Department of Computer Science University of Massachusetts Amherst Amherst, MA 01003 USA {casutton,mccallum}@cs.umass.edu Abstract Discriminitive models for sequences and trees—such as lin...

Title: None

Score: 0.8405262601175294

User feedback: None

Out links: 2832902 Raw text: 2832902

https://homepages.inf.ed.ac.uk/imurray2/pub/15rnnopt/rnnopt.pdf

On the Efficiency of Recurrent Neural Network Optimization Algorithms Ben Krause, Liang Lu, Iain Murray, Steve Renals University of Edinburgh Department of Informatics [email protected], [email protected], [email protected], [email protected] Abstract This study compares the sequential an...

Title: None

Score: 0.8367095466920836

User feedback: None

Out links: 352345 Raw text: 352345

http://proceedings.mlr.press/v80/lucas18a/lucas18a.pdf

Mixed batches and symmetric discriminators for GAN training Thomas Lucas* 1 Corentin Tallec* 2 Jakob Verbeek 1 Yann Ollivier 3 Abstract vincing source of samples of natural images (Karras et al., 2018). GANs consist of a generator and a discriminator network. The generator maps samples from a lat...

Title: Sub-Linear Memory: How to Make Performers SLiM

Score: 0.8366056065500114

User feedback: None

Out links: 205074 Raw text: 205074

https://gwern.net/doc/www/arxiv.org/f63a0b34378396bff253d974efc8664d5620489c.pdf

Sub-Linear Memory: How to Make Performers SLiM Valerii Likhosherstov 1 Krzysztof Choromanski 2 3 Jared Davis 4 5 Xingyou Song 2 Adrian Weller 1 6 arXiv:2012.11346v1 [cs.LG] 21 Dec 2020 Abstract The Transformer architecture has revolutionized deep learning on sequential data, becoming ubiquitous i...