Reading list
Links 25
Score: 0.8699225284191421
User feedback: None
Out links: 3268773 Raw text: 3268773https://cs.stanford.edu/~diyiy/docs/acl21_hiddencut.pdf
HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization Jiaao Chen, Dinghan Shen1 , Weizhu Chen1 , Diyi Yang Georgia Institute of Technology, 1 Microsoft Dynamics 365 AI {jchen896,dyang888}@gatech.edu {dishen,wzchen}@microsoft.com Abstract Fine-tuning large...
Score: 0.8556674907722507
User feedback: None
Out links: 1271410 Raw text: 1271410Few-shot Classification of Disaster-related Tweets Stanford CS224N Custom Project Jubayer Ibn Hamid Department of Computer Science Stanford University [email protected] Jitendra Nath Pandey Department of Computer Science Stanford University [email protected] Sheikh Rifayat Daiyan Srijon De...
Score: 0.8426350711444989
User feedback: None
Out links: 3885218 Raw text: 3885218https://www.mit.edu/~gfarina/2023/escher_iclr23/2206.04122.pdf
ESCHER: E SCHEWING I MPORTANCE S AMPLING IN G AMES BY C OMPUTING A H ISTORY VALUE F UNCTION TO E STIMATE R EGRET arXiv:2206.04122v2 [cs.GT] 11 Oct 2022 Stephen McAleer Carnegie Mellon University [email protected] Gabriele Farina Carnegie Mellon University [email protected] Marc Lanctot DeepMi...
Score: 0.8270818663345989
User feedback: None
Out links: 2124217 Raw text: 2124217http://www.cs.toronto.edu/~hinton/absps/googlerectified.pdf
ON RECTIFIED LINEAR UNITS FOR SPEECH PROCESSING M.D. Zeiler1∗ , M. Ranzato2 , R. Monga2 , M. Mao2 , K. Yang2 , Q.V. Le2 , P. Nguyen2 , A. Senior2 , V. Vanhoucke2 , J. Dean2 , G.E. Hinton3 1 New York University, USA 2 Google Inc., USA ABSTRACT Deep neural networks have recently become the gold st...
Score: 0.8073852083678074
User feedback: None
Out links: 2124224 Raw text: 2124224http://www.cs.toronto.edu/~hinton/absps/uai_crbms.pdf
Conditional Restricted Boltzmann Machines for Structured Output Prediction Volodymyr Mnih Department of Computer Science University of Toronto Toronto, Canada Hugo Larochelle ∗ Département d’informatique Université de Sherbrooke Sherbrooke, Canada Abstract Conditional Restricted Boltzmann Machi...
Score: 0.8040562140858766
User feedback: None
Out links: 1271380 Raw text: 1271380Contrastive Learning for Sentence Embeddings in BERT and its Smaller Variants Stanford CS224N Custom Project Vrishab Krishna Department of Computer Science Stanford University [email protected] Rohan Bansal Department of Computer Science Stanford University [email protected] Abstract Contr...
Score: 0.799613010172785
User feedback: None
Out links: 3885237 Raw text: 3885237https://www.mit.edu/~gfarina/2022/human_like_pikl_icml22/human_like_pikl.icml22.pdf
Modeling Strong and Human-Like Gameplay with KL-Regularized Search Athul Paul Jacob * 1 2 David J. Wu * 1 Gabriele Farina * 3 Adam Lerer 1 Hengyuan Hu 1 Anton Bakhtin 1 Jacob Andreas 2 Noam Brown 1 arXiv:2112.07544v2 [cs.MA] 17 Feb 2022 Abstract We consider the task of building strong but humanli...
Score: 0.7986178776498492
User feedback: None
Out links: 3268749 Raw text: 3268749https://cs.stanford.edu/~diyiy/docs/naacl_treemix.pdf
TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding Le Zhang Fudan University [email protected] Zichao Yang CMU [email protected] Diyi Yang Georgia Tech [email protected] Abstract Data augmentation is an effective approach to tackle over-fittin...
Score: 0.7974628078721803
User feedback: None
Out links: 3885228 Raw text: 3885228https://www.mit.edu/~gfarina/2024/dtp_iclr24/dtp_iclr24.pdf
Published as a conference paper at ICLR 2024 T HE U PDATE -E QUIVALENCE D ECISION -T IME P LANNING F RAMEWORK FOR Samuel Sokota†1 Gabriele Farina†2 David J. Wu† Hengyuan Hu3 Kevin A. Wang†4 J. Zico Kolter1,5 Noam Brown†6 † Work done at Meta AI 1 Carnegie Mellon University 2 Massachusetts Institu...
Score: 0.7902377174914134
User feedback: None
Out links: 2124194 Raw text: 2124194http://www.cs.toronto.edu/~hinton/absps/dropout.pdf
Improving neural networks by preventing co-adaptation of feature detectors G. E. Hinton∗ , N. Srivastava, A. Krizhevsky, I. Sutskever and R. R. Salakhutdinov Department of Computer Science, University of Toronto, 6 King’s College Rd, Toronto, Ontario M5S 3G4, Canada ∗ To whom correspondence should ...
Score: 0.7881641899084061
User feedback: None
Out links: 655841 Raw text: 655841https://www.usenix.org/system/files/atc21-ren-jie.pdf
ZeRO-Offload: Democratizing Billion-Scale Model Training Jie Ren, UC Merced; Samyam Rajbhandari, Reza Yazdani Aminabadi, and Olatunji Ruwase, Microsoft; Shuangyan Yang, UC Merced; Minjia Zhang, Microsoft; Dong Li, UC Merced; Yuxiong He, Microsoft https://www.usenix.org/conference/atc21/presentation/...
Score: 0.7812106818675081
User feedback: None
Out links: 1271320 Raw text: 1271320Looking Outside the Context Window: In-Context Learning with Up to Hundreds of Examples Stanford CS224N Custom Project Linden Li Department of Computer Science Stanford University [email protected] Varun Shenoy Department of Electrical Engineering Stanford University [email protected] Abs...
Score: 0.7811602910197608
User feedback: None
Out links: 3268784 Raw text: 3268784https://cs.stanford.edu/~diyiy/docs/naacl21_cl.pdf
Continual Learning for Text Classification with Information Disentanglement Based Regularization Yufan Huang∗, Yanzhe Zhang∗ , Jiaao Chen, Xuezhi Wang1 , Diyi Yang Georgia Institute of Technology, 1 Google {yhuang704, jiaaochen, dyang888}@gatech.edu, 1 [email protected] Abstract Continual learning...
Score: 0.780504441497556
User feedback: None
Out links: 1271406 Raw text: 1271406Finetuning minBERT Model for Multiple Downstream Tasks Stanford CS224N Default Project Yuan Wang Department of Computer Science Stanford University [email protected] Abstract Pre-trained Large Language Models, such as BERT and GPT, contain rich token embeddings that are useful for various downs...
Score: 0.7784315761978045
User feedback: None
Out links: 2124221 Raw text: 2124221http://www.cs.toronto.edu/~hinton/absps/multiframe.pdf
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly1 , Vincent Vanhoucke2 , Geoffrey Hinton1,2 1 University of Toronto 2 Google Inc. [email protected], [email protected] Abstract We describe a simple but effective way of using multi...
Score: 0.7775682348559958
User feedback: None
Out links: 353141 Raw text: 353141http://proceedings.mlr.press/v80/chen18l/chen18l.pdf
D RACO: Byzantine-resilient Distributed Training via Redundant Gradients Lingjiao Chen 1 Hongyi Wang 1 Zachary Charles 1 Dimitris Papailiopoulos 1 Abstract Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to...
Score: 0.7759493070114146
User feedback: None
Out links: 2124151 Raw text: 2124151http://www.cs.toronto.edu/~hinton/absps/tics.pdf
Review TRENDS in Cognitive Sciences Vol.11 No.10 Learning multiple layers of representation Geoffrey E. Hinton Department of Computer Science, University of Toronto, 10 King’s College Road, Toronto, M5S 3G4, Canada To achieve its impressive performance in tasks such as speech perception or objec...
Score: 0.7759423578852973
User feedback: None
Out links: 1271352 Raw text: 1271352Multi-task Learning with BERT in NLP Stanford CS224N Default Project Fan Wang Department of Computer Science Stanford University [email protected] Abstract In natural language processing, while deep learning techniques have achieved remarkable success in many different problems, these models ar...
Score: 0.7701090216789104
User feedback: None
Out links: 1271438 Raw text: 1271438Does Learning Syntax Help Models Learn Language? Stanford CS224N Custom Project Lian Wang Department of Computer Science Stanford University [email protected] Abstract Papadimitriou and Jurafsky (2020) showed that LSTMs trained on nonlinguistic structural data performed significantly better th...
Score: 0.769124031716929
User feedback: None
Out links: 2124169 Raw text: 2124169http://www.cs.toronto.edu/~hinton/absps/Outrageously.pdf
Published as a conference paper at ICLR 2017 O UTRAGEOUSLY L ARGE N EURAL N ETWORKS : T HE S PARSELY-G ATED M IXTURE - OF -E XPERTS L AYER Noam Shazeer1 , Azalia Mirhoseini∗†1 , Krzysztof Maziarz∗2 , Andy Davis1 , Quoc Le1 , Geoffrey Hinton1 and Jeff Dean1 1 Google Brain, {noam,azalia,andydavis,qv...
Score: 0.7672189088172875
User feedback: None
Out links: 2124228 Raw text: 2124228http://www.cs.toronto.edu/~hinton/absps/DNN-2012-proof.pdf
Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, [Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury] IE E Pr E oo f Deep Neural Networks for Acoustic Modeling in Speech Recognition [Four research groups share their views] <AU:...
Score: 0.766606258594767
User feedback: None
Out links: 1271394 Raw text: 1271394Exploring Multi-Task Learning for Robust Language Encoding with BERT Stanford CS224N Default Project Alejandro Lozano Department of Biomedical Data Science Stanford University [email protected] Laura Bravo Department of Biomedical Data Science Stanford University [email protected] Abstract ...
Score: 0.7652138330119215
User feedback: None
Out links: 2124186 Raw text: 2124186http://www.cs.toronto.edu/~hinton/absps/mcimage.pdf
Generating more realistic images using gated MRF’s Marc’Aurelio Ranzato Volodymyr Mnih Geoffrey E. Hinton Department of Computer Science University of Toronto {ranzato,vmnih,hinton}@cs.toronto.edu Abstract Probabilistic models of natural images are usually evaluated by measuring performance on rat...
Score: 0.7630162712282244
User feedback: None
Out links: 1271416 Raw text: 1271416SerBERTus: A SMART Three-Headed BERT Ensemble Stanford CS224N Default Project Matthew Hayes Department of Computer Science Stanford University [email protected] Mentor: Gabriel Poesia No External Collaborators No shared project Abstract We examine different architectures, learning methods, an...
Score: 0.7604815735654534
User feedback: None
Out links: 353143 Raw text: 353143http://proceedings.mlr.press/v80/hartford18a/hartford18a-supp.pdf
Deep Models of Interactions Across Sets Jason Hartford * 1 Devon R Graham * 1 Kevin Leyton-Brown 1 Siamak Ravanbakhsh 1 Abstract We use deep learning to model interactions across two or more sets of objects, such as user–movie ratings, protein–drug bindings, or ternary useritem-tag interactions. T...