I am a post-doc with Kyunghyun Cho and Krzysztof Geras at New York University. I am also an advisor at Molecule.one (to start as a Chief Scientist in September). I do my best to contribute to the broad machine learning community. Currently, I serve as an area chair for NeurIPS 2020 and ICLR 2020. I received my PhD from Jagiellonian University co-supervised by Jacek Tabor and Amos Storkey (University of Edinburgh). During PhD, I spent two summers as a visiting researcher with Yoshua Bengio, and collaborated with Google Research in Zurich.

My email is staszek.jastrzebski (on gmail).


  • (3.2020) Our paper on multimodal learning for breast cancer screening has been accepted to MIDL 2020 as a Spotlight!
  • (12.2019) Our paper on the early phase of the optimization trajectory has been accepted to ICLR 2020 as a Spotlight!
  • (09.2019) Large Scale Structure of Neural Networks' Loss Landscape has been accepted to NeurIPS 2019!
  • (06.2019) Received top 5% reviewer award for ICML 2019 :)
  • (03.2019) Our paper on parameter efficient training of BERT was accepted to ICML 2019!
  • (01.2019) Papers accepted to ICLR 2019, and AISTATS 2019! Also, our preprint on Neural Architecture Search is online.

My main research goal is to develop optimization techniques for the deep learning and AI of tomorrow, with applications to automatic scientific discovery and drug discovery in particular.

Current students

  • [MSc] Sławomir Mucha (UJ) - Pretraining in deep learning in cheminformatics
  • [MSc] Tobiasz Ciepliński (UJ) - Evaluating generative models in chemistry using docking simulators
  • [BSc] Michał Zmysłowski (UW) - Is noisy quadratic model of training of deep neural networks realistic enough?

Previous students

  • [MSc] Olivier Astrand (NYU) - Memorization in deep learning
  • [MSc] Tomasz Wesołowski (UJ) - Relevance of enriching word embeddings in modern deep natural language processing
  • [MSc] Andrii Krutsylo (UJ) - Physics aware representation for drug discovery
  • [BSc] Michał Soboszek (UJ) - Evaluating word embeddings
  • [MSc] Jakub Chłędowski (UJ) - Representation learning for textual entailment
  • [MSc] Mikołaj Sacha (UJ) - Meta learning and sharpness of the minima

Selected Publications

For a full list please see my Google Scholar profile.

Logo image

The Break-Even Point on the Optimization Trajectories of Deep Neural Networks

S. Jastrzębski, M. Szymczak, S. Fort, D. Arpit, J. Tabor, K. Cho*, K. Geras*

International Conference on Learning Representations 2020 (Spotlight)
paper talk

Logo image

Large Scale Structure of Neural Network Loss Landscapes

S. Fort, S. Jastrzębski

Neural Information Processing Systems 2019
paper poster

Logo image

Molecule Attention Transformer

L. Maziarka, T. Danel, S. Mucha, K. Rataj, J. Tabor, S. Jastrzebski

Preprint (accepted at Graph Representation Learning workshop at NeurIPS 2019)

Logo image

On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length

S. Jastrzębski, Z. Kenton, N. Ballas, A. Fischer, Y. Bengio, A. Storkey

International Conference on Learning Representations 2019
code paper poster

Logo image

Three Factors Influencing Minima in SGD

S. Jastrzębski*, Z. Kenton*, D. Arpit, N. Ballas, A. Fischer, Y. Bengio, A. Storkey

International Conference on Artificial Neural Networks 2018 (oral), International Conference on Learning Representations 2018 (workshop)

Logo image

Residual Connections Encourage Iterative Inference

S. Jastrzębski*, D. Arpit*, N. Ballas, V. Verma, T. Che, Y. Bengio

International Conference on Learning Representations 2018
paper poster

Logo image

A Closer Look at Memorization in Deep Networks

D. Arpit*, S. Jastrzębski*, N. Ballas*, D. Krueger*, T. Maharaj, E. Bengio, A. Fischer, A. Courville, S. Lacoste-Julien, Y. Bengio

International Conference on Machine Learning 2017
paper poster slides