I serve as a Chief Scientific Officer at Molecule.one . I am also an Assistant Professor at Jagiellonian University (member of GMUM.net). Prior to that, I was a post-doc with Kyunghyun Cho and Krzysztof Geras at New York University. I do my best to contribute to the broad machine learning community. Currently, I serve as an area chair for NeurIPS 2021 and ICLR 2021 (before that NeurIPS 2020, ICML 2020 & 2021, ICLR 2020), and I am coordinating the speaker team for Virtual MLinPL 2021. I received my PhD from Jagiellonian University co-supervised by Jacek Tabor and Amos Storkey (University of Edinburgh). During PhD, I spent two summers as a visiting researcher with Yoshua Bengio, and collaborated with Google Research in Zurich.

My email is staszek.jastrzebski (on gmail). My main research goal is to develop deep learning for autonomous scientific discovery, mainly in the context of drug discovery.


  • (6.2021) I was awarded the START stipend for our work on understanding optimization in deep learning. The stipend is given to the top 100 young scientists (selected from around 1000) in Poland.
  • (6.2021) Molecule.one has closed 4.6$M seed round, read news on TechCrunch!
  • (5.2021) Our Catastrophic Fisher Explosion has been accepted to ICML!
  • (5.2021) Huggingmolecules package including ours Molecule Attention Transformer is online.
  • (4.2021) I have been inclued in the top ten nominees for Ambassadors of Polish Innovation Award.
  • (11.2020) Started as Chief Scientific Officer at Molecule.one and Assistant Professor at Jagiellonian University
  • (3.2020) Our paper on multimodal learning for breast cancer screening has been accepted to MIDL 2020 as a Spotlight!
  • (12.2019) Our paper on the early phase of the optimization trajectory has been accepted to ICLR 2020 as a Spotlight!
  • (09.2019) Large Scale Structure of Neural Networks' Loss Landscape has been accepted to NeurIPS 2019!
  • (06.2019) Received top 5% reviewer award for ICML 2019 :)
  • (03.2019) Our paper on parameter efficient training of BERT was accepted to ICML 2019!
  • (01.2019) Papers accepted to ICLR 2019, and AISTATS 2019! Also, our preprint on Neural Architecture Search is online.

Current students

  • [PhD] Łukasz Maziarka (UJ), co-advised with Jacek Tabor
  • [PhD] Maciej Szymczak (UJ), co-advised with Jacek Tabor
  • [PhD] Jakub Chłędowski (UJ), co-advised with Krzysztof Geras
  • [MSc] Przemysław Kaleta (PW) - Speeding-up retrosynthesis, co-advised with Piotr Miłoś
  • [MSc] Aleksandra Talar (UJ) - Out-of-distribution generalization in molecule property prediction
  • [MSc] Piotr Gaiński (UJ) - Low data prediction in molecule property prediction

Previous students

  • [MSc] Sławomir Mucha (UJ) - Pretraining in deep learning in cheminformatics
  • [MSc] Tobiasz Ciepliński (UJ) - Evaluating generative models in chemistry using docking simulators
  • [BSc] Michał Zmysłowski (UW) - Is noisy quadratic model of training of deep neural networks realistic enough?
  • [MSc] Olivier Astrand (NYU) - Memorization in deep learning
  • [MSc] Tomasz Wesołowski (UJ) - Relevance of enriching word embeddings in modern deep natural language processing
  • [MSc] Andrii Krutsylo (UJ) - Physics aware representation for drug discovery
  • [BSc] Michał Soboszek (UJ) - Evaluating word embeddings
  • [MSc] Jakub Chłędowski (UJ) - Representation learning for textual entailment
  • [MSc] Mikołaj Sacha (UJ) - Meta learning and sharpness of the minima

Selected Publications

For a full list please see my Google Scholar profile.

Logo image

The Break-Even Point on the Optimization Trajectories of Deep Neural Networks

S. Jastrzębski, M. Szymczak, S. Fort, D. Arpit, J. Tabor, K. Cho*, K. Geras*

International Conference on Learning Representations 2020 (Spotlight)
paper talk

Logo image

Large Scale Structure of Neural Network Loss Landscapes

S. Fort, S. Jastrzębski

Neural Information Processing Systems 2019
paper poster

Logo image

Molecule Attention Transformer

L. Maziarka, T. Danel, S. Mucha, K. Rataj, J. Tabor, S. Jastrzebski

Preprint (accepted at Graph Representation Learning workshop at NeurIPS 2019)

Logo image

On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length

S. Jastrzębski, Z. Kenton, N. Ballas, A. Fischer, Y. Bengio, A. Storkey

International Conference on Learning Representations 2019
code paper poster

Logo image

Three Factors Influencing Minima in SGD

S. Jastrzębski*, Z. Kenton*, D. Arpit, N. Ballas, A. Fischer, Y. Bengio, A. Storkey

International Conference on Artificial Neural Networks 2018 (oral), International Conference on Learning Representations 2018 (workshop)

Logo image

Residual Connections Encourage Iterative Inference

S. Jastrzębski*, D. Arpit*, N. Ballas, V. Verma, T. Che, Y. Bengio

International Conference on Learning Representations 2018
paper poster

Logo image

A Closer Look at Memorization in Deep Networks

D. Arpit*, S. Jastrzębski*, N. Ballas*, D. Krueger*, T. Maharaj, E. Bengio, A. Fischer, A. Courville, S. Lacoste-Julien, Y. Bengio

International Conference on Machine Learning 2017
paper poster slides