Welcome Dwarkesh fans,

Dylan Patel and Asianometry

This episode was particularly relevant to our work. Choosing hardware and using it efficiently is an important part of any machine learning effort. Compared to typical LLM training runs, we have larger datasets with a different structure, and in some cases much stricter latency requirements on the inference side. As a result, we end up with a mix of CPUs, GPUs, and FPGAs, and are exploring yet more esoteric technologies.

To support these efforts, we’re hiring experts in CUDA, FPGA programming, and datacenter engineering. While people move between parts of the stack and collaborate across these boundaries, we do have separate job postings depending on whether you’re most interested in FPGAs, CUDA and performance optimization, or mostly the Python layer. Don’t stress too much about where to apply, we’ll get you on the right track once we get to know you. People here often work on multiple parts of the stack, in a fairly organic way.

We'll go into more detail about the parts of our work that are most related to this episode in the next edition of Signals & Threads with our own Sylvain Gugger, co-author of "Deep Learning for Coders with FastAI and PyTorch", and the HuggingFace Accelerate library, among other things. Check back in a couple weeks for that.

Listen and subscribe:

Jane Street’s Kaggle, launching in October

Financial markets are deeply complex and constantly evolving, offering a unique opportunity to explore the intricate dynamics that shape trading decisions. Kagglers will build a model using real-world data derived from production systems, which offers a glimpse into the daily challenges of successful trading. This competition highlights the difficulties in modeling financial markets, including fat-tailed distributions, non-stationary time series, and sudden shifts in market behavior.

So, what could you do here?

ML Engineers help drive the direction of an ML platform that is used daily by traders and researchers. The work is wide-ranging, including things like developing libraries for automating ML workflows and experiment evaluation, digging into the internals of open‑source ML tools, and optimizing our systems to match the needs of our trading systems.

ML Performance Engineers optimize the performance of our models. This work focuses on efficient large-scale training, low-latency inference in real-time systems and high-throughput inference in research. Engineers take a whole-systems approach, including storage systems, networking and host- and GPU-level considerations.

ML Researchers are responsible for building models to price securities and execute trades in live trading systems. A mix of trading and software engineering roles, this work involves analyzing large datasets, building and testing models, creating new trading strategies, and writing the code that implements them.

ML Interns are paired with full-time mentors, collaborating on real-world projects and learning how Jane Street applies advanced machine learning and statistical techniques to model and predict moves in financial markets. Through a series of classes and activities, they analyze real trading data via access to our growing GPU cluster containing thousands of A/H100s. Over the course of the program, interns will gain an understanding of the differences between textbook machine learning and its application to noisy financial data.