Modern particle physics experiments are among the most computationally demanding scientific efforts, generating vast amounts of data that must be processed in real time to capture rare and interesting events. One of the hardest challenges is reconstructing the paths of charged particles (known as "tracking") as they move through detectors. This step is so complex that it can limit how much high-quality data experiments like the CMS experiment at the Large Hadron Collider can record in the first place.

Tracking in this setting is unlike typical trajectory problems. Particle collisions happen millions of times per second, each producing thousands of new particles. These particles move so fast that we can't measure when they pass through the detector, and instead of a smooth trail, we only get 5–15 individual "hits" in different detector layers. The task is like solving an extremely difficult 3D connect-the-dots puzzle—starting from a cloud of scattered points, we must infer around a thousand particle trajectories for each collision.

Traditional algorithms struggle as the number of particles and collisions increases, but recent advances in machine learning offer a promising alternative. In this project, we explore the use of Graph Neural Networks to tackle this problem more efficiently.

The reconstruction of charged particle trajectories ("tracking") in particle physics detectors is one of the computationally most challenging tasks of the field, limiting the amount of high-quality data that can even be recorded. Applied to particle collider experiments such as the CMS experiment, this task is different from many other problems that involve trajectories: There are millions of particle collisions per second, each with thousands of individual particles that need to be tracked, there is no time information (the particles travel too fast), and we do not observe a continuous trajectory but instead only 5-15 points ("hits") along the way in different detector layers. The task can be described as a combinatorically very challenging "connect-the-dots" problem, essentially turning a cloud of points (hits) in 3D space into a set of $\mathcal O(1000)$ trajectories.

Unlike traditional tracking algorithms built around Kalman filters, this project uses graph neural networks for significant speed increases. A conceptually simple way to turn tracking into a machine learning task is to create a fully connected graph of all points and then train an edge classifier to reject any edge that doesn't connect points that belong to the same particle. In this way, only the individual trajectories remain as components of the initial fully connected graph. In this project, we instead explore the idea of object condensation or learned clustering, where a network maps all hits to a latent space, learning to place hits from the same track close to each other, such that simple operations can recover the hits belonging to the same tracks.

Charged particle tracking as an embedding task

Charged particle tracking as an embedding task: The left side shows a tSNE embedding of all hit features, with hits belonging to some (randomly selected) particles colored. Our embedding maps hits belonging to the same particle in the same place (right picture), such that tracks can be recovered by a simple clustering operation.