Movement matching fashions have emerged as a robust methodology for generative modeling on domains like photos or movies, and even on irregular or unstructured knowledge like 3D level clouds and even protein constructions. These fashions are generally educated in two phases: first, an information compressor is educated, and in a subsequent coaching stage a circulation matching generative mannequin is educated within the latent area of the info compressor. This two-stage paradigm units obstacles for unifying fashions throughout knowledge domains, as hand-crafted compressors architectures are used for various knowledge modalities. To this finish, we introduce INRFlow, a domain-agnostic method to be taught circulation matching transformers instantly in ambient area. Drawing inspiration from INRs, we introduce a conditionally impartial point-wise coaching goal that permits INRFlow to make predictions constantly in coordinate area. Our empirical outcomes exhibit that INRFlow successfully handles completely different knowledge modalities equivalent to photos, 3D level clouds and protein construction knowledge, reaching robust efficiency in numerous domains and outperforming comparable approaches. INRFlow is a promising step in direction of domain-agnostic circulation matching generative fashions that may be trivially adopted in numerous knowledge domains.
- † Work achieved whereas at Apple
Determine 1: (a) Excessive stage overview of INRFlow utilizing the picture area for example. Our mannequin may be interpreted as an encoder-decoder mannequin the place the decoder makes predictions independently for every coordinate-value pair given zft. For various knowledge domains, the coordinate and worth dimensionality adjustments, however the mannequin is saved the identical. (b) Samples generated by INRFlow educated on ImageNet 256×256. (c) Picture-to-3D level clouds generated by coaching INRFlow on Objaverse (Deitke et al., 2023). (d) Protein constructions generated by INRFlow educated on SwissProt (Boeckmann et al., 2003). GT protein constructions are depicted in inexperienced whereas the generated constructions by INRFlow are proven in orange.