You're reading this in Sorkinese™
want the ELI5 version?
You're reading this in Sorkinese™
want the ELI5 version?
We’re building a model that moves beyond surface-level recognition of what something is, toward modeling the underlying patterns that explain why people prefer what they prefer.
Quantifying taste requires ingesting and continuously updating large-scale, heterogeneous data. Our infrastructure monitors products, brands, restaurants and cultural signals as a living system, not a static snapshot.
The backbone is a transformer-based multimodal model built on Qwen-Omni, fine-tuned so attention layers respect the language that people use to describe products, brands, and experiences.
We construct a geometry of taste by operating over the full sequence of hidden states, projecting them into structured representations that capture style, brand identity, product form, materiality, and function.
Personalization is achieved with lightweight, per-user models operating in the 128-D space, learning attraction and aversion without retraining or fragmenting the backbone. The omnimodal LM is the foundation of this all.
We extend the same backbone into grounded language via VQA-style supervision, unifying representation, personalization, and explanation in a single system.
As a result, Taste connects products, restaurants, brands, and people (etc.), and this system learns those relationships within a single unified representation.