From picking to making: the recommender that creates your feed
For decades feeds answered one question — of everything that exists, what should we show you? This vision argues the next question is what should we make for you.
Every recommender system you have ever used does one thing: it sorts a fixed shelf. Netflix lays out a row of shows, a short-video app picks the next clip, a shop suggests products — and in every case the machine is reaching into a catalogue of things other people already made, and ranking them for you. This paper asks what happens when the machine stops sorting and starts making.
The shelf, and why it bounds you
The traditional design is what the authors call the retrieval-based paradigm, and the key word is retrieval. Picture a giant library of existing items — every film, every song, every video that humans have already made and uploaded. That library is the item corpus. The recommender's whole job is to reach in and pull out the handful of items it predicts you, specifically, will most enjoy.
To make that prediction it learns from your behaviour: what you clicked, what you watched to the end, what you skipped, when and how long you engaged. The classic trick is to notice patterns across many users at once — if thousands of people who behaved like you went on to love a certain film, the system bets that you will too. That is the intuition behind collaborative filtering: your future is predicted from the crowd whose past resembles yours. It is powerful and it scales beautifully. But notice what it fundamentally does. It matches you to existing items based on patterns of clicks. The system never asks what you want in words, and it never makes anything. Everything clever happens in the ranking; nothing happens in the creating.
The authors point to two cracks in this success. First, the perfect item for you might simply not exist in the corpus. Suppose you love a song but would really like to hear it performed in the style of a different singer you admire. No such recording exists, so a retrieval-based system is helpless — it can only offer the nearest existing thing. The catalogue bounds what you can be offered.
Second, the way you communicate with the system is desperately thin. Mostly you do not communicate at all, at least not directly. You express preferences through passive feedback — clicks, watch time, the occasional thumbs-up. You cannot easily tell the system, in words, "I want something calmer today," or "less of this topic, more of that one." You just click, and hope it infers. The authors call this passive and inefficient, and once they say it, it is hard to unsee: we have trained ourselves to steer these systems by twitching, when we could just talk.
A second loop, between you and a generator
Now bring in the development that makes both cracks worth widening: generative AI. Systems have become startlingly good at producing content — images, text, audio, video, gathered under the umbrella term AIGC, for AI-generated content — and large language models gave us machines that genuinely understand and produce natural language. Put those together and a possibility opens up. If the perfect item does not exist, maybe the system can generate it. And instead of inferring your needs from clicks, the system can let you say them, in a back-and-forth conversation.
That is the leap. The authors propose a paradigm they call GeneRec, short for generative recommender, and the elegant part is that it does not throw the old system away. In the traditional loop, humans upload items to the corpus and the system ranks them. GeneRec adds a second loop, between you and an AI generator: you give instructions and feedback, it works out what you want and produces personalised content, and that content is either shown to you directly or dropped into the corpus to be ranked alongside the human-made items. The catalogue stops being fixed. It can grow, on demand, shaped to you.
Three modules: instructor, editor, creator
The authors break GeneRec into three parts, and the division is intuitive.
- The instructor listens and interprets. Its input is your instructions — text, or even multimodal, mixing speech, images and video — plus ordinary feedback like clicks. Its job is twofold: decide whether to fire up the generator at all (an explicit request, or a run of rejected options, is the signal), and translate your messy instructions into clean guidance the generator can act on. It is the bridge between human intent and machine creation.
- The AI editor repurposes. It takes an item that already exists and edits it to suit you — cutting a long video into themed clips, or restyling a piece of content to match your taste. It creates nothing from nothing; it adapts what is there. That is valuable both for users, who get a personalised version, and for original creators, who can spin out many tailored variants of their work.
- The AI creator is the bold one: making genuinely new items from scratch. Given your guidance, plus relevant facts pulled from the web, it produces a brand-new item for your need. Return to the music example: the creator might note you love a particular composer, take your instruction that you want it sung by a specific other artist, learn that artist's style from material on the web, and generate a music video that never existed before — made for an audience of one.
So the full system can do three things rather than one: retrieve, repurpose, or create.
For decades recommenders answered: of all the things that exist, which should we show you? This argues the next question is what should we make for you.
Trust is not the footnote — it is the design
The moment you let an AI invent the content people consume, you inherit a serious problem: trust. To their credit, the authors spend real time here rather than hand-waving, insisting on what they call fidelity checks. First, bias and fairness — the generator learns from data that may be skewed, so its output must not entrench stereotypes. Second, privacy — an item generated from one person's data must not leak that person's private information to others. Third, safety — no content that could harm someone, and protection against people gaming the system. Fourth, authenticity — in domains like news, the claims in generated content must be verifiably true, or you have built a misinformation machine. Fifth, legal compliance — including the thorny question of copyright and who owns AI-generated work. And sixth, identifiability — watermarking AI-generated items so you can always tell what a machine made versus a human.
To measure whether any of this works, they propose evaluating two sides. Item-side evaluation looks at the artifact: is it high quality, is it relevant to what you asked, does it pass the fidelity checks? User-side evaluation looks at you: are you satisfied, judged by explicit feedback like a rating or a spoken "I love this," or by the same implicit signals we already use, like whether you watched it to the end.
The paper then zooms out to a roadmap, watching three things evolve together: how we interact (passive clicks moving toward rich multimodal conversation, the authors literally invoking a Jarvis-like assistant); how content is made (expert-generated, then user-generated, now AI-generated); and the underlying algorithm. Here a quick distinction: a discriminative model tells things apart or scores them — exactly what traditional rankers do — while a generative model produces new things. For most of recommendation's history the discriminative side did all the work. The authors argue generative models are now being pulled into the centre, and speculate that one day a single large language model might unify retrieval, repurposing and creation in one system. In specific domains the examples get tangible: personalised daily news articles with heavy fidelity checks; custom fashion designs whose most popular results could be sent to factories for real production; personalised music tracks wrestling with copyright; and short video, the hardest case, where one clip combines video, subtitles, cover image and sound. To show it is not pure speculation, the authors built a demonstration of the editor and creator on micro-video generation, and report early but promising results.
The honest caveats
This is a vision and position paper from a few years ago, not a finished, deployed system with benchmark-beating numbers. Its contribution is the framing — naming the limits of retrieval, proposing the generative paradigm, decomposing it into instructor, editor and creator, and insisting trust is built in from the start. The open problems are real and still largely unsolved: teaching the instructor to reliably understand intent, deciding when to generate versus retrieve, extracting your true preferences from noisy clicks, and building those domain-specific fidelity checks.
Read now, with more years of hindsight, the unease writes itself. A feed that generates bespoke content tuned to keep you engaged is a more powerful thing than a feed that merely picks from what exists — for delight and for manipulation alike. The authors flag tensions they cannot fully resolve: ownership, if an AI creates a song in the style of a real artist learned from that artist's work; authenticity at scale, when no two people see the same content and some was conjured on the spot; and the quiet risk of homogenisation, a system that optimises hard for engagement converging on a narrow, comfortable loop. None of these mean the idea is wrong. They are the questions that decide whether it is built responsibly — which is precisely why the authors put trust at the centre.
Why it matters
For decades recommender systems answered one question: of all the things that exist, which should we show you? This paper argues the next question is different — what should we make for you? That move, from retrieving to generating, is small to state and enormous in consequence. It promises feeds tailored not just in selection but in substance, and it places a heavy bet that we can keep such systems honest, fair and safe while we hand them the power to create what we see.