Building an AI Astrologer: Personas, Rolling Summaries, and Tone-Matched Replies

An LLM that does astrology readings sounds like a gimmick. Building one that feels like a different astrologer for every user is a real engineering problem.

LLMPersonalizationProduct

A user opens our app at 11pm. Their boss just gave them a weird performance review. They want to know if Saturn is messing with them.

That's the problem the AI Astrologer is trying to solve. It sounds frivolous. It's not — it's one of the most demanding LLM applications I've worked on, because the bar is "feels like a real, knowledgeable person talking to me at 11pm."

And here's the thing: that "real person" can't be the same for every user. A 22-year-old asking about a job offer needs a different tone, different references, different kind of advice than a 45-year-old asking about a marriage decision.

Where It Lives in the Stack

A note on the architecture before we get into the substance: Mynaksh's backend is mostly Node.js / Express microservices. The AI Astrologer is the exception — it lives in a Python service that the Node services call into when they need LLM-shaped work. That separation is deliberate. The Python service owns prompts, retrieval, model routing, and eval orchestration; the Node services own everything else.

The rest of this post is about the Python side.

The Easy Version Doesn't Work

If you build the obvious thing — a system prompt that says "you are an astrologer, give a reading based on this birth chart" — you get something that's fine. It says vaguely true astrology-shaped things. It is forgettable.

We built that version. Users finished one reading and didn't come back.

What Made Users Come Back

Three things, roughly in order of impact:

User personas. Different reply styles, vocabulary, cultural references for different users. Not just "tone" — the LLM gets a structured persona that shapes how it thinks about each user.
Rolling summaries. The AI remembers previous sessions. Not by feeding the entire history into the context window — by maintaining a compact rolling summary that captures what matters.
Reply-format-per-user. Some users want bullet points. Some want flowing paragraphs. Some want a single short verdict and a question back. We learned each user's preferred shape and matched it.

The rest of this post walks through how we built each.

User Personas

A persona, in our system, is a structured object derived from what we know about the user — birth details, history of asks, demographic signals, language preferences, prior reactions to readings.

The schema is roughly:

{
  user_id: string;
  preferred_communication_style: 'formal' | 'casual' | 'elder' | 'peer';
  likely_cultural_context: { region: string; tradition: string };
  preferred_language: string;
  reading_depth_preference: 'brief' | 'standard' | 'detailed';
  typical_question_categories: string[];
  sensitivity_topics: string[];
  prior_session_engagement: { avg_session_length: number; return_rate: number };
}

Some fields come from explicit signals (preferred language from app settings, region from sign-up data). Some are derived from behavior (engagement signals from past sessions). Some require explicit user input — sensitivity_topics is where users can mark things they don't want the AI to bring up.

The persona doesn't go directly into the LLM prompt. It gets translated into a "voice profile" that shapes the system prompt for that user. So one user's reading might start with "Looking at your chart..." and another's might start with "Beta, listen..." — and both feel right for that user.

The hardest part wasn't building the persona system. It was deciding where the line is between "personalization" and "stereotyping." We have explicit rules for what the persona can and can't infer, and an override layer where users can correct us. ("No, don't speak to me like an aunty.")

Rolling Summaries

Context window management is one of those problems that sounds easy and isn't.

If you just shove the user's entire history into the prompt, you blow the context budget on the third session. If you don't include enough history, every reading feels like the first time the AI has met the user.

We compromise with rolling summaries. After every session, a summarization pass compresses the conversation into a structured object:

{
  session_id: string;
  when: timestamp;
  what_was_asked: string;          // one sentence
  core_answer_given: string;       // 2-3 sentences
  emotional_signal: string;        // optional, e.g. "anxious about work"
  open_threads: string[];          // things to follow up on
  reading_topics_touched: string[];
}

The summarization pass is itself an LLM call, with a tightly-scoped prompt. We don't trust the LLM to compress arbitrarily — the schema is fixed, and we validate the output before persisting. Each summary is around 150 tokens; a power user with 30 sessions still fits comfortably in the prefix with the persona, the chart, and the new question.

The trick is summarizing both content and emotional context. A reading where the user was clearly anxious lands differently the next session than a reading where they were curious. The next session's prompt sees both.

Reply Format Per User

Different users want different shapes. We don't ask them — we observe.

We measure preferred format from a few signals:

Time spent in the reading view per character of output. (Skimmers vs readers behave differently.)
Whether the user asks a follow-up question, and how it's phrased.
Whether they re-engage within 7 days.
Explicit thumbs-up / thumbs-down on individual responses.

Format detection is conservative — we only update a user's format profile when we have at least three sessions of data. New users get a sensible default (medium length, mixed prose-and-structure). The profile updates are weighted, not jumpy: one outlier session doesn't override the pattern.

The format also shifts with the question type. A "should I take this job" question gets a different shape than a "what does this birth chart say about my personality" question. The persona governs style; the question type governs structure. The prompt template combines both.

It sounds elaborate. It is. But the alternative is a one-size-fits-all reading, and that is exactly what users told us they didn't want.

Evals That Aren't Garbage

You cannot have an LLM grade an AI astrologer's output. The thing being measured is "does this feel insightful and personal" — that needs human judgment.

We use real astrologers as evaluators. They sample readings from the AI and rate them on a 1-5 craft scale, with structured flags for specific failure modes (wrong tone, factual error in the chart reading, unhelpful framing, off-persona).

We sample roughly 2% of AI-generated readings, with stratified sampling so we don't only see the easy cases — we deliberately oversample edge cases (very long sessions, very short sessions, sessions where the user gave a thumbs-down). At our volume, that's several thousand readings a month for review, and we pay astrologers for the review time. It's the largest single line item in the AI budget after model costs, and the most valuable.

The eval data feeds back into prompt updates and persona refinements weekly. Persona definitions and prompt templates are version-controlled; eval scores are tracked per version, so we can see whether a change improved or regressed craft quality before it ships to all users.

This is also where the persona work shows up most clearly. Astrologer-evaluators know what a good reading sounds like for a 22-year-old vs. a 45-year-old, and the eval scores reflect that. A reading that's "good" technically but "wrong tone" gets dinged — exactly the signal we want.

What I'd Build Next

Three things on the roadmap:

Voice readings. A synthesized voice for users who want to hear a reading rather than read it. The persona system already knows how to vary tone; voice is the next step.
Multi-tradition support. Western astrology, Chinese astrology — same retrieval architecture, different knowledge bases. Users who follow multiple traditions could get readings that draw from each.
Cross-session goal tracking. For users working through a long-term question (a career decision, a relationship arc) over multiple months, an explicit "thread" concept would let the AI track the goal across sessions instead of treating each one as standalone.