AI Product Management: From Model Selection to Data Architecture
    Back to Resources
    Guides & Templates

    AI Product Management: From Model Selection to Data Architecture

    Senior Technical AI Product Manager & Machine Learning Architect

    Objective: A comprehensive, step-by-step educational guidebook designed to transition a Product Manager into a technical lead for AI-driven products, with a specific focus on high-stakes industries like Fintech and Travel.


    Module 1: Model Taxonomy & Selection

    Understanding the landscape of AI models is the foundation of any technical PM's toolkit.

    Model Types

    • SLMs (Small Language Models): Lightweight, fast, cost-effective. Ideal for classification, summarization, and on-device inference. Examples: Phi-3, Gemma.
    • LLMs (Large Language Models): Broad knowledge, strong generalization. Best for complex generation, multi-turn conversation, and creative tasks. Examples: GPT-4, Claude, Gemini.
    • Reasoning Models (o1-style): Optimized for multi-step logical reasoning, chain-of-thought problem solving. Best for complex analysis, code generation, and mathematical reasoning.
    • Vision Models: Process and understand images alongside text. Essential for document OCR, visual inspection, and multimodal applications.

    Decision Matrix

    Use CaseFintechTravelRecommended Model Type
    Fraud DetectionTransaction pattern analysisBooking anomaly detectionSLM / Reasoning Model
    Customer SupportAccount inquiries, compliance Q&ABooking changes, travel advisoriesLLM
    Document ProcessingKYC document verificationPassport/visa OCRVision Model
    Complex AnalysisRisk assessment, regulatory complianceDynamic pricing, itinerary optimizationReasoning Model
    Content GenerationReport generationItinerary descriptions, travel guidesLLM

    Checklist for PMs

    • Define the primary task (classification, generation, reasoning, vision)
    • Assess latency requirements (real-time vs. batch)
    • Evaluate cost constraints and volume expectations
    • Determine if domain-specific knowledge is critical
    • Consider regulatory and compliance requirements

    Module 2: Hosting & Infrastructure

    Local Hosting

    Tools: Ollama, vLLM, llama.cpp

    Hardware Requirements:

    • 7B parameter models: 8GB+ RAM, consumer GPU (RTX 3060+)
    • 13B parameter models: 16GB+ RAM, mid-range GPU (RTX 3090+)
    • 70B+ parameter models: 64GB+ RAM, enterprise GPU (A100, H100)

    When to use: Prototyping, data-sensitive applications, air-gapped environments, cost optimization at scale.

    Cloud Hosting

    Hugging Face Ecosystem:

    • Spaces: Quick demos and prototypes with Gradio/Streamlit
    • Inference Endpoints: Production-grade, auto-scaling model serving
    • Pros: Vast model library, community support, flexible pricing

    Managed Providers (OpenAI, Anthropic, Google):

    • Pros: Lowest time-to-production, managed infrastructure, enterprise SLAs
    • Cons: Data privacy concerns, vendor lock-in, less customization

    Quantization

    Quantization reduces model precision (e.g., FP16 to INT8 or INT4) to decrease memory usage and increase speed.

    • FP16: Full precision, highest quality, highest cost
    • INT8: ~50% memory reduction, minimal quality loss
    • INT4: ~75% memory reduction, noticeable quality trade-off

    Impact: A 70B model at FP16 requires ~140GB VRAM. At INT4, it fits in ~35GB -- making it runnable on a single A100.

    Checklist for PMs

    • Calculate expected query volume and latency SLAs
    • Evaluate data residency and privacy requirements
    • Compare total cost of ownership: cloud API vs. self-hosted
    • Plan for scaling: auto-scaling endpoints vs. fixed infrastructure
    • Assess team capability for infrastructure management

    Module 3: The Optimization Decision Tree

    Framework

    Start Here: Is the base model's knowledge sufficient?
    |
    +-- YES --> Is the output format/style correct?
    |           |
    |           +-- YES --> Use as-is (maybe light Prompt Engineering)
    |           +-- NO  --> Prompt Engineering (system prompts, few-shot examples)
    |
    +-- NO  --> Does the model need access to YOUR data?
                |
                +-- YES, and data changes frequently --> RAG
                +-- YES, and it's stable domain knowledge --> Fine-tuning
                +-- Need a completely new capability --> Full Pre-training (rare, expensive)
    

    Key Technical Distinction

    Updating Model Weights (Fine-tuning):

    • Permanently changes the model's behavior
    • Requires training data and compute
    • The knowledge becomes "baked in"
    • Like teaching someone a new skill

    Updating a Knowledge Base (RAG):

    • Model behavior stays the same
    • New information is retrieved at query time
    • Knowledge is external and easily updated
    • Like giving someone a reference book

    Comparison Table

    ApproachCostEffortBest For
    Prompt EngineeringLowHoursFormat, tone, simple task guidance
    RAGMediumDays-WeeksDynamic data, company docs, FAQs
    Fine-tuningHighWeeksDomain expertise, consistent style
    Pre-trainingVery HighMonthsEntirely new language or domain

    Checklist for PMs

    • Start with prompt engineering before escalating
    • Document when prompt engineering hits its limits
    • For RAG: identify data sources and update frequency
    • For fine-tuning: prepare at least 1,000+ high-quality examples
    • Always benchmark against the base model

    Module 4: Data Architecture

    When is a Traditional Database (SQL/JSON) Sufficient?

    • Structured, tabular data with known schemas
    • Exact-match queries (user profiles, transactions, bookings)
    • ACID compliance requirements (financial records)
    • Simple filtering, sorting, and aggregation

    Example: Customer booking history, transaction ledgers, user preferences.

    When is a Vector Database Necessary?

    Vector databases are essential when you need semantic search -- finding information by meaning rather than exact keywords.

    Step-by-step process:

    1. Choose an Embedding Model: Convert text/images into numerical vectors (e.g., OpenAI text-embedding-3, Sentence Transformers)
    2. Generate Embeddings: Process your documents through the embedding model
    3. Index Vectors: Store in a vector DB (Pinecone, Weaviate, Qdrant, pgvector)
    4. Configure Retrieval: Set similarity metrics (cosine, dot product) and top-k results
    5. Integrate with LLM: Pass retrieved context to the model (RAG pattern)

    Example: Searching travel reviews for "romantic beachfront hotels with good food" or finding similar fraud patterns across transactions.

    When Does a Knowledge Graph Outperform?

    Knowledge graphs excel when:

    • Complex relationships matter
    • Multi-hop reasoning is required
    • Explainability is critical
    • Data has rich, interconnected structure

    Checklist for PMs

    • Map your data types: structured, unstructured, or both?
    • Identify query patterns: exact match, semantic search, or relational?
    • For vector DBs: estimate embedding dimensions and storage needs
    • For knowledge graphs: map entity types and relationship types
    • Consider hybrid approaches: SQL + Vector DB is increasingly common