# 23 — Data Migration & Platform Import Engine

Fogbreak replaces Follow Up Boss, ShowingTime, Sisu, Paperless Pipeline, DocuSign, and RealScout. No brokerage will adopt Fogbreak if they lose years of client data, transaction history, showing feedback, signed documents, and analytics in the switch. This instruction builds a migration engine that imports from all six platforms — plus a universal CSV mapper for any other source.

**This is the #1 onboarding blocker for every new tenant.** Without it, Fogbreak is a greenfield-only product. With it, any brokerage can switch in a weekend.

---

## Dependencies

- **Requires:** Instruction 02 (PostgreSQL schema) — migration targets the new schema
- **Recommended:** Instructions 04–09 (platform replacements) should be at least partially built so imported data has destination tables
- **No hard dependency on AI layer** — migration is data movement, not inference

---

## Architecture

Create `app/api/migrate_import.php` — the unified migration engine.

### Three Import Tiers

| Tier | Method | Platforms | Complexity |
|------|--------|-----------|------------|
| **Tier 1: API-Direct** | Authenticated REST API pull | Follow Up Boss, DocuSign, Sisu | Full automation possible |
| **Tier 2: CSV/Export File** | Upload exported file, map columns | Paperless Pipeline, ShowingTime, RealScout, any CRM | Semi-automated with column mapping UI |
| **Tier 3: Archive Ingest** | Bulk file upload (PDFs, images) | DocuSign (completed docs), any legacy files | File processing + metadata extraction |

### Core Tables

```sql
-- Migration job tracking
CREATE TABLE IF NOT EXISTS migration_jobs (
    id SERIAL PRIMARY KEY,
    tenant_id INT NOT NULL,
    source_platform VARCHAR(50) NOT NULL,  -- 'follow_up_boss', 'showingtime', 'sisu', 'paperless_pipeline', 'docusign', 'realscout', 'csv_universal'
    tier VARCHAR(20) NOT NULL,             -- 'api_direct', 'csv_upload', 'archive_ingest'
    status VARCHAR(20) DEFAULT 'pending',  -- 'pending', 'running', 'mapping', 'importing', 'validating', 'completed', 'failed'
    total_records INT DEFAULT 0,
    imported_records INT DEFAULT 0,
    skipped_records INT DEFAULT 0,
    error_records INT DEFAULT 0,
    config JSON,                           -- API keys, file paths, column mappings
    error_log JSON,                        -- Array of per-record errors
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    created_by INT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Column mapping definitions (for CSV tier)
CREATE TABLE IF NOT EXISTS migration_field_maps (
    id SERIAL PRIMARY KEY,
    tenant_id INT NOT NULL,
    job_id INT NOT NULL REFERENCES migration_jobs(id),
    source_field VARCHAR(255) NOT NULL,    -- Column name from source file
    target_table VARCHAR(100) NOT NULL,    -- Fogbreak destination table
    target_field VARCHAR(100) NOT NULL,    -- Fogbreak destination column
    transform VARCHAR(50),                 -- 'none', 'date_format', 'phone_normalize', 'email_lowercase', 'currency_parse', 'status_map'
    transform_config JSON,                 -- e.g., {"format": "MM/DD/YYYY"} or {"map": {"Hot": "active", "Cold": "dormant"}}
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Deduplication tracking
CREATE TABLE IF NOT EXISTS migration_dedup_log (
    id SERIAL PRIMARY KEY,
    tenant_id INT NOT NULL,
    job_id INT NOT NULL REFERENCES migration_jobs(id),
    source_record JSON,                    -- Original record from source
    matched_record_id INT,                 -- Existing Fogbreak record that matched
    matched_table VARCHAR(100),
    match_type VARCHAR(50),                -- 'email_exact', 'phone_exact', 'name_fuzzy', 'address_exact'
    resolution VARCHAR(20),                -- 'merged', 'skipped', 'created_new', 'flagged'
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

---

## Platform-Specific Import Specifications

### 1. Follow Up Boss (Tier 1: API-Direct)

**API Access:** REST API at `https://api.followupboss.com/v1/`
- Auth: HTTP Basic Authentication (API Key as username, password blank)
- API Key location: FUB Admin → API → Create API Key
- Rate limits: Standard REST rate limiting applies
- Pagination: Offset-based (`?offset=0&limit=100`)

**Available Endpoints & Data:**

| FUB Endpoint | Data | Fogbreak Target Table | Notes |
|---|---|---|---|
| `GET /v1/people` | All contacts (name, email, phone, tags, stage, source, assigned agent, custom fields) | `clients` | Primary migration. Includes lead source attribution. |
| `GET /v1/notes` | Notes per contact (text, author, timestamp) | `interactions` | Map as interaction type 'note' |
| `GET /v1/calls` | Call logs (duration, direction, recording URL, outcome) | `interactions` | Map as interaction type 'call'. Recording URLs may expire. |
| `GET /v1/emails` | Email history per contact | `interactions` | Map as interaction type 'email'. Cross-reference with `email_messages` to avoid duplicates. |
| `GET /v1/tasks` | Tasks (title, due date, assigned to, status) | `tasks` | Map task status to Fogbreak phase system |
| `GET /v1/actionPlans` | Action plans (drip-like sequences) | `drip_campaigns` + `drip_steps` | Map plan steps to drip steps. Enrollment data via `/v1/actionPlansPeople`. |
| `GET /v1/customFields` | Custom field definitions | `client_custom_fields` (new) | Preserve all custom data. Don't discard what you don't recognize. |
| `GET /v1/dealStages` | Pipeline stage definitions | Reference data for status mapping | Map to Fogbreak's 0-9 phase system |
| `GET /v1/deals` | Active deals with stage, value, close date | `transactions` | Map deal stage → Fogbreak phase |
| `GET /v1/users` | Agent/team member profiles | `admins` | Map roles: admin, agent, ISA, lender partner |
| `GET /v1/smartLists` | Saved filter configurations | `smart_lists` (new) | Preserve segmentation logic |
| `GET /v1/tags` | Tag definitions | `client_tags` | Tags are critical for segmentation |
| `GET /v1/webhooks` | Active webhook configurations | Log only | Document existing integrations for reference |

**Migration Logic:**

```
1. Authenticate with FUB API key
2. Pull /v1/users → create admin accounts (skip if email already exists)
3. Pull /v1/customFields → create custom field definitions
4. Pull /v1/people (paginated, all pages) → for each:
   a. Dedup check: match by email (exact) or phone (normalized)
   b. If match found: merge (update missing fields, append tags, don't overwrite existing)
   c. If no match: create new client record
   d. Map FUB stage → Fogbreak lifecycle (lead/active/closed/repeat)
   e. Preserve: source, source URL, assigned agent, tags, custom fields
5. Pull /v1/notes, /v1/calls, /v1/emails → attach to migrated client records
6. Pull /v1/actionPlans → create drip campaigns + steps
7. Pull /v1/deals → create transactions (map stage to phase)
8. Pull /v1/tasks → attach to deals or clients
9. Validation pass: count records per table, flag orphans
```

**What you CAN'T get from FUB API:**
- Lead routing rules (round-robin config, weighting) — must be manually recreated
- Speed-to-lead metrics (historical response times) — not exposed via API
- Email template HTML — available via `/v1/emailTemplates` but formatting may not transfer cleanly
- Automation triggers — FUB's automation rules aren't exported via API

**Estimated records per brokerage:** 500–50,000 contacts, 2,000–200,000 interactions

---

### 2. ShowingTime (Tier 2: CSV/Export + Tier 1 where available)

**API Access:** ShowingTime does NOT offer a public REST API for individual agents/brokerages. Data access is MLS-level via Bridge Interactive (RESO-certified API), which serves listing data — not showing history.

**Export Options:**
- **Agent-level:** Showing history reports (Listing Activity, Agent Activity, Showings I Have Requested) can be exported from the ShowingTime web dashboard as CSV/PDF
- **Broker-level:** Front Desk provides agent activity, listing activity, and feedback reports — exportable as CSV
- **MLS-level:** Bridge Interactive API provides listing data (not showing data) via RESO Web API

**Available Data via Export:**

| ShowingTime Report | Data | Fogbreak Target | Export Format |
|---|---|---|---|
| Listing Activity Report | Showing date/time, showing agent, listing, status | `showings` | CSV |
| Agent Activity Report | Agent showing counts, listings shown, feedback given | `showing_analytics` | CSV |
| Feedback Report | Per-showing feedback (rating, comments, buyer interest level) | `showing_feedback` | CSV |
| Showing Requests | Request history (requested, confirmed, cancelled, no-show) | `showings` | CSV |
| Seller Report | Activity on a listing (views, showings, feedback summary) | `listing_analytics` | PDF (parse) or manual entry |

**Migration Logic:**

```
1. User exports showing reports from ShowingTime dashboard (CSV)
2. Upload CSV to Fogbreak migration engine
3. Column mapping UI: map ShowingTime columns → Fogbreak schema
4. Default mappings provided (ShowingTime's CSV format is consistent):
   - "Showing Date" → showings.showing_date
   - "Showing Agent" → showings.showing_agent (fuzzy match to admins table)
   - "Listing Address" → showings.property_id (match by address to properties table)
   - "Status" → showings.status
   - "Feedback" → showing_feedback.comments
   - "Interest Level" → showing_feedback.buyer_interest
5. Dedup: match by (property_id + showing_date + showing_agent)
6. Import showing records + feedback
7. Flag unmatched properties/agents for manual review
```

**What you CAN'T get from ShowingTime:**
- Availability windows / scheduling preferences — not in export
- SmartRoute optimization history — proprietary algorithm data
- Lockbox access codes — security-restricted
- Real-time showing notifications config — must be recreated
- Seller portal settings — not exportable

**Estimated records per brokerage:** 200–10,000 showings, 100–5,000 feedback entries

---

### 3. Sisu (Tier 1: API-Direct + Tier 2: CSV)

**API Access:** Sisu has an open REST API (documented via partnership with API Nation / datalabz.re).
- Auth: API key-based
- Integration typically routed through API Nation middleware
- Direct API access available for team owners

**Available Data:**

| Sisu Data | Fogbreak Target | Method | Notes |
|---|---|---|---|
| Transaction data (address, price, stage, dates, agents) | `transactions` + `commissions` | API or CSV export | Primary data. Export from Sisu's transaction screen. |
| Agent metrics (dials, conversations, appointments, contracts, closings) | `agent_metrics` (new) | API via datalabz | Historical KPI data. Critical for coaching baselines. |
| Goals (per agent, per period) | `agent_goals` (new) | CSV export | Goal definitions and pacing history |
| Leaderboard history | `leaderboard_snapshots` (new) | CSV export | Competitive history. Useful for gamification continuity. |
| Commission data (GCI, splits, agent share) | `commissions` + `commission_splits` | API or CSV | Cross-reference with Fogbreak's existing commission tables |
| Custom reports | Reference only | CSV export | Preserve report configurations as documentation |

**Migration Logic:**

```
1. If API access: authenticate via API Nation / direct key
   - Pull transactions (paginated)
   - Pull agent metrics (date-ranged)
   - Pull goals
2. If CSV: user exports from Sisu transaction screen
   - Upload to migration engine
   - Column mapping (Sisu's export format is well-structured)
3. For each transaction:
   a. Dedup: match by (address + close_date) or (MLS number if available)
   b. Map Sisu stage → Fogbreak phase (0-9)
   c. Import commission data (GCI, splits)
4. For agent metrics:
   a. Create historical metric records
   b. Establish baselines for coaching engine
5. Validation: total GCI from Sisu should match total GCI imported
```

**What you CAN'T get from Sisu:**
- Badge/achievement history — gamification data is internal
- Coaching conversation logs — not exported
- Custom dashboard layouts — must be recreated
- Agent ranking algorithm weights — proprietary

**Estimated records per brokerage:** 100–5,000 transactions, 1,000–50,000 metric data points

---

### 4. Paperless Pipeline (Tier 2: CSV Only)

**API Access:** Paperless Pipeline does NOT have a public API. Integration is limited to Zapier triggers (outbound only — new transaction, task completed, etc.). There is no way to programmatically pull historical data.

**Export Options:**
- **Transaction list export:** Admins can download a CSV of all transactions with full details
- **Monthly backup:** Each month, a backup CSV of transactions added/closed/expired/cancelled that month
- **Document download:** Individual document download from each transaction (no bulk export)

**Available Data via Export:**

| Paperless Pipeline Export | Data | Fogbreak Target | Notes |
|---|---|---|---|
| Transaction List CSV | Address, type, stage, dates, agents, price, escrow info | `transactions` | Primary migration source |
| Transaction Details | Checklist items, task completion status, notes | `compliance_items` + `tasks` | May require per-transaction export |
| Documents | PDFs attached to each transaction | `documents` | Manual download required — no bulk API |
| Contacts | Buyer, seller, agent, escrow, title contacts per deal | `clients` + transaction associations | Embedded in transaction data |
| Monthly Backup CSV | Same as transaction list, filtered by month | `transactions` | Alternative to full export |

**Migration Logic:**

```
1. Admin exports full transaction list from Paperless Pipeline
2. Upload CSV to Fogbreak migration engine
3. Column mapping:
   - "Property Address" → transactions.address
   - "Transaction Type" → transactions.type (map: "Listing", "Buyer", "Dual" etc.)
   - "Status" → transactions.phase (map PP stages to Fogbreak 0-9)
   - "Listing Agent" / "Selling Agent" → transactions.agent_id (fuzzy match)
   - "List Price" / "Sale Price" → transactions.price
   - "Close Date" → transactions.close_date
   - "Escrow Company" → transactions.escrow_company
4. For compliance items:
   a. If per-transaction export available: map checklist items → compliance_items
   b. If not: create compliance items from jurisdiction templates (they'll start unchecked)
5. Documents: instruct user to bulk download from PP, then use Archive Ingest (Tier 3)
6. Dedup: match by (address + close_date)
```

**What you CAN'T get from Paperless Pipeline:**
- Task dependency graphs — not in export format
- Automation rules / triggers — no export
- Template configurations — must be recreated
- Audit trail of who completed what task and when — limited in CSV export
- Document version history — only current version downloadable

**Critical limitation:** Documents must be manually downloaded transaction-by-transaction. For a brokerage with 500+ closed deals, this is a significant manual effort. Consider building a "document migration checklist" that prioritizes active/pending transactions and archives closed deals as lower priority.

**Estimated records per brokerage:** 200–5,000 transactions, 1,000–25,000 tasks/checklist items

---

### 5. DocuSign (Tier 1: API-Direct + Tier 3: Archive)

**API Access:** DocuSign eSignature REST API is mature and well-documented.
- Auth: OAuth 2.0 (Authorization Code Grant for user-level, JWT for server-level)
- Base URL: `https://www.docusign.net/restapi/v2.1/` (production)
- Sandbox: `https://demo.docusign.net/restapi/v2.1/`
- Rate limits: 1,000 requests/hour (standard plan)
- Requires DocuSign developer account and API integration key

**Available Endpoints & Data:**

| DocuSign Endpoint | Data | Fogbreak Target | Notes |
|---|---|---|---|
| `GET /accounts/{id}/envelopes` | List all envelopes with filters (status, date range) | `esign_envelopes` | Supports date filtering. Paginated. |
| `GET /envelopes/{id}/documents` | Download signed PDFs (individual or combined) | `documents` | Can get combined PDF or ZIP of all docs |
| `GET /envelopes/{id}/form_data` | Extract field values from completed envelopes | `esign_field_data` (new) | Tab/field data: names, dates, amounts signed |
| `GET /envelopes/{id}/audit_events` | Full audit trail (SHA-256 hashed, timestamped) | `esign_audit_log` | 15+ event types. Immutable record. |
| `GET /envelopes/{id}/recipients` | Signer details (name, email, status, signed date) | `esign_recipients` | Includes routing order, status per signer |
| `GET /accounts/{id}/templates` | Template definitions | `esign_templates` | Can preserve template library |
| `GET /envelopes/{id}/documents/certificate` | Certificate of Completion | `documents` | Legal proof of signing |

**Migration Strategy — Two Modes:**

**Mode A: Full Historical Archive (recommended for compliance)**
```
1. Authenticate via OAuth 2.0 (requires user consent)
2. GET /envelopes?from_date=2020-01-01&status=completed (paginated)
3. For each completed envelope:
   a. Download combined PDF → store in documents table
   b. Download Certificate of Completion → store alongside
   c. Extract form_data → store structured field values
   d. Pull audit_events → store in esign_audit_log
   e. Pull recipients → store signer details
4. Link to Fogbreak transaction by:
   - Matching envelope subject/property address to transaction
   - Or: manual association via migration UI
5. Validation: count envelopes in DocuSign vs imported
```

**Mode B: Active Envelopes Only (faster, for quick cutover)**
```
1. Same as Mode A but filter: status=sent OR status=created
2. Import only in-progress envelopes
3. Note: CANNOT resume signing in Fogbreak — active DocuSign envelopes must be completed in DocuSign
4. New envelopes go through Fogbreak's esign.php
```

**What you CAN'T migrate from DocuSign:**
- Active signing sessions — cannot transfer mid-signature envelopes to another platform
- Branding/logo configurations — must be recreated
- PowerForms (self-service signing URLs) — must be recreated
- Connect (webhook) configurations — must be recreated
- Signing groups — member lists can be pulled, but the group logic must be recreated

**Important legal note:** DocuSign's audit trail is a legal record. When archiving, store the original audit trail alongside Fogbreak's own records. Never modify or reformat the DocuSign audit data — it's evidence.

**Estimated records per brokerage:** 100–10,000 envelopes, 200–30,000 documents

---

### 6. RealScout (Tier 2: CSV + Tier 1: API where available)

**API Access:** RealScout has an API but it's primarily designed for MLS/brokerage-level integrations, not individual data export. Direct API integrations exist with Follow Up Boss, Cloze, MoxiWorks, and ReChat.

**Export Options:**
- **Client CSV export:** Go to CLIENTS tab → scroll to bottom → "EXPORT ALL CLIENTS (CSV)"
- **Includes:** Name, email, phone, subscription status, deleted status
- **Does NOT include via CSV:** Saved search criteria, property preferences, reaction history (likes/dislikes), engagement analytics

**Available Data:**

| RealScout Data | Fogbreak Target | Method | Notes |
|---|---|---|---|
| Client contacts (name, email, phone) | `clients` | CSV export | Basic contact data. Dedup against FUB import. |
| Subscription status | `clients.alert_status` | CSV export | Track who unsubscribed |
| Saved searches | `saved_searches` | API only (if available) or manual recreation | **Critical gap** — CSV does not include search criteria |
| Property preferences (beds, baths, features, style) | `client_preferences` (new) | API only or manual | Feature preferences are the core value of RealScout |
| Property reactions (liked, disliked, commented) | `property_reactions` (new) | NOT exportable | This is RealScout's proprietary AI training data |
| Engagement analytics (opens, views, clicks) | `client_engagement_log` | NOT exportable | Behavioral data stays in RealScout |
| Listing alerts sent | Reference only | NOT exportable | Historical alert records |

**Migration Logic:**

```
1. Export client CSV from RealScout
2. Upload to Fogbreak migration engine
3. Column mapping (RealScout CSV is straightforward):
   - "First Name" / "Last Name" → clients.first_name / clients.last_name
   - "Email" → clients.email
   - "Phone" → clients.phone
   - "Is deleted" → skip if true (or import as inactive)
   - "Unsubscribed" → clients.alert_opt_out = true
4. Dedup: match by email against existing clients (likely already imported from FUB)
   - If match: merge RealScout subscription status
   - If no match: create new client (these are RealScout-only contacts)
5. For saved searches / preferences:
   a. If API access: pull search criteria per client → create saved_searches
   b. If no API: generate "preference questionnaire" workflow
      - For each imported client, create a task: "Confirm property preferences"
      - Agent manually enters preferences (or sends client a preference form)
6. Flag: property reactions and AI-learned preferences DO NOT transfer
   - Fogbreak's AI matching (instruction 09) will need to learn from scratch
   - Communicate this clearly to the brokerage during onboarding
```

**What you CAN'T get from RealScout:**
- AI-learned preference models — this is RealScout's proprietary machine learning data
- Property reaction history (likes, dislikes, ratings, comments) — not in CSV export
- Engagement behavioral data (what they clicked, how long they viewed) — not exportable
- Saved search criteria via CSV — only available via API (and Zapier explicitly excludes this)
- Curated collection contents — not exportable

**This is the hardest migration.** RealScout's value is the behavioral data it accumulates over time. Clients will start "cold" in Fogbreak's matching engine. Mitigate this by: (a) importing explicit preferences where possible, (b) prompting agents to confirm/update preferences during first interaction post-migration, and (c) communicating to the brokerage that AI matching improves with engagement over the first 30-60 days.

**Estimated records per brokerage:** 200–20,000 contacts, 100–10,000 saved searches (if API available)

---

## Universal CSV Import (Any Source)

For brokerages using CRMs not in the above list (kvCORE, BoomTown, Sierra Interactive, LionDesk, Wise Agent, CINC, etc.), build a universal CSV mapper.

### Column Mapping UI

Add to `fogbreak.html` (new tab or modal in admin settings):

```
Migration Wizard Steps:
1. Select source platform (dropdown: FUB, ShowingTime, Sisu, PP, DocuSign, RealScout, Other)
2. Upload file(s) — CSV, XLSX, or JSON
3. Auto-detect columns + suggest mappings
4. User reviews/adjusts column mappings
5. Preview: show first 10 rows mapped to Fogbreak schema
6. Configure dedup rules (match by: email, phone, address, name)
7. Run import (with progress bar)
8. Review results: imported, skipped (dupes), errored, flagged for review
```

### Auto-Detection Logic

```php
// Smart column detection for common CRM exports
$column_signatures = [
    // Follow Up Boss
    'firstName' => ['target' => 'clients.first_name'],
    'lastName' => ['target' => 'clients.last_name'],
    'stage' => ['target' => 'clients.lifecycle_stage', 'transform' => 'status_map'],
    'leadSource' => ['target' => 'clients.lead_source'],

    // Common variations
    'First Name' => ['target' => 'clients.first_name'],
    'Last Name' => ['target' => 'clients.last_name'],
    'Email' => ['target' => 'clients.email'],
    'Phone' => ['target' => 'clients.phone'],
    'Cell Phone' => ['target' => 'clients.phone'],
    'Mobile' => ['target' => 'clients.phone'],
    'Address' => ['target' => 'clients.address'],
    'Property Address' => ['target' => 'transactions.address'],
    'List Price' => ['target' => 'transactions.list_price', 'transform' => 'currency_parse'],
    'Sale Price' => ['target' => 'transactions.sale_price', 'transform' => 'currency_parse'],
    'Close Date' => ['target' => 'transactions.close_date', 'transform' => 'date_format'],
    'Closing Date' => ['target' => 'transactions.close_date', 'transform' => 'date_format'],
    'MLS Number' => ['target' => 'properties.mls_number'],
    'MLS #' => ['target' => 'properties.mls_number'],
    'Tags' => ['target' => 'client_tags', 'transform' => 'tag_split'],
    'Notes' => ['target' => 'interactions.notes'],
    'Source' => ['target' => 'clients.lead_source'],
    'Lead Source' => ['target' => 'clients.lead_source'],
    'Agent' => ['target' => 'clients.assigned_agent', 'transform' => 'agent_match'],
    'Assigned To' => ['target' => 'clients.assigned_agent', 'transform' => 'agent_match'],
];
```

### Data Transforms

```php
$transforms = [
    'date_format' => function($value, $config) {
        // Try multiple date formats: MM/DD/YYYY, YYYY-MM-DD, DD-Mon-YY, etc.
        $formats = ['m/d/Y', 'Y-m-d', 'd-M-y', 'm-d-Y', 'n/j/Y', 'Y/m/d'];
        foreach ($formats as $fmt) {
            $date = DateTime::createFromFormat($fmt, trim($value));
            if ($date) return $date->format('Y-m-d');
        }
        return null; // Flag for manual review
    },
    'phone_normalize' => function($value) {
        // Strip to digits, format as +1XXXXXXXXXX
        $digits = preg_replace('/\D/', '', $value);
        if (strlen($digits) === 10) return '+1' . $digits;
        if (strlen($digits) === 11 && $digits[0] === '1') return '+' . $digits;
        return $digits; // Non-US or unrecognized — store as-is
    },
    'currency_parse' => function($value) {
        return (float) preg_replace('/[^0-9.]/', '', $value);
    },
    'email_lowercase' => function($value) {
        return strtolower(trim($value));
    },
    'status_map' => function($value, $config) {
        $map = $config['map'] ?? [];
        return $map[trim($value)] ?? trim($value);
    },
    'tag_split' => function($value) {
        // Tags may be comma-separated, semicolon-separated, or pipe-separated
        return array_map('trim', preg_split('/[,;|]/', $value));
    },
    'agent_match' => function($value, $config) {
        // Fuzzy match agent name to admins table
        // Returns admin_id or null (flagged for manual assignment)
    },
];
```

---

## Deduplication Engine

Dedup is critical — most brokerages will import from multiple sources (FUB contacts + RealScout contacts + Paperless Pipeline contacts). Without dedup, they'll get 3x the records.

### Match Strategy (in priority order)

1. **Email exact match** — Most reliable. Lowercase + trim before comparing.
2. **Phone exact match** — After normalization to +1XXXXXXXXXX format.
3. **Name + address fuzzy match** — Levenshtein distance ≤ 2 on name, exact on address.
4. **MLS number match** — For property/transaction dedup.
5. **Address normalization match** — "123 Main St" = "123 Main Street" = "123 main st."

### Merge Rules

When a duplicate is found:
- **Don't overwrite existing data** — only fill in blanks
- **Append tags** — union of both tag sets
- **Keep both interaction histories** — create interaction records from both sources
- **Use most recent contact info** — if phone/email differs, prefer the newer source
- **Flag for review** — if names don't match but email does (possible shared email)
- **Log everything** — every merge decision goes in `migration_dedup_log`

---

## Migration UI

Add a "Data Migration" section under Admin Settings in `fogbreak.html`:

### Migration Dashboard

```
┌─────────────────────────────────────────────────────┐
│ Data Migration                                       │
├─────────────────────────────────────────────────────┤
│                                                      │
│  Import From:                                        │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐            │
│  │ Follow   │ │ Showing  │ │   Sisu   │            │
│  │ Up Boss  │ │  Time    │ │          │            │
│  │ [API]    │ │ [CSV]    │ │ [API/CSV]│            │
│  └──────────┘ └──────────┘ └──────────┘            │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐            │
│  │Paperless │ │ DocuSign │ │RealScout │            │
│  │ Pipeline │ │          │ │          │            │
│  │ [CSV]    │ │ [API]    │ │ [CSV/API]│            │
│  └──────────┘ └──────────┘ └──────────┘            │
│  ┌──────────┐                                       │
│  │ Other    │                                       │
│  │ [CSV]    │                                       │
│  └──────────┘                                       │
│                                                      │
│  Recent Migrations:                                  │
│  ─────────────────────────────────────               │
│  ✅ FUB Import — 2,847 contacts — Mar 15            │
│  ✅ DocuSign Archive — 342 envelopes — Mar 15        │
│  ⚠️  RealScout — 1,203 contacts (47 flagged) — Mar 16│
│  🔄 Sisu Import — Running (67%) — Now                │
│                                                      │
└─────────────────────────────────────────────────────┘
```

### Column Mapping Interface

For CSV imports, show a two-column mapping table:

```
Source Column (from file)    →    Fogbreak Field
─────────────────────────────────────────────────
"First Name"                 →    Client: First Name    ✓ auto-detected
"Last Name"                  →    Client: Last Name     ✓ auto-detected
"Email Address"              →    Client: Email         ✓ auto-detected
"Cell"                       →    Client: Phone         ✓ auto-detected
"Lead Status"                →    Client: Lifecycle     ⚠️ needs status map
"Custom Field 1"             →    [Select target ▼]     ⚠️ unmapped
"Notes"                      →    Interaction: Note     ✓ auto-detected

[Preview Import] [Run Import]
```

---

## Recommended Migration Order for New Tenants

This is the onboarding playbook for any brokerage switching to Fogbreak:

```
Day 1: Foundation
  1. Follow Up Boss → Fogbreak (API)     [~30 min automated]
     - All contacts, notes, calls, emails, tags, deals
     - This is the CRM backbone — do it first

  2. DocuSign → Fogbreak (API)           [~1 hr automated]
     - Archive all completed envelopes + audit trails
     - Active envelopes stay in DocuSign until completed

Day 2: Transaction History
  3. Paperless Pipeline → Fogbreak (CSV)  [~1 hr semi-automated]
     - Transaction history, checklists, compliance items
     - Documents: prioritize active deals, archive closed

  4. Sisu → Fogbreak (API/CSV)           [~30 min]
     - KPI history, commission data, agent metrics
     - Sets baselines for coaching engine

Day 3: Enrichment
  5. ShowingTime → Fogbreak (CSV)         [~30 min]
     - Showing history + feedback
     - Lower priority — historical context, not active workflows

  6. RealScout → Fogbreak (CSV)           [~30 min + manual follow-up]
     - Client contacts (dedup against FUB import)
     - Flag: saved searches need manual recreation or preference confirmation

Day 4: Validation & Cutover
  7. Run validation report:
     - Total contacts: FUB count ≈ Fogbreak count (after dedup)
     - Total transactions: PP count ≈ Fogbreak count
     - Total envelopes: DocuSign count ≈ archived count
     - GCI totals: Sisu ≈ Fogbreak commission records
  8. Agent walkthrough: 30-min per agent to verify their data
  9. Cutover: set Fogbreak as primary, deactivate legacy platforms
```

---

## API Endpoints for migrate_import.php

```php
// Migration job management
GET    ?action=list_jobs                    // List all migration jobs for tenant
GET    ?action=job_detail&id={id}           // Job status + stats + error log
POST   ?action=create_job                   // Create new migration job
POST   ?action=start_api_import&id={id}     // Start Tier 1 (API) import
POST   ?action=upload_file&id={id}          // Upload CSV/XLSX for Tier 2
POST   ?action=save_mappings&id={id}        // Save column mappings
POST   ?action=preview_import&id={id}       // Preview first 10 rows mapped
POST   ?action=run_import&id={id}           // Execute the import
POST   ?action=cancel_job&id={id}           // Cancel running job
GET    ?action=dedup_report&id={id}         // Show duplicates found + resolutions
POST   ?action=resolve_dedup&id={id}        // Manually resolve flagged duplicates
GET    ?action=validation_report&id={id}    // Post-import validation stats
POST   ?action=rollback&id={id}            // Undo an import (delete imported records)

// Platform-specific
POST   ?action=test_fub_connection          // Verify FUB API key
POST   ?action=test_docusign_connection     // Verify DocuSign OAuth
POST   ?action=test_sisu_connection         // Verify Sisu API key
GET    ?action=fub_record_count             // Preview how many records will import
GET    ?action=docusign_envelope_count      // Preview envelope count by date range
```

---

## Security & Tenant Isolation

- **All migration jobs are tenant-scoped.** `tenant_id` on every query. No cross-tenant data leaks.
- **API keys for source platforms are encrypted at rest** (AES-256 in config, not plain text in migration_jobs.config).
- **Source platform API keys are single-use.** After migration completes, prompt user to revoke the API key in the source platform.
- **Rollback capability:** Every imported record is tagged with `migration_job_id`. Rollback deletes all records with that job ID.
- **Rate limiting:** API imports respect source platform rate limits. FUB: pause between paginated calls. DocuSign: 1,000 req/hr max.
- **No data is deleted from source platforms.** Migration is copy-only. The brokerage decides when to deactivate legacy tools.

---

## Error Handling

```php
// Per-record error logging
$error_types = [
    'INVALID_EMAIL'       => 'Email format invalid — record skipped',
    'DUPLICATE_RESOLVED'  => 'Duplicate found — merged with existing record',
    'DUPLICATE_FLAGGED'   => 'Possible duplicate — flagged for manual review',
    'MISSING_REQUIRED'    => 'Required field missing (name or email) — skipped',
    'AGENT_NOT_FOUND'     => 'Assigned agent not found in Fogbreak — unassigned',
    'DATE_PARSE_FAILED'   => 'Could not parse date — stored as null, flagged',
    'PROPERTY_NOT_FOUND'  => 'Property address not matched — showing orphaned',
    'API_RATE_LIMITED'    => 'Source API rate limit hit — pausing 60s',
    'API_AUTH_FAILED'     => 'Source API authentication failed — job halted',
    'FILE_PARSE_ERROR'    => 'CSV/XLSX parsing error on row N — skipped',
];
```

Every error is logged to `migration_jobs.error_log` (JSON array) with row number, source data, and error type. The validation report surfaces these for manual review.

---

## Testing Checklist

- [ ] `migration_jobs`, `migration_field_maps`, `migration_dedup_log` tables created via auto-migration
- [ ] FUB API import: test with sandbox API key, verify contacts + interactions imported correctly
- [ ] DocuSign API import: test with sandbox account, verify envelope PDFs + audit trails archived
- [ ] CSV universal import: test with sample exports from each platform
- [ ] Column auto-detection: verify ≥80% of columns auto-mapped for standard exports
- [ ] Dedup engine: test email-match, phone-match, and fuzzy-name scenarios
- [ ] Merge logic: verify existing data not overwritten, tags appended, interactions preserved
- [ ] Rollback: verify all records from a job can be cleanly deleted
- [ ] Tenant isolation: verify migration job in tenant A cannot access tenant B data
- [ ] Rate limiting: verify FUB/DocuSign imports pause appropriately
- [ ] Validation report: totals match source platform counts
- [ ] Migration UI: column mapping, preview, progress bar, error review all functional
- [ ] Error handling: invalid records logged, job continues (doesn't halt on single bad record)
- [ ] Large dataset test: import 10,000+ contact CSV without timeout or memory exhaustion

---

## File Manifest

| File | Purpose | Lines (est.) |
|---|---|---|
| `app/api/migrate_import.php` | Migration engine: API imports, CSV processing, dedup, validation | ~2,500 |
| `app/js/migration-wizard.js` | Frontend: platform selector, file upload, column mapping, progress | ~800 |
| `app/fogbreak.html` | Add Migration tab/section under admin settings | +200 |
| `db/seeds/migration_column_signatures.json` | Default column mappings for all 6 platforms | ~300 |

**Estimated total:** ~3,800 lines (core) + ~1,200 lines (email scraper)

---

## Supplement: Gmail Email Scraper (Tier 2B)

For platforms with poor or no APIs (ShowingTime, Paperless Pipeline, RealScout), the user's Gmail inbox is a rich secondary data source. Every one of these platforms sends structured transactional emails — confirmations, reminders, alerts, feedback requests — that contain parseable data fields. With user consent and Gmail IMAP access (which Fogbreak already has), this data can be mined automatically.

**This is NOT a replacement for API or CSV import.** It's a supplemental recovery layer that fills gaps the other methods can't reach — especially historical data that may no longer be exportable from the source platform.

### How It Works

```
1. User grants Gmail access (Fogbreak already has IMAP sync via email.php)
2. Migration engine searches Gmail for emails from known platform sender addresses
3. Platform-specific parsers extract structured data from email subject lines and HTML bodies
4. Extracted records are deduplicated against any data already imported via API/CSV
5. New records are inserted; existing records are enriched with additional fields
```

### Platform Sender Addresses

```php
$platform_senders = [
    'showingtime' => [
        'callcenter@showingtime.com',
        'support@showingtime.com',
        'showings@showingtime.com',
        'noreply@showingtime.com',
    ],
    'paperless_pipeline' => [
        'help@paperlesspipeline.com',
        // Note: From header shows "Tracy Travaille <help@paperlesspipeline.com>"
        // The actual sender name is the TC, not the platform
    ],
    'realscout' => [
        'no-reply@agentmail.realscout.com',
        'no-reply@realscout.com',
        'support@realscout.com',
    ],
];
```

---

### ShowingTime Email Parser

**Verified from actual Gmail data.** ShowingTime sends templated HTML emails from `callcenter@showingtime.com` with highly consistent structure. ~201 emails found in Kean's inbox.

**Email Types & What They Contain:**

| Subject Pattern | Email Type | Extractable Data |
|---|---|---|
| `YOUR SHOWING CONFIRMED \| {address} {date} {time}` | Confirmation | Address, city, ZIP, date, time, MLS#, price, status, listing agent, showing agent, co-listing agent, brokerage, phone numbers |
| `CANCELLED \| Showing \| {address} \| {date} {time}` | Cancellation | Same as confirmation + cancellation reason |
| `FEEDBACK REQUESTED \| {address}` | Feedback Request | Address, MLS#, price, status, listing agent, showing date/time, feedback link |
| `Extra info for {address} (MLS# {id})` | Supplemental Docs | Address, MLS#, attached documentation |

**Parser: Subject Line (regex)**

```php
// CONFIRMED: "YOUR SHOWING CONFIRMED | 475 El Caminito, CARMEL VALLEY 93924 Sat, 12/4 11:00 AM"
$confirmed_pattern = '/YOUR SHOWING CONFIRMED \| (.+?), ([A-Z\s]+) (\d{5})\s+(.+)/';
// Captures: [1] address, [2] city, [3] zip, [4] date/time string

// CANCELLED: "CANCELLED | Showing | 475 El Caminito, CARMEL VALLEY 93924 | Wed, 12/8 10:00 AM"
$cancelled_pattern = '/CANCELLED \| Showing \| (.+?), ([A-Z\s]+) (\d{5}) \| (.+)/';

// FEEDBACK: "FEEDBACK REQUESTED | 475 El Caminito, CARMEL VALLEY 93924"
$feedback_pattern = '/FEEDBACK REQUESTED \| (.+?), ([A-Z\s]+) (\d{5})/';
```

**Parser: HTML Body (verified field locations)**

The HTML body follows ShowingTime's "Template V3" format with consistent structure:

```php
function parseShowingTimeBody($html) {
    $data = [];

    // Strip HTML to text for extraction
    $text = strip_tags(html_entity_decode($html));

    // MLS Number: always formatted as "MLS# ML81863579"
    if (preg_match('/MLS#\s*(ML\d+)/', $text, $m)) {
        $data['mls_number'] = $m[1];
    }

    // Price: "$1,200,000" — appears after ZIP code
    if (preg_match('/\$([0-9,]+)\s*\|/', $text, $m)) {
        $data['price'] = (float) str_replace(',', '', $m[1]);
    }

    // Status: "ACTIVE", "CONTINGENT", "PENDING" — between price and MLS#
    if (preg_match('/\|\s*(ACTIVE|CONTINGENT|PENDING|SOLD|WITHDRAWN)\s*\|/', $text, $m)) {
        $data['listing_status'] = $m[1];
    }

    // Presented by (listing team): "Presented by: William Smith Luxury Group"
    if (preg_match('/Presented by:\s*(.+?)(?:\s*You have|$)/s', $text, $m)) {
        $data['listing_team'] = trim($m[1]);
    }

    // Showing Agent: "Kean Matthams (Showing Agent)"
    if (preg_match('/(\w[\w\s]+?)\s*\(Showing Agent\)/', $text, $m)) {
        $data['showing_agent'] = trim($m[1]);
    }

    // Listing Agent: extracted from "Georgia Byrne (Co-Listing Agent)" pattern
    if (preg_match('/(\w[\w\s]+?)\s*\((?:Co-)?Listing Agent\)/', $text, $m)) {
        $data['listing_agent'] = trim($m[1]);
    }

    // Brokerage: "Coldwell Banker Realty" — appears after listing agent
    if (preg_match('/\((?:Co-)?Listing Agent\)\s*(\w[\w\s]+?(?:Realty|Brokerage|Real Estate|Properties|Group))/i', $text, $m)) {
        $data['brokerage'] = trim($m[1]);
    }

    // Phone numbers: "(831) 238-0544 (Mobile Phone)"
    preg_match_all('/\((\d{3})\)\s*(\d{3})-(\d{4})\s*\(([\w\s]+)\)/', $text, $phones, PREG_SET_ORDER);
    foreach ($phones as $phone) {
        $data['phones'][] = [
            'number' => "+1{$phone[1]}{$phone[2]}{$phone[3]}",
            'type' => trim($phone[4]),
        ];
    }

    // Showing date/time: "Showing Wednesday, December 8, 2021 10:00 AM - 10:30 AM"
    // Also: "Showing Sat, December 4, 2021 11:00 AM - 11:30 AM"
    if (preg_match('/Showing\s+(?:\w+,?\s+)?(\w+ \d+, \d{4})\s+(\d+:\d+ [AP]M)\s*-\s*(\d+:\d+ [AP]M)/', $text, $m)) {
        $data['showing_date'] = $m[1];
        $data['showing_start'] = $m[2];
        $data['showing_end'] = $m[3];
    }

    // Buyer info: "Buyer info not provided" or actual buyer name
    if (preg_match('/Buyer info (.+?)(?:\s*\+|\.|\s{2})/', $text, $m)) {
        $data['buyer_info'] = trim($m[1]);
    }

    // Subdivision
    if (preg_match('/Subdivision:\s*(.+?)(?:\s*Presented|$)/s', $text, $m)) {
        $data['subdivision'] = trim($m[1]);
    }

    return $data;
}
```

**Fogbreak Target Tables:**

| Extracted Field | Target |
|---|---|
| Address + City + ZIP | `showings.property_address` → match to `properties.id` |
| MLS# | `showings.mls_number` → match to `properties.mls_number` |
| Showing date/time | `showings.showing_date`, `showings.start_time`, `showings.end_time` |
| Status (confirmed/cancelled) | `showings.status` |
| Showing agent | `showings.showing_agent_id` → match to `admins.name` |
| Listing agent | `showings.listing_agent` (text) or match to `admins` |
| Price | `properties.price` (update if different) |
| Listing status | `properties.status` |
| Buyer info | `showings.buyer_notes` |

**Estimated yield:** With ~201 ShowingTime emails, expect 50-100 unique showing records (many emails are follow-ups for the same showing — confirmation, feedback request, cancellation are separate emails for one event). Dedup by (address + date + showing_agent).

---

### Paperless Pipeline Email Parser

**Verified from actual Gmail data.** Paperless Pipeline sends plain-text emails from `help@paperlesspipeline.com` with the TC's name in the From field. ~201 emails found in Kean's inbox. Emails are NOT templated HTML — they're plain text with a consistent header structure.

**Email Structure (every PP email follows this exact format):**

```
------------------------------------------------------------------------
       TO RESPOND, REPLY TO ALL
------------------------------------------------------------------------

{TC Name} sent you a note about {Address}, {City}, {State} {ZIP} ({Transaction Type}):
{link to transaction}

{TC Name} said:

{Subject/Task Name}

{Free-form message body}

..........................................................................

This note was emailed to: {Recipient 1}, {Recipient 2}, and {Recipient 3}
View this note and transaction at:
{link to transaction}
```

**Parser:**

```php
function parsePaperlessPipelineEmail($subject, $body, $from, $cc, $date) {
    $data = [];

    // Subject always: "{Task/Note Title} - {Address}, {City}, {State} {ZIP}"
    // Examples:
    //   "Loan Contingency Reminder - 0 Monte Verde 2 SE of 10th Street, Carmel, CA 939…"
    //   "Homelight Link to Disclosures - 218 14th St, Pacific Grove, CA 93950"
    //   "City Report - Pacific Grove - 218 14th St, Pacific Grove, CA 93950"
    //   "Contingency Removal Reminder (All) - 218 14th St, Pacific Grove, CA 93950"
    //   "Initial Deposit is Due - 218 14th St, Pacific Grove, CA 93950"
    //   "*Listing information needed - 2409 Schooner Avenue, Marina, CA 93933"

    if (preg_match('/^[\*]?(.+?)\s*-\s*(.+?,\s*(?:CA|California)\s*\d{5})/', $subject, $m)) {
        $data['task_name'] = trim($m[1]);
        $data['property_address'] = trim($m[2]);
    }

    // TC name from body: "{Name} sent you a note about"
    if (preg_match('/^(.+?) sent you a note about/', $body, $m)) {
        $data['tc_name'] = trim($m[1]);
    }

    // Transaction type: "(Listing)" or "(Purchase)" after address in body
    if (preg_match('/\((\w+)\):/', $body, $m)) {
        $data['transaction_type'] = $m[1]; // "Listing", "Purchase", etc.
    }

    // Recipients: "This note was emailed to: Kean Matthams and Sharon Matthams"
    // Or: "This note was emailed to: Kean Matthams, Sharon Matthams, and sig.ponce..."
    if (preg_match('/This note was emailed to:\s*(.+)/i', $body, $m)) {
        $names = preg_split('/,\s*(?:and\s+)?|\s+and\s+/', trim($m[1]));
        $data['recipients'] = array_map('trim', $names);
    }

    // CC list contains all parties on the transaction
    // Typical CC: team@fogbreak.io, sharon@carmelrealtycompany.com,
    //             tracy@travailletransactions.com, escrow@titlecompany.com
    if (!empty($cc)) {
        $data['cc_parties'] = array_map('trim', explode(',', $cc));
    }

    // Attachments (e.g., timeline.pdf) — referenced in body
    if (preg_match('/The following doc has been attached.*?:\s*\n\n- (.+)/s', $body, $m)) {
        $data['attachments'] = array_map('trim', explode("\n-", $m[1]));
    }

    // Date of the note
    $data['note_date'] = $date;

    // Infer transaction milestone from task name
    $milestone_map = [
        'Loan Contingency Reminder' => 'loan_contingency',
        'Contingency Removal Reminder' => 'contingency_removal',
        'Disclosure reminder' => 'disclosures',
        'Initial Deposit is Due' => 'initial_deposit',
        'Homelight Link' => 'disclosures',
        'City Report' => 'city_inspection',
        'Listing information needed' => 'listing_prep',
        'AVID Reminder' => 'avid_report',
    ];

    foreach ($milestone_map as $keyword => $milestone) {
        if (stripos($data['task_name'] ?? '', $keyword) !== false) {
            $data['milestone'] = $milestone;
            break;
        }
    }

    return $data;
}
```

**What This Recovers That CSV Can't:**

| Data Point | Available in CSV? | Available in Email? |
|---|---|---|
| Transaction timeline (what happened when) | Partial | **Yes — each email is a timestamped event** |
| Which parties are on the deal (CC list) | No | **Yes — escrow, title, co-agents visible in CC** |
| TC task names and reminders | Limited | **Yes — every task generates an email** |
| Attached documents (timeline.pdf, etc.) | Manual download only | **Yes — attachments can be saved directly** |
| Free-form TC notes per transaction | No | **Yes — full message body** |

**Fogbreak Target Tables:**

| Extracted Field | Target |
|---|---|
| Property address | `transactions.address` → match to existing transaction |
| Transaction type (Listing/Purchase) | `transactions.type` |
| Task name | `tasks.title` or `compliance_items.name` |
| Note date | `interactions.created_at` (type: 'tc_note') |
| Message body | `interactions.notes` |
| CC parties | Cross-reference `clients` table; create associations |
| Milestone | `transactions.phase` (if it advances the timeline) |
| Attachments | `documents` table (save to storage, link to transaction) |

**Estimated yield:** ~201 emails containing transaction notes, deadline reminders, and document links. Each email maps to one interaction record and potentially one task/milestone update. Unique transactions: likely 10-30 (many emails per deal).

---

### RealScout Email Parser

**Verified from actual Gmail data.** RealScout sends from `no-reply@agentmail.realscout.com`. ~201 emails found in Kean's inbox. These are the emails that fill the biggest gap — they contain the property match data and search criteria that RealScout's CSV export doesn't include.

**Email Types & What They Contain:**

| Subject Pattern | Email Type | Extractable Data |
|---|---|---|
| `Alert for {Client}: Price Drop: {address} now ${price}` | Price drop alert | Client name, address, price, beds, baths, sqft, city, ZIP, days on site, alert type |
| `Alert for {Client}: A {City} listing at ${price} and N more` | Multi-match alert | Client name, multiple properties with full details, city preferences |
| `Alert for {Client}: Back on Market: {address} at ${price}` | Status change alert | Client name, address, price, status change type |
| `New listings to approve for "{Search Name}"` | Agent approval queue | Client name, search name, number of matches needing approval |

**Parser:**

```php
function parseRealScoutEmail($subject, $body, $date) {
    $data = [];

    // Subject: "Alert for Eric Cabral: Price Drop: 1043 Mission Rd now $2.75M"
    if (preg_match('/Alert for (.+?):\s*(.+)/', $subject, $m)) {
        $data['client_name'] = trim($m[1]);
        $data['alert_description'] = trim($m[2]);
    }

    // Subject: "New listings to approve for "My Saved Search""
    if (preg_match('/New listings to approve for "(.+?)"/', $subject, $m)) {
        $data['search_name'] = $m[1];
    }

    // Client name from body: "We've notified {Client Name} about"
    // Or: "There are N matches for {Client Name} that need your approval"
    if (preg_match('/(?:notified|matches for)\s+(.+?)\s+(?:about|that need)/', $body, $m)) {
        $data['client_name'] = trim($m[1]);
    }

    // Property listings in body — highly structured:
    // "• Price Drop" or "• New Listing" or "• Back on Market"
    // "$2,745,000"
    // "4 Beds | 2.5 Baths | 2,503 Sqft"
    // "1043 Mission Rd"
    // "Pebble Beach, CA, 93953"
    // "168 days on site" or "New on site"

    $properties = [];
    // Split body into property blocks by looking for price patterns
    $blocks = preg_split('/•\s*(New Listing|Price Drop|Back on Market|Price Increase|Status Change)/', $body, -1, PREG_SPLIT_DELIM_CAPTURE);

    for ($i = 1; $i < count($blocks); $i += 2) {
        $prop = [];
        $prop['alert_type'] = trim($blocks[$i]); // "Price Drop", "New Listing", etc.
        $block = $blocks[$i + 1] ?? '';

        // Price: "$2,745,000" or "$9,995,000" or "$2.75M"
        if (preg_match('/\$([\d,]+(?:\.\d+)?(?:M)?)/', $block, $pm)) {
            $price_str = str_replace(',', '', $pm[1]);
            if (strpos($price_str, 'M') !== false) {
                $prop['price'] = (float) str_replace('M', '', $price_str) * 1000000;
            } else {
                $prop['price'] = (float) $price_str;
            }
        }

        // Beds/Baths/Sqft: "4 Beds | 2.5 Baths | 2,503 Sqft"
        if (preg_match('/(\d+)\s*Beds?\s*\|\s*([\d.]+)\s*Baths?\s*\|\s*([\d,]+)\s*Sqft/', $block, $pm)) {
            $prop['beds'] = (int) $pm[1];
            $prop['baths'] = (float) $pm[2];
            $prop['sqft'] = (int) str_replace(',', '', $pm[3]);
        }

        // Address: line before "City, CA, ZIP"
        if (preg_match('/\n\s*(.+?)\n\s*([A-Za-z\s]+),\s*CA,?\s*(\d{5})/', $block, $pm)) {
            $prop['address'] = trim($pm[1]);
            $prop['city'] = trim($pm[2]);
            $prop['zip'] = $pm[3];
        }

        // Days on site: "168 days on site" or "New on site"
        if (preg_match('/(\d+)\s*days on site/', $block, $pm)) {
            $prop['days_on_site'] = (int) $pm[1];
        } elseif (strpos($block, 'New on site') !== false) {
            $prop['days_on_site'] = 0;
        }

        if (!empty($prop['address'])) {
            $properties[] = $prop;
        }
    }

    $data['properties'] = $properties;
    $data['alert_date'] = $date;

    return $data;
}
```

**What This Recovers That CSV Can't — The Critical Gap:**

| Data Point | In CSV? | In Email? | Value |
|---|---|---|---|
| Client ↔ property interest mapping | **No** | **Yes** | Know which client is interested in which properties |
| Saved search names | **No** | **Yes** | "My Saved Search" — can reconstruct search intent |
| Property match criteria (implicit) | **No** | **Yes** | Price range, city, beds/baths inferred from matched properties |
| Alert types (new listing, price drop, back on market) | **No** | **Yes** | Client engagement trigger types |
| Historical property data at time of alert | **No** | **Yes** | Price, days on site, status at the moment client was notified |
| Which listings needed agent approval | **No** | **Yes** | Shows agent curation behavior |

**Reverse-Engineering Saved Searches:**

This is the real payoff. By analyzing all RealScout alert emails for a given client, you can reconstruct their implicit search criteria:

```php
function inferSearchCriteria($client_name, $alert_emails) {
    $properties = [];
    foreach ($alert_emails as $email) {
        $parsed = parseRealScoutEmail($email['subject'], $email['body'], $email['date']);
        foreach ($parsed['properties'] as $prop) {
            $properties[] = $prop;
        }
    }

    if (empty($properties)) return null;

    // Infer search criteria from the properties that were matched
    $criteria = [];

    // Price range: min/max of matched properties (with 10% buffer)
    $prices = array_column($properties, 'price');
    $criteria['min_price'] = floor(min($prices) * 0.9);
    $criteria['max_price'] = ceil(max($prices) * 1.1);

    // Beds: min of matched properties
    $beds = array_filter(array_column($properties, 'beds'));
    if (!empty($beds)) {
        $criteria['min_beds'] = min($beds);
    }

    // Baths: min of matched properties
    $baths = array_filter(array_column($properties, 'baths'));
    if (!empty($baths)) {
        $criteria['min_baths'] = min($baths);
    }

    // Cities: all unique cities seen
    $cities = array_unique(array_column($properties, 'city'));
    $criteria['cities'] = $cities;

    // Sqft range
    $sqfts = array_filter(array_column($properties, 'sqft'));
    if (!empty($sqfts)) {
        $criteria['min_sqft'] = floor(min($sqfts) * 0.85);
        $criteria['max_sqft'] = ceil(max($sqfts) * 1.15);
    }

    return $criteria;
}
```

**Example from Kean's actual data:**

From the emails we can already see:
- **Eric Cabral** — interested in Pebble Beach properties, $2.75M–$4.95M range, price drops and back-on-market alerts
- **Renjie Yuan** — interested in Pebble Beach, $4.95M–$10M range, new listings
- **Vishal Khadake** — active buyer with pending approvals, Pebble Beach area
- **Nathan Olivas** — active buyer with pending approvals
- **Zak Bagans** — active buyer with pending approvals

That's five active clients with city preferences, price ranges, and engagement patterns — **none of which would be in the CSV export.**

**Fogbreak Target Tables:**

| Extracted Field | Target |
|---|---|
| Client name | `clients.name` → match to existing client or create new |
| Properties shown to client | `property_reactions` (type: 'alerted') |
| Alert type (price drop, new listing, etc.) | `property_alerts.alert_type` |
| Property details (price, beds, baths, sqft) | `properties` (create or update) |
| Inferred search criteria | `saved_searches` (auto-generated from analysis) |
| Alert date | `property_alerts.sent_at` |

---

### Email Scraper Implementation

Add to `app/api/migrate_import.php`:

```php
// Additional endpoints for email scraper
POST   ?action=scan_gmail&platform={platform}    // Search Gmail for platform emails
GET    ?action=email_scan_preview&id={id}        // Preview what was found
POST   ?action=import_from_email&id={id}         // Import parsed email data
```

**Gmail Search Queries (per platform):**

```php
$gmail_queries = [
    'showingtime' => [
        'from:callcenter@showingtime.com subject:"SHOWING CONFIRMED"',
        'from:callcenter@showingtime.com subject:"CANCELLED"',
        'from:callcenter@showingtime.com subject:"FEEDBACK REQUESTED"',
        'from:callcenter@showingtime.com subject:"FEEDBACK RECEIVED"',
    ],
    'paperless_pipeline' => [
        'from:help@paperlesspipeline.com',
        // All PP emails follow the same template — no need for subject filters
    ],
    'realscout' => [
        'from:no-reply@agentmail.realscout.com subject:"Alert for"',
        'from:no-reply@agentmail.realscout.com subject:"New listings to approve"',
    ],
];
```

**Consent & Security:**

- **Explicit user consent required.** Migration wizard includes a checkbox: "I authorize Fogbreak to scan my Gmail inbox for emails from [Platform] to extract transaction and showing data for import."
- **Read-only.** Emails are parsed, not modified or deleted.
- **Scoped search.** Only emails matching known platform sender addresses are accessed. No broad inbox scanning.
- **No email content stored raw.** Only the structured extracted fields are saved. The original email body is not persisted.
- **One-time scan.** Not ongoing monitoring. User initiates the scan, reviews results, approves import.

### Email Scraper Testing Checklist

- [ ] ShowingTime parser: correctly extracts address, MLS#, price, agents, date/time from confirmed/cancelled/feedback emails
- [ ] Paperless Pipeline parser: correctly extracts address, transaction type, task name, TC notes, CC parties
- [ ] RealScout parser: correctly extracts client name, property details, alert type, inferred search criteria
- [ ] Search criteria inference: produces reasonable min/max ranges from 5+ alert emails per client
- [ ] Dedup: email-imported showings don't duplicate CSV-imported showings
- [ ] Dedup: email-imported clients match existing CRM contacts by name/email
- [ ] Gmail consent flow: user must explicitly approve before scan begins
- [ ] Large inbox: handles 1,000+ platform emails without timeout
- [ ] Attachments: PP email attachments (timeline.pdf, etc.) are saved to documents table

### Updated File Manifest

| File | Purpose | Lines (est.) |
|---|---|---|
| `app/api/migrate_import.php` | Migration engine (add email scraper endpoints) | +600 |
| `app/js/migration-wizard.js` | Frontend (add email scan UI, consent flow, preview) | +300 |
| `app/api/email_parsers/showingtime.php` | ShowingTime email parser | ~200 |
| `app/api/email_parsers/paperless_pipeline.php` | Paperless Pipeline email parser | ~150 |
| `app/api/email_parsers/realscout.php` | RealScout email parser + search criteria inference | ~300 |

---

## What This Unlocks

With the full migration engine — API imports, CSV mapping, and Gmail email scraping — Fogbreak's onboarding story becomes:

> "Connect your Follow Up Boss API key and your DocuSign account. Upload your ShowingTime and Paperless Pipeline exports. Then let us scan your Gmail — we'll reconstruct your showing history from ShowingTime confirmations, your transaction timeline from Paperless Pipeline notes, and your buyers' search preferences from RealScout alerts. Every contact, every deal, every signed document, every showing, every buyer preference. Nothing gets left behind."

The email scraper is the differentiator. Every competing platform migration tool stops at "upload a CSV." Fogbreak goes further — it mines the data trail that these platforms left in the agent's inbox. For platforms with no API (Paperless Pipeline) or APIs that don't export behavioral data (RealScout), email is the only recovery path.

That's the difference between a product demo and a product adoption.
