# 24 — Contact Data Enrichment Engine

**Status:** Design phase | **Priority:** High (feeds lead scoring, deal detection, hidden deals) | **Dependencies:** 00 (MASTER-PLAN), 03 (SELF-HOSTED-AI), 06 (Sisu replacement)

**Completion Target:** 6 weeks | **Effort:** 280 hours | **Owner:** Lead scoring + CRM lifecycle integration

---

## OVERVIEW

The Contact Data Enrichment Engine automatically surfaces hidden deals by continuously enriching CRM contacts with external data signals — life events (job changes, home purchases, marriages, business formation), net worth indicators (property ownership, LinkedIn seniority), social media activity, and demographic trends. Inspired by Luxury Presence's Presence® CRM which processes 15 billion data points across 700 million interactions to identify high-probability buyer/seller windows.

**What It Replaces:** Luxury Presence's background enrichment (historical + real-time), RealScout's life event detection, Sisu's hidden deal scoring.

**What It Adds:**
- Real-time life event detection (job change, home sale, marriage, business formation)
- Net worth range estimation (from property ownership + LinkedIn + business registrations)
- "Deal probability score" (0-100) — surfaces contacts most likely to buy/sell within 90 days
- Enrichment pipeline with configurable data sources (county assessor, LinkedIn, Facebook, business registrations, Clearbit/People Data Labs)
- Privacy-first architecture (CCPA/GDPR compliant data retention, per-contact opt-out, full deletion)
- Integration with lead scoring (js/lead-scoring.js), daily action feed (js/daily-action-feed.js), drip enrollment

**Why It Matters:** Real estate is driven by life events. By detecting job changes (career advancement → relocate), home sales (downsizing → elderly/empty nest), business formation (startup success → upgrade), and marriages (new couple → buy together), Fogbreak can identify windows where contacts are MOST likely to transact — and notify agents before competitors. Deal probability scoring moves leads from "maybe interested" to "statistically about to move."

---

## DATABASE SCHEMA

### contact_enrichment

```sql
CREATE TABLE contact_enrichment (
  id INT AUTO_INCREMENT PRIMARY KEY,
  tenant_id INT,
  client_id INT NOT NULL,
  enriched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  source_data JSON,  -- { "linkedin": {...}, "property_records": {...}, "business": {...}, "social": {...} }
  net_worth_estimate_low DECIMAL(12, 2),  -- e.g., 500000
  net_worth_estimate_high DECIMAL(12, 2),  -- e.g., 1500000
  net_worth_confidence INT,  -- 0-100, confidence in estimate
  estimated_income_range VARCHAR(50),  -- "75k-150k", "150k-500k", "500k+", etc.
  last_enriched_at TIMESTAMP,
  next_enrichment_scheduled TIMESTAMP,
  enrichment_status ENUM('pending', 'processing', 'complete', 'failed'),
  privacy_opt_out BOOLEAN DEFAULT FALSE,
  data_retention_expires_at TIMESTAMP,  -- CCPA/GDPR expiry
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  FOREIGN KEY (tenant_id) REFERENCES tenants(id),
  FOREIGN KEY (client_id) REFERENCES clients(id),
  INDEX (tenant_id, client_id),
  INDEX (enrichment_status),
  INDEX (last_enriched_at)
);
```

### enrichment_events

```sql
CREATE TABLE enrichment_events (
  id INT AUTO_INCREMENT PRIMARY KEY,
  tenant_id INT,
  client_id INT NOT NULL,
  event_type ENUM('job_change', 'home_purchase', 'home_sale', 'business_formation', 'marriage', 'divorce', 'relocation', 'promotion', 'business_acquisition', 'patent_filed'),
  event_date DATE,  -- When the event occurred (may be retroactive)
  detected_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,  -- When Fogbreak detected it
  source VARCHAR(100),  -- 'linkedin_job_change', 'county_assessor', 'business_records', 'facebook_announcement', etc.
  event_data JSON,  -- { "job_title": "VP Engineering", "company": "Google", "location": "Mountain View", "salary_range": "200k-300k", ... }
  confidence INT,  -- 0-100, confidence in detection
  action_taken VARCHAR(50),  -- 'none', 'drip_enrollment', 'daily_feed', 'task_created', 'notification_sent'
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  FOREIGN KEY (tenant_id) REFERENCES tenants(id),
  FOREIGN KEY (client_id) REFERENCES clients(id),
  INDEX (tenant_id, event_type),
  INDEX (detected_at),
  INDEX (event_date)
);
```

### enrichment_sources

```sql
CREATE TABLE enrichment_sources (
  id INT AUTO_INCREMENT PRIMARY KEY,
  tenant_id INT,
  source_type ENUM('county_assessor', 'linkedin', 'facebook', 'business_records', 'clearbit', 'people_data_labs', 'zillow_comps', 'custom_api'),
  is_enabled BOOLEAN DEFAULT TRUE,
  api_key VARCHAR(1000),  -- Encrypted in config.php, retrieved from env
  api_endpoint VARCHAR(500),
  rate_limit_per_day INT,  -- e.g., 1000 for Clearbit free tier
  cost_per_request DECIMAL(6, 3),  -- Cost to enrich one contact
  monthly_budget DECIMAL(10, 2),
  monthly_spend DECIMAL(10, 2) DEFAULT 0,
  last_sync TIMESTAMP,
  sync_frequency ENUM('hourly', 'daily', 'weekly', 'monthly'),
  documentation_url VARCHAR(500),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  FOREIGN KEY (tenant_id) REFERENCES tenants(id),
  INDEX (tenant_id, source_type)
);
```

### enrichment_queue

```sql
CREATE TABLE enrichment_queue (
  id INT AUTO_INCREMENT PRIMARY KEY,
  tenant_id INT,
  client_id INT NOT NULL,
  priority INT DEFAULT 50,  -- 0-100, higher = sooner. High-value contacts get 90+, new leads get 70
  job_type ENUM('initial_enrichment', 'refresh_enrichment', 'event_detection', 'net_worth_update'),
  requested_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  started_at TIMESTAMP NULL,
  completed_at TIMESTAMP NULL,
  status ENUM('pending', 'processing', 'complete', 'failed'),
  error_message VARCHAR(500),
  retry_count INT DEFAULT 0,
  max_retries INT DEFAULT 3,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  FOREIGN KEY (tenant_id) REFERENCES tenants(id),
  FOREIGN KEY (client_id) REFERENCES clients(id),
  INDEX (tenant_id, status),
  INDEX (priority DESC, requested_at ASC)
);
```

### deal_probability_scores

```sql
CREATE TABLE deal_probability_scores (
  id INT AUTO_INCREMENT PRIMARY KEY,
  tenant_id INT,
  client_id INT NOT NULL,
  score INT,  -- 0-100
  confidence INT,  -- 0-100, how confident in this score
  deal_type ENUM('buyer', 'seller', 'investor', 'upsell', 'downsize'),
  recalculated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  factors JSON,  -- { "life_events_weight": 30, "engagement_weight": 25, "property_ownership_weight": 25, "demographic_weight": 15, "behavior_weight": 5, "total": 72 }
  days_to_likely_transaction INT,  -- Estimated days until transaction (0-365)
  notification_sent BOOLEAN DEFAULT FALSE,
  notification_sent_at TIMESTAMP,
  agent_assigned VARCHAR(100),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  FOREIGN KEY (tenant_id) REFERENCES tenants(id),
  FOREIGN KEY (client_id) REFERENCES clients(id),
  INDEX (tenant_id, score DESC),
  INDEX (deal_type),
  INDEX (notification_sent)
);
```

---

## API ENDPOINTS (api/enrichment.php)

### enrichContact

**POST /api/enrichment.php?action=enrichContact**

Immediately enrich a single contact. If privacy_opt_out = true, skip enrichment. Return enrichment_events and updated net_worth_estimate.

```json
REQUEST:
{
  "client_id": 12345,
  "force_refresh": false  // If true, re-enrich even if recently enriched
}

RESPONSE (success):
{
  "success": true,
  "data": {
    "client_id": 12345,
    "enrichment_status": "complete",
    "net_worth_estimate_low": 500000,
    "net_worth_estimate_high": 1500000,
    "net_worth_confidence": 78,
    "estimated_income_range": "150k-500k",
    "events_detected": [
      {
        "event_type": "job_change",
        "event_date": "2026-03-15",
        "source": "linkedin_job_change",
        "event_data": { "job_title": "VP Engineering", "company": "Google", "salary_range": "200k-300k" },
        "confidence": 95
      }
    ],
    "last_enriched_at": "2026-03-29T14:23:45Z",
    "next_enrichment_scheduled": "2026-04-05T14:23:45Z"
  }
}
```

### getEnrichmentHistory

**GET /api/enrichment.php?action=getEnrichmentHistory&client_id=12345&limit=20**

Return paginated enrichment history (all enrichment_events for a contact). Order by detected_at DESC.

```json
RESPONSE (success):
{
  "success": true,
  "data": {
    "client_id": 12345,
    "total_events": 8,
    "events": [
      { "id": 1, "event_type": "job_change", "detected_at": "2026-03-15T10:00:00Z", "event_date": "2026-03-12", ... },
      ...
    ],
    "earliest_event_date": "2025-11-03",
    "latest_event_date": "2026-03-15"
  }
}
```

### getEnrichmentEvents

**GET /api/enrichment.php?action=getEnrichmentEvents&event_type=job_change&days=30**

Return all enrichment_events across all contacts of a specific type within X days. Used for bulk drip enrollment or notifications.

```json
RESPONSE (success):
{
  "success": true,
  "data": {
    "event_type": "job_change",
    "period_days": 30,
    "count": 23,
    "events": [
      { "client_id": 101, "event_type": "job_change", "confidence": 95, "detected_at": "2026-03-20T..." },
      ...
    ]
  }
}
```

### getDealProbabilities

**GET /api/enrichment.php?action=getDealProbabilities&sort=score&limit=50&min_score=60**

Return top deal probability scores. Used for daily action feed and "hidden deals" dashboard widget.

```json
RESPONSE (success):
{
  "success": true,
  "data": {
    "scores": [
      {
        "client_id": 456,
        "client_name": "John Doe",
        "score": 89,
        "confidence": 87,
        "deal_type": "seller",
        "days_to_likely_transaction": 14,
        "factors": { "life_events_weight": 40, "engagement_weight": 30, ... },
        "notification_sent": false,
        "top_event": { "event_type": "home_sale", "detected_at": "2026-03-20T..." }
      },
      ...
    ],
    "total_count": 147
  }
}
```

### configureEnrichmentSources

**POST /api/enrichment.php?action=configureEnrichmentSources**

Enable/disable data sources. Set API keys, rate limits, budgets. Admin only.

```json
REQUEST:
{
  "sources": [
    { "source_type": "linkedin", "is_enabled": true, "api_key": "***", "monthly_budget": 500 },
    { "source_type": "clearbit", "is_enabled": true, "api_key": "***", "monthly_budget": 200 }
  ]
}

RESPONSE (success):
{
  "success": true,
  "data": {
    "sources_configured": 2,
    "total_monthly_budget": 700,
    "monthly_spend": 0
  }
}
```

### triggerBulkEnrichment

**POST /api/enrichment.php?action=triggerBulkEnrichment**

Enqueue all stale contacts for enrichment (contacts not enriched in >30 days, or high-value contacts >7 days). Admin only.

```json
REQUEST:
{
  "client_ids": [101, 102, 103],  // Optional; if omitted, enqueue all stale
  "priority": 70
}

RESPONSE (success):
{
  "success": true,
  "data": {
    "queued": 47,
    "already_processing": 3,
    "skipped_opted_out": 2
  }
}
```

### deleteEnrichmentData

**POST /api/enrichment.php?action=deleteEnrichmentData**

GDPR/CCPA compliance. Delete all enrichment data for a contact. Irreversible.

```json
REQUEST:
{
  "client_id": 12345,
  "reason": "CCPA right-to-delete"
}

RESPONSE (success):
{
  "success": true,
  "data": {
    "client_id": 12345,
    "deleted_records": 23,
    "deleted_at": "2026-03-29T14:30:00Z"
  }
}
```

### getEnrichmentStats

**GET /api/enrichment.php?action=getEnrichmentStats**

Dashboard metrics. Total contacts enriched, events detected this month, deal probability distribution, source breakdown, spend.

```json
RESPONSE (success):
{
  "success": true,
  "data": {
    "total_enriched": 523,
    "total_opted_out": 12,
    "events_this_month": 47,
    "event_breakdown": { "job_change": 18, "home_sale": 12, "business_formation": 8, ... },
    "avg_deal_probability_score": 58,
    "deals_scored_above_75": 34,
    "source_breakdown": { "linkedin": 234, "county_assessor": 178, "facebook": 89, ... },
    "monthly_spend": 342.50,
    "monthly_budget": 800,
    "next_budget_reset": "2026-04-01T00:00:00Z"
  }
}
```

---

## IMPLEMENTATION STEPS

### Phase 1: Infrastructure (Weeks 1-2)

1. **Create database tables** — contact_enrichment, enrichment_events, enrichment_sources, enrichment_queue, deal_probability_scores.
2. **Build enrichment.php API module** — auth, routing, standard response pattern.
3. **Implement enrichment queue system** — priority-based job queue, fetched by cron.
4. **Add config.php support** — store API keys for LinkedIn, Clearbit, county assessor APIs. Use env vars for secrets.

### Phase 2: Data Source Integration (Weeks 2-4)

1. **LinkedIn API integration** (LinkedIn API v2):
   - Job history enrichment (job_change detection)
   - Profile scraping (title, seniority, employment history)
   - Public URL monitoring
   - Rate limit: 100 calls/day (free tier), 10,000/day (premium)
   - Store result in contact_enrichment.source_data['linkedin']

2. **County Assessor API integration** (varies by county; use Zillow or ATTOM as fallback):
   - Property ownership lookup by address + name
   - Sale price, acquisition date, square footage
   - Detect home_purchase and home_sale events
   - Store in contact_enrichment.source_data['property_records']
   - Fire home_purchase event if property acquired in last 60 days

3. **Business Records integration** (Secretary of State, business databases):
   - Business formation/acquisition detection
   - Ownership stake, business type, filing date
   - Fire business_formation event
   - Store in contact_enrichment.source_data['business']

4. **Clearbit or People Data Labs API** (demographic/income estimation):
   - Income range estimation
   - Net worth proxy (from income + property data)
   - Industry, company size, employment tenure
   - Store in contact_enrichment and update net_worth_estimate fields

5. **Social media monitoring** (Facebook API + Instagram Graph API):
   - Life event announcements (marriage, engagement, relocation)
   - Post frequency monitoring (engagement indicator)
   - Store in contact_enrichment.source_data['social']

### Phase 3: Life Event Detection (Weeks 3-5)

1. **Build event_type detection logic** — each data source maps to 1+ event types:
   - LinkedIn job_change → job_change event (confidence = 90)
   - County assessor purchase → home_purchase event (confidence = 100)
   - Facebook marriage announcement → marriage event (confidence = 100)
   - Business formation filing → business_formation event (confidence = 100)

2. **Implement confidence scoring**:
   - Official source (county assessor) = 100
   - Secondary source (LinkedIn API) = 90
   - Inferred source (Clearbit) = 70
   - ML inference (pattern detection) = 50

3. **Create enrichment_events** for each detected event. Store source, confidence, event_data.

4. **Handle duplicate detection** — don't fire home_purchase twice for same property. Check existing events within 90 days.

### Phase 4: Net Worth Estimation (Weeks 4-5)

1. **Build net_worth_estimate algorithm**:
   - Property ownership value (from Zillow/ATTOM Zestimate)
   - Income range (from Clearbit)
   - Business equity (if founder, from business records)
   - Weighted average with confidence scoring

2. **Store as range (low/high)**, not exact number. Never expose exact estimate to contact.

3. **Create estimated_income_range buckets**: "<50k", "50k-75k", "75k-150k", "150k-500k", "500k+", "1m+".

### Phase 5: Deal Probability Scoring (Week 5)

1. **Build scoring algorithm** — weighted factors:
   - Life events (35%) — job_change, home_sale, marriage, business_formation each +15-20 points
   - CRM engagement (25%) — email open rate, click rate, interaction frequency, property views
   - Property ownership (20%) — does contact own real estate already? Location relevant?
   - Demographic fit (15%) — income range, age (if available), family status
   - Behavior pattern (5%) — time on site, form fills, saved search activity
   - Formula: score = (life_events_pct * 0.35) + (engagement_pct * 0.25) + (ownership_pct * 0.20) + (demographic_pct * 0.15) + (behavior_pct * 0.05)

2. **Detect deal_type** (buyer/seller/investor/upsell/downsize):
   - If home_sale event detected + high engagement = likely seller
   - If job_change + relocation + income_increase = likely buyer
   - If business_formation + high_net_worth = likely investor
   - Default to buyer if ambiguous

3. **Estimate days_to_likely_transaction** — based on event recency + engagement velocity:
   - Home sale detected 7 days ago + high CRM engagement = 14-30 days
   - Job change detected 3 days ago = 60-90 days (relocation planning window)
   - Engagement ramping up (exponential view/click increase) = 30-60 days

4. **Write to deal_probability_scores**. Recalculated weekly, or immediately after new life event.

### Phase 6: Integration with Existing Modules (Week 6)

1. **Feed into lead_scoring.js**:
   - Modify lead score calculation to incorporate deal_probability_score
   - High deal prob score (>75) boosts overall lead score by 25 points
   - Store in clients table as enrichment_score

2. **Feed into daily_action_feed.js**:
   - Surface deals with score >75 as "Hidden Deal" items (priority 90+)
   - Include top life event and estimated transaction window
   - Action: "Follow up — likely to transact in 14 days"

3. **Auto-enroll in drips**:
   - Home_purchase event → buyer nurture drip (if not already enrolled)
   - Home_sale event → seller nurture drip
   - Job_change + relocation → relocation nurture drip
   - Query drips.php and call enrollmentAction for auto-enrollment

4. **Create interaction records**:
   - Each life event creates an interaction record (type = 'enrichment_event') for activity history
   - Populated by cron job: update interactions table after each enrichment_event creation

5. **Generate admin notifications**:
   - Create tasks in tasks table when deal_probability_score >75
   - Task: "Follow up — contact likely to buy/sell (89% confidence)"
   - Due date = estimated transaction date

---

## CRON JOBS (api/cron.php)

### nightly_enrichment

**Runs:** 2 AM daily | **Duration:** ~30 minutes | **Auth:** API key

```php
// Fetch next 100 jobs from enrichment_queue (ordered by priority DESC, requested_at ASC)
// For each job:
//   1. Call enrichContact($client_id)
//   2. Query each enabled source from enrichment_sources
//   3. Call respective API (LinkedIn, assessor, etc.)
//   4. Store results in contact_enrichment
//   5. Detect life_events and write to enrichment_events
//   6. Update queue status to 'complete'
//   7. Log to cron_log
// Return summary: {queued: N, completed: M, failed: K, errors: [...]}
```

### deal_probability_calc

**Runs:** 3 AM daily | **Duration:** ~15 minutes | **Auth:** API key

```php
// For each contact with recent enrichment_events (last 7 days):
//   1. Recalculate deal_probability_score
//   2. Determine deal_type
//   3. Estimate days_to_likely_transaction
//   4. Write to deal_probability_scores
//   5. If score >75 AND notification_sent=false:
//        a. Create interaction record (type='enrichment_event')
//        b. Create task for agent
//        c. Set notification_sent=true, notification_sent_at=NOW()
// Return summary: {recalculated: N, new_deals_over_75: M}
```

### stale_contact_requeue

**Runs:** 6 AM Monday | **Duration:** ~5 minutes | **Auth:** API key

```php
// Find all contacts where last_enriched_at < 30 days ago
// Find high-value contacts where last_enriched_at < 7 days ago
// Enqueue for enrichment with priority=50 (low-value) or 80 (high-value)
// Respect privacy_opt_out flag; skip opted-out contacts
// Return summary: {requeued_low_value: N, requeued_high_value: M}
```

---

## TESTING CRITERIA

### Unit Tests

1. **enrichContact**:
   - Test with client_id that has no prior enrichment
   - Test with force_refresh=true (re-enriches even if recent)
   - Test with privacy_opt_out=true (should skip)
   - Test with no enabled sources (should return status 'failed')
   - Test response includes net_worth_estimate_low/high and all detected events

2. **Event detection**:
   - Verify job_change detected from LinkedIn data
   - Verify home_purchase detected from county assessor data
   - Verify business_formation detected from business records
   - Verify confidence scoring (100 for official sources, 90 for API, 70 for inferred)
   - Verify duplicate detection (same event within 90 days not re-created)

3. **Net worth estimation**:
   - Verify low/high range is reasonable (e.g., 500k-1.5m, not 500k-600k)
   - Verify confidence score reflects data sources used
   - Verify income_range bucket assignment

4. **Deal probability scoring**:
   - Verify weighted formula (35% life_events + 25% engagement + 20% ownership + 15% demographic + 5% behavior)
   - Verify score range 0-100
   - Verify deal_type assignment (buyer/seller/investor/upsell/downsize)
   - Verify days_to_likely_transaction is reasonable (0-365 range)
   - Test with contact that has multiple recent events (score should be high)
   - Test with contact with no events (score should be lower)

5. **Privacy & compliance**:
   - Verify deleteEnrichmentData removes all records
   - Verify privacy_opt_out flag prevents enrichment
   - Verify data_retention_expires_at is set and honored (data auto-deleted after expiry)

### Integration Tests

1. **Lead scoring integration**:
   - Enrich a contact
   - Verify enrichment_score appears in clients table
   - Verify lead_score calculation includes enrichment_score

2. **Daily action feed integration**:
   - Create a contact with deal_probability_score = 85
   - Verify it appears in daily_action_feed.js as "Hidden Deal" (priority 90+)
   - Verify top life event is displayed

3. **Drip enrollment**:
   - Create enrichment_event of type 'home_sale'
   - Verify contact auto-enrolled in seller nurture drip
   - Verify drip_enrollments record created

4. **Cron execution**:
   - Queue 10 contacts for enrichment
   - Run nightly_enrichment cron
   - Verify all 10 contacts enriched
   - Verify contact_enrichment and enrichment_events populated
   - Verify cron_log has entry with summary

### End-to-End Tests

1. **Real contact enrichment flow**:
   - Create new contact with email only
   - Trigger enrichContact
   - Verify LinkedIn, property records, business records queried
   - Verify life events detected
   - Verify net_worth_estimate populated
   - Verify deal_probability_score calculated
   - Verify interaction records created
   - Verify daily action feed updated

2. **Hidden deal notification**:
   - Enrich contact that triggers deal_probability_score = 87
   - Verify agent notification sent (via task creation)
   - Verify notification_sent_at timestamp set
   - Verify follow-up drip enrolled

3. **Privacy deletion**:
   - Enrich contact thoroughly (LinkedIn, property, business data)
   - Call deleteEnrichmentData
   - Verify all enrichment data deleted
   - Verify client record still exists
   - Verify CRM interactions preserved (only enrichment data removed)

### Performance Tests

1. **Enrichment speed**: Single contact enrichment should complete in <5 seconds (with API rate limits)
2. **Bulk enrichment**: 1,000 contacts queued should process in <30 minutes (respecting API rate limits)
3. **Scoring calculation**: Recalculate 5,000 deal probability scores in <2 minutes
4. **Query performance**: getDealProbabilities (limit 50) should return in <500ms

### API Security Tests

1. **Authentication**: All endpoints require valid session or API key
2. **Tenant isolation**: Contact enrichment only visible to owner tenant
3. **Admin-only actions**: configureEnrichmentSources, deleteEnrichmentData require admin role
4. **SQL injection**: Test with malicious client_id, API key values; verify PDO prepared statements prevent injection
5. **API rate limiting**: Enrich same contact 100x rapidly; verify rate limiting applied

---

## PRIVACY & COMPLIANCE NOTES

### CCPA (California Consumer Privacy Act)
- Contacts have right to know what data is enriched. Provide getEnrichmentHistory endpoint for self-service access.
- Contacts have right to delete. Implement deleteEnrichmentData.
- Set data_retention_expires_at on all enrichment records. Auto-delete after 1 year.

### GDPR (EU General Data Protection Regulation)
- Same deletion/transparency as CCPA.
- For EU contacts, require explicit opt-in before enriching (not default opt-out).
- No processing of sensitive categories (health, politics, religion) — filter out in data source APIs.

### Implementation
- Add privacy_opt_out flag to contact_enrichment. If true, skip all enrichment for this contact.
- Add data_retention_expires_at. Cron job runs monthly to delete expired records.
- Admin UI: per-contact opt-out button. Bulk privacy controls in tenant settings.

---

## INTEGRATION CHECKPOINTS

- **Instruction 03 (Self-Hosted AI)**: Use Mistral 7B for event type inference if APIs fail (fallback confidence: 50)
- **Instruction 06 (Sisu Replacement)**: Embed deal_probability_score in coaching dashboards. Surface hidden deals for agent review.
- **Instruction 10 (AI Copy Engine)**: Use detected life events in outreach copy. "Congratulations on your promotion at Google!"
- **Instruction 18 (MLS Integration)**: Cross-reference enriched property ownership with MLS sold data. Verify home_sale event.
- **Instruction 20 (Advanced Analytics)**: Report on enrichment ROI. Track leads from hidden deal scoring to closed transactions.

---

## ESTIMATED COSTS

- **LinkedIn API**: $0 (public profiles, 100 calls/day free tier). Premium: $5/month.
- **Clearbit**: $25/month (up to 500 enrichments/month). Scale: $150/month for 5,000/month.
- **People Data Labs**: $99/month (unlimited enrichments).
- **County assessor (ATTOM)**: $0 (batch pricing), or $0.10/lookup for on-demand.
- **Facebook Graph API**: $0 (free tier). Premium features: $10+/month.
- **Zillow/Zestimate**: $0 (public estimates), or $250+/month for API access.

**Typical tenant spend**: $200-400/month for full enrichment (LinkedIn + Clearbit + property + business).

---

## DEPLOYMENT CHECKLIST

- [ ] Create database tables
- [ ] Build api/enrichment.php module
- [ ] Configure API keys in config.php (LinkedIn, Clearbit, assessor API)
- [ ] Test each data source integration separately
- [ ] Test life event detection logic
- [ ] Build deal probability scoring algorithm
- [ ] Integrate with lead_scoring.js
- [ ] Integrate with daily_action_feed.js
- [ ] Add cron jobs to cron.php
- [ ] Test nightly enrichment cron
- [ ] Test privacy deletion flows
- [ ] Load test with 1,000+ enrichments
- [ ] Document API in ARCHITECTURE.html
- [ ] Deploy to Bluehost

---

**Owner:** Kean | **Reviewed by:** Architecture working group | **Next:** Instruction 25 (Home Valuation & CMA)
