back to all skills

data-analytics

analyticsv1.0.0

Data analysis workflows, SQL query patterns, dashboard design, KPI frameworks, and data storytelling for business intelligence.

copied ✓
openclawclaude-codecursorcodex
0 installsVirusTotal: cleanSource code

Data Analytics

Workflow

1. Define the Question

Before writing any query, articulate:

  • What decision will this analysis inform?
  • What metric answers the question?
  • What timeframe is relevant?
  • What segments matter?

Bad: "How are we doing?" → Good: "What's our 30-day retention rate by acquisition channel for Q1 cohorts?"

2. KPI Framework Selection

FrameworkBest forCore metrics
AARRR (Pirate)Growth-stage SaaSAcquisition, Activation, Retention, Revenue, Referral
HEARTProduct/UX teamsHappiness, Engagement, Adoption, Retention, Task success
NSM (North Star)Company alignmentOne metric that captures core value delivery
OKRGoal trackingObjectives + measurable Key Results

Choose NSM first, then AARRR for operational metrics, HEART for product teams.

3. SQL Patterns

Funnel analysis:

WITH funnel AS (
  SELECT
    user_id,
    MAX(CASE WHEN event = 'signup' THEN 1 ELSE 0 END) AS signed_up,
    MAX(CASE WHEN event = 'onboarding_complete' THEN 1 ELSE 0 END) AS onboarded,
    MAX(CASE WHEN event = 'first_value_action' THEN 1 ELSE 0 END) AS activated,
    MAX(CASE WHEN event = 'purchase' THEN 1 ELSE 0 END) AS converted
  FROM events
  WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
  GROUP BY user_id
)
SELECT
  COUNT(*) AS total_users,
  SUM(signed_up) AS signups,
  SUM(onboarded) AS onboarded,
  SUM(activated) AS activated,
  SUM(converted) AS converted,
  ROUND(100.0 * SUM(onboarded) / NULLIF(SUM(signed_up), 0), 1) AS signup_to_onboard_pct,
  ROUND(100.0 * SUM(activated) / NULLIF(SUM(onboarded), 0), 1) AS onboard_to_activate_pct,
  ROUND(100.0 * SUM(converted) / NULLIF(SUM(activated), 0), 1) AS activate_to_convert_pct
FROM funnel;

Cohort retention:

WITH cohort AS (
  SELECT
    user_id,
    DATE_TRUNC('week', MIN(created_at)) AS cohort_week
  FROM events
  WHERE event = 'signup'
  GROUP BY user_id
),
activity AS (
  SELECT
    user_id,
    DATE_TRUNC('week', created_at) AS activity_week
  FROM events
  WHERE event = 'session_start'
)
SELECT
  c.cohort_week,
  COUNT(DISTINCT c.user_id) AS cohort_size,
  COUNT(DISTINCT CASE WHEN a.activity_week = c.cohort_week + INTERVAL '1 week' THEN c.user_id END) AS week_1,
  COUNT(DISTINCT CASE WHEN a.activity_week = c.cohort_week + INTERVAL '2 weeks' THEN c.user_id END) AS week_2,
  COUNT(DISTINCT CASE WHEN a.activity_week = c.cohort_week + INTERVAL '4 weeks' THEN c.user_id END) AS week_4,
  COUNT(DISTINCT CASE WHEN a.activity_week = c.cohort_week + INTERVAL '8 weeks' THEN c.user_id END) AS week_8
FROM cohort c
LEFT JOIN activity a ON c.user_id = a.user_id
GROUP BY c.cohort_week
ORDER BY c.cohort_week;

LTV calculation:

WITH monthly_revenue AS (
  SELECT
    user_id,
    DATE_TRUNC('month', payment_date) AS month,
    SUM(amount) AS mrr
  FROM payments
  WHERE status = 'succeeded'
  GROUP BY user_id, DATE_TRUNC('month', payment_date)
),
user_ltv AS (
  SELECT
    user_id,
    SUM(mrr) AS total_revenue,
    COUNT(DISTINCT month) AS months_active,
    MIN(month) AS first_payment,
    MAX(month) AS last_payment
  FROM monthly_revenue
  GROUP BY user_id
)
SELECT
  ROUND(AVG(total_revenue), 2) AS avg_ltv,
  ROUND(AVG(months_active), 1) AS avg_lifetime_months,
  ROUND(AVG(total_revenue / NULLIF(months_active, 0)), 2) AS avg_arpu
FROM user_ltv;

Churn detection:

SELECT
  user_id,
  MAX(created_at) AS last_active,
  CURRENT_DATE - MAX(created_at)::date AS days_since_active,
  CASE
    WHEN CURRENT_DATE - MAX(created_at)::date > 30 THEN 'churned'
    WHEN CURRENT_DATE - MAX(created_at)::date > 14 THEN 'at_risk'
    ELSE 'active'
  END AS status
FROM events
WHERE event = 'session_start'
GROUP BY user_id
ORDER BY days_since_active DESC;

4. Dashboard Design

Layout rules:

  • Top row: 3-4 KPI cards (current value + trend arrow + % change)
  • Second row: Primary chart (line/area for trends, bar for comparisons)
  • Third row: Breakdown tables or secondary charts
  • Filters: Date range, segment, channel — always at top

Chart selection:

Data typeChart
Trend over timeLine chart
Part of wholeStacked bar or donut
Comparison across categoriesHorizontal bar
DistributionHistogram
CorrelationScatter plot
Funnel stagesFunnel chart
GeographicChoropleth map

5. Statistical Analysis

A/B test significance:

from scipy import stats

control_conversions, control_total = 120, 1000
variant_conversions, variant_total = 145, 1000

# Two-proportion z-test
p1 = control_conversions / control_total
p2 = variant_conversions / variant_total
p_pool = (control_conversions + variant_conversions) / (control_total + variant_total)
se = (p_pool * (1 - p_pool) * (1/control_total + 1/variant_total)) ** 0.5
z_score = (p2 - p1) / se
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

print(f"Lift: {((p2/p1) - 1) * 100:.1f}%")
print(f"p-value: {p_value:.4f}")
print(f"Significant: {'Yes' if p_value < 0.05 else 'No'}")

Sample size calculation:

from scipy.stats import norm

def sample_size(baseline_rate, mde, alpha=0.05, power=0.8):
    z_alpha = norm.ppf(1 - alpha/2)
    z_beta = norm.ppf(power)
    p1 = baseline_rate
    p2 = baseline_rate * (1 + mde)
    n = ((z_alpha * (2*p1*(1-p1))**0.5 + z_beta * (p1*(1-p1) + p2*(1-p2))**0.5) / (p2 - p1)) ** 2
    return int(n) + 1

# Example: 5% baseline, detect 10% relative lift
print(f"Need {sample_size(0.05, 0.10)} users per variant")

6. Data Storytelling

Structure every analysis as:

  1. Context — Why are we looking at this? (1 sentence)
  2. Finding — What did we discover? (lead with the insight, not the method)
  3. Evidence — Show the chart/table that proves it
  4. Implication — So what? What should we do?
  5. Recommendation — Specific next action with expected impact

Rules:

  • One insight per slide/section
  • Annotate charts (mark events, callout anomalies)
  • Compare to benchmarks or previous periods
  • Quantify impact in dollars or users, not just percentages