WITH
metrics as (...),
exposures as (...),
joined_data as (
SELECT
exposures.unit_id,
exposures.experiment_id,
exposures.group_id,
metrics.timestamp,
metrics.value
FROM exposures
JOIN metrics
ON (
exposures.unit_id = metrics.unit_id
AND metrics.timestamp >=
exposures.first_timestamp
)
)
SELECT
group_id,
SUM(value) as value
FROM joined_data
GROUP BY group_id;
```python
## 3. Exposure duplication
Exposure data must be de-duplicated before joining to ensure a single record per user. Many vendors further manage crossover users (users present in more than one experiment group), removing them from analysis and/or alerting them if this occurs with high frequency.
```sql expandable
SELECT
unit_id,
experiment_id,
MIN(timestamp) as first_timestamp,
COUNT(distinct group_id) as groups
FROM <exposures_table>
GROUP BY
unit_id,
experiment_id,
group_id
HAVING COUNT(distinct group_id) = 1;