4. Retention Analysis for a Fintech App

The table below records various feature usages on a fintech app. Each time a customer uses a product feature, a row is inserted in the table.

Table Name - ACTIVITY

\begin{tabular}{|l|l|l|}
\hline
USER_ID & PRODUCT & DATETIME \\
\hline
ABC & DEPOSIT & \begin{tabular}{l}
[tex]$2021-01-01$[/tex] \\
[tex]$10:26$[/tex]
\end{tabular} \\
\hline
ABC & PAYMENT & \begin{tabular}{l}
[tex]$2021-01-03$[/tex] \\
[tex]$12:56$[/tex]
\end{tabular} \\
\hline
PRQ & INVEST & \begin{tabular}{l}
[tex]$2021-01-03$[/tex] \\
[tex]$10:56$[/tex]
\end{tabular} \\
\hline
PRQ & INVEST & \begin{tabular}{l}
[tex]$2021-01-03$[/tex] \\
[tex]$11:58$[/tex]
\end{tabular} \\
\hline
PRQ & PAYMENT & \begin{tabular}{l}
[tex]$2021-01-03$[/tex] \\
[tex]$11:25$[/tex]
\end{tabular} \\
\hline
\end{tabular}

Assume a user is considered retained for a date-product if the same user uses the same product again within the next 7 days (not including the same day). For example, if user ABC used the product DEPOSIT on April 10, ABC will be considered retained on April 10 for the DEPOSIT feature only if they use DEPOSIT again between April 11 and April 17 (inclusive).

Question:
Which date-product combination has the maximum retention in the given dataset?

Note:
- There should be at least 5 users for the feature on that day for the retention calculation to be meaningful. Discard date-product combinations with less than 5 users.

Output Format:
The output should be in the following format and should have only one row corresponding to the maximum retention:

\begin{tabular}{|l|l|l|l|}
\hline
date & product & total_users_this_date & total_retained_users \\
\hline
\begin{tabular}{l}
yyyy-mm-dd
\end{tabular} & & & \\
\hline
\end{tabular}

This task is to be solved using MySQL syntax. If you face issues, keep in mind to check for MySQL syntax.



Answer :

Sure, let's break down the solution to the problem step-by-step:

1. Extract Date and Unique Users:
- Extract the date from the DATETIME.
- For each date-product combination, count the number of unique users.

2. Filter Date-Product Combinations with at Least 5 Users:
- We need to focus only on those date-product combinations where there are at least 5 unique users.

3. Calculate Retention:
- For each valid date-product combination, determine if users re-use the same product within 7 days (not including the same day).
- Calculate the total retained users.

4. Determine the Date-Product Combination with Maximum Retention:
- Identify the combination with the highest number of retained users.

Since I need to provide a MySQL solution, here is the step-by-step MySQL syntax to achieve the solution:

```sql
WITH activity_dates AS (
SELECT
USER_ID,
PRODUCT,
DATE(DATETIME) AS DATE
FROM
ACTIVITY
),

valid_combinations AS (
SELECT
DATE,
PRODUCT,
COUNT(DISTINCT USER_ID) AS total_users_this_date
FROM
activity_dates
GROUP BY
DATE,
PRODUCT
HAVING
COUNT(DISTINCT USER_ID) >= 5
),

retention AS (
SELECT
ad1.DATE AS start_date,
ad1.PRODUCT,
COUNT(DISTINCT ad1.USER_ID) AS total_users_this_date,
COUNT(DISTINCT ad1.USER_ID) AS total_retained_users
FROM
activity_dates ad1
JOIN
activity_dates ad2
ON
ad1.USER_ID = ad2.USER_ID
AND ad1.PRODUCT = ad2.PRODUCT
AND ad2.DATE > ad1.DATE
AND ad2.DATE <= DATE_ADD(ad1.DATE, INTERVAL 7 DAY)
AND ad2.DATE != ad1.DATE
WHERE
(ad1.DATE, ad1.PRODUCT) IN (
SELECT
DATE, PRODUCT
FROM
valid_combinations
)
GROUP BY
ad1.DATE,
ad1.PRODUCT
)

SELECT
start_date AS date,
PRODUCT AS product,
total_users_this_date,
total_retained_users
FROM
retention
ORDER BY
total_retained_users DESC
LIMIT 1;
```

### Explanation:

1. activity_dates:
- This common table expression (CTE) extracts the distinct `USER_ID`, `PRODUCT`, and `DATE` from the `ACTIVITY` table.

2. valid_combinations:
- This CTE calculates the number of unique users for each `DATE` and `PRODUCT` combination and filters those combinations with at least 5 users.

3. retention:
- This CTE joins the `activity_dates` table with itself to check for users who reuse the same product within 7 days (excluding the same day). It then groups by the initial `DATE` and `PRODUCT` to count the total users and those retained.

4. Final Selection:
- The main query selects the result from the `retention` CTE, sorting by the number of retained users in descending order and limiting the output to the top row.

By running this SQL query, you should get the date-product combination with the maximum retention based on the given conditions.