A random sample of 395 people in a country were surveyed to find out if gender is independent of education level. Each person was asked to report the highest education level they had obtained. The data from the survey is summarized in the following table:

\begin{tabular}{|l|r|r|r|r|r|}
\hline & High School & Bachelors & Masters & Ph.D. & Total \\
\hline Female & 60 & 54 & 46 & 41 & 201 \\
\hline Male & 40 & 44 & 53 & 57 & 194 \\
\hline Total & 100 & 98 & 99 & 98 & 395 \\
\hline
\end{tabular}

Formulate the hypothesis and test whether gender and education level are dependent at the [tex][tex]$5\%$[/tex][/tex] level of statistical significance.



Answer :

To address the given problem, we need to determine if gender and education level are independent or if there is a relationship between them. This requires performing a Chi-Square Test for Independence. Here’s a step-by-step solution:

Step 1: Formulate the Hypotheses

- Null Hypothesis ([tex]\(H_0\)[/tex]): Gender and education level are independent.
- Alternative Hypothesis ([tex]\(H_a\)[/tex]): Gender and education level are not independent.

Step 2: Construct the Contingency Table

Based on the provided data, the contingency table is:

| | High School | Bachelors | Masters | Ph.D. | Total |
|---------------|-------------|-----------|---------|-------|-------|
| Female | 60 | 54 | 46 | 41 | 201 |
| Male | 40 | 44 | 53 | 57 | 194 |
| Total | 100 | 98 | 99 | 98 | 395 |

Step 3: Calculate Expected Frequencies

The expected frequency for each cell is calculated using:
[tex]\[ E_{ij} = \frac{(row \, total \, of \, cell) \times (column \, total \, of \, cell)}{total \, sample \, size} \][/tex]

For example, the expected frequency for females with High School education:
[tex]\[ E_{11} = \frac{201 \times 100}{395} \approx 50.886 \][/tex]

We perform this calculation for all cells to get the expected frequency table:

| | High School | Bachelors | Masters | Ph.D. |
|-----------|-------------|-----------|---------|--------|
| Female| 50.886 | 49.868 | 50.377 | 49.868 |
| Male | 49.114 | 48.132 | 48.623 | 48.132 |

Step 4: Calculate the Chi-Square Statistic

The Chi-Square statistic is given by:
[tex]\[ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \][/tex]

Using the observed ([tex]\(O_{ij}\)[/tex]) and expected ([tex]\(E_{ij}\)[/tex]) values:

For each cell:

- Female, High School:
[tex]\[ \frac{(60 - 50.886)^2}{50.886} \approx 1.643 \][/tex]
- Female, Bachelors:
[tex]\[ \frac{(54 - 49.868)^2}{49.868} \approx 0.344 \][/tex]
- Female, Masters:
[tex]\[ \frac{(46 - 50.377)^2}{50.377} \approx 0.380 \][/tex]
- Female, Ph.D.:
[tex]\[ \frac{(41 - 49.868)^2}{49.868} \approx 1.577 \][/tex]
- Male, High School:
[tex]\[ \frac{(40 - 49.114)^2}{49.114} \approx 1.690 \][/tex]
- Male, Bachelors:
[tex]\[ \frac{(44 - 48.132)^2}{48.132} \approx 0.355 \][/tex]
- Male, Masters:
[tex]\[ \frac{(53 - 48.623)^2}{48.623} \approx 0.385 \][/tex]
- Male, Ph.D.:
[tex]\[ \frac{(57 - 48.132)^2}{48.132} \approx 1.632 \][/tex]

Adding these values gives us the Chi-Square statistic:
[tex]\[ \chi^2 \approx 1.643 + 0.344 + 0.380 + 1.577 + 1.690 + 0.355 + 0.385 + 1.632 = 8.006 \][/tex]

Step 5: Determine the Degrees of Freedom

The degrees of freedom (dof) for the test is calculated as:
[tex]\[ \text{dof} = (number \, of \, rows - 1) \times (number \, of \, columns - 1) \][/tex]

In this case:
[tex]\[ \text{dof} = (2 - 1) \times (4 - 1) = 3 \][/tex]

Step 6: Determine the p-value

The p-value corresponds to the calculated Chi-Square statistic with the given degrees of freedom.

For [tex]\(\chi^2 = 8.006\)[/tex] and [tex]\(dof = 3\)[/tex], the p-value is approximately [tex]\(0.0459\)[/tex].

Step 7: Compare the p-value to the Significance Level

The significance level [tex]\(\alpha\)[/tex] is [tex]\(0.05\)[/tex].

Since [tex]\( p \approx 0.0459 < 0.05\)[/tex]:
- We reject the null hypothesis.

Conclusion:

At the 5% significance level, there is sufficient evidence to conclude that gender and education level are not independent; there is a significant relationship between gender and education level.