Ana is a data analyst in a public health organization that is studying the factors that influence the occurrence of communicable diseases in different communities. When analyzing data for counting new cases reported per community, Ana observes that a large proportion of communities did not report new cases, resulting in a considerable number of zeros in the dataset. She understands that these zeros can represent truly unaffected communities and failures in the detection or case reports.

Considering the objective of Ana to model the number of cases reported based on variables, such as population density, access to health care, and hygiene practices, which approach should she adopt to adequately cope with the high number of zeros in the dataset?
Eliana
Applying a multiple linear regression model, treating the number of cases reported as a continuous dependent variable and ignoring the unequal distribution of zeros.
Using a logistic regression model, considering the occurrence of at least one case reported as a binary event (yes/no).
Implementing a zero-inflated model, suitable for count data when there is an excess of zero counts.
Employing cluster analysis to group communities in categories based on the presence or absence of reported cases, and separately analyzing the factors in each group.



Answer :

Other Questions