Literature Review Week 3

literature review

week 3

kristina

Literature review for the Week 3 of the course IDC-6940 for Fall 2025

Author

Affiliation

Kristina Kusem

Master of Data Science Program @ The University of West Florida (UWF)

Article 1

Title: Determinants of coexistence of undernutrition and anemia among under- five children in Rwanda; evidence from 2019/20 demographic health survey: Application of bivariate binary logistic regression model.^[1]

Authors: Abebew Aklog smare, Yitateku Adugna Agmas

Problem: - In children under five years of age, malnutrition and anemia have been an ongoing problem in many African countries. This paper focuses on studying malnutrition and anemia in Rwanda in particular; Rwanda went through a civil war, and after the war, rates of malnutrition and anemia decreased as the country was rebuilt. However, different parts of the country experienced faster or slower rates of improvement of malnutrition and anemia after the war, and it is unclear which factors are associated with improved rates of malnutrition and anemia. This study aims to identify key predictors correlated with these health ailments, so that the areas of the country still suffering from high rates of these health problems can be given the correct types of aid to fix the problem of malnutrition and anemia. - The introduction of the paper cites a few studies that identify strong correlations with malnutrition and anemia in children under five. Some strong predictors found by other studies are the child’s age, parents’ education level, household economic class, geographic location, household food availability, child birth size, family size, maternal age, and more. - The introduction states that this topic is relevant because although there is already existing research on malnutrition and anemia in African countries, there is not much literature studying the relationship between the two health conditions. This study aims to analyze the relationship between malnutrition and anemia in children in addition to identifying strong predictors of the conditions.

Data: - Data is supplied by the 2019/20 Rwanda Demographic and Health Survey. The researchers obtained the samples themselves, and the sampling method is described in depth. The researchers chose 500 clusters from different areas all over the country, and then households were selected at random from these clusters to be surveyed. The resulting data consisted of 3205 data points consisting of data about children under the age of 5.

Solution to problem: - Researchers used a bivariate binary logistic regression model. This model helps understand the relationship between the outcome variables, presence of malnutrition and presence of anemia. The outcome variables are both binary, taking on a value of 1 for present, or 0 for not present. The predictors consist of about 26 variables relating to the child’s health conditions, family information, details about the parents, and relevant geographic information. - Three models are presented. The first model is the bivariate binary logistic regression model. The second model is the equivalent, but it is in the form of the log odds. The third model discussed is the odds ratio, and it is used to assess the relationship between categorical predictors in the model. - The researchers used SPSS and R software to perform the analyses.

Results: - Results shows that nearly half of the study participants had anemia and about one fifth had malnutrition. - Six significant predictors were found: mother’s age, drinking water, other children in household, child gender, birth order, and gender of household head. - The odds ratio was a value that was not 1, which indicates that the outcome variables are not statistically independent. The relationship that exists between the outcome variables is significant. - The goodness fit test used was a proportion of correct predictions to the number of observations. This result was about 89%, so the researchers conclude that the model was a good fit. - There is a discussion about possible causes of the significant relationship that was found between malnutrition and anemia in children under 5. Researchers cite other facts and figures about why these health conditions are strongly correlated.

Conclusion: - Increasing maternal education, supplementing with vitamin A and other nutrient dense foods, providing a healthy/ clean/ safe environment, and decreasing maternal anemia may help improve rates of malnutrition and anemia in children.

Limitations of the study: - The only limitation discussed is that the data collected may be prone to errors. This means researchers can conclude there are strong correlations, but it cannot be stated that any of the relationships are causal.

Article 2

Article Title: Using Binary logistic Regression to Detect Health Insurance Fraud.^[2]

Author: Baraah Samara, Ph.D. Student

Problem: - Insurance fraud in health insurance industry. Specifically, patients treated at private clinics or hospitals. - Intro of the article explains why this topic is relevant. Between 1965 and 2008, the cost of healthcare increased significantly, and as a result, health insurance fraud has increased. There needs to be effective tools at detecting this fraud. - If fraud is decreased, it will help the economy as a whole, it will help insurance companies, and it will lower premium payments made by customers. - Fraud is committed by three types of entities: consumer, provider, and payer fraud. - Literature review cites common predictors of health insurance fraud: diagnoses, service cost, number of claims from individual, greatest costing claim, probability of anomaly, excessive charges by care facilities, and more.

Data: - Original dataset contained 26 independent variables - Data was collected from a time span of January 2022 through November 2022 - about 123,000 data points with no missing values. - The predictors are of varying types, including numerical, categorical, and binary values - The dependent variable is fraud, with a value of 1 for fraud present, or 0 for no fraud present.

Solution to Problem:

Why they selected this model: - Building a binary logistic regression model to detect health insurance fraud - logistic regression is selected as the analytic technique because they want to assess effects of categorical variables on a categorical dependent variable. They also cite that logistic regression is the most accurate type of regression model with the kind of classification they are performing in this study. - Fraud detection commonly employs binary prediction models - The logistic regression model also provides estimates between 0 and 1, which help investigators estimate probability of fraud - Researchers provide the equation they use to calculate the log odds of the event of interest (occurrence of fraud)

The method: - They calculate likelihood of an individual committing fraud - They calculate total cost accrued by an individual. Then they perform the logistic regression using this calculation - The model works by identifying outliers and classifies them as potential fraudulent activity - Before testing the model, researchers hypothesize that there will be a positive relationship between overall cost accrued by patient and likelihood of fraud. Costs include doctor visit costs, prescription drug costs, lab costs, costs of medical symptoms, and total cost of expensive prescriptions. - When running the model, none of the coefficients were zero, which means that there exists a significant relationship between the outcome and the predictor variables. - Predictors were tested for multicollinearity before the model was run. Pearson correlation coefficients were obtained, and any predictors with a Pearson value of greater than 0.8 were excluded from the model. Only eight predictors remained after removing the predictors that were strongly correlated. - Different models were constructed using only the most important predictors. When taking away the least important predictor, the log likelihood was calculated to assess the accuracy of the model. The best performing model contained six of the original predictors.

Results: - Six predictors were found to be significant in predicting health insurance fraud. The predictors are office visit cost, prescription costs, lab costs, symptom cost, and two expensive prescription drug costs. - There is a thorough interpretation of model slopes. For example, “the likelihood of fraud increases by .005188 for every unit increase in pharmacy cost.” Interpretations for the most significant predictors are given in this way. - A Chi-Square test for independence was used to determine whether at least one predictor was significantly related to the outcome. It was concluded that at least one of the six predictors was significant. - The model was found to be about 99% accurate when predicting no fraud, but only about 76% accurate when predicting fraud. - An example is included that shows how to calculate the probability that an individual will commit insurance fraud, given values for the six predictors in the equation. - The study concludes by stating the importance and relevance of continuing to develop new fraud detection models.

Limitations: - No limitations are explicitly stated in this paper. However, it can be considered a limitation that only data from middle eastern countries was used in the study. To make the results of the study more generalizable, data from other regions of the world should be included in a more comprehensive study.

References

1. Asmare, A. A., & Agmas, Y. A. (2024). Determinants of coexistence of undernutrition and anemia among under-five children in rwanda; evidence from 2019/20 demographic health survey: Application of bivariate binary logistic regression model. Plos One, 19(4), e0290111.

2. Samara, B. (2024). Using binary logistic regression to detect health insurance fraud. Pakistan Journal of Life & Social Sciences, 22(2).