Welcome to the first week of the second course in Coursera’s Data Management and Visualization specialization. In order to utilize ANOVA and post hoc testing, I needed to examine a different explanatory variable than in the previous course. **This analysis will examine whether or not a person’s gender and race is associated with their support of the death penalty, as punishment for murder.** I will still be using the Outlook on Life (OOL) surveys, made available by ICPSR for Coursera students.

**Hypotheses to be tested:**

Null hypothesis:the death penalty is supported equally among all gender-racial groups.

Alternate hypothesis:support for the death penalty varies among gender-racial groups.

The categorical response variable was a combination of two groups: those who know someone who has been arrested for a crime, and those who have a friend or relative who has been convicted of a crime. This variable had two categories: yes and no.

The new categorical explanatory variable contains four (4) categories: white men, white women, men of color, and women of color. The gender choices were extremely limited in the dataset: male, female, and no response. Because it was not possible to know if a “no response” was a refusal to answer or identifying as a transgender or nonbinary person, these individuals were omitted from the sample. For more comprehensive data analysis, the survey should have used “masculine” and “feminine” instead of “male” and “female,” and included options for transgender, nonbinary, and possibly other gender identities.

An analysis of variance (ANOVA) revealed that among this sample, the gender and race of an individual (collapsed into 4 categories, as the categorical explanatory variable) is significantly associated with a preference for the death penalty. Utilizing an ordinary least squares (OLS) approach, the following results were obtained: F-statistic = 32.57, p = 2.10e-20. Tukey’s Honestly Significant Difference post hoc test was conducted to determine which groups were significantly different from each other. **There was no significant difference in the results between white men and white women, therefore we accept the null hypothesis; however, there were significant differences between white women and men of color, white men and men of color, men of color and women of color, white women and women of color, and white men and women of color, and we accept the alternate hypothesis for these groups.**

The punishment preferences among the groups are as follows: 66.7% of white men favor the death penalty, 58.8% of white women favor the death penalty, 55.6% of men of color favor imprisonment, and 64.7% of women of color favor imprisonment.

I was unable to calculate standard deviation for these results. I do not think this is possible, because the explanatory variable has 4 categories, and the response variable has 2 categories: neither are quantitative. After spending many hours (at least 16!) trying to find and code quantitative variables relevant to my original thesis, I was unsuccessful. I understand the code involved in calculating means and standard deviations, but I was unable to show that in this assignment. If any of my classmates have some input or resources, I’d welcome the assistance. I would very much like to calculate the deviation between each of the four gender-ethnic groups.

**OUTPUT:**

=====================

Is there a relationship between the gender and race of people who

favor the death penalty as punishment for murder?

=====================

All responses, death penalty vs. life in prison:

count 1535

unique 2

top Prison

freq 791

Name: W2_QK3, dtype: object

=====================

Preferences by ethnicity-gender subsets:

WHITE MEN:

Death 0.666667

Prison 0.333333

Name: W2_QK3, dtype: float64

=====================

WHITE WOMEN:

Death 0.587814

Prison 0.412186

Name: W2_QK3, dtype: float64

=====================

MEN OF COLOR:

Death 0.443925

Prison 0.556075

Name: W2_QK3, dtype: float64

=====================

WOMEN OF COLOR:

Death 0.352713

Prison 0.647287

Name: W2_QK3, dtype: float64

=====================Ordinary Least Squares:

OLS Regression Results

==============================================================================

Dep. Variable: W2_QK3 R-squared: 0.060

Model: OLS Adj. R-squared: 0.058

Method: Least Squares F-statistic: 32.57

Date: Thu, 21 Sep 2017 Prob (F-statistic): 2.10e-20

Time: 10:23:36 Log-Likelihood: -1065.9

No. Observations: 1535 AIC: 2140.

Df Residuals: 1531 BIC: 2161.

Df Model: 3

Covariance Type: nonrobust

===================================================================================

coef std err t P>|t| [0.025 0.975]

———————————————————————————–

Intercept 1.3333 0.027 48.542 0.000 1.279 1.387

C(ETH_GEN)[T.2] 0.0789 0.040 1.972 0.049 0.000 0.157

C(ETH_GEN)[T.3] 0.2227 0.036 6.167 0.000 0.152 0.294

C(ETH_GEN)[T.4] 0.3140 0.035 9.023 0.000 0.246 0.382

==============================================================================

Omnibus: 1.131 Durbin-Watson: 1.938

Prob(Omnibus): 0.568 Jarque-Bera (JB): 196.108

Skew: -0.066 Prob(JB): 2.60e-43

Kurtosis: 1.254 Cond. No. 5.32

==============================================================================Warnings:

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

===================================Post hoc test:

Multiple Comparison of Means – Tukey HSD,FWER=0.05

=============================================

group1 group2 meandiff lower upper reject

———————————————

MOC WM -0.2227 -0.3156 -0.1299 True

MOC WOC 0.0912 0.0096 0.1728 True

MOC WW -0.1439 -0.2399 -0.0479 True

WM WOC 0.314 0.2245 0.4034 True

WM WW 0.0789 -0.024 0.1817 False

WOC WW -0.2351 -0.3278 -0.1424 True

———————————————end.

**PYTHON CODE:**

```
import pandas
import numpy
import warnings
import statsmodels.formula.api as smf
import statsmodels.stats.multicomp as multi
warnings.simplefilter(action = "ignore", category = FutureWarning)
data = pandas.read_csv('ool_pds.csv', low_memory=False)
pandas.set_option('display.max_columns', None)
pandas.set_option('display.max_rows', None)
pandas.set_option('display.float_format', lambda x:'%f'%x)
print('=====================')
print('Is there a relationship between the gender and race of people who \nfavor the death penalty as punishment for murder?')
print('=====================')
# setting up to work with the data:
warnings.simplefilter(action = "ignore", category = FutureWarning)
pandas.set_option('display.max_columns', None)
pandas.set_option('display.max_rows', None)
pandas.set_option('display.float_format', lambda x:'%f'%x)
data = pandas.read_csv('ool_pds.csv', low_memory = False)
# categorical response variable, death-vs-prison:
data['W2_QK3'] = data['W2_QK3'].convert_objects(convert_numeric=True)
data['W2_QK3'] = data['W2_QK3'].replace(-1, numpy.nan)
data['W2_QK3'] = data['W2_QK3'].dropna()
# make a subset:
sub1 = data.copy()
# categorical explanatory variable, race or ethnicity:
# plz note: this is incredibly cis/binary and trans-/nb-exclusive :(
sub1['PPETHM'] = sub1['PPETHM'].convert_objects(convert_numeric=True)
sub1['PPETHM'] = sub1['PPETHM'].replace(-1, numpy.nan).dropna()
sub1['PPETHM'] = sub1['PPETHM'].replace(-2, numpy.nan).dropna()
sub1['PPGENDER'] = sub1['PPGENDER'].convert_objects(convert_numeric=True)
sub1['PPGENDER'] = sub1['PPGENDER'].replace(-1, numpy.nan).dropna()
sub1['PPGENDER'] = sub1['PPGENDER'].replace(-2, numpy.nan).dropna()
# create a new categorical explanatory variable based on gender and race
# white women, white men, women of color, men of color
def ETH_GEN(row):
if row['PPETHM'] == 1: # white
if row['PPGENDER'] == 1: return 1 # white male
else: return 2 # white female
else: # POC
if row['PPGENDER'] == 1: return 3 # men of color
else: return 4 # women of color
sub1['ETH_GEN'] = sub1.apply(lambda row: ETH_GEN(row), axis = 1)
sub1['ETH_GEN'] = sub1['ETH_GEN'].convert_objects(convert_numeric=True)
# make a subset of individuals for whom the relevant data is available
sub2 = sub1[[ 'W2_QK3', 'ETH_GEN' ]].dropna()
#recoding group names
recode1 = {1: 'White Men', 2: 'White Women', 3: 'Men of Color', 4: 'Women of Color'}
sub2['ETH_GEN_LABELS']= sub2['ETH_GEN'].map(recode1)
sub2['ETH_GEN_LABELS']= sub2['ETH_GEN_LABELS'].astype('category')
# look at some data
print('All responses, death penalty vs. life in prison:')
sub2['W2_QK3'] = sub2['W2_QK3'].astype('category')
sub2['W2_QK3'] = sub2['W2_QK3'].cat.rename_categories(['Death', 'Prison'])
desc = sub2['W2_QK3'].describe()
print(desc)
print('=====================')
print('Preferences by ethnicity-gender subsets:')
print('WHITE MEN:')
sub_wm = sub2[sub2['ETH_GEN_LABELS'] == 'White Men'].dropna()
percent_wm = sub_wm['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False, normalize = True)
print(percent_wm)
print('=====================')
print('WHITE WOMEN:')
sub_ww = sub2[sub2['ETH_GEN_LABELS'] == 'White Women'].dropna()
percent_ww = sub_ww['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False, normalize = True)
print(percent_ww)
print('=====================')
print('MEN OF COLOR:')
sub_moc = sub2[sub2['ETH_GEN_LABELS'] == 'Men of Color'].dropna()
percent_moc = sub_moc['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False, normalize = True)
print(percent_moc)
print('=====================')
print('WOMEN OF COLOR:')
sub_woc = sub2[sub2['ETH_GEN_LABELS'] == 'Women of Color'].dropna()
percent_woc = sub_woc['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False, normalize = True)
print(percent_woc)
print('=====================')
# now let's run some tests!
# ANOVA: ordinary least squares
print('\nOrdinary Least Squares:')
sub2['ETH_GEN'] = sub2['ETH_GEN'].astype('category')
sub2['ETH_GEN'] = sub2['ETH_GEN'].cat.rename_categories(['WM', 'WW', 'MOC', 'WOC'])
model1 = smf.ols(formula = 'W2_QK3 ~ C(ETH_GEN)', data = sub1)
results1 = model1.fit()
print(results1.summary())
print('===================================')
print('\nPost hoc test:')
# post hoc, Tukey's Honestly Significant Difference Test
sub4 = sub1[['ETH_GEN', 'W2_QK3']].dropna()
sub4['ETH_GEN'] = sub4['ETH_GEN'].astype('category')
sub4['ETH_GEN'] = sub4['ETH_GEN'].cat.rename_categories(['WM', 'WW', 'MOC', 'WOC'])
mc1 = multi.MultiComparison(sub4['W2_QK3'], sub4['ETH_GEN'])
res1 = mc1.tukeyhsd()
print(res1.summary())
print('\nend.')
```