week 3: managing data

The goal for this week was to identify and perform any data management that will help to answer the question clarified in week 2:

Do people who know someone who has been accused or convicted of a crime favor the death penalty over life in prison as a punishment for murder, and does this preference differ from people who have never known anyone accused or convicted of a crime?

I did not realize that I was jumping ahead when performing some data management in the previous assignment. In order to answer this question, I needed to combine those who answered “yes” to one or both of the following questions: “has anyone in your household ever been arrested for a crime?” and “do you have any friends or relatives having a criminal conviction?” into one group, and combine those who answered “no” to both of these questions into a second group.

By the nature of the convert_numeric() function, individuals with missing variables were excluded. Individuals who refused to answer the questions are included in the analysis, because their responses were coded as (-1). However, it was impossible to determine why variables are missing from the dataset at this level of investigation. Because these answers cannot be inferred, the individuals were omitted from the analysis.

Summary:

With blank responses omitted from the sample, the total sample size is 2,201 (out of the dataset’s 2,294 records).

The sample includes 1,086 people who answered “yes” to one or both of the following questions: “has anyone in your household ever been arrested for a crime?” and “do you have any friends or relatives having a criminal conviction?” and 1,115 people who answered “no” to both of these questions.

Examining the entire sample (n = 2,201) first: 46.5% favored the death penalty as punishment for murder, 49.4% favored life in prison, and 4.1% refused to answer.

Group 1 includes those who know someone who has been accused or convicted of a crime (n = 1,086): 44.4% favored the death penalty, 53.1% favored life in prison, and 2.4 % refused to answer.

Group 2 includes those who do not know anyone accused or convicted of a crime (n = 1,115): 50.1% favored the death penalty, 46.7% favored life in prison, and 3.3% refused to answer.

PROGRAM OUTPUT:

runfile(‘/Users/ghost/PycharmProjects/Coursera-Data/data2.py’, wdir=’/Users/ghost/PycharmProjects/Coursera-Data’)
=====================
This analysis will examine responses to the Outlook On Life (OOL) survey question:
Which is the better penalty for murder: death or life in prison? [W2_QK3]

The dataset includes 2294 total observations, and 436 variables.
=====================
Creating the subsets:

GROUP 1 will include individuals who answered yes to either of the following two questions:

Q1: [W1_P9] Has anyone in your household ever been arrested for a crime?
Q2: [W1_P10] Do you have any friends or relatives having a criminal conviction?

GROUP 2 will include individuals who answered NO to both of these questions.

Records in which either of these questions were left blank were omitted.

Total sample size: 2201

Total number who know someone arrested OR convicted: 1086

Total number who do NOT know someone arrested or convicted: 1115

=====================
*** Calculating Frequency Distributions ***
=====================

BOTH GROUPS COMBINED:
n = 3
1 = death penalty
2 = life imprisonment
-1 = refused to answer
———–
COUNTS:
1.0 744
2.0 791
-1.0 66
Name: W2_QK3, dtype: int64
———–
PERCENTAGES:
1.0 0.464710
2.0 0.494066
-1.0 0.041224
Name: W2_QK3, dtype: float64

=====================

GROUP 1: People who know someone arrested OR convicted:
n = 1086
1 = death penalty
2 = life imprisonment
-1 = refuse
———–
COUNTS:
1.0 333
2.0 398
-1.0 18
Name: W2_QK3, dtype: int64
———–
PERCENTAGES:
1.0 0.444593
2.0 0.531375
-1.0 0.024032
Name: W2_QK3, dtype: float64

=====================

GROUP 2: People who do NOT know someone arrested or convicted:
n = 1115
1 = death penalty
2 = life imprisonment
-1 = refuse
———–
COUNTS:
1.0 398
2.0 371
-1.0 26
Name: W2_QK3, dtype: int64
———–
PERCENTAGES:
1.0 0.500629
2.0 0.466667
-1.0 0.032704
Name: W2_QK3, dtype: float64

=====================

PYTHON CODE:


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import pandas
import numpy
import warnings 
warnings.simplefilter(action = "ignore", category = FutureWarning)
data = pandas.read_csv('ool_pds.csv', low_memory=False)

print('=====================')
print('This analysis will examine responses to the Outlook On Life (OOL) survey question: \nWhich is the better penalty for murder: death or life in prison? [W2_QK3]')
print()
print('The dataset includes {} total observations, and {} variables.'.format(len(data), len(data.columns)))

print('=====================')
# subset: people who know someone arrested and/or convicted of a crime:
sub1 = data[ ((data['W1_P9'] == 1) | (data['W1_P10'] == 1)) ]
# subset: people who don't know someone arrested OR convicted:
xsub = data[ ((data['W1_P9'] == 2) & (data['W1_P10'] == 2)) ]

print('Creating the subsets:')
print()
print('GROUP 1 will include individuals who answered yes to either of the following two questions:\n')
print('Q1: [W1_P9] Has anyone in your household ever been arrested for a crime?')
print('Q2: [W1_P10] Do you have any friends or relatives having a criminal conviction?')
print()
print('GROUP 2 will include individuals who answered NO to both of these questions.')
print()
print('Records in which either of these questions were left blank were omitted.')
print()

print('Total sample size: {}'.format(len(sub1) + len(xsub)))
print()
print('Total number who know someone arrested OR convicted: {}'.format(len(sub1)))
print()
print('Total number who do NOT know someone arrested or convicted: {}'.format(len(xsub)))

print()
print('=====================')
print('*** Calculating Frequency Distributions *** ')
print('=====================')
# counts and frequencies for ALL data in the sample:
ctotaldata = data['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False)
ptotaldata = data['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False, normalize = True)
print('\nBOTH GROUPS COMBINED:')
print(' n = {}'.format(len(ctotaldata))) # n = number of records
print(' 1 = death penalty\n 2 = life imprisonment\n-1 = refused to answer')
print('     -----------     ')
print('COUNTS:')
print(ctotaldata)
print('     -----------     ')
print('PERCENTAGES:')
print(ptotaldata)
print()
print('=====================')

# counts and frequencies for who know someone
sub1 = sub1.convert_objects(convert_numeric=True)
csub1 = sub1['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False)
psub1 = sub1['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False, normalize = True)
print('\nGROUP 1: People who know someone arrested OR convicted:')
print(' n = {}'.format(len(sub1)))
print(' 1 = death penalty\n 2 = life imprisonment\n-1 = refuse')
print('     -----------     ')
print('COUNTS:')
print(csub1)
print('     -----------     ')
print('PERCENTAGES:')
print(psub1)
print()
print('=====================')

# counts and frequencies for who do NOT know someone
xcsub = xsub['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False)
xpsub = xsub['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False, normalize = True)
print('\nGROUP 2: People who do NOT know someone arrested or convicted:')
print(' n = {}'.format(len(xsub)))
print(' 1 = death penalty\n 2 = life imprisonment\n-1 = refuse')
print('     -----------     ')
print('COUNTS:')
print(xcsub)
print('     -----------     ')
print('PERCENTAGES:')
print(xpsub)
print()
print('=====================')

Leave a Comment Cancel reply