week 2: writing your first program (in python)

SUMMARY:

While I had an understanding in my head of what I wanted to examine last week, my last blog post still felt muddy, and it wasn’t until I dove into the data itself and started isolating it and working with it that really got to understand the dataset enough to refine my research question:

Do people who know someone who has been accused or convicted of a crime favor the death penalty overĀ life in prison as a punishment for murder, and does this preference differ from people who have never known anyone accused or convicted of a crime?

Blank responses were omitted from the sample, leaving a total sample size of 2,201 out of the dataset’s total 2,294 records. This sample includes 1,086 people who answered “yes” to one or both of the following questions: “has anyone in your household ever been arrested for a crime?” and “do you have any friends or relatives having a criminal conviction?” and 1,115 people who answered “no” to both of these questions.

Examining the entire sample (n = 2,201) first: 46.5% favored the death penalty as punishment for murder, 49.4% favored life in prison, and 4.1% refused to answer.

Group 1 includes those who know someone who has been accused or convicted of a crime (n = 1,086): 44.4% favored the death penalty, 53.1% favored life in prison, and 2.4 % refused to answer.

Group 2 includes those who do not know anyone accused or convicted of a crime (n = 1,115): 50.1% favored the death penalty, 46.7% favored life in prison, and 3.3% refused to answer.

PROGRAM OUTPUT:

runfile(‘/Users/ghost/PycharmProjects/Coursera-Data/data2.py’, wdir=’/Users/ghost/PycharmProjects/Coursera-Data’)
=====================
This analysis will examine responses to the Outlook On Life (OOL) survey question:
Which is the better penalty for murder: death or life in prison? [W2_QK3]

The dataset includes 2294 total observations, and 436 variables.
=====================
Creating the subsets:

GROUP 1 will include individuals who answered yes to either of the following two questions:

Q1: [W1_P9] Has anyone in your household ever been arrested for a crime?
Q2: [W1_P10] Do you have any friends or relatives having a criminal conviction?

GROUP 2 will include individuals who answered NO to both of these questions.

Records in which either of these questions were left blank were omitted.

Total sample size: 2201

Total number who know someone arrested OR convicted: 1086

Total number who do NOT know someone arrested or convicted: 1115

=====================
*** Calculating Frequency Distributions ***
=====================

BOTH GROUPS COMBINED:
n = 3
1 = death penalty
2 = life imprisonment
-1 = refused to answer
———–
COUNTS:
1.0 744
2.0 791
-1.0 66
Name: W2_QK3, dtype: int64
———–
PERCENTAGES:
1.0 0.464710
2.0 0.494066
-1.0 0.041224
Name: W2_QK3, dtype: float64

=====================

GROUP 1: People who know someone arrested OR convicted:
n = 1086
1 = death penalty
2 = life imprisonment
-1 = refuse
———–
COUNTS:
1.0 333
2.0 398
-1.0 18
Name: W2_QK3, dtype: int64
———–
PERCENTAGES:
1.0 0.444593
2.0 0.531375
-1.0 0.024032
Name: W2_QK3, dtype: float64

=====================

GROUP 2: People who do NOT know someone arrested or convicted:
n = 1115
1 = death penalty
2 = life imprisonment
-1 = refuse
———–
COUNTS:
1.0 398
2.0 371
-1.0 26
Name: W2_QK3, dtype: int64
———–
PERCENTAGES:
1.0 0.500629
2.0 0.466667
-1.0 0.032704
Name: W2_QK3, dtype: float64

=====================

PYTHON CODE:


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import pandas
import numpy
import warnings 
warnings.simplefilter(action = "ignore", category = FutureWarning)
data = pandas.read_csv('ool_pds.csv', low_memory=False)

print('=====================')
print('This analysis will examine responses to the Outlook On Life (OOL) survey question: \nWhich is the better penalty for murder: death or life in prison? [W2_QK3]')
print()
print('The dataset includes {} total observations, and {} variables.'.format(len(data), len(data.columns)))

print('=====================')
# subset: people who know someone arrested and/or convicted of a crime:
sub1 = data[ ((data['W1_P9'] == 1) | (data['W1_P10'] == 1)) ]
# subset: people who don't know someone arrested OR convicted:
xsub = data[ ((data['W1_P9'] == 2) & (data['W1_P10'] == 2)) ]

print('Creating the subsets:')
print()
print('GROUP 1 will include individuals who answered yes to either of the following two questions:\n')
print('Q1: [W1_P9] Has anyone in your household ever been arrested for a crime?')
print('Q2: [W1_P10] Do you have any friends or relatives having a criminal conviction?')
print()
print('GROUP 2 will include individuals who answered NO to both of these questions.')
print()
print('Records in which either of these questions were left blank were omitted.')
print()

print('Total sample size: {}'.format(len(sub1) + len(xsub)))
print()
print('Total number who know someone arrested OR convicted: {}'.format(len(sub1)))
print()
print('Total number who do NOT know someone arrested or convicted: {}'.format(len(xsub)))

print()
print('=====================')
print('*** Calculating Frequency Distributions *** ')
print('=====================')
# counts and frequencies for ALL data in the sample:
ctotaldata = data['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False)
ptotaldata = data['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False, normalize = True)
print('\nBOTH GROUPS COMBINED:')
print(' n = {}'.format(len(ctotaldata))) # n = number of records
print(' 1 = death penalty\n 2 = life imprisonment\n-1 = refused to answer')
print('     -----------     ')
print('COUNTS:')
print(ctotaldata)
print('     -----------     ')
print('PERCENTAGES:')
print(ptotaldata)
print()
print('=====================')

# counts and frequencies for who know someone
sub1 = sub1.convert_objects(convert_numeric=True)
csub1 = sub1['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False)
psub1 = sub1['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False, normalize = True)
print('\nGROUP 1: People who know someone arrested OR convicted:')
print(' n = {}'.format(len(sub1)))
print(' 1 = death penalty\n 2 = life imprisonment\n-1 = refuse')
print('     -----------     ')
print('COUNTS:')
print(csub1)
print('     -----------     ')
print('PERCENTAGES:')
print(psub1)
print()
print('=====================')

# counts and frequencies for who do NOT know someone
xcsub = xsub['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False)
xpsub = xsub['W2_QK3'].convert_objects(convert_numeric=True).value_counts(sort = False, normalize = True)
print('\nGROUP 2: People who do NOT know someone arrested or convicted:')
print(' n = {}'.format(len(xsub)))
print(' 1 = death penalty\n 2 = life imprisonment\n-1 = refuse')
print('     -----------     ')
print('COUNTS:')
print(xcsub)
print('     -----------     ')
print('PERCENTAGES:')
print(xpsub)
print()
print('=====================')

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.