Howard College Linear Association between Marriages and Wage Report

Intro: Linear regression attempts to model the relationship between two quantitative variables by fitting a linear equation to observed data. Before attempting to fit a linear model to observed data, however, we need to first determine whether a linear model is appropriate – meaning would a linear model predict the response variable with reasonable accuracy. If there is no association or a weak association between the explanatory and response variables, then a linear regression model will not be useful. A scatterplot and the correlation coefficient can be useful tools in determining the strength of an association and whether or not a linear model could be used to make reasonable predictions.

In this project you will chose a data set that interests you from the list below, investigate the strength of a linear association between two quantitative variables within that data set, and determine if a linear regression model is appropriate. 


To produce a successful project you must:

  • Read and follow the instructions carefully.
  • Give yourself sufficient time to work on the project.
  • Write clearly, using appropriate statistical terminology and correct mathematical notation. College-level writing is expected, as is the use of proper grammar.
  • Use StatCrunch to complete all calculations and graphs.
  • Create original work.
  • Submit a professional report that is typed and formatted and organized well.


STEP 1: Choose a data set and two quantitative variables within that data set to investigate.

For example, I could say, “I chose to investigate the linear association between Home Team Goals and Attendance from the data set titled FIFA World Cup Match Results (1930-2014).”

STEP 2: Use StatCrunch to create a scatterplot for your variables.  

For my example, I will create a basic scatterplot with Attendance as the explanatory variable and Home Team Goals as the response variable. I will then copy the graph into my document.

STEP 3: Use StatCrunch to calculate the correlation coefficient and report the result.

For my example, I will only need to compute the correlation coefficient for the variables Home Team Goals and Attendance (not all of the variables in the table as is shown in the video). I will then copy the result into my document.

STEP 4: Referencing the scatterplot and the correlation coefficient, describe the form and strength of the association you are investigating and be sure to thoroughly discuss any possible outliers. Then make a conclusion about whether or not a linear model would be appropriate for the association you are investigating.  

For this step, write a thoughtful paragraph that gives a detailed description of the association and a reasoned conclusion about whether a linear model is appropriate for your case using the language and concepts involved with linear regression. This step is where you show that you thoroughly understand this concept and therefore it carries the most points towards your grade for this project. 

Data Sets:

Below is a list of data sets – choose one for the project.

U.S. CBP Drug Seizure Statistics:…
This data set summarizes the pounds of drugs seized at ports of entry and between points of entry by the U.S. Customs and Border Protection Agency.…

U.S. Presidential Data:…
This data set contains information on the U.S. Presidents from 1789-2019.

Fatal Encounters Updated September 2018:…
This data set contains information on fatal encounters. Fatal Encounters is a non-profit organization that collects data on police involved deaths. Note: This is a volunteer agency collecting the data from people who are scouring new articles for evidence of these fatal encounters. Thus, this is not a complete population of fatal encounters, only a large sample.

College Basketball Arenas:…
This data set contains information on college basketball arenas throughout the country.

Marriage vs. the Economy:…
This data set compares the number of marriages in the last 30 years to several factors of the economy.

Medical Costs:…
This data set contains a variety of personal data in regards to medical costs.

MLB August 2019 Batting:…
This data set contains MLB batter statistics and are year-to-date as of August 18, 2019.

Sample College Data:…
This data set contains a variety of data for colleges and universities in Delaware, DC, Maryland, Pennsylvania, Virginia, and West Virginia. Data is for the year 2011.

Fast Food Nutritional Data:…
This data set contains nutritional information on a variety of fast food items. Data was collected in January 2017 from online sources for each restaurant.

NFL Player Data 2016:…
This data set lists the 2,764 NFL players for all team rosters as of July 22, 2016

Car Details 2019 Models:…
This data set contains information on the 2019 models of widely-known sold cars. MSRP stands for Manufacturer Suggested Retail Price and MPG stands for Miles Per Gallon.

Don't hesitate - Save time and Excel

Are you overwhelmed by an intense schedule and facing difficulties completing this assignment? We at GrandHomework know how to assist students in the most effective and cheap way possible. To be sure of this, place an order and enjoy the best grades that you deserve!

Post Homework