 EssayPhD是来自全球各地TOP名校的PhD自发组成的联盟，旨在为学弟学妹提供专业的英语论文写作指导及留学申请指导，为名解答论文及申请文书写作中的常见疑难问题。

Predictive linear regression method Individual Project

Coursework 2: Predictive linear regression method
Individual Project
• • Using the ‘TaykoData Set’ for analysis you will be performing a predictive linear regression.
• • The data is provided in Excel format on the Blackboard. ‘Tayko Software catalog’ is a firm that sells games and educational software.
• • In this project, we centre on applying a linear multiple regression method to predict ‘Spending’ of customers, based on a number of explanatory variables, listed below.
• • There are 23 variables. For the source variable, there are 15 different catalogues that the games and educational software can be ordered from.
• • The class/dependent variable is SPENDING
Develop a multiple regression model for predicting spending among the purchasers.
• 1. Partition this data set into training and validation partitions on the basis of the partition variable.
• 2. Develop a best model for predicting spending using multiple linear regressions on training data.
• a. For this consider the Independent variables, which will produce the best model.
• b. You will do this based on statistical estimations such as
• i. Multicollinearity assumptions checking
• ii. Analysis of statistical indication of the best model
• c. Three regressions need to be completed
• i. Forward
• ii. Backward
• iii. Stepwise
• d. For each regression remember to select the statistics that will help you complete part b.
• e. Describe the subset selecting method, explain and justify your choice.
• 3. Discuss a variety of statistical measures (i.e goodness of fit measures in SPSS) that would allow you to validate the performance of the model.
• a. For higher marks relate these back to the model you have been developing in part 2. This is independent learning and will not be covered in class.
• 4. Perform a final regression on the validation data.
• a. This should be the best model as determined from using the test data.
• b. Report on the model accuracy as well as overall performance.

1. Data Description
There are 23 variables and 1000 observations in the original data set and we divided the data set into Training set and Validating set with each has 500 observations. Variables are as follows:
US: equals 1 if it is a US address, otherwise 0
Source: Source catalogue for the record. Including 15 source variables source_a to source_w, equals 1 if from that source, otherwise 0
Freq: Number of transactions in last year at source catalogue
last_update_days_ago: How many days ago was last update to cust. record
first_update_days_ago: How many days ago was 1st update to cust. record
Web_order: Customer placed at least 1 order via web
Gender: equals 1 when customer is male, otherwise 0
Purchase: Person made purchase in test mailing
Spending: Amount spent by customer in test mailing (\$)
Partition: Variable indicating which partition the record will be assigned to
We will use Spending as our dependent variable and built a linearly model using other variables to find out which variables have linear relation with spending and then use the model to do prediction.
2. Exploratory Data Analysis
We have the partial plots of dependent variables against independent variable:    ------分隔线----------------------------
﻿
ESSAY PhD擅长于各类学科写作，为全球留学生提供优质论文代写服务。  