Homework 2
Prof. Mohammad Dehghani
Assignment Guidelines
1. Students need to complete the assignment individually.
2. All the assignments are required to be done in RStudio.
3. Provide necessary comments using ‘#’ for better understanding of your script.
4. The code should follow tidyverse style guide (https://style.tidyverse.org/index.html) The tidyverse style guide has style standards for naming objects, indentation and how to write long lines of codes to name a few
5. If you take help from any external sources, please mention that in the reference. Violating academic integrity policies may include zero credit on the work.
6. The assignment report needs to include the following sections:
• Problem statement: A brief about your understanding on the assignment questions (maximum 3 lines)
• Result: What were your finding after creating the code and running it in R. This section may include:
– Graphs / charts / plots
– Final data frame for your result
– Results obtained
• Conclusion: What were the statistical inferences and observations from the results obtained.
7. Assignment report has to be included in the .rmd file.
1 of 2
Deliverables:
1. Please submit a*.rmd file which includes your code and can be knit into a PDF(recommended)
**
i. ** Please contact the TAs in office hours to learn how to knit the files into PDF
2. The above mentioned file has to be labeled as: ‘HW # – IE 6600 – Sec # – Student Name ’
3. Submit your HW deliverables via CANVAS 4. Deadline: 22nd September 23:59 pm Homework: 2 IE6600: Computation and Data Visualisation
Direct to consumer marketing is an effective strategy to distribute agricultural and farm products to con- sumers. Farmers market forms an important link between farmers and consumers that helps foster farmer consumer relationships. The United States Department of Agriculture (USDA) has recognized the impor- tance of farmers markets. Through its many programs, USDA has helped the growth of farmers markets across the country. As on date 8,791 farmers markets are listed in USDA’s National Farmers Market Direc- tory. The data is stored in fm.csv.
The data file contains the following details
1. Variables indicating the geographical location of the farmers market (latitude, longitude, street, county, state, etc.)
2. Variables indicating types of products (herbs, vegetables, seafood etc.)
3. Variables indicating type of payment accepted (cash, WIC, SNAP, SFMNP etc.)
4. Variables indicating online social media presence
5. Variables indicating date and time
The directory of farmers market across the US is given in the file. Answer the following questions from the dataset fm.csv
Task 1
Write a code to compute the number of farmers markets by cities in the state of California and arrange them in descending order of number of farmers market. Omit NA values
Task 2
Filter by state of Massachusetts and generate the following table. First column should contain the year (year has to be extracted from the column ‘ Season1Date’) and second column should contain list number of farmer markets. Omit Na values
Sample output:
The below table should only be considered as a reference as how the output should look like. Students need to generate the entire table with the below two columns.
Year No. of Farmers Market
2012 2162
2017 4366
2011 2605
2016 3086
2019 3397
Task 3
Write a code to compute the number of farmers markets by state and display the top 15 states
Task 4
Filter by state of New York and generate the following table using pivot function. First column should contain the Payment system and second column should list the type of products. For Payment System consider the columns, “Credit”, “WIC”, “WICcash”, and “SNAP” from the original farmers market data. Third column should have the number of farmers market offering the payment services.
3 of 2
Sample output:
The below table should only be considered as a reference as how the output should look like. Students need to generate the entire long form table.
States Payment System #Farmers Market
Credit Organic 2162
Credit Bakedgoods 4366
Credit Cheese 2605
Credit Crafts 3086
Credit Flowers 3397
Task 5
Create two new columns and add the columns to the farmers market dataframe. The first column should be named “Startdate” and the second column must be name “Enddate”. The Season1Date column has most entries of the form “05/05/2015 to 10/27/2015”. Split the date entries of Season1Date and allocate the first value to Startdate and the second value to Enddate.
Sample output:
The below table should only be considered as a reference as how the output should look like. Students need to generate the entire long form table.
Season1Date Startdate Enddate
05/05/2015 to 10/27/2015 05/05/2015 0/27/2015
Task 6
From the NY Collision data nycollision.csv compute for each borough and tabulate the following variables
• Number of pedestrians injured in each Borough will all stats (total, min, max, mean, median, mode, quartiles). All the stats have to be calculated in a single line of code. (10 Points)
• List the number of accidents by the type of vehicles involved in each borough (5 points)
• List the factors responsible for the accidents in each borough in descending order (5 points)
• List the number of accidents by each hour of the day (5 points)
• Give the monthly number of accidents by month and year (5 points)
• For Brooklyn, List the number of persons injured, killed, pedestrians injured, killed, cyclist injured, killed, motorist injured, killed in the long form with two columns (Borough, type of outcome ie., injured/killed, number) Do not include rows with empty values.