Assignment Task
Part 1 – Data Exploration and Manipulation
Tasks 1-4 focus on tweets about COVID-19, with each row representing a tweet and each column representing a piece of information recorded about the tweet. This data has been taken from Kaggle1 and can be accessed in the “vaccination_all_tweets.csv” file on iLearn in the Assignment 2 folder.
Task 1
• Report the number of rows and the number of columns in the dataset.
• Report the structure of the data and calculate descriptive statistics for all numeric columns. Descriptive statistics include the mean, standard deviation, maximum and minimum.
• In ascending order, print the number of unique values for all columns of type object.
• In descending order, print the proportion of True values for all columns of type bool.
Task 2
• Create a new column tweet_length equal to the number of characters in each tweet. Create a histogram showing the distribution of values in this new column using 25 bins with appropriate customisations. Comment on the plot.
• Create a new column retweet_prop equal to the number of retweets divided by the number of followers. Inspect this column. What problem has occurred?
• For tweets where retweet_prop is less than or equal to 1, create a histogram coloured by whether the user is verified. Add reasonable customisations to the plot and interpret it.
Task 3
• Create a new column called account_year reporting the year the account associated with each tweet was created. Create a stacked bar chart showing the number of verified and unverified accounts created each year. Include appropriate customisations.
• The source column takes a lot of different values, but only a few are common. Identify the 7 most common values of the source column, then modify the column to change all other values to be “Other”.
• Create a pie chart showing the breakdown of the categories in the newly modified source variable. Display the percentage of each category on the pie chart. Interpret the plot.
• Create a column that shows how many minutes have passed each day by the time the tweet was made (e.g., a tweet at 17:52:03 becomes 1072.05). Create a histogram showing the distribution of times that tweets are made at with appropriate customisations.
Task 4
• What is the maximum number of characters associated with a single hashtag? For example, the ‘Pfizer’ hashtag has 6 characters.
• Create four columns corresponding to whether each tweet mentioned Pfizer, Moderna, AstraZeneca, or Sinovac. Create a line plot with four lines corresponding to the number of times Pfizer, Moderna, AstraZeneca, or Sinovac was mentioned. Your x-axis should be the date. You can choose whether you are considering daily tweet counts or weekly tweet counts.
This IT Assignment has been solved by our IT experts at Schooling Best. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.