Write My Paper Button

WhatsApp Widget

Task 1 (10 marks) – Exploration Find the final K-means configurations for a series of datasets and various initial generations of centroids. How many different configurations did you observe? How frequently

Subject Code: MA3022 – MA4022 – MA7022
MA3022 / MA4022 / MA7022 Data Mining and Neural Networks

Due till 03.02.2025
100 marks available

Theoretical Background and Two Mini-Research Projects


Theoretical Background (20 marks)

Give a description of classification and clustering (5 marks).
What is the difference between them? (5 marks)
Describe KNN approach and Hart’s algorithm for data (5 marks).
Describe the K-means (5 marks).


Project 1: Condensed Nearest Neighbour for Data Reduction in Nearest Neighbour Classifier (40 marks)

Go to web page:
https://github.com/Mirkes/Data_Mining_Softbook/wiki/KNN-and-potential-ene

Read text. Download application:
https://github.com/Mirkes/Data_Mining_Softbook/blob/master/knn/knn.jar

Task 1 (10 marks)

Study how the number of prototypes depends on the number of points for two convex well-separated classes.

Task 2 (10 marks)

Prepare a series of examples with more sophisticated non-convex shapes of well-separated classes. Study how the number of prototypes depends on the number of points in these classes.

Task 3 (10 marks)

Study how the number of prototypes and outliers depends on the number of points for two well-separated classes with added background uniformly distributed noise (option: random).

Task 4 (10 marks)

In conclusion, discuss the results and propose a hypothesis for further study.

Do not forget to save and submit the configurations of the classes and prototypes as figures!


Project 2: Dynamics of K-means Clustering (40 marks)

Go to web page:
https://github.com/Mirkes/Data_Mining_Softbook/wiki/k-means-and-k-medoids

Read text. Download application:
https://github.com/Mirkes/Data_Mining_Softbook/blob/master/kmeans/KMeansKMedoids.jar

Task 1 (10 marks) – Exploration

Find the final K-means configurations for a series of datasets and various initial generations of centroids. How many different configurations did you observe? How frequently did they appear? How many iterations were required?

Task 2 (10 marks)

Formulate a hypothesis about the number of different final K-means configurations and their frequencies. Analyse how they depend on the number of data points. Check this hypothesis on random sets of equidistributed points.

Task 3 (10 marks)

Formulate a hypothesis about the convergence rate of K-means and its dependence on the number of data points. Check this hypothesis on the random sets of equidistributed points (use the same series of experiments as in question 2).

Task 4 (10 marks)

In conclusion, discuss the results and propose a hypothesis for further study.

Do not forget to save and submit the configurations of the classes and prototypes as figures!

Task 1 (10 marks) – Exploration Find the final K-means configurations for a series of datasets and various initial generations of centroids. How many different configurations did you observe? How frequently
Scroll to top