I chose to cluster the movies in the .txt file by the rating they received. I may have made some kind of mistake because the dendogram was alphabetical. Even if this is the case I understand the concept behind hierarchical clustering. Which is to group the data into hierarchies and merger the two most similar. Repeating this until all data is clustered. This is how I got my dendogram.
I first ran the command:
rows,columns,data=clusters.readfile('C:\Python26\Lib\movies.txt')
Followed by:
clust=clusters.hcluster(data)
This command took quite a while to run. I started the operation and had to wait for around 30 minutes.
Hierachical clustering takes the data and bulids a hierarchy of groups. It then continuously merges the most similar groups. It repeats this until there is only one resulting group.
I then printed out the resulting dendogram:
clusters.printclust(clust,labels=rows)
This is a small section of the dendogram:

This is clearly incorrect.
I chose to try this again using different fields from the .txt file. I figured out that some fields just work better than others. I performed the same python commands as before, but this time on a file called movies4.txt. This file consisted of the movies, ratings they received, and then r1 (which I'm not sure what that means). This time I got a more convincing looking dendogram.
Here is a piece of that dendogram:

No comments:
Post a Comment