A list of puns related to "Cluster analysis"
I come from a more data science background, where cluster analysis is used very liberally.
But when I read research papers that rely more on econometrics (e.g. business-related ones), I rarely see cluster analysis being used. I've even heard people saying econometricians tend not to trust results from clustering.
104 school clusters (Sept 7-Oct 14), 21% involved staff.
Staff were the index cases in 11.5% of total clusters.
"Students, rather than staff, are majority of cases and primarily drive in-school transmission."
https://preview.redd.it/x3le9jhfbaw71.png?width=1773&format=png&auto=webp&s=7a6655fb2d53874cd7c66d1d9ff81fba6cee3f20
https://newwestschools.ca/wp-content/uploads/2021/10/StaffVaccinePresentation.pdf
I'm doing cluster analysis for a retailer spread across multiple countries. This is a gulf retailers so most of the shoppers here are foreigners. With customer data (aggregated transaction metrics and demographics) from 2018 Jan the request was to create customer personas. I'm just worried that this is not possible as the data will not be a good fit for creating clusters as there was covid. Is there anyway to check if my data will good boundaries for the clusters we create. Elbow plot and silhouette scores doesn't make sense I think, correct me if I'm wrong. Is there anyway to control the boundary conditions of clusters
I was wondering if someone could perhaps help explain the difference between these two methods and when it might be more appropriate to use one vs the other for cluster/hotspot analysis?
I understand that local Moran's I allows you to identify both statistically significant clusters (areas of high value surrounded by high value neighbors aka hotspots & areas of low value surrounded by low value neighbors aka coldspots) as well as statistically significant outliers (areas of high value surrounded by low value neighbors and areas of low value surrounded by high value neighbors) .
By contrast Getis-Ord Gi* only seems to find statistically significant clusters/hotspots/coldspots and not outliers like local Moran's I does.
What are some other differences between these methods that are relevant to deciding which to use to study the prevalence of a disease by census tract? Is it just whether you'd like to find outliers or not? If local Moran's I has no disadvantages since it can find both outliers and clusters why does anyone use Getis-Ord Gi* if it can only find clusters and not outliers? What is the advantages for it? Is the way neighbors are found different for one vs the other?
What do you think about a cluster analysis to segment customers? I feel like a manual segmentation is often times better, especially when it comes to more personalized marketing.
I think that clustering makes sense when there are distinct groups in the population. However, I think that these groups can easily be identified by EDA (finding thresholds of certain variables) in most cases. There is a possibility for identifying relevant groups only through cluster analysis, but I think that a) those cases are rare and b) the identified clusters are more complex and not suitable for a segmentation with the objective of a more personalized communication.
Does anybody have a success story where unsupervised clustering led to a customer segmentation that offered a business value (e.g. because of more personalized communication)? I am struggling to imagine a scenario where unsupervised clustering comes up with better clusters for personalized communications compared to manually building clusters by thresholds/criteria for clusters.
I have conducted a multiple choice survey, and now I want to analyze it using a cluster analysis.
Since it is a multiple choice, multiple variant survey k-mean clustering or fuzzy k-mean clustering seemed like the obvious choice. However, since I am completely new to clustering, I am not sure if it is the best approach after all.
Furthermore, the multiple answers are giving me a hard time, as it does not allow a clear clustering of one-to-one answers. Should I produce distinct data sets for each multiple answer? It would expand the data set way beyond 250 entries, and I am not even sure if it would provide any useful answers because it won't be able to represent when multiple answers were given.
Here is a link to my data sheet. In the data multiple answers are distinguished by ";" and *blank answers* are indicated by NULL.
What I am trying to achieve is to find clusters in my participants or at the very least a correlation between answers (e.g. If Answer 1 was answered in question 1, then participants also were likely to answer option 4 in question number 17).
Are there any algorithms that can deal with the data? How would you approach this data set?
Edit: added passage of what I want to achieve with the analysis and updated the link
Hi! Iβd like to ask that weather we could use dissimilarity measure in any cluster analysis algorithm (e.g. k-mean, hierarchal or k-median etc) instead of distance measure as well? If the distance measure do not provide the true difference between the objects? If not, then why not? Thanks
Hello! I'm a health researcher, and I was wondering if someone here could perhaps help explain the difference between these two methods and when it might be more appropriate to use one vs the other for cluster/hotspot analysis (I'm studying diabetes and prediabetes prevalence by census tract)? I imagine these spatial/geographic methods are used quite extensively in environmental and geosciences so I thought I'd try asking here!
I understand that local Moran's I allows you to identify both statistically significant clusters (areas of high value surrounded by high value neighbors aka hotspots & areas of low value surrounded by low value neighbors aka coldspots) as well as statistically significant outliers (areas of high value surrounded by low value neighbors and areas of low value surrounded by high value neighbors) .
By contrast Getis-Ord Gi* only seems to find statistically significant clusters/hotspots/coldspots and not outliers like local Moran's I does.
What are some other differences between these methods that are relevant to deciding which to use to study the prevalence of a disease by census tract? Is it just whether you'd like to find outliers or not? If local Moran's I has no disadvantages since it can find both outliers and clusters why does anyone use Getis-Ord Gi* if it can only find clusters and not outliers? What is the advantages for it? Is the way neighbors are found different for one vs the other?
I was wondering if someone could perhaps help explain the difference between these two methods and when it might be more appropriate to use one vs the other for cluster/hotspot analysis?
I understand that local Moran's I allows you to identify both statistically significant clusters (areas of high value surrounded by high value neighbors aka hotspots & areas of low value surrounded by low value neighbors aka coldspots) as well as statistically significant outliers (areas of high value surrounded by low value neighbors and areas of low value surrounded by high value neighbors) .
By contrast Getis-Ord Gi* only seems to find statistically significant clusters/hotspots/coldspots and not outliers like local Moran's I does.
What are some other differences between these methods that are relevant to deciding which to use to study the prevalence of a disease by census tract? Is it just whether you'd like to find outliers or not? If local Moran's I has no disadvantages since it can find both outliers and clusters why does anyone use Getis-Ord Gi* if it can only find clusters and not outliers? What is the advantages for it? Is the way neighbors are found different for one vs the other?
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.