24 Hilarious Cluster analysis Puns

Why is cluster analysis so rare in econometrics?

I come from a more data science background, where cluster analysis is used very liberally.

But when I read research papers that rely more on econometrics (e.g. business-related ones), I rarely see cluster analysis being used. I've even heard people saying econometricians tend not to trust results from clustering.

👍︎ 9

💬︎

👤︎ u/micky04

📅︎ Nov 30 2021

🚨︎ report

Okinawa fears link between 1st Omicron case and base cluster | The Asahi Shimbun: Breaking News, Japan News and Analysis asahi.com/ajw/articles/14…

👍︎ 22

💬︎

👤︎ u/Setagaya-Observer

📅︎ Dec 18 2021

🚨︎ report

Fraser Health Cluster Analysis, Sept 7-Oct 14: 104 school clusters. "Students, rather than staff, are majority of cases and primarily drive in-school transmission"

https://www.newwestrecord.ca/coronavirus-covid-19-local-news/fraser-health-schools-saw-2000-covid-19-cases-in-six-weeks-and-most-of-those-were-students-4699352

104 school clusters (Sept 7-Oct 14), 21% involved staff.

Staff were the index cases in 11.5% of total clusters.

"Students, rather than staff, are majority of cases and primarily drive in-school transmission."

https://preview.redd.it/x3le9jhfbaw71.png?width=1773&format=png&auto=webp&s=7a6655fb2d53874cd7c66d1d9ff81fba6cee3f20

https://newwestschools.ca/wp-content/uploads/2021/10/StaffVaccinePresentation.pdf

👍︎ 5

💬︎

👤︎ u/sereniti81

📅︎ Oct 29 2021

🚨︎ report

Cluster analysis goodness of fit

I'm doing cluster analysis for a retailer spread across multiple countries. This is a gulf retailers so most of the shoppers here are foreigners. With customer data (aggregated transaction metrics and demographics) from 2018 Jan the request was to create customer personas. I'm just worried that this is not possible as the data will not be a good fit for creating clusters as there was covid. Is there anyway to check if my data will good boundaries for the clusters we create. Elbow plot and silhouette scores doesn't make sense I think, correct me if I'm wrong. Is there anyway to control the boundary conditions of clusters

👍︎ 5

💬︎

👤︎ u/nitz_d_blitz

📅︎ Oct 28 2021

🚨︎ report

Genetic Bio-Ancestry and Social Construction of Racial Classification in Social Surveys in the Contemporary United States [self-reported as white, 99.5 % were assigned into the “white” category by the cluster analysis. Of those who self-reported as black, 99.3 % were classified as “black.” ] ncbi.nlm.nih.gov/pmc/arti…

👍︎ 2

💬︎

👤︎ u/razznick

📅︎ Dec 08 2021

🚨︎ report

The Earth has a pulse -- a 27.5-million-year cycle of geological activity: Analysis of 260 million years of major geological events finds recurring clusters 27.5 million years apart eurekalert.org/pub_releas…

👍︎ 599

💬︎

👤︎ u/DoremusJessup

📅︎ Jun 19 2021

🚨︎ report

Local Moran’s I vs Getis-Ord Gi* for Cluster/Hotspot Analysis

I was wondering if someone could perhaps help explain the difference between these two methods and when it might be more appropriate to use one vs the other for cluster/hotspot analysis?

I understand that local Moran's I allows you to identify both statistically significant clusters (areas of high value surrounded by high value neighbors aka hotspots & areas of low value surrounded by low value neighbors aka coldspots) as well as statistically significant outliers (areas of high value surrounded by low value neighbors and areas of low value surrounded by high value neighbors) .

By contrast Getis-Ord Gi* only seems to find statistically significant clusters/hotspots/coldspots and not outliers like local Moran's I does.

What are some other differences between these methods that are relevant to deciding which to use to study the prevalence of a disease by census tract? Is it just whether you'd like to find outliers or not? If local Moran's I has no disadvantages since it can find both outliers and clusters why does anyone use Getis-Ord Gi* if it can only find clusters and not outliers? What is the advantages for it? Is the way neighbors are found different for one vs the other?

👍︎ 10

💬︎

👤︎ u/Confident_Proposal

📅︎ Sep 06 2021

🚨︎ report

Cluster analysis goodness of fit (r/DataScience) reddit.com/r/datascience/…

👍︎ 3

💬︎

👤︎ u/Peerism1

📅︎ Oct 29 2021

🚨︎ report

We analyzed 3,154 Elasticsearch clusters, and discovered that many people are making the same mistakes. Want to learn more about the most common mistakes you can avoid? Link to the full analysis in the first comment.

👍︎ 16

💬︎

👤︎ u/OpsterHQ

📅︎ Aug 05 2021

🚨︎ report

Fraser Health: COVID-19 School Cluster and Transmission Analysis May 7, 2021 (Leaked) drive.google.com/file/d/1…

👍︎ 60

💬︎

👤︎ u/sereniti81

📅︎ May 11 2021

🚨︎ report

Scientists identify two pathways to self-harm, using self-organising maps and cluster analysis in data from early childhood to adolescence doi.org/10.1016/j.jaac.20…

👍︎ 151

💬︎

👤︎ u/DrDalmaijer

📅︎ Jun 15 2021

🚨︎ report

OLYMPICS/ COVID-19 staff cluster in Olympic hotel hosting Brazilian delegation | The Asahi Shimbun: Breaking News, Japan News and Analysis asahi.com/sp/ajw/articles…

👍︎ 27

💬︎

👤︎ u/Combini_chicken

📅︎ Jul 14 2021

🚨︎ report

The Earth has a pulse -- a 27.5-million-year cycle of geological activity: Analysis of 260 million years of major geological events finds recurring clusters 27.5 million years apart eurekalert.org/pub_releas…

👍︎ 35

💬︎

👤︎ u/DoremusJessup

📅︎ Jun 19 2021

🚨︎ report

[OC] Cluster analysis of parliament's divisions - see which MPs vote similarly by their proximity

👍︎ 138

💬︎

👤︎ u/930913

📅︎ Feb 28 2021

🚨︎ report

Cluster analysis demonstrates how cringe this community is (click the dot) anvaka.github.io/map-of-r…

👍︎ 41

💬︎

👤︎ u/johannesalthusius

📅︎ Apr 12 2021

🚨︎ report

Implications of COVID-19 vaccination and public health countermeasures on SARS-CoV-2 variants of concern in Canada: evidence from a spatial hierarchical cluster analysis medrxiv.org/content/10.11…

👍︎ 23

💬︎

👤︎ u/icloudbug

📅︎ Jul 05 2021

🚨︎ report

Cluster Analysis for Customer Segmentation

What do you think about a cluster analysis to segment customers? I feel like a manual segmentation is often times better, especially when it comes to more personalized marketing.

I think that clustering makes sense when there are distinct groups in the population. However, I think that these groups can easily be identified by EDA (finding thresholds of certain variables) in most cases. There is a possibility for identifying relevant groups only through cluster analysis, but I think that a) those cases are rare and b) the identified clusters are more complex and not suitable for a segmentation with the objective of a more personalized communication.

Does anybody have a success story where unsupervised clustering led to a customer segmentation that offered a business value (e.g. because of more personalized communication)? I am struggling to imagine a scenario where unsupervised clustering comes up with better clusters for personalized communications compared to manually building clusters by thresholds/criteria for clusters.

👍︎ 3

💬︎

👤︎ u/tstr2609

📅︎ Jun 14 2021

🚨︎ report

Italian virologist research on virus genome by Massimo Galli, head of Sacco Hospital, Milan: Lodi cluster came from China via Germany: other clusters directly from China. The virus jumped from animal to human around 23rd October 2019 according to the virus genome analysis and its mutations. ilgiornale.it/news/cronac…

👍︎ 5k

💬︎

👤︎ u/cottoncandy240

📅︎ Mar 21 2020

🚨︎ report

Multivariate Cluster Analysis

I have conducted a multiple choice survey, and now I want to analyze it using a cluster analysis.

Since it is a multiple choice, multiple variant survey k-mean clustering or fuzzy k-mean clustering seemed like the obvious choice. However, since I am completely new to clustering, I am not sure if it is the best approach after all.

Furthermore, the multiple answers are giving me a hard time, as it does not allow a clear clustering of one-to-one answers. Should I produce distinct data sets for each multiple answer? It would expand the data set way beyond 250 entries, and I am not even sure if it would provide any useful answers because it won't be able to represent when multiple answers were given.

Here is a link to my data sheet. In the data multiple answers are distinguished by ";" and *blank answers* are indicated by NULL.

What I am trying to achieve is to find clusters in my participants or at the very least a correlation between answers (e.g. If Answer 1 was answered in question 1, then participants also were likely to answer option 4 in question number 17).

Are there any algorithms that can deal with the data? How would you approach this data set?

Edit: added passage of what I want to achieve with the analysis and updated the link

👍︎ 11

💬︎

👤︎ u/Diveboi

📅︎ May 21 2021

🚨︎ report

From NSW Health: Analysis of cases linked to Avalon cluster by exposure event

👍︎ 106

💬︎

👤︎ u/iknowitall322

📅︎ Jan 01 2021

🚨︎ report

It's Even More Of a Cluster... Bomb Than I Thought. Further musings and even some analysis on what's really going on with Mark 20 Rockeyes in DCS. youtu.be/qdjiZ6Gluxk

👍︎ 35

💬︎

👤︎ u/sidekick65

📅︎ Jan 26 2021

🚨︎ report

[Q] Can we use dissimilarity measure in cluster analysis instead of distance measure?

Hi! I’d like to ask that weather we could use dissimilarity measure in any cluster analysis algorithm (e.g. k-mean, hierarchal or k-median etc) instead of distance measure as well? If the distance measure do not provide the true difference between the objects? If not, then why not? Thanks

👍︎ 3

💬︎

👤︎ u/zeeshas901

📅︎ Apr 25 2021

🚨︎ report

Local Moran’s I vs Getis-Ord Gi* for Cluster/Hotspot Analysis

Hello! I'm a health researcher, and I was wondering if someone here could perhaps help explain the difference between these two methods and when it might be more appropriate to use one vs the other for cluster/hotspot analysis (I'm studying diabetes and prediabetes prevalence by census tract)? I imagine these spatial/geographic methods are used quite extensively in environmental and geosciences so I thought I'd try asking here!

I understand that local Moran's I allows you to identify both statistically significant clusters (areas of high value surrounded by high value neighbors aka hotspots & areas of low value surrounded by low value neighbors aka coldspots) as well as statistically significant outliers (areas of high value surrounded by low value neighbors and areas of low value surrounded by high value neighbors) .

By contrast Getis-Ord Gi* only seems to find statistically significant clusters/hotspots/coldspots and not outliers like local Moran's I does.

What are some other differences between these methods that are relevant to deciding which to use to study the prevalence of a disease by census tract? Is it just whether you'd like to find outliers or not? If local Moran's I has no disadvantages since it can find both outliers and clusters why does anyone use Getis-Ord Gi* if it can only find clusters and not outliers? What is the advantages for it? Is the way neighbors are found different for one vs the other?

👍︎ 2

💬︎

👤︎ u/Confident_Proposal

📅︎ Sep 06 2021

🚨︎ report

Local Moran’s I vs Getis-Ord Gi* for Cluster/Hotspot Analysis

I was wondering if someone could perhaps help explain the difference between these two methods and when it might be more appropriate to use one vs the other for cluster/hotspot analysis?

I understand that local Moran's I allows you to identify both statistically significant clusters (areas of high value surrounded by high value neighbors aka hotspots & areas of low value surrounded by low value neighbors aka coldspots) as well as statistically significant outliers (areas of high value surrounded by low value neighbors and areas of low value surrounded by high value neighbors) .

By contrast Getis-Ord Gi* only seems to find statistically significant clusters/hotspots/coldspots and not outliers like local Moran's I does.

What are some other differences between these methods that are relevant to deciding which to use to study the prevalence of a disease by census tract? Is it just whether you'd like to find outliers or not? If local Moran's I has no disadvantages since it can find both outliers and clusters why does anyone use Getis-Ord Gi* if it can only find clusters and not outliers? What is the advantages for it? Is the way neighbors are found different for one vs the other?

👍︎ 3

💬︎

👤︎ u/Confident_Proposal

📅︎ Sep 06 2021

🚨︎ report