Why is cluster analysis so rare in econometrics?

I come from a more data science background, where cluster analysis is used very liberally.

But when I read research papers that rely more on econometrics (e.g. business-related ones), I rarely see cluster analysis being used. I've even heard people saying econometricians tend not to trust results from clustering.

πŸ‘︎ 9
πŸ’¬︎
πŸ‘€︎ u/micky04
πŸ“…︎ Nov 30 2021
🚨︎ report
Okinawa fears link between 1st Omicron case and base cluster | The Asahi Shimbun: Breaking News, Japan News and Analysis asahi.com/ajw/articles/14…
πŸ‘︎ 22
πŸ’¬︎
πŸ‘€︎ u/Setagaya-Observer
πŸ“…︎ Dec 18 2021
🚨︎ report
Fraser Health Cluster Analysis, Sept 7-Oct 14: 104 school clusters. "Students, rather than staff, are majority of cases and primarily drive in-school transmission"

https://www.newwestrecord.ca/coronavirus-covid-19-local-news/fraser-health-schools-saw-2000-covid-19-cases-in-six-weeks-and-most-of-those-were-students-4699352

104 school clusters (Sept 7-Oct 14), 21% involved staff.

Staff were the index cases in 11.5% of total clusters.

"Students, rather than staff, are majority of cases and primarily drive in-school transmission."

https://preview.redd.it/x3le9jhfbaw71.png?width=1773&format=png&auto=webp&s=7a6655fb2d53874cd7c66d1d9ff81fba6cee3f20

https://newwestschools.ca/wp-content/uploads/2021/10/StaffVaccinePresentation.pdf

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/sereniti81
πŸ“…︎ Oct 29 2021
🚨︎ report
Cluster analysis goodness of fit

I'm doing cluster analysis for a retailer spread across multiple countries. This is a gulf retailers so most of the shoppers here are foreigners. With customer data (aggregated transaction metrics and demographics) from 2018 Jan the request was to create customer personas. I'm just worried that this is not possible as the data will not be a good fit for creating clusters as there was covid. Is there anyway to check if my data will good boundaries for the clusters we create. Elbow plot and silhouette scores doesn't make sense I think, correct me if I'm wrong. Is there anyway to control the boundary conditions of clusters

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/nitz_d_blitz
πŸ“…︎ Oct 28 2021
🚨︎ report
Genetic Bio-Ancestry and Social Construction of Racial Classification in Social Surveys in the Contemporary United States [self-reported as white, 99.5 % were assigned into the β€œwhite” category by the cluster analysis. Of those who self-reported as black, 99.3 % were classified as β€œblack.” ] ncbi.nlm.nih.gov/pmc/arti…
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/razznick
πŸ“…︎ Dec 08 2021
🚨︎ report
The Earth has a pulse -- a 27.5-million-year cycle of geological activity: Analysis of 260 million years of major geological events finds recurring clusters 27.5 million years apart eurekalert.org/pub_releas…
πŸ‘︎ 599
πŸ’¬︎
πŸ‘€︎ u/DoremusJessup
πŸ“…︎ Jun 19 2021
🚨︎ report
Local Moran’s I vs Getis-Ord Gi* for Cluster/Hotspot Analysis

I was wondering if someone could perhaps help explain the difference between these two methods and when it might be more appropriate to use one vs the other for cluster/hotspot analysis?

I understand that local Moran's I allows you to identify both statistically significant clusters (areas of high value surrounded by high value neighbors aka hotspots & areas of low value surrounded by low value neighbors aka coldspots) as well as statistically significant outliers (areas of high value surrounded by low value neighbors and areas of low value surrounded by high value neighbors) .

By contrast Getis-Ord Gi* only seems to find statistically significant clusters/hotspots/coldspots and not outliers like local Moran's I does.

What are some other differences between these methods that are relevant to deciding which to use to study the prevalence of a disease by census tract? Is it just whether you'd like to find outliers or not? If local Moran's I has no disadvantages since it can find both outliers and clusters why does anyone use Getis-Ord Gi* if it can only find clusters and not outliers? What is the advantages for it? Is the way neighbors are found different for one vs the other?

πŸ‘︎ 10
πŸ’¬︎
πŸ“…︎ Sep 06 2021
🚨︎ report
Cluster analysis goodness of fit (r/DataScience) reddit.com/r/datascience/…
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Peerism1
πŸ“…︎ Oct 29 2021
🚨︎ report
We analyzed 3,154 Elasticsearch clusters, and discovered that many people are making the same mistakes. Want to learn more about the most common mistakes you can avoid? Link to the full analysis in the first comment.
πŸ‘︎ 16
πŸ’¬︎
πŸ‘€︎ u/OpsterHQ
πŸ“…︎ Aug 05 2021
🚨︎ report
Fraser Health: COVID-19 School Cluster and Transmission Analysis May 7, 2021 (Leaked) drive.google.com/file/d/1…
πŸ‘︎ 60
πŸ’¬︎
πŸ‘€︎ u/sereniti81
πŸ“…︎ May 11 2021
🚨︎ report
Scientists identify two pathways to self-harm, using self-organising maps and cluster analysis in data from early childhood to adolescence doi.org/10.1016/j.jaac.20…
πŸ‘︎ 151
πŸ’¬︎
πŸ‘€︎ u/DrDalmaijer
πŸ“…︎ Jun 15 2021
🚨︎ report
OLYMPICS/ COVID-19 staff cluster in Olympic hotel hosting Brazilian delegation | The Asahi Shimbun: Breaking News, Japan News and Analysis asahi.com/sp/ajw/articles…
πŸ‘︎ 27
πŸ’¬︎
πŸ‘€︎ u/Combini_chicken
πŸ“…︎ Jul 14 2021
🚨︎ report
The Earth has a pulse -- a 27.5-million-year cycle of geological activity: Analysis of 260 million years of major geological events finds recurring clusters 27.5 million years apart eurekalert.org/pub_releas…
πŸ‘︎ 35
πŸ’¬︎
πŸ‘€︎ u/DoremusJessup
πŸ“…︎ Jun 19 2021
🚨︎ report
[OC] Cluster analysis of parliament's divisions - see which MPs vote similarly by their proximity
πŸ‘︎ 138
πŸ’¬︎
πŸ‘€︎ u/930913
πŸ“…︎ Feb 28 2021
🚨︎ report
Cluster analysis demonstrates how cringe this community is (click the dot) anvaka.github.io/map-of-r…
πŸ‘︎ 41
πŸ’¬︎
πŸ‘€︎ u/johannesalthusius
πŸ“…︎ Apr 12 2021
🚨︎ report
Implications of COVID-19 vaccination and public health countermeasures on SARS-CoV-2 variants of concern in Canada: evidence from a spatial hierarchical cluster analysis medrxiv.org/content/10.11…
πŸ‘︎ 23
πŸ’¬︎
πŸ‘€︎ u/icloudbug
πŸ“…︎ Jul 05 2021
🚨︎ report
Cluster Analysis for Customer Segmentation

What do you think about a cluster analysis to segment customers? I feel like a manual segmentation is often times better, especially when it comes to more personalized marketing.

I think that clustering makes sense when there are distinct groups in the population. However, I think that these groups can easily be identified by EDA (finding thresholds of certain variables) in most cases. There is a possibility for identifying relevant groups only through cluster analysis, but I think that a) those cases are rare and b) the identified clusters are more complex and not suitable for a segmentation with the objective of a more personalized communication.

Does anybody have a success story where unsupervised clustering led to a customer segmentation that offered a business value (e.g. because of more personalized communication)? I am struggling to imagine a scenario where unsupervised clustering comes up with better clusters for personalized communications compared to manually building clusters by thresholds/criteria for clusters.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/tstr2609
πŸ“…︎ Jun 14 2021
🚨︎ report
Italian virologist research on virus genome by Massimo Galli, head of Sacco Hospital, Milan: Lodi cluster came from China via Germany: other clusters directly from China. The virus jumped from animal to human around 23rd October 2019 according to the virus genome analysis and its mutations. ilgiornale.it/news/cronac…
πŸ‘︎ 5k
πŸ’¬︎
πŸ‘€︎ u/cottoncandy240
πŸ“…︎ Mar 21 2020
🚨︎ report
Multivariate Cluster Analysis

I have conducted a multiple choice survey, and now I want to analyze it using a cluster analysis.

Since it is a multiple choice, multiple variant survey k-mean clustering or fuzzy k-mean clustering seemed like the obvious choice. However, since I am completely new to clustering, I am not sure if it is the best approach after all.

Furthermore, the multiple answers are giving me a hard time, as it does not allow a clear clustering of one-to-one answers. Should I produce distinct data sets for each multiple answer? It would expand the data set way beyond 250 entries, and I am not even sure if it would provide any useful answers because it won't be able to represent when multiple answers were given.

Here is a link to my data sheet. In the data multiple answers are distinguished by ";" and *blank answers* are indicated by NULL.

What I am trying to achieve is to find clusters in my participants or at the very least a correlation between answers (e.g. If Answer 1 was answered in question 1, then participants also were likely to answer option 4 in question number 17).

Are there any algorithms that can deal with the data? How would you approach this data set?

Edit: added passage of what I want to achieve with the analysis and updated the link

πŸ‘︎ 11
πŸ’¬︎
πŸ‘€︎ u/Diveboi
πŸ“…︎ May 21 2021
🚨︎ report
From NSW Health: Analysis of cases linked to Avalon cluster by exposure event
πŸ‘︎ 106
πŸ’¬︎
πŸ‘€︎ u/iknowitall322
πŸ“…︎ Jan 01 2021
🚨︎ report
It's Even More Of a Cluster... Bomb Than I Thought. Further musings and even some analysis on what's really going on with Mark 20 Rockeyes in DCS. youtu.be/qdjiZ6Gluxk
πŸ‘︎ 35
πŸ’¬︎
πŸ‘€︎ u/sidekick65
πŸ“…︎ Jan 26 2021
🚨︎ report
[Q] Can we use dissimilarity measure in cluster analysis instead of distance measure?

Hi! I’d like to ask that weather we could use dissimilarity measure in any cluster analysis algorithm (e.g. k-mean, hierarchal or k-median etc) instead of distance measure as well? If the distance measure do not provide the true difference between the objects? If not, then why not? Thanks

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/zeeshas901
πŸ“…︎ Apr 25 2021
🚨︎ report
Local Moran’s I vs Getis-Ord Gi* for Cluster/Hotspot Analysis

Hello! I'm a health researcher, and I was wondering if someone here could perhaps help explain the difference between these two methods and when it might be more appropriate to use one vs the other for cluster/hotspot analysis (I'm studying diabetes and prediabetes prevalence by census tract)? I imagine these spatial/geographic methods are used quite extensively in environmental and geosciences so I thought I'd try asking here!

I understand that local Moran's I allows you to identify both statistically significant clusters (areas of high value surrounded by high value neighbors aka hotspots & areas of low value surrounded by low value neighbors aka coldspots) as well as statistically significant outliers (areas of high value surrounded by low value neighbors and areas of low value surrounded by high value neighbors) .

By contrast Getis-Ord Gi* only seems to find statistically significant clusters/hotspots/coldspots and not outliers like local Moran's I does.

What are some other differences between these methods that are relevant to deciding which to use to study the prevalence of a disease by census tract? Is it just whether you'd like to find outliers or not? If local Moran's I has no disadvantages since it can find both outliers and clusters why does anyone use Getis-Ord Gi* if it can only find clusters and not outliers? What is the advantages for it? Is the way neighbors are found different for one vs the other?

πŸ‘︎ 2
πŸ’¬︎
πŸ“…︎ Sep 06 2021
🚨︎ report
Local Moran’s I vs Getis-Ord Gi* for Cluster/Hotspot Analysis

I was wondering if someone could perhaps help explain the difference between these two methods and when it might be more appropriate to use one vs the other for cluster/hotspot analysis?

I understand that local Moran's I allows you to identify both statistically significant clusters (areas of high value surrounded by high value neighbors aka hotspots & areas of low value surrounded by low value neighbors aka coldspots) as well as statistically significant outliers (areas of high value surrounded by low value neighbors and areas of low value surrounded by high value neighbors) .

By contrast Getis-Ord Gi* only seems to find statistically significant clusters/hotspots/coldspots and not outliers like local Moran's I does.

What are some other differences between these methods that are relevant to deciding which to use to study the prevalence of a disease by census tract? Is it just whether you'd like to find outliers or not? If local Moran's I has no disadvantages since it can find both outliers and clusters why does anyone use Getis-Ord Gi* if it can only find clusters and not outliers? What is the advantages for it? Is the way neighbors are found different for one vs the other?

πŸ‘︎ 3
πŸ’¬︎
πŸ“…︎ Sep 06 2021
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.