A list of puns related to "Descriptive Statistics"
So, I have a sample with a table of lots of different characteristics, e.g. age, sex, depression, anxiety, exercise
I know how to find e.g. mean age of whole sample, but what if I wanted to find mean of those in sample who have depression, or those who have anxiety? How do I code this to create that subgroup?
Going beyond that, what about how to find mean (or other descriptive stats) of depression only, anxiety only, depression and anxiety...
Thank you so so much. You are all amazing
Hey everyone,
Just trying to figure out what the precise differences between summary statistics and descriptive statistics are. I'd appreciate any insights.
Thanks,
Hi all
I came through this article on the economist saying that ‘Squid game’ was 100x more popular than average show.
https://www.economist.com/graphic-detail/2021/10/15/squid-game-is-only-the-latest-netflix-hit-to-break-the-language-barrier
Could someone enlighten me on how to calculate such descriptive stat? Thanks!
Good morning!
I'm back again with what's probably some basic math that I may be overthinking but just can't seem to make it work!
We're working with a table of data regarding 200 vehicles with their list and final sale price plus the days to sell. I've been able to do the majority of the assignment; however, I can't seem to figure out how to calculate the expected number of days to sell. Can someone point me to either what equations I can utilize or if there is something else from the table?
For the data I'm working with, I found an average list price of 32.162K versus sale of 29.743K. The average days to sell is 32.905 (versus a median of 31 and range of 69 if it matters). The question I'm working with: if I have a vehicle with a list price of 30.0K what can I expect the sale price to be and expected number of days to sell.
Thank you for your help! I can provide more information if needed but am hoping there's something basic I'm missing and it can be determined from what's above!
Hello Babies and Ladies,
[DISCLAIMER: if you are a baby, please learn to read first. This is not a quality statistical analysis, it's the result of me having fun with the RedditAPI and some Python. Also, everything refers to the Top 100 Posts of all time!]
one thing unites all the people in this community: Moon Farming. So I went out on an epic quest to find the recipe for the maximum amount of upvotes on your daily shit posts. In this post, I will present to you the result of an analysis of the top 100 posts of all time on r/CryptoCurrency.
I used the RedditAPI and Python to scrape the top 100 posts of all time. Next, I deleted polls and posts with a "MOD"-flair. That brings our dataset down to 91 remaining posts. 100 was the maximum amount I could scrape using the API. In the next step, I did some preprocessing by setting all Flairs to uppercase so that "COMEDY" and "Comedy" will be treated as the same. I checked whether the posts or titles contained external links and finally counted the words in the titles and posts.
Let's dive right into some descriptive statistics! First up: Upvotes
https://preview.redd.it/owx3n9t55ye71.png?width=3135&format=png&auto=webp&s=8a060b57a43638036966edd04420de4d7746f521
In this boxplot you can see that the top post of all time had around 53.000 upvotes. Apart from the other 5 outliers, the number of upvotes range from about 12.000 to 27.000. The median number of upvotes is roughly 17.000. You can also see that the data are slighty right-skewed.
We see a similar distribution for Text Lengths:
https://preview.redd.it/tcvr7amm6ye71.png?width=3135&format=png&auto=webp&s=9b89bedcbed7843260497f9223912ded835403fd
One madlad really put in some effort and cranked out 3600 words! Many of you will notice that there are posts with 0 words, that means there was only a title and maybe a link.
https://preview.redd.it/c82612hp7ye71.png?width=3164&format=png&auto=webp&s=fd5d9f6a4ae915c5d42b3dc4242f8d36bc4dd69e
Titles are distributed pretty evenly with only a single outlier at 50 words. Nothing special here.
Next, let's check out the Flairs:
https://preview.redd.it/nhydyu4v7ye71.png?width=3409&format=png&auto=webp&s=7d2e4b4351f2f0ba99131a0d935dc3cc0023d42c
Trading was used 12 times, followed by Finance, Strategy, Comedy and Focussed Discussion.
I also counted the most common Nouns/Proper Nouns in both the titles and the texts:
There's a lot of buzz around prediction and data science, which has the potential to be learned or even be a focus area in an MS in Stats. But what about the descriptive side of things?
I have a dataframe that looks like this:
'data.frame': 200005 obs. of 23 variables:
$ ID : chr "A16000" "A17000" "A17000" "A17000" "A18000"...
$ Date : Date, format: "2018-04-10" "2017-03-21" "2017-04-22" "2017-05-09" ...
$ Educ : num 0 1 0 0 1 NA NA 1 NA NA ...
$ Returned : chr "0" "0" "0" "0" ...
$ Burrowed : chr "7" "45" "10" "10" ...
$ Freq : chr "1" "10" "10" "2" ...
$ Grp : chr NA "A" "A" "A" "A" "A" "B" "A" ...
and I want to find some descriptive statistics answers using it but I don't know what would be the best codes. For example, I want to know: 1: does group A Return more than Group B? 2: Do customers with higher Freq burrow more? 3: Are customers with 10 or higher Freq more likely to have Educ?
I have tried using different tables, but they bring back wierd values I can't make heads or tails of. Example of what I've tried:
#Comparison of returns by Groups
xtabs(~df$Returned+df$Grp,data=df)
rowPerc(xtabs(~df$Returned+df$Grp,data=df)) #numerical summary of variable
bargraph(~df$Grp,groups=df$Returned,data=df,type="percent") #graphical summary of variable
favstats(~df$Grp|df$Returned,data=df) #numerical summary of variable
bwplot(df$Returned~df$Grp,data=df) :
Hello hello. I'm able to calculate the difference between rows using the following calculation: ZN(SUM([Measurename])) - LOOKUP(ZN(SUM([Measurename])), -1) This works for when I render a table. What I'd like to do next is explore these differences with some descriptive statistics and charts but having a little trouble. For example, I'd like to calculate the average and variance for all these differences, and plot a histogram of these differences. Any ideas on how this can be accomplished? Many thanks
From baseball to economics, the most basic task when working with data is to summarize a great deal of information. There are some 330 million residents in the United States. A spreadsheet with the name and income history of every American would contain all the information we could ever want about the economic health of the country--yet it would also be so unwieldy as to tell us nothing at all. The irony is that more data can often present less clarity. So we simplify. We perform calculations that reduce a complex array of data into a handful of numbers that describe data, just as we might encapsulate a complex, multifaceted Olympic gymnastics performance with one number: 9.8.
The good news is that these descriptive statistics give us a manageable and meaningful summary of the underlying phenomenon. The bad news is that any simplification invites abuse. Descriptive statistics can be like online dating profiles: technically accurate and yet pretty darn misleading (pg 17, 2013).
-
A baseballtól a közgazdaságtanig az adatokkal való munka során a legalapvetőbb feladat a rengeteg információ összefoglalása. Az Egyesült Államokban mintegy 330 millió lakos él. Egy táblázat, amely minden amerikai nevét és jövedelemtörténetét tartalmazza, minden olyan információt tartalmaz, amelyet valaha is akarhatunk az ország gazdasági állapotáról-ugyanakkor olyan nehézkes is lenne, hogy semmit sem mond nekünk. Az irónia az, hogy több adat gyakran kevésbé egyértelmű. Tehát leegyszerűsítjük. Olyan számításokat hajtunk végre, amelyek egy összetett adatsort egy maroknyi számra redukálnak, amelyek leírják az adatokat, ahogyan egy komplex, sokoldalú olimpiai gimnasztikai teljesítményt egy számmal is lefoglalnánk: 9.8.
A jó hír az, hogy ezek a leíró statisztikák jól kezelhető és értelmes összefoglalót adnak a mögöttes jelenségről. A rossz hír az, hogy minden egyszerűsítés visszaélést von maga után. A leíró statisztikák olyanok lehetnek, mint az online társkereső profilok: technikailag pontosak és mégis nagyon félrevezetőek (2013. o. 17.)
I feel that it's obvious that the government has detailed covid-19 data such as the exact number of Maltese vs tourists testing positive for Covid every day, where they are likely to have contracted it from etc, yet there is very little transparency on this and the public should have access to this information. I hear people talking about the covid situation in Malta as if they have it all figured out, and they know what's causing the numbers to remain up way too often. So are people being given detailed statistics somewhere and I don't know about it, or are they talking out of their asses? You can't really form a worthwhile opinion unless you look at the data. Where is the data?
Is it ok to use descriptive statistics on just sample and not on the whole population? Or is getting the sample just a inferential statistics thing?
Hey, I've created a tutorial on how to calculate descriptive statistics using the summary() function in the R programming language: https://statisticsglobe.com/summary-function-in-r/
Hi everyone,
I built a ruby gem (C++ native extension) to compute descriptive statistics (min, max, mean, median, quartiles and standard deviation) on multivariate datasets (2D arrays) in ruby. It is ~11x faster at computing these summary stats than an optimal algorithm in hand-written ruby and ~4.7x faster than the next fastest native extension available as a gem. The high performance is achieved by leveraging native code and SIMD intrinsics (on platforms where they are available) to parallelize computations on the CPU while still being effectively single threaded.
Altogether it was mostly a fun way to explore writing a native ruby extension, as well as hand optimising C++ code using SIMD intrinsics. Let me know what you think! I'm also not really a C++ expert, so any review/suggestions are welcome.
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.