R: How to do baseline descriptive statistics of subgroups of a sample in a table?

So, I have a sample with a table of lots of different characteristics, e.g. age, sex, depression, anxiety, exercise

I know how to find e.g. mean age of whole sample, but what if I wanted to find mean of those in sample who have depression, or those who have anxiety? How do I code this to create that subgroup?

Going beyond that, what about how to find mean (or other descriptive stats) of depression only, anxiety only, depression and anxiety...

Thank you so so much. You are all amazing

👍︎ 3

💬︎

👤︎ u/MarinaDoro

📅︎ Jan 16 2022

🚨︎ report

[Fink] Say what you will about its utility, but FIP is a descriptive statistic. Its inputs are only events that happened on the field. There is no estimation there. twitter.com/DevanFink/sta…

👍︎ 44

💬︎

👤︎ u/Rah_Rah_RU_Rah

📅︎ Nov 18 2021

🚨︎ report

Difference bw summary and descriptive statistics?

Hey everyone,

Just trying to figure out what the precise differences between summary statistics and descriptive statistics are. I'd appreciate any insights.

Thanks,

👍︎ 3

💬︎

👤︎ u/feizys

📅︎ Dec 21 2021

🚨︎ report

Descriptive statistics: comparing vs average

Hi all

I came through this article on the economist saying that ‘Squid game’ was 100x more popular than average show.

https://www.economist.com/graphic-detail/2021/10/15/squid-game-is-only-the-latest-netflix-hit-to-break-the-language-barrier

Could someone enlighten me on how to calculate such descriptive stat? Thanks!

👍︎ 2

💬︎

👤︎ u/Impressive-Choice138

📅︎ Dec 27 2021

🚨︎ report

Calculating Expected Days to Sell From Table/Descriptive Statistics

Good morning!

I'm back again with what's probably some basic math that I may be overthinking but just can't seem to make it work!

We're working with a table of data regarding 200 vehicles with their list and final sale price plus the days to sell. I've been able to do the majority of the assignment; however, I can't seem to figure out how to calculate the expected number of days to sell. Can someone point me to either what equations I can utilize or if there is something else from the table?

For the data I'm working with, I found an average list price of 32.162K versus sale of 29.743K. The average days to sell is 32.905 (versus a median of 31 and range of 69 if it matters). The question I'm working with: if I have a vehicle with a list price of 30.0K what can I expect the sale price to be and expected number of days to sell.

Thank you for your help! I can provide more information if needed but am hoping there's something basic I'm missing and it can be determined from what's above!

👍︎ 2

💬︎

👤︎ u/NOLAGruber

📅︎ Dec 16 2021

🚨︎ report

Cracking the Code to Moon Farming - Descriptive Statistics of the 100 Top Posts

Hello Babies and Ladies,

[DISCLAIMER: if you are a baby, please learn to read first. This is not a quality statistical analysis, it's the result of me having fun with the RedditAPI and some Python. Also, everything refers to the Top 100 Posts of all time!]

one thing unites all the people in this community: Moon Farming. So I went out on an epic quest to find the recipe for the maximum amount of upvotes on your daily shit posts. In this post, I will present to you the result of an analysis of the top 100 posts of all time on r/CryptoCurrency.

The Data

I used the RedditAPI and Python to scrape the top 100 posts of all time. Next, I deleted polls and posts with a "MOD"-flair. That brings our dataset down to 91 remaining posts. 100 was the maximum amount I could scrape using the API. In the next step, I did some preprocessing by setting all Flairs to uppercase so that "COMEDY" and "Comedy" will be treated as the same. I checked whether the posts or titles contained external links and finally counted the words in the titles and posts.

Some Stats

Let's dive right into some descriptive statistics! First up: Upvotes

https://preview.redd.it/owx3n9t55ye71.png?width=3135&format=png&auto=webp&s=8a060b57a43638036966edd04420de4d7746f521

In this boxplot you can see that the top post of all time had around 53.000 upvotes. Apart from the other 5 outliers, the number of upvotes range from about 12.000 to 27.000. The median number of upvotes is roughly 17.000. You can also see that the data are slighty right-skewed.

We see a similar distribution for Text Lengths:

https://preview.redd.it/tcvr7amm6ye71.png?width=3135&format=png&auto=webp&s=9b89bedcbed7843260497f9223912ded835403fd

One madlad really put in some effort and cranked out 3600 words! Many of you will notice that there are posts with 0 words, that means there was only a title and maybe a link.

https://preview.redd.it/c82612hp7ye71.png?width=3164&format=png&auto=webp&s=fd5d9f6a4ae915c5d42b3dc4242f8d36bc4dd69e

Titles are distributed pretty evenly with only a single outlier at 50 words. Nothing special here.

Next, let's check out the Flairs:

https://preview.redd.it/nhydyu4v7ye71.png?width=3409&format=png&auto=webp&s=7d2e4b4351f2f0ba99131a0d935dc3cc0023d42c

Trading was used 12 times, followed by Finance, Strategy, Comedy and Focussed Discussion.

I also counted the most common Nouns/Proper Nouns in both the titles and the texts:

... keep reading on reddit ➡

👍︎ 116

💬︎

👤︎ u/CaptainLysander

📅︎ Aug 02 2021

🚨︎ report

[E] Is a MS in Stats useful for a career focusing on descriptive statistics (i.e. Business Intelligence/Analytics)?

There's a lot of buzz around prediction and data science, which has the potential to be learned or even be a focus area in an MS in Stats. But what about the descriptive side of things?

👍︎ 23

💬︎

👤︎ u/Tender_Figs

📅︎ Sep 17 2021

🚨︎ report

Descriptive Statistics comparing groups in R

I have a dataframe that looks like this:

'data.frame': 200005 obs. of 23 variables:

$ ID : chr "A16000" "A17000" "A17000" "A17000" "A18000"...

$ Date : Date, format: "2018-04-10" "2017-03-21" "2017-04-22" "2017-05-09" ...

$ Educ : num 0 1 0 0 1 NA NA 1 NA NA ...

$ Returned : chr "0" "0" "0" "0" ...

$ Burrowed : chr "7" "45" "10" "10" ...

$ Freq : chr "1" "10" "10" "2" ...

$ Grp : chr NA "A" "A" "A" "A" "A" "B" "A" ...

and I want to find some descriptive statistics answers using it but I don't know what would be the best codes. For example, I want to know: 1: does group A Return more than Group B? 2: Do customers with higher Freq burrow more? 3: Are customers with 10 or higher Freq more likely to have Educ?

I have tried using different tables, but they bring back wierd values I can't make heads or tails of. Example of what I've tried:

#Comparison of returns by Groups

xtabs(~df$Returned+df$Grp,data=df)

rowPerc(xtabs(~df$Returned+df$Grp,data=df)) #numerical summary of variable

bargraph(~df$Grp,groups=df$Returned,data=df,type="percent") #graphical summary of variable

favstats(~df$Grp|df$Returned,data=df) #numerical summary of variable

bwplot(df$Returned~df$Grp,data=df) :

👍︎ 2

💬︎

👤︎ u/Cryhyena

📅︎ Nov 08 2021

🚨︎ report

Help request, descriptive statistics on differences between rows

Hello hello. I'm able to calculate the difference between rows using the following calculation: ZN(SUM([Measurename])) - LOOKUP(ZN(SUM([Measurename])), -1) This works for when I render a table. What I'd like to do next is explore these differences with some descriptive statistics and charts but having a little trouble. For example, I'd like to calculate the average and variance for all these differences, and plot a histogram of these differences. Any ideas on how this can be accomplished? Many thanks

👍︎ 3

💬︎

👤︎ u/Carrageous

📅︎ Oct 26 2021

🚨︎ report

"Descriptive Statistics Can Be Like Online Dating Profiles" by Charles Wheelan

From baseball to economics, the most basic task when working with data is to summarize a great deal of information. There are some 330 million residents in the United States. A spreadsheet with the name and income history of every American would contain all the information we could ever want about the economic health of the country--yet it would also be so unwieldy as to tell us nothing at all. The irony is that more data can often present less clarity. So we simplify. We perform calculations that reduce a complex array of data into a handful of numbers that describe data, just as we might encapsulate a complex, multifaceted Olympic gymnastics performance with one number: 9.8.

The good news is that these descriptive statistics give us a manageable and meaningful summary of the underlying phenomenon. The bad news is that any simplification invites abuse. Descriptive statistics can be like online dating profiles: technically accurate and yet pretty darn misleading (pg 17, 2013).

-

A baseballtól a közgazdaságtanig az adatokkal való munka során a legalapvetőbb feladat a rengeteg információ összefoglalása. Az Egyesült Államokban mintegy 330 millió lakos él. Egy táblázat, amely minden amerikai nevét és jövedelemtörténetét tartalmazza, minden olyan információt tartalmaz, amelyet valaha is akarhatunk az ország gazdasági állapotáról-ugyanakkor olyan nehézkes is lenne, hogy semmit sem mond nekünk. Az irónia az, hogy több adat gyakran kevésbé egyértelmű. Tehát leegyszerűsítjük. Olyan számításokat hajtunk végre, amelyek egy összetett adatsort egy maroknyi számra redukálnak, amelyek leírják az adatokat, ahogyan egy komplex, sokoldalú olimpiai gimnasztikai teljesítményt egy számmal is lefoglalnánk: 9.8.

A jó hír az, hogy ezek a leíró statisztikák jól kezelhető és értelmes összefoglalót adnak a mögöttes jelenségről. A rossz hír az, hogy minden egyszerűsítés visszaélést von maga után. A leíró statisztikák olyanok lehetnek, mint az online társkereső profilok: technikailag pontosak és mégis nagyon félrevezetőek (2013. o. 17.)

👍︎ 7

💬︎

👤︎ u/asearchforyou

📅︎ Oct 11 2021

🚨︎ report

Is there any way to see more descriptive covid-19 Malta statistics?

I feel that it's obvious that the government has detailed covid-19 data such as the exact number of Maltese vs tourists testing positive for Covid every day, where they are likely to have contracted it from etc, yet there is very little transparency on this and the public should have access to this information. I hear people talking about the covid situation in Malta as if they have it all figured out, and they know what's causing the numbers to remain up way too often. So are people being given detailed statistics somewhere and I don't know about it, or are they talking out of their asses? You can't really form a worthwhile opinion unless you look at the data. Where is the data?

👍︎ 8

💬︎

👤︎ u/Loriol_13

📅︎ Aug 15 2021

🚨︎ report

[Statistics] Can we use Descriptive Statistics on a sample?

Is it ok to use descriptive statistics on just sample and not on the whole population? Or is getting the sample just a inferential statistics thing?

👍︎ 2

💬︎

👤︎ u/SlowTortol

📅︎ Sep 18 2021

🚨︎ report

Complete Guide To Descriptive Statistics in Python for Beginners analyticsindiamag.com/com…

👍︎ 4

💬︎

👤︎ u/analyticsindiam

📅︎ Aug 04 2021

🚨︎ report

Tutorial on how to calculate descriptive statistics in R

Hey, I've created a tutorial on how to calculate descriptive statistics using the summary() function in the R programming language: https://statisticsglobe.com/summary-function-in-r/

👍︎ 78

💬︎

👤︎ u/JoachimSchork

📅︎ Feb 24 2021

🚨︎ report

High performance descriptive statistics computation in ruby

Hi everyone,

I built a ruby gem (C++ native extension) to compute descriptive statistics (min, max, mean, median, quartiles and standard deviation) on multivariate datasets (2D arrays) in ruby. It is ~11x faster at computing these summary stats than an optimal algorithm in hand-written ruby and ~4.7x faster than the next fastest native extension available as a gem. The high performance is achieved by leveraging native code and SIMD intrinsics (on platforms where they are available) to parallelize computations on the CPU while still being effectively single threaded.

Altogether it was mostly a fun way to explore writing a native ruby extension, as well as hand optimising C++ code using SIMD intrinsics. Let me know what you think! I'm also not really a C++ expert, so any review/suggestions are welcome.

https://github.com/Martin-Nyaga/fast_statistics

👍︎ 58

💬︎

👤︎ u/RegularLayout

📅︎ Mar 23 2021

🚨︎ report

Complete Guide To Descriptive Statistics in Python for Beginners analyticsindiamag.com/com…

👍︎ 2

💬︎

👤︎ u/analyticsindiam

📅︎ Aug 04 2021

🚨︎ report