PowerQuery/ Data Model- False detection of many-to-many relationship

I am trying to link two tables in the Data Model. I get an error message, "The relationship cannot be created because each column contains duplicate values. Select at least one column that contains only unique values."

This should be a one-to-many relationship. The unique side of that relationship is just a concatentation of country name + ZIP code. I've trimmed/cleaned, removed duplicates, pivoted and counted, Group-By counted, loaded to Table and tested in Excel, etc. Basically tested for duplicates in several ways, and these values are absolutely unique.

Any ideas? I doubt this is an error on Excel's part, since I would imagine this feature is pretty damn robust.

👍︎ 3

💬︎

👤︎ u/Freiling

📅︎ Aug 01 2019

🚨︎ report

Jeff Timmer: "I’m a political professional and I’ve been crunching numbers since before many of you were born. This is data science, not opinion. My models indicate the reason Bernie isn’t winning is because relatively few people are voting for him." twitter.com/jefftimmer/st…

👍︎ 253

💬︎

👤︎ u/Liberty_Chip_Cookies

📅︎ Mar 19 2020

🚨︎ report

Nested has_many through questions? Or how to data model...

I'm trying to write an application where players can be members of many leagues (a has_many through). I've managed to get that far. Conversely, a league has many users. Not too difficult for the most part. After that, a league has many seasons and a season has many games. This is where I'm getting confused. A game should have many players, but only players that are a subset of the league they're in. So, a league might have 6 players total, but a game might only have 4 of those players. I'm stuck trying to figure out how to make this association work. Should game have many players through a join table called "game_players" and have many "game_players" through the league_memberships join table? I think I need a separate join table, but I'm not 100% sure.

EDIT: I should add that the players of the game will "has_one" "score", which will keep track of statistics for that game.

👍︎ 5

💬︎

👤︎ u/phekno

📅︎ Jul 26 2016

🚨︎ report

How many of you use Data Models in Excel? What are your thoughts?

My work gives me access to Lynda/Linked In Learning and after completing several courses I decided I’d start an Excel Business Analytics course. It’s main focuses are Power Query (Get & Transform), Data Models and finally Power Pivot and DAX.

After the brief into covering what they all are capable of, I was kind of blown away. Outside of a single person no one in my organisation that I interact with uses this stuff and it seems incredibly powerful. My mind was instantly flooded with existing workflows and new projects that could use this.

I see to see a lot of people mentioning power query on this sub, but do many of you use the Data Model? Seems great for setting up smaller projects that wouldn’t warrant using a real database. What are you thoughts? Do you find this (as well as Power Query / Power Pivot / DAX) useful enough to use day to day?

👍︎ 2

💬︎

👤︎ u/B_Mac_86

📅︎ Feb 21 2020

🚨︎ report

Out of all the companies that have data science departments how many have models that are actually running in production?

👍︎ 5

💬︎

👤︎ u/ai_yoda

📅︎ Dec 07 2018

🚨︎ report

TIL: There is no scientific data that shows Alcoholics Anonymous is effective and many addiction researchers think the AA model of abstinence from alcohol can increase the likelihood of binge drinking

This is the best tl;dr I could make, original reduced by 97%. (I'm a bot)

> Numerous clinical trials have confirmed that the method is effective, and in 2001 Sinclair published a paper in the journal Alcohol and Alcoholism reporting a 78 percent success rate in helping patients reduce their drinking to about 10 drinks a week.

> His ideas came to be illustrated by a chart showing how alcoholics progressed from occasionally drinking for relief, to sneaking drinks, to guilt, and so on until they hit bottom and then recovered.

> Researchers at the National Council on Alcoholism charged that the news would lead alcoholics to falsely believe they could drink safely.

> If Betty Ford and Elizabeth Taylor could declare that they were alcoholics and seek help, so too could ordinary people who struggled with drinking.

> These changes gradually bring about a crucial shift: instead of drinking to feel good, the person ends up drinking to avoid feeling bad. Alcohol also damages the prefrontal cortex, which is responsible for judging risks and regulating behavior-one reason some people keep drinking even as they realize that the habit is destroying their lives.

> In a follow-up study two years later, the patients had fewer days of heavy drinking, and more days of no drinking, than did a group of 20 alcohol-dependent patients who were told to abstain from drinking entirely.

Summary Source | FAQ | Feedback | Top keywords: drink^#1 alcohol^#2 treatment^#3 patient^#4 research^#5

Post found in /r/todayilearned, /r/alcoholicsanonymous, /r/9M9H9E9, /r/TrueReddit, [/r/psychology](http://np.reddit.com/r/psychology/comments

... keep reading on reddit ➡

👍︎ 35

💬︎

👤︎ u/autotldr

📅︎ Sep 23 2017

🚨︎ report

Subaru Altered Fuel Economy, Emissions Data For Years, For Many Models ibtimes.com/subaru-altere…

👍︎ 9

💬︎

👤︎ u/Prettygame4Ausername

📅︎ Apr 24 2018

🚨︎ report

Subaru Altered Fuel Economy, Emissions Data For Years, For Many Models ibtimes.com/subaru-altere…

👍︎ 7

💬︎

👤︎ u/Prettygame4Ausername

📅︎ Apr 24 2018

🚨︎ report

Link to an in depth report about the effectiveness of artillery against armor. Starts on P.8. (Additional comment by u/Gunnergoz: IMO If this data was accurately modeled in WoT, arty would be buffed so much that many players would likely quit the game.) sill-www.army.mil/firesbu…

👍︎ 7

💬︎

👤︎ u/gunnergoz

📅︎ Jun 21 2019

🚨︎ report

Data modeling, matching results and criterias, many to many?

Ok, Im am trying to figur out how to best structure this incoming data to make analysis later possible/easy.

Imagine you will get data of students withs exam grades in many subjects. You also have data of universities with lower limits of accaptance for a set of subjects (each U has a different sets, but many subjects overlap).

In the analysis later you will want to select a university and get insights in to which students are passing in the different subjects, the deviation from lower limits in the U's set of subjects, rate the students by grad in that universitis set of subjetcs etc. So the main thing is that you want to filter subjects by U, and analyse/use the lower limit in each sucject of that U.

I think whats wraping my head is that the only thing linking the students and the universities is the subjects. So my idea is to build two tables, one with students with grades in subjects and one with universities with lower limits in subjects. But doing it this way I will have a many to many relationship as there are many repeting subjects in the both the student table and the universities tabel.

This could lead to many-to-many questions (have not tried that before), and maby it is, but somehow I feel that Im missing something obvious here, that there is a much simpler solution to this? Maby excel is a better place to solve this?

Ideas?

👍︎ 3

💬︎

👤︎ u/quelevator

📅︎ May 31 2019

🚨︎ report

Data model question. Could use some thoughts/ideas.

I am making a database for a collectible card game in effort to keep track of cards owned and possible wishlist.

When I started designing the data model I was thinking of 2 tables. A CARD table that has the information for each individual card, and a COLLECTION table that would hold the information for quantity owned of a specific card. The CARD table is where I am having the most questions and could use some outside input.

Originally I was thinking that since there are multiple card types (i.e. Character, Equipment, Plot-Twist, and Action) I decided that a CARD_TYPE table may be needed for the sake of normalization. The CARD_TYPE table would have a one to many relationship with the CARD table.

Digging deeper into it I find where I am having the most issues is with the fact that each card type I listed above has it's own set of specific attributes (i.e. Character has defense and attack, Equipment has a bonus, Plot-Twists have a duration, etc..). I would think that these attributes should be in the CARD table since they hold information related to a specific card. However this would mean each record in the CARD table would have all of the attributes a card could have, but those that didn't pertain (i.e. Duration for a Character card) would be null. This doesn't seem right to me from a structural/normalization, but I am not sure maybe I am over thinking it.

The other option I had thought of was to create a table for each specific card type, but that seems like it would make querying for an entire collection rather messy and error prone.

In any case if someone has some other thoughts or ideas they are for sure welcome. I am no expert in this arena so any input is appreciated.

👍︎ 2

💬︎

👤︎ u/3spot

📅︎ Sep 13 2015

🚨︎ report

Beam (ebay) - a distributed knowledge graph store, sometimes called an RDF store or a triple store. Knowledge graphs are suitable for modeling data that is highly interconnected by many types of relationships, like encyclopedic information about the world github.com/eBay/beam

👍︎ 6

💬︎

👤︎ u/magenta_placenta

📅︎ May 03 2019

🚨︎ report

Thanks to so many of your suggestions and questions, our amazing teams at IHME have expanded our COVID19 web information healthdata.org/covid

👍︎ 97

💬︎

👤︎ u/ITthatMatters

📅︎ Mar 31 2020

🚨︎ report

My boss said to me, “You are the worst train operator ever. How many trains have you derailed in the past year?”

I said, “I’m not sure. It’s so hard to keep track.”

👍︎ 11k

💬︎

👤︎ u/labink

📅︎ Jan 05 2020

🚨︎ report

Data Scientists, since the term has been so blurried and tries to encompass so many things - what exactly do you do? What skills do you use? What tasks do you do daily? How does your daily life look like?

Everything is pretty much in the title so this body is just a placeholder to not get automoderatorwhacked.

👍︎ 489

💬︎

👤︎ u/MotorAdhesive4

📅︎ Feb 24 2019

🚨︎ report

How do I populate db with Many-to-Many (with "through") Relationship?

I am completely confused about how to add data to a db with a many-to-many relationship.

What I'm working on is a small flashcard app that allows users to learn a list of words (language learning).

But I'm not sure how I should add words for a particular Student Course combo.

models.py

class Language(Model):
    name = CharField(max_length=60)

class Course(Model):
    name = CharField(max_length=60)
    description = TextField()
    language = ForeignKey(Language, on_delete=SET_NULL, null=True)

class Word(Model):
    word = CharField(max_length=60)
    language = ForeignKey(Language, on_delete=PROTECT)

class StudentCourse(Model):
    course = ForeignKey(Course, on_delete=CASCADE)
    student = ForeignKey(User, on_delete=CASCADE)
    words = ManyToManyField(Word, through='courses.WordProgress')

class WordProgress(Model):
    student_course = ForeignKey(StudentCourse, on_delete=CASCADE)
    word = ForeignKey(Word, on_delete=CASCADE)
    last_review = DateTimeField(auto_now_add=True)
    next_review_sec = IntegerField(default=60)

I have added data to Language, Course, Word, and User. But where I get stuck is how I can initialize a course with a set of words. How is ManyToManyField populated? I've seen some suggestions that it should be an array? In my case, should it be an array of words? i.e. {course: 1, user: 1, words: [1, 2, 3]} Or is this populated in some other way? Also, does through have an impact on any of this?

Thanks for your help -- it is greatly appreciated!

👍︎ 11

💬︎

👤︎ u/zacheism

📅︎ Mar 25 2020

🚨︎ report

With many climate change predictions having been wrong how do you know which ones to trust anymore?

👍︎ 2

💬︎

👤︎ u/lordraz0r

📅︎ Jan 28 2020

🚨︎ report

How to feature engineer a categorical (non-ordinal) feature with many (>5000) unique values?

I'm working on a data set that is decently large, >2 million rows. One of the features is a serial number for a product (so probably decently important). Usually, I would OneHotEncode them, but this isn't really possible because having 5000 columns would break my model. Any suggestions for how you would deal with this? Thanks.

Edit: Some clarification. This is a pretty massive dataset, with over two million rows. I have two variables of concern. One is the ad campaign, of which there are 5704 unique values. The other is the product number of the game being played when the advertisement launches, which has >29k unique features. When predicting clicking on an ad in a mobile game, I think both are probably too important to drop. The other thing is, I'm aware that it isn't realistic or super intelligent to have these as features, but this is a take home technical interview, so it's part of the challenge to make due. Thanks again everyone for your thoughtful answers.

👍︎ 13

💬︎

👤︎ u/str8cokane

📅︎ Oct 05 2019

🚨︎ report

I want to identify the gender of a subject of a headshot photo. I have thousands of images, but only 100 subjects. IE I have many images of each person. What steps do I need to take to maximize accuracy when applied to the headshot of a person that the model has not yet seen?

👍︎ 14

💬︎

👤︎ u/RaddestOfComrades

📅︎ Jul 29 2019

🚨︎ report

TIL that many companies are actively hoarding enormous amounts of sensor and telematic data, called dark data, for the day that technology can process it and allow them to profit from it. en.wikipedia.org/wiki/Dar…

👍︎ 628

💬︎

👤︎ u/ILoveCarwashes

📅︎ Jan 26 2018

🚨︎ report

At a complete loss about how to analyze this survey. All categorical or likert data with many variables of interest.

I am analyzing a survey of about 30 questions. All of the survey questions are on a likert scale from 1 to 10. The demographic information is categorical. For example age is "18-25", "25-45",... not numeric.

The goals of this analysis are unclear, but thankfully it is exploratory so I am not overly concerned about controlling error rates. The goal, though, is to explain support of several types of policy (measured on a likert scale). The PI would like to see why people support certain types of policy (not a causal claim) . I do not think this is possible, but maybe I am wrong.

I have no idea how to actually model this. Traditional regression modeling seems out of the question. There are around ten types of policy they want explained. I would have to run 10 regression models, changing the independent variable each time. I am not concerned with strictly controlling the Type I error because this is exploratory, but I do not trust results from so many regression models. I will definitely chase a false positive. Another complication is that everything is either 0-to-10 likert or categorical.

Does anyone have any similar experience? The high level problem is analyzing likert survey data with many variables of interests.

👍︎ 15

💬︎

👤︎ u/webbed_feets

📅︎ May 20 2019

🚨︎ report

I'm a moderator of r/crowdsourcedmedical. We're a sub helping to develop 3D printable files for doctors. They're running out of many simple, plastic parts that they rely on, to fight COVID-19. We have a number of dev teams working on our site and I need devs for the Backend!

We are still actively working on developing a DB diagram for the site and once we have that, we are going to implement the appropriate models and get those into the DB. Then we are going to work hard on getting the endpoints working so the front-end developers can start working with data and ensuring that it all works as it is supposed to. This is just a brief overview of what we are doing so you have an idea of what we are working on before coming on-board.

The skills I am looking for, though not required are:

Python 3.7+ experience, especially with type hinting
SQLAlchemy experience
RESTful API experience or knowledge of the concepts with FastAPI experience preferred
Willingness to learn, try your best, and problem solve issues without giving up
Basic understanding of Databases and creating relationships between tables (this will be done in SQLAlchemy)

If you think you can help out, even if you have never touched any of this before, please let us know! Some of these concepts are not terribly difficult to learn and would require a few hours of playing around with the code to get an idea of how things work in addition to reading up on the docs for FastAPI.

You can view FastAPI here: https://fastapi.tiangolo.com/features/ I look forward to working with you and thank you for any help or assistance you can provide to help get this project off the ground.

Please take a look at the pinned post for onboarding instructions and how to get started. Thank you!

Discord Link - Join Us: https://discord.gg/Ctu3c3w

👍︎ 67

💬︎

👤︎ u/SoftDev90

📅︎ Mar 23 2020

🚨︎ report

How many introverts does it take to change a lightbulb?

Why does it have to be a group activity

👍︎ 176

💬︎

👤︎ u/BlankPhotos

📅︎ Apr 02 2020

🚨︎ report

My boss yelled at me the other day, “You’ve got to be the worst train driver in history. How many trains did you derail last year?"

I said, "Can’t say for sure, it’s so hard to keep track!"

👍︎ 8k

💬︎

👤︎ u/johnnydarko-

📅︎ Oct 18 2018

🚨︎ report

Many people on reddit have 'lost' the ability to think rationally and independently.

At least on some subs including this one it has become almost impossible to engage in objective conversations that are not loaded by preemptive ideas, subjective emotions and assertions.

👍︎ 5

💬︎

👤︎ u/undercover_system

📅︎ May 08 2019

🚨︎ report

I lost my colon to Crohn's disease. I extracted a 3D model from my pre-surgery MRI data, and printed it for a gift to my surgeon.

👍︎ 4k

💬︎

👤︎ u/redruM69

📅︎ Feb 08 2020

🚨︎ report

How Many Samples are Needed to Learn a Convolutional Neural Network? arxiv.org/abs/1805.07883

👍︎ 25

💬︎

👤︎ u/nobodykid23

📅︎ Nov 24 2018

🚨︎ report

Analysis of how many Rotten Tomatoes user reviews are likely sufficient for a given movie's audience score to be reasonably close to the final audience score

This post presents an analysis of how many Rotten Tomatoes user reviews are likely sufficient for a given movie's audience score to be reasonably close to the final audience score. For analysis purposes, I'll make the possibly unwarranted assumption that the arrival of user reviews over time is random. This assumption allows us to model the user reviews that exist at an earlier point in time as a random sampling of the user reviews that exist at a later time. More specifically, let's address the question "Assuming that the arrival of user reviews over time is random, at what number of user ratings is it likely sufficient that the audience score will be within 1 percentage point of the final audience score?"

Theory

The displayed audience score appears to always be an integer, which I'll assume is rounded to the nearest integer. Let the variable FR = the displayed rounded final audience score, which is an integer. Let the variable FP = the pre-rounded final audience score, which is a real number known to Rotten Tomatoes but unknown to the public. Let the variable SR = the displayed rounded audience score at some point in time, which is an integer. Let the variable SP = the pre-rounded value of SR, which is a real number known to Rotten Tomatoes but unknown to the public. Our question therefore can be stated as "Assuming that the arrival of user reviews over time is random, at what number of user ratings for SR is it likely sufficient that (FR - 1) <= SR <= (FR + 1)?" Since the random sampling theory that we will be using uses real numbers, to take into account rounding, the real number range that we're interested in is (FR - 1) - 0.5 <= SP <= (FR +1) + 0.4999999999. Our question therefore can be stated as "Assuming that the arrival of user reviews over time is random, at what number of user ratings for SP is it likely sufficient that (FR - 1.5) <= SP <= (FR + 1.4999999999)?"

The random sampling theory that we will use requires specification of the desired margin of error percentage for the calculation of the sample size needed to have a specified confidence level that the population value is within the margin of error percentage of the sample value. In our case the population is the user reviews that FP is calculated from, the population value is FP, the sample is the user reviews that SP is calculated from, and the sample value is SP. The value of FP for which the difference between FP and (FR - 1.5) [the left side of our desir

... keep reading on reddit ➡

👍︎ 73

💬︎

👤︎ u/Wiskkey

📅︎ Dec 28 2019

🚨︎ report

In the NBA how many games does it take to spot what could be a full-season trend?

So every year, according to this site, there are at least handful of teams that are 53% or above against the spread meaning if you had bet on them every single game you would have profited. Most years there are teams hovering around 60% too (for example this year the Kings are 62.1% ATS, last year Boston was 62.6%). That’s a whole lot of potential profit

Trends can be viewed here: https://www.teamrankings.com/nba/trends/ats_trends/?range=yearly_2017_2018

My question is after how many games at the start of an NBA season do these trends usually appear where they can be considered reliable? 20 games? To the point where you can blindly lay money down on a team the remainder of the season and be pretty sure of profit?

Discuss

Thanks

👍︎ 8

💬︎

👤︎ u/BronxBombers32

📅︎ Feb 25 2019

🚨︎ report

Thanks for explaining the word “many” to me.

It means a lot.

👍︎ 11k

💬︎

👤︎ u/alixcamille

📅︎ Apr 09 2018

🚨︎ report

The Governor of Veneto (Italy) defends decision to test the whole town of first cases (6800 tests), says data will be used to study the outbreak and model it

Source: ANSA

According to Luca Zaia, Governor of Veneto, everyone in Vo’ Euganeo has been tested for Coronavirus. The positivity rate is 1.7%.

Vo’ Euganeo is the town in which the first cases of Coronavirus in Veneto have appeared.

👍︎ 2k

💬︎

👤︎ u/alastairlerouge

📅︎ Feb 28 2020

🚨︎ report

Many supposed features of Alzheimers are artifacts of the mouse models used. The findings of over 3000 publications may need to be re-evaluated. jneurosci.org/content/36/…

👍︎ 252

💬︎

👤︎ u/NacogdochesTom

📅︎ Sep 25 2016

🚨︎ report

How many flies does it take to screw in a light bulb?

Only 2. But the real question is, how did they get in there?

👍︎ 70

💬︎

👤︎ u/ToeKneeh

📅︎ Dec 27 2019

🚨︎ report

Which websites/channels do you use to keep up to date with the latest papers/models/trends in data science?

If any. The field progresses so quickly that it's hard keeping up to date with everything. How do you do it?

👍︎ 31

💬︎

👤︎ u/LexMeat

📅︎ Jan 17 2021

🚨︎ report

Any ideas on how to know how many remaining classic cars by model and years on earth?

Hey there, I’m working on a website dedicated to classic cars, and I’m looking for different datas of which the remaining number of cars « still alive » by model and years.

As an example: I know that there has been exactly 318 Jaguar E-Type Serie 1 4.2L Fixed-head Coupe Right Hand Driving produced in 1966

How to know how many are totally destructed and how many are still alive on a global scale?

Any ideas to share? Best Piem

👍︎ 9

💬︎

👤︎ u/Chapso

📅︎ Dec 28 2019

🚨︎ report

Oil choices for cast iron and how they affect flavor. (Probably put on here many times by others, just trying to help)

👍︎ 1k

💬︎

👤︎ u/undocumentedsource

📅︎ Jan 30 2021

🚨︎ report

Edge wants to reunite with Christian for one more tag run: “I would love it! I think at some point that would need to happen - just because there’s so many opportunities there.” metro.co.uk/2021/02/03/ww…

👍︎ 1k

💬︎

👤︎ u/mcgjourno

📅︎ Feb 03 2021

🚨︎ report

Many have correctly identified the problem of government and the corporation. What is missing is a sound game theoretical design to manifest a peer-to-peer solution in an adversarial environment. If we are going to achieve the peer-to-peer vision, we must do so through economic persuasion....

Many have correctly identified the problem of government and the corporation. What is missing is a sound game theoretical design to manifest a peer-to-peer solution in an adversarial environment. If we are going to achieve the peer-to-peer vision, we must do so through economic persuasion. Neither government nor the corporation is going to willfully give up its power. If we are to succeed, we must create a new solution which makes the old obsolete. Fortunately there is a way to do this.

The core value proposition of all business models is trust/identity. All businesses match a buyer and a seller and collect a fee for doing so. The reason we are reliant on centralized businesses is because we cannot establish trust among two peers without going through a centralized intermediary. If we solve this problem, then we have made the centralized business obsolete.

A system of peer-to-peer trust can be rather easily built on the blockchain. Users register their identity to a designated Bitcoin address and link all of their profile characteristics/data to this identity. Attestation from peers is used to build a public profile. Using this public immutable profile, instead of the individual broadcasting trade requests to a single monopoly provider like Facebook, Uber, eBay, Netflix, etc, the user broadcasts trade requests to thousands of competing matchmakers. Each matchmaker recreates a complete copy of the network. All prospective counterparties can verify the trustworthiness of each other on the blockchain. They need not trust the monopolized intermediary to verify each other's reputation. By solving peer-to-peer trust, you have eliminated the very thing that allows corporations to attain monopolies -- the network effect.

By eliminating the network effect of trust, the centralized business is made obsolete. By making the centralized business obsolete, government is indirectly made obsolete. Government has no power without the centralized business implementing its decrees out of fear of getting shut down. It simply cannot control an economy of billions of individuals transacting peer-to-peer. By making the centralized business and government obsolete, the cost of trade is reduced 40%+. This is because the cost of the middleman, regulation, and extortion is removed from trade as it cannot be enforced in a peer-to-peer framework. Users will join this system not out of ideological persuasion, but because they personally profit from it.

👍︎ 41

💬︎

👤︎ u/guyfawkesfp

📅︎ Jun 30 2019

🚨︎ report

Came back to WoW after many years. Happy they still put funny NPC names into the game :D