A list of puns related to "Database normalization"
Sorry for the click-bait-y title.
Given: a columnar storage system, like an OLAP database like Snowflake. Or even a columnar storage format like parquet.
If you had a column that had time as a text string, e.g. "11:30", would it improve database normalization to split that into one field with "11", one with ":", and the last with "30"? My thought is that the number of unique values you need to store in the original case is 720, assuming a 12-hr format, and 1440 assuming a 24-hr format. For the latter case, even though you would have more column objects, the number of unique values you have to carry is 73 , assuming a 12-hr format and 85, assuming a 24-hr format.
Wouldn't that compress better? and not degrade the information persistence.
This is assuming a dataset that is written to significantly more than its read so joining the fields back when queried isn't that big of a deal.
If so, would it be possible to write a program that just does this basic level of scanning through a database and finds ways to optimize its storage
Hi Reddit!
I am a full stack junior web/mobile developer and I am having a lot of doubts about databases.
The only database that I know how to use is DynamoDB, which is a NoSQL database, and I am frequently searching and learning the difference between NoSQL and SQL. My point is about data normalization, which is a "feature" of SQL database, right? So, isn't the data validation and structure made by the frontend strictly followed? I mean, if my application will use/create the same data structure, why should I concern about data normalization?
I know that you can send requests from outside the application, but with a simple validation, you can block that request. I think that I am getting so used to AWS's way to create APIs and databases that I am messing everything in my mind.
Thank you in advance!!!
Hello,
I have done a fair amount of reading on database normalization. I feel as though I understand the concept and why it is implemented. However, i struggle with putting it into practice (particulary at the 2nd and 3rd normal forms). I think I may just have the wrong thought process. From the second you are tasked with trying to normalizing a database (from the 2nd form, onwards), what are you specifically looking for? Does anybody have tips or tricks?
Thanks in advance
For example say we have an application where the user can store API URLs along with headers and parameters. For example, the user could save a custom Google search searching for the query "cats", and the parameters would be q=cats.
1 table named "Api" storing the URL, and Json columns for headers and parameters
4 tables
Separate tables for Api, headers, params, and KeyValuePairs (the last is optional, but I see it in the code I'm inheriting)
---
Which is the better design?
My hunch is that design #1 (denormalization) is a lot simpler to work with as a developer and likely more performant as well, and I don't see any benefit of the additional complexity of having additional database tables in this case aside from maybe making it easier to run analytics on the data we're storing (which is a moot point in this scenario).
But that being said I come from more of a frontend background, so tell me if I'm off here.
---
EDIT: The database is Postgres, so it supports Json. This isn't a side project so I can't just swap out the database.
EDIT: Actually the more I think about it, #2 (separate tables) doesn't actually sound that bad (minus the KeyValuePairs table which I think is useless). It's a bit more work upfront, but it keeps things more robust, particularly since I'm working on a team with other people. By having a looser schema, we could say hire a contractor who inputs the json incorrectly and causes everything to fail.
EDIT: Leaning back towards denormalization since the APIs aren't going to change much, and we likely won't be querying by headers/parameters so there's little benefit to normalization here. Thanks for the answers!
Trying to normalize first database for practice.
Company ID, Company Name, Amenity 1, Amenity 2, Amenity 3, Address 1, Address 2, City, State, Postal Code
How would you normalize this?
How we denormalized our table to get ride of complex and slow queries.
https://medium.com/@taukeer/de-normalization-of-database-for-read-performance-220cd50ac827
Hello all, I'm working on a database schema for a small program I'm making. The program's goal is to take an item we manufacture and create a list of components (Bill of Material). I have three database tables that seem to accomplish this well, but the third database isn't in a normal form and I don't know how to fix it.
Manufactured Unit
PK: UnitID
VarChar(255): Unit Name
Boolean: NeedsPump
Equipment
PK: EquipmentID
VarChar(255): Cut Sheet
VarChar(255): Equipment Name
UnitEquipment
FK: UnitID
FK: EquipmentID
Int: Quantity
The natural fit seems to be combining UnitID and EquipmentID into a primary key and then using it. That said, I'm not sure of the underlying mechanisms for how the two IDs are concatenated and I have concerns. For instance, if unitID is 1 and equipmentID is 12, the concatenated PK would be '112'; this would be the same as unitID of 12 and equipmentID of 1. Are these concerns unfounded? Does making the primary key a combo of the two FKs put this database in 3NF? Sorry if this question is extremely simple, I'm not an SQL expert.
Hi new to DB here. I have always been fascinated about how various categories of data can be organized and normalized. Is there any online site that is similar to free code camp that lets you normalize tables from scratch and tells you if itβs correct or not? I feel thatβs the best way to learn and get better at it. Please feel free to mention other good ways to practice. Thanks.
https://imgur.com/a/hJVTuFH. Using Access Bible 2016. Trying normalization. Beginner on Ch. 8 Queries
Customer Contact is parent. Contact and Company are children?
Tried separate tables for account manager and commodity.
So why not leave in main table?
I am trying to figure out how python classes work with data tables that I want to save to a postgresql database.
For example, I am trying to make a (practice) program that will keep track of workers in a large company (corporation incorporated) including their name, phone number(s), email address, and physical address. (Basically one could enter information about an employee and have it update the database, read info off the database, or erase an employee.)
One way I could do this is to make a Employee class which has as attributes all of these items. For example:
class Employee: def init (self, f_name, l_name, phone, email, st_address, postal_code, employee_id) #with the definition of each of these items under it, etc.
But when I write these items to a database, I start getting data normalization issues.
For example: What if someone has 3 phone numbers? Or several emails? Or if two of the employees have the same address and gasp shared landline?
If I were just normalizing a (rational) database I would have a separate table for each of these things. For example, I would have a phone number table (with a phone number ID as a primary key) which would have an employee_id associated with it as a foreign key. This way two employees that gasp share a landline won't create issues--i.e. when one of these employees retires and I erase the phone number associated with it, I don't loose data for the other employee.
I am confused in how to set up objects in python dealing with this data.
Should I have a separate class for each table in the database? This seems like it creates an awful lot of files. Maybe that's ok. Is that the convention?
That's how I am thinking of handling it currently. For example, phone number is a class that has a phone_id, type (mobile, landline, etc) and an owner's employee_id--all stored in a table that looks like the class.
Or am I thinking about classes wrong? Am I thinking about classes in the way that I think about tables because I have a little background in databases and no background in OOP?
Is there a different convention for how one structures classes for data that is to be stored and read from a database? (If so what is it, and could you explain why it is done?)
Me: Well, I don't remember. I just ensure I don't have to store redundant data when designing schemas.
Interviewer: Gives a wry smile.
Me: (Fuck, I've messed up the interview. I guess I won't get this job...)
I come back home and read up on normalization.
Five Hours Later:
Interviewer: Congratulations, you got the job!!
On the first day, I install MySQL and insert a subset of the production data.
is_active
clause to filter out archived data.I realized why I got the job.
I'm trying to normalize the following conceptual model. How have I done?
I can't quite understand or describe why I'm having such a hard time understanding this but after dozens of attempts it still hasn't clicked with me. 1NF makes perfect sense in most cases but as soon as the "functional dependency" kicks in, my mind somehow blanks out and I'm never able to consistently "get it right".
Do you have any recommendation of a proper source with good examples/solutions? Preferably one I can try in a MySQL database and see why doing it in a different way would lead to inconsistencies.
This might just be my misconception about DB architecture, so I am looking for a good strategy for normalizing my database structure.
The issue I'm having is that some of the models I'm building share many attributes as entities that are not necessarily related to one another. Here's my example (albeit simplified). I have the following models:
I also have models for the following:
A profile belongs to a user, however an organization does not. Both organizations and profiles have locations, emails, and phones. An application is associated with a profile, however I need to allow profile to be updated by the User (or Admin), however the application must remain frozen once submitted. So, application is really just a representation of the state several aggregated models at the given point of submission. Similarly, I would like the profile to also simply represent state of other associated models. Thus, if a user updates their profile I am retaining the state of their original profile, by creating a new profile record and new address record. So, the profile table would just be a series of foreign keys that reference a combination of records.
The issue is that since Location, Email, Phone, etc. are associated with multiple models. Organizations and Profiles both reference. However, if Rails seems to always put the foreign key on the child model. In my particular situation, it would seem to make more sense to apply the foreign key to the parent model. So, Organization records contain a foreign key for email. So do Profiles, or any other models that use that table.
Is it bad to put the foreign key on the parent model? Will ActiveRecord allow me to do this? Is there a better way to set up these associations?
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.