ELI5: Master Data Management and Product Information Management

I recently joined a company that is a data consulting firm and they've explained MDM and PIM to me so many times, but I am still not getting it. Can anyone here break it down in a way so that I can understand it and explain it to others?

👍︎ 3
💬︎
📅︎ Jun 29 2020
🚨︎ report
Live Stream - Advanced SQL, Theory, Modeling - Master Data Management

I used to post here weekly with my "Master Data Management in SQL" videos. Got to a point in the system where there was more code to do, before I could create a decent 10-15 minute summary.

I decided to start live streaming as I work on some of these massive modules. Basically talking out loud, making data driven code. Then making more consice vids of the work done.

Example:

I'm currently working on a data driven ETL process, purely in sql. The idea is to get adventureworks2017 into this hyper normalized / abstracted model.

Normalizing the ETL idea, I discovered I needed a couple modules.

A script generating module

A loading / mapping module

The last live stream I did, I was able to create a table structure that will generate scripts, gave it some data, and tested it.

I say scripts and not TSQL because, well most languages are just rules. Items inside of containers.

In past videos i made mssql create some MySQL for me, and some Javascript for my node api. Theoretically, this module can do both. Possibly at the same time, and maybe even execute some python to adjust files / call mysql... Thus creating a system that can create other systems.

No SSIS, no third party tools (except devArts sql complete and sql monitor).. pure MSSQL 2019 and keynote for visuals. I can create files using BCP, and mssql can execute python so I think there isn't much I can't do.

Any ways, thats the crazy picture, baby steps first. Adventureworks to a hyper normalized mdm system. Data driven script creation (that executes to obtain records / datasets), then an ingestion / mapping module and see where it takes me. Maybe a start ingesting r/datasets using a pure sql solution.

Live streams are not on a schedule, tomorrow ~4pm pst, I'll be continuing with this script generator to get it configured to start pulling adventure works datasets.

What's on a schedule is - On Saturdays, 1pm-~3pm PST, I'll be doing some Q and A or chatting depending on the crowd, or continuing with this project. BYOData if you have specific modeling / query questions.

Tldr Quarantine got me bored

https://www.youtube.com/elricsims-dataarchitect

👍︎ 35
💬︎
📅︎ May 01 2020
🚨︎ report
Master Data Management in Salesforce?

Hi, I am a project manager at a University IT shop and we are looking at implementing a Master Data Management system for Contacts (Mostly Bio-Demo data) in Salesforce. Does anyone have any experience doing this? It's early stages but we are looking at:

  1. ClearMDM https://www.clearmdm.com/ (seems new with few reviews, maybe a one person shop?)
  2. Informatica Cloud 360 for Salesforce https://video.informatica.com/detail/videos/master-data-management/video/5829423117001/informatica-cloud-customer-360-for-salesforce-demo?autoStart=true
  3. Can this be done with Apsona or Demand Tools?

Any advice or guidance would be appreciated. Thanks!

👍︎ 12
💬︎
👤︎ u/vogonpoem
📅︎ Nov 27 2019
🚨︎ report
SQL Master Data Management Tutorial

UPDATED: (sprint 1 uploaded). Back once again for the regenade master.

Back in October, I started a series describing master data management in SQL, constructing it from an empty SQL instance with unexpected support from this community.

I was new to YouTube, to teaching this model, to editing and recording. The original series was long format (30-120 minutes/vid) and packed with information that might have been difficult to consume all at once.

Since then I took a step back, and analyzed how I could present this information, these concepts, the system wide normalization, and end to end development.

So I'm taking another stab at it.

I present to you all, a new hands on series for the same model.

I am approaching this with feature atomicity, 10 minute (not including intro/title/endscreen) videos with very specific actions, driven by a public trello kanban board and with an agile mindset.

10 video sprints where 1-8 will be development, 9 will be tech debt removal, and 10 will be the retrospective and planning for the next sprint. So that is 90 minutes of code (9 episodes) and one retro/planning.

For those architects wanting to understand this model, and all developers and analysts that want to understand this pattern based architecture, the retro videos are where you should ask questions (what went well, what didnt, what should we commit to). Which is why I'm posting this now.

I'm pretty excited, and a little impulsive, so I'm sharing the playlist now. I have the intro and first 5 up, 6-10 (finishing the first sprint) at home, recorded, but in editing.

I'm still thinking about a release cycle. I want to give enough time to record and edit the videos and enough time to consume them, so I may be doing full sprint drops to allow for comments on the retro and change accordingly. Possibly 2 week gaps, mimicking an average sprint.

Hope you all enjoy, and see you again soon.

SQL Master Data Management Tutorial: https://www.youtube.com/playlist?list=PLPI9hmrj2Vd_ntg2HACiHYeYl7iRvrgPb

👍︎ 64
💬︎
📅︎ Jun 27 2019
🚨︎ report
How many technology does your business have? Found on LinkedIn, from a master data management consultancy.
👍︎ 20
💬︎
📅︎ Aug 16 2019
🚨︎ report
SQL Master Data Management - Next Steps?

I can't believe its finally coming to an 'end'!

Im on Sprint 4, today at 2pm a 'Data Entry for a data driven table creator' video will release, tomorrow the code.

It was pretty cool seeing SQL create 5 tables, 100+ defaults, 10 indexes, 25 views, loading all data into our personal information schema... in one second.

But that means... the system part of the system is basically done.

Sprint 5 will contain the last few table shapes: History, Archive, Snapshot, Domain, and Shared Attributes.. but then... well its time to get real data in there.

There will be a 'Sprint 6 and Beyond', where I will need to add system functionality but it really becomes feature add instead of 'we need to do this before we proceed'

So, what is next my fellow SQLians? What are you interested in / want to understand more?

Because the next few series are either very short, or required time to mature (training / learning), I want to keep myself busy. So let me know!

SQL and Python, Social Media Ingestion - The next series will be a short one, just because the foundation is pretty simple.

  • Using SQL to generate Python that calls APIs and stores data. Focuses on 4 subreddits, SQL, MSSQL, DataSets and 20Questions.
  • Branching out to Twitter, LinkedIn
  • Hopefully... pulling and loading datasets from r DataSets

SQL and Python, Posting via APIs - If we can pull data, we can push data.

  • Post to Reddit, Twitter, LinkedIn
  • Post to DokuWiki (Self documenting database)
  • Post to Wordpress

Decisions, Decisions, Decisions - Very quick series. The Decision structure will be the base for ML / ANN / Calcs.

  • Using decisions to classify SQL/MSSQL posts in various formats. You could consider this clustering, but its not deep. Just an introduction to Decisions and the power it wields.

Machine Learning and ANNs, 20 Questions in SQL - This series will take time to mature, because it is data dependent.

  • What good is a Bot if you cant interact with it? Lets make SQL play 20 questions.
  • Using the decision structure and dynamic weighting and loopback (or back prop..w/e you wanna call it) that will allow SQL to learn new questions to ask. SQL needs to continually adjust its probability of success when trying to figure out the next question to ask based on what is known and what question will get SQL to the final answer.
  • This is where r 20Questions comes into play. I need the 'reddit bot' to start pulling data, and to write a transformation script to remove the po
... keep reading on reddit ➡

👍︎ 15
💬︎
📅︎ Aug 07 2019
🚨︎ report
[HIRING] Help designing architecture for master data management/predictive analytics platform

We have aspirations to develop a platform that integrates data from 30+ companies (that are part of an association that works together to provide complementary products) to produce a common view of customers by piecing information together collected across different sources. The information will be used to generate product recommendations directed to the customer of other products they might be interested in from other companies; and help the companies better understand their market share relative to the target populations.

Our goal is to offer this as a service to allow other companies to join the partnership and build interfaces to share their data in exchange for insights, while providing a user interface to allow users to manage how their data may be collected and used.

OUR ASK: We intend on building this solution incrementally and are seeking help with developing an architecture that will support this growth and identifying potential open source solutions that can be leveraged to meet our long term needs.

REQUIRED EXPERIENCE/KNOWLEDGE: PHP/Python stacks; strong understanding of master data management disciplines; big data technologies; experience with developing and maintaining recommendation systems

PREFERRED EXPERIENCE/KNOWLEDGE: Experience with customer data management and/or data management platforms; experience with Facebook and Google advertising platforms

Must be based in the US, Canada, or Mexico due to IP restrictions

Deliverables: High level solution architecture; product recommendations to meet requirements; rough level-of-effort for each phase; conversations to discuss alternatives Rate: up to $50/hr Total budget: $500 (looking for starting point on the deliverables)

👍︎ 2
💬︎
👤︎ u/DC-er
📅︎ Dec 04 2019
🚨︎ report
The DATE Dimension - SQL Master Data Management

Super awesome master data management date dimension.

Can store:

  • Multiple calendars.
  • Multiple date firsts.
  • tons of FORMAT and DATENAME options to reduce code and unify practices
  • supports campaigns, calendar to calendar translations, event ranges, single day holidays, range holidays.

Bonus: little sneak peak of what's to come. I used SQL to execute python and return json to call Holidata.net and was able to import 500~ holidays from different locales into this dimension.

Wahbam: https://youtu.be/t-aayfZJJ8c

👍︎ 26
💬︎
📅︎ Aug 16 2019
🚨︎ report
Master Data Management more than De-duplication?

Not sure if this is the right place, but I am a project manager at an organization which is currently implementing a Master Data Management system. We are thinking about going with SSB's "Central Intelligence" product to mange our Master Data Management system. We are using it solely for consolidating bio-demo contact records from multiple internal-organizations.

It is a pricey product, but as far as I can tell all it does is de-duplication of multiple contacts for the same person to create a "composite" or "Golden" record and you don't get to chose the rules that are used to do this (for instance, weighting DOB over same-middle-name for a merge).

Am I missing something? It seems like a python script could do the same thing? For the price of the product for a year we could hire two devs for a year and there wouldn't be reoccurring contract fees.

Any advice to help me sound smart at my next meeting would be appreciated.

👍︎ 3
💬︎
👤︎ u/vogonpoem
📅︎ Nov 21 2019
🚨︎ report
Part 2: Master Data Management in SQL from an empty instance

Hey all,

Just uploaded another 4 hours of video. Lets just call it day 2.

In the first day, we built 64 databases, 512 files, 34 datatypes, created the first shape, and started filling the System tables. The journey continues.

Day 2

008 - The Datapoint Subject: filling DataPoint CFT-B (JSON use)

009 - RCFT-R The Relationship Shape: A new table shape for holding relationships.

010 - Synonyms: a short video for creating synonyms so we are not using the database name part in code.

011 - SpecificDataSetNumber: Making sure the SpecificDataSetNumber is accurate, so we can start to use Relationships.

012 - The Process Subject: Filling Process and Process Parameters so we can call/relate functions and procedures to concepts / actions

013 - Definitions - DataPointType Defaults: Getting prepared to get rid of some of the scripts we used in 004 and 005. Setting and modifying defaults are now data-driven. (Relationship Shape use)

Thanks all for the awesome feedback so far.

Day 3 will be about Definitions (Clustered, Unique, NonClustered indices) for proper physical storage (all tables have no clustered index right now), Data Stewardship and Change Management (Views for controlling data flow, procedures -insert/update/delete- and functions for consistent access of data).

Playlist (starts at 001): https://www.youtube.com/watch?v=pf5gWPLfWNo&list=PLPI9hmrj2Vd8m_w3By7pI7xlkXMRzNYzS

👍︎ 40
💬︎
📅︎ Oct 21 2018
🚨︎ report
Building a Master Data Management system in SQL from an empty instance.

Hey all,

Just started making youtube vids on some knowledge I gained over the years. I'm building some very interesting data patterns and normalizing databases, files, code, and as much as possible.

Here are the first 7 videos I have created. These are videos of me coding and talking, and I'm trying to do everything on video with very little offline work.

From the Beginning - Master Data Management in SQL: https://www.youtube.com/playlist?list=PLPI9hmrj2Vd8m_w3By7pI7xlkXMRzNYzS

001 - building 64 databases

002 - normalizing storage / 512 files

003 - defining data, 34 datatypes across all databases

004 - the first shape (of 4) and the subject... subject

005 - the system subject and data driven table creation

006 - the database subject

007 - the dataset subject

I believe all skill levels will be able to take something away from this series.

Thanks all.

👍︎ 48
💬︎
📅︎ Oct 15 2018
🚨︎ report
History, Snapshot, and Archive data. Master data management in SQL

Can you take a snapshot of a date / range well after that time has passed? You should be able to. You never know when you would have wished to snap a certain point in time. You should also be able to access that data whenever you please.

That's the power and flexibility that comes with understanding these master data management design patterns.

This video explains how we will be setting up the archive, history and snapshot tables, what columns they contain, and what data we need to make sure these history, snapshot, and archive tables are created when we create our primary subject tables.

https://youtu.be/U5lmWlRmuhs

The code behind the creation of these table? 2pm pst today.

Sprint 5 is here. We are finishing up the system. Friday:14 table Date dimension (sql calling python demo), saturday: procedures that will version data, sunday and byond: the final two shapes (domain, shared attributes).

Only a week or two away from making this system download the internet. Starting with a reddit bot in SQL that constructs and executes python (instead of the usual python saving to sql).

Full tutorial: https://www.youtube.com/playlist?list=PLPI9hmrj2Vd_ntg2HACiHYeYl7iRvrgPb

👍︎ 9
💬︎
📅︎ Aug 15 2019
🚨︎ report
SQL Master Data Management Tutorial - Sprint 2

Hey all,

Sprint 1 is up, you can find the complete SQL MDM Tutorial playlist here: https://www.youtube.com/watch?v=TmUrH8C9vus&list=PLPI9hmrj2Vd_ntg2HACiHYeYl7iRvrgPb

The second sprint is on a scheduled release*.

011- Building the Information Schema, our more powerful version of SQL's INFORMATION_SCHEMA : https://www.youtube.com/watch?v=qXCiTLr-TXY

012 - Classifying and filling the Database subject: https://www.youtube.com/watch?v=PLoZ8c3EqNo

013 - Defining Subjects (groups of tables) and classifying them: https://www.youtube.com/watch?v=YTX35SKL9QQ

014 - Classifying and filling the DataSet subject: is on a scheduled release for Monday.

015 - Classifying and Typing DataPoints: is on a scheduled release for Tuesday. We will be using FOR JSON in dynamic SQL to get an array of columns for a dataset and iterate through that list to fill our tables appropriately

016 - Classifying and filling Process subject: is on a scheduled release for Wednesday. We will be creating an 'ad-hoc' classifying procedure instead of trying to program/extract intelligence into our object names. Just another approach to classifying data.

017 - Classifying and filling the ProcessParameter subject: is on a scheduled release for Thursday. We will be creating a procedure that accepts JSON and fills the ProcessParameter table. This is a great example of why our Information Schema (dbSystemMain) is more powerful than SQL's. We will be able to query parameters in table valued functions and know what datatype a scalar function returns.

018 - The Relationship Shape / Subject: is on a scheduled release for Friday. One relationship table to relate any record in any table to any record in any table. This table will help us get prepared for the next sprint, where we will be focusing on Definitions. Clustered Index builders, Unique Index builders, Non clustered index builders, how we can create abstract containers for tables/columns to dynamically build our tables based on that Subject's classification.. etc.

019 - Tech Debt 2: is on a scheduled release for Saturday, Just trying to keep the system healthy while we code it. We can probably replace some @table variables now that we have the information schema up.

020 - Sprint

... keep reading on reddit ➡

👍︎ 42
💬︎
📅︎ Jul 07 2019
🚨︎ report
Day 3 - Master Data Management in SQL

Hey all,

Day 1 and Day 2 (8+- hours of coding) have produced purposeful databases, files, tables, synonyms, and a way to manage defaults and table shapes. Day 3 (4+- hours) has produced all of the views and Insert/Update/Delete procedures that I need to be able to effectively implement a Change Management flow for the data. We can turn the act of recording Archive and History data on and off and - due to the consistency of the data management system - I was able to create a wiki.

This wiki will display the dependencies of datasets, processes, and even rips the comments out of my procedures. The comment extraction lets me document a change log and communicate pseudo code without needing to access SQL.

Day 4 will consist of Error handling, procedurally generating the functions we need (removing the large WHERE IN() chains). We will take a look at two more shapes (Domain and Share Attribute) which will start to include dbCore into our development instead of just dbSystem and dbInterface.

Hope you all enjoy.

Here is the Day 3 Playlist.

014 - Clustered Indexes - Creating a Definition to control what goes in a clustered index.

015 - Base and Select views - The first two views for data governance.

016 - Insert Update and Delete views - the last 3 views for data governance.

017 - Archive and History tables - Creating tables to house History and Archive data.

018 - The Insert Update and Delete Procedures - Getting away from insert statements to get a hold on Change Management

019 - The Wiki - "Self-documenting" data management system. Using bcp to create a Wiki.

👍︎ 39
💬︎
📅︎ Oct 28 2018
🚨︎ report
Sprint 3 (halfway) - SQL Master Data Management Tutorial

Hey all,

Just wanted to stop by / talk about where we are (for those building it at home), and try and coerce others into watching this series. When I start importing AdventureWorks, WWI, and data from you guys... you might be sad that you hadn't started building one of these at home.

Sprint 3 playlist is here

The important pieces of this sprint (so far)...

021 - SpecificDataSetNumber - Most underrated piece to the entire puzzle. Something as simple as a table-unique value in every row in said table (same value, not an identity) being the first value in your clustered index, dramatically improves performance. How? Well, it pushes the tables data together on the physical layer (pages/extents) instead of a shuffled deck of cards when you apply a cluster to the PK, or any non table-unique value. Plus its a must when attempting to relate any record in any table to any record in any table... using the same relationship/junction/bridge/xref table.

022 - The Base View - Honestly... Nothing too special in here. The base view converts our consistent column names into system unique column names. If we wanted to (and we shouldn't) we should be able to CREATE VIEW AS SELECT * FROM table,table,table,^(n) without getting an ambiguous column error. This allows us to create data-driven dynamic queries without worrying about aliases.

023 - Select, Insert, Update, Delete View Creator - Data Entry That single relationship table... We're filling it with data using my 2nd most favorite database tool in existence. Google sheets. Copying data from Sheets and pasting it directly into an EDIT TOP # table window has saved me soo much time. Could I have queried it using OPENDATASOURCE? yes. Will I create a dynamic procedure that will parameterize OPENDATASOURCE so it knows what columns are in the file / drivers to use / Header First Row? Yes, but not this early. This video gives some decent insight into classifying data and a ton of insight into the Relationship table... because we are relating tblRefDefinitions to tblRefDataPointTypes to control what columns are allowed in what views!

024 - Select, Insert, Update, Delete View Creator - Code - Now that the data is there, we can create a proced

... keep reading on reddit ➡

👍︎ 8
💬︎
📅︎ Jul 21 2019
🚨︎ report
Advanced MSSQL Master Data Management Tutorial

Hey all,

I hang out in r/SQL mostly, career data architect, and I thought this sub would be a great place to blatantly self-promote a series I'm working on.

I rebooted a series dedicated to an end to end, Microsoft SQL based, master data managment system and would like to share it with those looking for a hands on architecture project.

Unlike the previous series (long format / information overload), I will be following an agile development methodology. 10 episodes a sprint (8 development, 1 tech debt removal, 1 retro/ planning) 10 minutes per video.

There is some information that can be used in a non MDM environment, like the physical model normalization techniques. However, this series will be normalizing everything from databases to data types, naming to accessing data.

Below is the introduction to the series, plus the first sprint is already uploaded.

The unfortunate side of this series, is that you will need to have had a bit of experience working with SQL, or at least be comfortable with dynamic SQL and data driven logic.

Thanks for reading and I hope you enjoy.

https://youtu.be/TmUrH8C9vus

👍︎ 7
💬︎
📅︎ Jun 28 2019
🚨︎ report
[September 3, 2019] Upcoming Webinar: Master Data Management and Your Enterprise community.intersystems.co…
👍︎ 2
💬︎
📅︎ Aug 19 2019
🚨︎ report
Day 4 - Master Data Management in SQL

Hey all,

Day 4 is here and some things have changed.

  1. I added a third party tool to my development (which appears heavily in 21). DevArt's SQL Complete.
  2. I added a new member to the team, Steve.

Just like in the real world, adding a new team member slows production down a little bit. I think it was worth it. Steve has been following along and building a system like this, so I thought it was a great idea to have someone who has a ton of SQL experience (but not with this system) to add commentary and ask questions while I code (Pair Programming).

So here is Day 4

020 - Taxonomy Functions - Procedurally creating 4 functions per subject to reduce code and remove the WHERE IN () chains in our code. These functions also take care of our CleansableFlag and CleansedTo logic, which allows us to repair records without changing code.

021 - Tech Debt 001 - There will be several more of these videos coming out. This particular Tech Debt video showcases how we will be using the Change Management Procedures, Taxonomy Functions and demos the History table (because we fixed the Table_Finder procedure and updated our DataSetTypes). We started the series with Tokenized scripts, and as the system develops we will be removing those scripts and converting them to data-driven options.

022 - Domain and Shared Attribute Shapes - Creating the final two Major shapes in our system. Having Steve around caused this video to be very long, however, the questions asked and comments made brought some significant value to the process. Not only did we create and discuss what a Domain and Shared Attribute are, but I also demo some multi-tenancy ideas, and how to utilize both Shapes. We create the Group domain and the Name shared attribute, which means we finally have a way to create Tenants, Templates of data (to remove the rest of our scripts), and are very close to being able to grab Adventure Works and Wide World Importers and load them into the system.

Thanks all for tuning in. Only a few more days left in the construction of the system part of this system.

👍︎ 25
💬︎
📅︎ Nov 04 2018
🚨︎ report
Master data management - give me all data

Long story short: i run a master data management YouTube series, mssql.

I'm about 5 weeks from completion and I want all the data.

I have geo (8mil cities), date (multiple calendar support), and unit of measurement (8500 units)...

I'm looking for the most complete sets this sub has, and will be loading them all over time in my series.

Recipes, food + calories, weapon / military equipment, game related data, languages (etymology), race/ethnicity, what ever... just has to be as complete as possible.

Difficulty: no health data.

👍︎ 3
💬︎
📅︎ Jul 16 2019
🚨︎ report
Is Master Data Management more than deduplificaiton?

Not sure if this is the right place, but I am a project manager at an organization which is currently implementing a Master Data Management system. We are thinking about going with SSB's "Central Intelligence" product to mange our Master Data Management system. We are using it solely for consolidating bio-demo contact records from multiple organizations.

It is a pricey product, but as far as I can tell all it does is de-duplication of multiple contacts for the same person to create a "composite" or "Golden" record and you don't get to chose the rules that are used to do this (for instance, weighting DOB over same-middle-name for a merge).

Am I missing something? It seems like a python script could do the same thing? For the price of the product for a year we could hire two devs for a year and there wouldn't be reoccurring contract fees.

Any advice to help me sound smart at my next meeting would be appreciated.

👍︎ 6
💬︎
👤︎ u/vogonpoem
📅︎ Nov 20 2019
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.