35 Hilarious Data Architecture Puns

Hey guys! Below is the cosmos architecture overview. For @sentinel.co Read more here https://t.co/G9qbXZoIrN $dVPN #data #Web3 v.redd.it/hodis4modba81

👍︎ 11

💬︎

👤︎ u/ElvisAzubuike69

📅︎ Jan 07 2022

🚨︎ report

Architecture help for Data portfolio project

Hello everyone,

I'm working on a project with a web app in Flask (A dashboard), which obtains a hashtag from the user and then connects to the Twitter Streaming API to obtain in real-time all the tweets that are being generated with that #. Later, they are sent to a PySpark Streaming service to do certain operations with them and finally, return the results to the dashboard client.

So far, I've been using python sockets to do all the communication (Including threading for sending/receiving the tweets and all), but I would like to know if there are other professional tools I might be missing and could make the project more professional and practical (Considering that multiple clients are expected to use the service at the same time)

Thanks!

👍︎ 30

💬︎

👤︎ u/javrom95

📅︎ Dec 31 2021

🚨︎ report

What is the best approach for sharing data among different services in Microservice architecture?

Hello Guys & Gals, Hope everyone is doing well,
The reason I'm raising this question is that we are redesigning some of our applications based on Microservices approach but we are struggling with how to share our data between different services where each service has its own database, to summon it into a picture we are implementing the below system:

Figure 1 - Database per service

For instance, we have a table in Service B called "orders" and this table has a column called "user_id" which points to the end user who submits the order, now we have another table in Service A called "users" which stores the user credentials along with other related data; the challenge here is to show the username along with its orders in Service B.

In a simple application with shared database the query would have been like:

SELECT ORDER.NAME``, ORDER.PRICE, USER.NAME FROM ORDERS

JOIN USERS ON USERS.ID = ORDERS.USER_ID

Now since we have individual databases for each single service, we won't be able to do it like above.
A dummy solution to this major issue is to collect "user_id"s and send it all via HTTP request to Service A and fetch the users' usernames and join the fetched data with "orders" table in Service B.
Though above is a basic example of what we may face during our development but in fact, it may even get into more complicated scenarios if other services are involved (A complicated service like CRM).

We found some solutions like:

Shared Database Architecture
Database Replication

In "Shared Database" and "Database Replication" architectures, although the implementation would be much easier but we will lose the significant benefit of Microservices which is "loosely coupling" like if we change the structure of one table in our database we have to change our code base in other services accordingly, along with not being able to use different database types as well.

What are your thoughts?
Is there any final and definitive solution to the above issues?
Any books, suggestions to help us deal with the problem would be much appreciated.

Thanks in advance.

👍︎ 7

💬︎

👤︎ u/alisri_2021

📅︎ Dec 27 2021

🚨︎ report

Data Base Architecture

I know this is a loaded question but its my first project using mondgo db and I am overwhelmed with how to handle the database portion. Not sure if this type of question is allowed but any help or resources is appreciated.

Data

Doctor Office 1 (where multiple doctors will need an account to access everything below)
- N Patients
  - Every patient has some basic data
  - Each patient has anywhere from 0 - 100 tests
  - Each test will have anywhere from 1 - 100 data points
Doctor Office 2
- etc

Any idea how to organize this the best? Do I use a separate DB for each doctor office? Do I create a collection for "patients" and a new collection for each and every test? How do I relate the tests to a patient?

I was thinking the following:

1 data base per doctor office
There will be a patients collection which holds basic data + references to any of the 0-100 tests
There will be a separate collection for each of the 100 tests
Patients collection will point to a specific test via an id

Any recommendation are appreciated

👍︎ 2

💬︎

👤︎ u/Sliffcak

📅︎ Jan 13 2022

🚨︎ report

Looking for data architecture, migration and troubleshooting resources

So I'm studying for the GCP PDE exam and don't have a huge amount of hands on DE experience. I finished the 6 part Coursera course that's mostly aimed at preparing a DE for working within the GCP ecosystem. It's been helpful but I've been going through question banks/practice tests and realized I'm missing a fair bit of the hands on knowledge about migrating from legacy setups and how to properly troubleshoot things when standing up a new pipeline/tool.

I know a lot of this comes down to hours spent on the job but do any of you have recommendations on resources that were helpful for things like going from Hadoop to Spark workflows, updating legacy SQL code, when to push or pull new data or use a publish/subscribe model etc. Basically looking for resources on data architecture and common troubleshooting.

I'm currently a few chapters into Designing Data Intensive Applications and it's been great at helping me build a mental model of how some of these tools work but so far it seems a bit general. Also this sub has been great too for exposing me to new topics and y'all are great!

TLDR: Looking for resources on data architecture, troubleshooting and hands on stuff. GCP specific would be nice but AWS or open source works too. Halp plz!

👍︎ 3

💬︎

👤︎ u/rlew631

📅︎ Jan 19 2022

🚨︎ report

Core Data MVVM Architecture (CRUD Method) using Combine Framework in SwiftUI youtube.com/watch?v=mpCGU…

👍︎ 3

💬︎

👤︎ u/HaarisIqubal

📅︎ Jan 07 2022

🚨︎ report

Proud to present before you MyTube, a YouTube clone built with Kotlin and Youtube data API, using the MVVM app architecture. v.redd.it/a8y1nma4b6y71

👍︎ 202

💬︎

👤︎ u/PomegranateSudden930

📅︎ Nov 07 2021

🚨︎ report

Core Data MVVM Architecture (CRUD Method) using Combine Framework in SwiftUI youtube.com/watch?v=mpCGU…

👍︎ 13

💬︎

👤︎ u/HaarisIqubal

📅︎ Jan 07 2022

🚨︎ report

Data Engineering in the Microservice architecture ...

Hi everyone,

wanted to get your opinion on the subject on how to best manage ETLs at my current job.

Basically we have not so bad architecture of dozens of micro services, all deployed to AWS, infrastructure managed by terraform.

But when it comes to the ETLs I think it is very cumbersome what we basically do:

all our micro services have GraphQL API exposed internally
in order to run an ETL we fetch data from different sources (it can be S3/API/Google Sheets)
than we enrich data with lots of different data fetched from sometimes dozens of our internal APIs, the practice here is that usually we use DataFrames for that

Most of the time this is basically fetching and joining data - which could be easily done in any DWH or even database (our data is ~2TB at most) but we have everything spread between dozens of postgres databases. Most of the jobs are batch ETLs. Than with that data we do something (can be pushing to some other database, running some ML models or maybe sending emails, standard stuff).

So now there are two things:

The only advantage of having this architecture (AFAIK) is that the schema of the API and logic behind is decoupled.
On the other hand if we would bring the data into single DWH or something extracting data in the ETL would take few lines of SQL (but it schema changes could result in breaking the ETLs).

Let me know what do you think is the way to go here.

👍︎ 2

💬︎

👤︎ u/zoso

📅︎ Jan 11 2022

🚨︎ report

The future OS kernel will be a data-oriented scheduler (with Computer hardware and software integration architecture diagram) github.com/linpengcheng/P…

👍︎ 75

💬︎

👤︎ u/linpengcheng

📅︎ Nov 19 2021

🚨︎ report

wavejumper/rehook: ClojureScript React library enabling data-driven architecture github.com/wavejumper/reh…

👍︎ 21

💬︎

👤︎ u/dustingetz

📅︎ Jan 04 2022

🚨︎ report

What distributed data store to learn the architecture concepts?

Hello

could someone please share insights on what implementation of distributed data store to choose to learn in-depth on the Architecture concepts?

The list consists of Apache Cassandra, MongoDB, Amazon DynamoDB.

I'd like to hear why you consider one over the other when it comes to learning the concepts and architecture. My hypothesis is all three share a lot of common characteristics in Architecture.

Please share your thoughts. Thanks

👍︎ 10

💬︎

👤︎ u/git_world

📅︎ Dec 11 2021

🚨︎ report

Core Data MVVM Architecture (CRUD Method) using Combine Framework in SwiftUI youtube.com/watch?v=mpCGU…

👍︎ 16

💬︎

👤︎ u/HaarisIqubal

📅︎ Jan 07 2022

🚨︎ report

A deep dive to our data stack architecture, including a healthy dose of dogfooding

Hey everyone.

We wrote a post, as part of a series of posts, about our data stack. It includes some dogfooding for obvious reasons and hopefully not a lot of inevitable bias. But I believe it will be helpful for anyone who's interested in designing/implementing a data stack.

You can find the post here: https://rudderstack.com/blog/rudderstacks-data-stack-deep-dive

Feedback is more than welcome and happy to answer any questions.

👍︎ 11

💬︎

👤︎ u/cpardl

📅︎ Jan 05 2022

🚨︎ report

' Turing Cross-architecture benchmarking framework for data science algorithms'

https://www.turing.ac.uk/research/research-projects/turing-benchmarking-framework

Project Status: Finished.

Data structures

Numerical (Algorithms)

Software framework development

Hardware optimisation (FPGA/GPU)

Statistical methods & theory

Optimisation

Computing networks

Parallel computing Neural networks

Machine learning

👍︎ 2

💬︎

👤︎ u/Calculation-Rising

📅︎ Jan 01 2022

🚨︎ report

75vm's 30TB data and some MSSQL. Architecture recommendations

looking for some real world experience in implementing a new veeam solution.

Do you use a proxy? proxy vm/physical with vm/physical backup server ?

Put it all on one physical machine?

I've read that the backup server should be at the DR site. Has anyone used a vm backup server and replicated to DR site?

love to hear peoples suggestions/anecdotes

👍︎ 2

💬︎

👤︎ u/3DPrintedVoter

📅︎ Dec 14 2021

🚨︎ report

The 69th edition of Data Engineering Weekly highlights one year of dbt, LakeHouse Architecture Reference, Airflow to Apache Dolphin comparison, and more!! dataengineeringweekly.com…

👍︎ 3

💬︎

👤︎ u/vananth22

📅︎ Jan 10 2022

🚨︎ report

The ideal data architecture

If you could build a real-time data architecture from scratch, what tools would you use?

Context:

Structured data only
Real-time data
Process 1M records per day

Needs:

Data has to be sourced from DB2
General insights via dashboard
Open-source tools only

👍︎ 4

💬︎

👤︎ u/Minimum-Membership-8

📅︎ Nov 17 2021

🚨︎ report

Is data architecture or ETL more difficult to learn than business analysis?

I am currently a college student. I have a course that will make me do ETL and some data architecture. I am not really that aware of ETL or data architecture and want some information on it. Is ETL or data architecture the most difficult thing in the data science field to understand the concept of it. If you need to a comparison, would it be more difficult to understand those concepts compared to learning business analysis? Is data architecture anything like regular architecture?

👍︎ 10

💬︎

👤︎ u/No_Helicopter9361

📅︎ Nov 12 2021

🚨︎ report

Core Data MVVM Architecture (CRUD Method) using Combine Framework in SwiftUI youtube.com/watch?v=mpCGU…

👍︎ 2

💬︎

👤︎ u/HaarisIqubal

📅︎ Jan 07 2022

🚨︎ report

What approach do you use for designing a startup's data architecture?

Hello everyone!

As the title says, I would like to know your process. The reason for this is that I am still a bit lost when it comes to design solutions. There is a lot of information out there and seems like a there are different ways to create a solution.

How do you decide whether the data should go into a lake or a warehouse?

Is the data architecture oriented to the startup's growth system? e.g.: product-led, marketing-led, etc.

https://preview.redd.it/lhzm7w9ca1z71.png?width=1024&format=png&auto=webp&s=12d26c6d76aa5a1e2321ceacf37ddf728d26db8b

For example, in this architecture we load raw-data to datalake and then a warehouse, but I have heard that it is better to upload the data to a lake and transform it without having to move it to a warehouse. In what situation is it better to upload the data to a warehouse than to a lake?

Another thing I want to ask is why many people use pipelines made with python when there are solutions like Airbyte that allow you to automate this? Is there something I'm not seeing?

Sorry for my bad English and I hope the point of this post is clear.

👍︎ 15

💬︎

👤︎ u/Nagoya_

📅︎ Nov 11 2021

🚨︎ report

Jensen copycat - "GTC Wrapup: NVIDIA CEO Outlines Vision for Accelerated Computing, Data Center Architecture, AI, Robotics, Omniverse Avatars and Digital Twins in Keynote" - and the gullibles believe? LOL

So Jensen presented his keynote today and fools are quick to sell AMD's shares believing. Jensen was using "Accelerated Datacenter" see title on nVidia's official link below!

https://blogs.nvidia.com/blog/2021/11/09/nvidia-ceo-accelerated-computing-ai-omniverse-avatars-robots-gtc/

But it's just words, a "vision", the "future"... does he have an answer NOW to cDNA 2 MI200 datacenters GPUs? No! Does he have a way to use EPYC Milan-X or future Genoa to connect his GPUs COHERENTLY? No. Infinity Fabric only available to connect between AMD's chips and chiplets!

Enough said!

👍︎ 11

💬︎

👤︎ u/TOMfromYahoo

📅︎ Nov 09 2021

🚨︎ report

How EQT Group Uses Agile Data Architecture in Its Organization for Superior Returns - Snowflake Blog snowflake.com/blog/how-eq…

👍︎ 2

💬︎

👤︎ u/fhoffa

📅︎ Jan 07 2022

🚨︎ report

Core Data MVVM Architecture (CRUD Method) using Combine Framework in SwiftUI youtube.com/watch?v=mpCGU…

👍︎ 2

💬︎

👤︎ u/HaarisIqubal

📅︎ Jan 07 2022

🚨︎ report

Deleting Data in a Microservices Architecture bennorthrop.com/Essays/20…

👍︎ 26

💬︎

👤︎ u/r0st0v

📅︎ Nov 05 2021

🚨︎ report

Generate diagram of the current architecture from structured data

I am looking for a software or method for generating architectural diagrams (system diagrams, component diagrams) from structured information on systems and their relationships. Let me explain.

Let's imagine an enterprise context with hundreds of components shared between dozens of departments and projects.

Structured information by component: a table in a db, an excel, a json array, etc. With a practical way of importing, adding and modifying information.
Relationships: dependencies between components such as REST APIs, SOAP Web Service, Kafka Event, SFTP, import, export, etc.
Ability to add tags to systems and relationships, e.g. FE Layer, BE Layer, Data Layer, Deprecated, Dismissed, DapartmentA, DepartmentB, Hosted externally, etc.

The final objective is to generate a diagram of the current architecture (dedicated to the specific use-case) from the filters applied, for example:

All components that have relations of type X with component Y, up to depth N.
All components with tags: DepartmentA (and its relations up to depth N), FE Layer, BE Layer.
Possibility of choosing which component properties to display in the diagram, e.g.: Name, ID, description, language (Java, .NET, NodeJS, etc.).
If a component needs to be decommissioned, I want to immediately see with which and how many other components it has integrations

Do you use or know of anything like this?

👍︎ 4

💬︎

👤︎ u/Foo_01

📅︎ Nov 28 2021

🚨︎ report

How to design data architecture for fast ad-hoc analytics if the data model is not known a priori?

We have several clients for whom we need to create a BI tool. Looks like good opportunity to look ahead and try to build a SaaS tool, since we already have the clients to pilot with.

The idea at its simplest: a client connects their datasource and our tool shows dashboard with insights. We already have a MVP which our clients like, but works only with manually uploaded CSV files with small amounts of data. Our next step is to design a scalable data architecture. We have very little hands-on experience with this, currently we are just doing some research.

The problem we see is that we do not know data schemas a priori. Each client has different dataset with a different schema. So we cannot simply design a normalized data model for fast ad-hoc queries which will work with everybody. Or can we? What are some good approaches to this?

Datasources may be:

log of HTTP requests / event tracking (JSON)
relational database
csv file
google analytics, facebook insights, ...
...

Challenges we identified so far:

a client's data schema may change in time (change of datatypes, renaming/adding/removing columns, etc...)
we need to keep track of changes in time (eg. changes of attributes for an entity)
... ?

Forgive me if I use incorrect terminology, still learning.

👍︎ 2

💬︎

👤︎ u/unskilledexplorer

📅︎ Nov 22 2021

🚨︎ report

building architecture of data centres (any special ones around)?

Xmas shopping at a bookshop, in the architecture section was a book on data centres - WTF? It was shrink-wrapped, so couldn't see what they included. The data centres I have seen are rectangular with hardly any windows, can't see the attraction.

👍︎ 3

💬︎

👤︎ u/VS2ute

📅︎ Dec 04 2021

🚨︎ report

Which service should store which data in a microservice architecture?

Hello,

in my team, we currently have a monolith and want to move to a microservice architecture (for various reasons). One problem we however face is to figure out which service should save which data.

For the sake of the example, let's say that we have an entity which is a simple Word Document.

So we have one service for creating, editing, saving, deleting etc. the Word document (among other things like listing and other CRUD operations)

This entity has various states, such as

Draft
Open
In Progress
Completed

Now these states are being reached through workflows. The workflow however can be configured for each user individually, so one user can move an entity from Draft to Open, another can just move it from Draft to Completed (and is missing Open/In Progress state). Also, when a specific state is reached, we send emails (through message queue which is listend to by another microservice)

-----

So what we want to do, is to create a new microservice which saves the configuration of the workflow and actually performs the state changes. So it has the ID of the entity and the workflow configuration and then decides in which state it should get next.

I hope I gave you enough insight on how we want to build the microservice architecture. I know it could be quite hard to grasp the whole concept from this small example.

The problem we are facing however now, is which service should save the ACTUAL state of the entity.

So the WordDocumentService has it's own Database (for saving the data in general, such as the title, filename, content) and the WorkflowService has it's own Database (for saving the workflows and which entity has which workflow assigned)

So should the current state (Draft, Open, etc.) be saved

in the WorkflowService?
- what we are wondering is, if we have to like query all entities in state draft for example, the API Aggregator would need to make multiple calls. First get all entity ids in a specific state and then query all entities by id
in the WordDocumentService?
- Then the workflow service would just hold the information about which state it should be? But I FEEL like this should not be done in here, because the microservices should store their own data and states, right? Otherwise there's no advantage
in both services?
- This seems like a bad idea, because then you don't have a single sourcce of truth and it could potentially cause data inconsistency

So in my opinion, everything related to that should

... keep reading on reddit ➡

👍︎ 8

💬︎

👤︎ u/mstknb

📅︎ Nov 09 2021

🚨︎ report

"The Commission plans to invest €2 billion to foster the development of data processing infrastructures, tools, architectures and mechanisms for data sharing." EU Data Governance Act, EU Commission. You know what that means for Data Unions like Swash... 🚀🚀

👍︎ 6

💬︎

👤︎ u/swashapp

📅︎ Dec 16 2021

🚨︎ report

Insert Consistent Sample Data for Users in Microservice Architecture?

Currently in the process of building up my SaaS product and some users would like to have sample data so that they can explore the product right after signing up.

Now I’m using a microservices architecture where each service manages its own database so I’d have to insert data on pretty much every service.

I thought of a few solutions:

Have every service deal with sample data itself, maybe an endpoint where you can insert it for the user (cons: high coupling if data is dependent on each other, changes to the data would be pretty hard)
Have a central service/function deal with managing the sample data, it will then just call every service and insert the required data (con: what if Interfaces of services change?)

What do you guys think? Are there any existing solutions or patterns for this problem? Quite a lot of products offer this service but I was not able to find much on it

👍︎ 6

💬︎

👤︎ u/UmpireAfter7476

📅︎ Nov 03 2021

🚨︎ report

Take home assignment for a data engineering role. Build web scraper, develop data model, data architecture, and then presentation. Lol ok.

👍︎ 106

💬︎

👤︎ u/Maximum-Builder8464

📅︎ Sep 14 2021

🚨︎ report

Architecture suggestion for Alerting System in retail data.

Hello All. I am currently working for a retail analytics company as a Data Engineer + Data Scientist. As of now I have built a simple alerting system. It checks for alert subscriber and alert type during change calculation in ETL ( for example price change) and sends email if match is found. Its done via python scripts with somewhat reconfigurable alert sink ( SQS, email etc) and alert data ( count of product with change or all products list,). However its time to make it more scalable ( upto 1000 alerts per minute) and separate it out from main ETL pipeline. I always prefer simple solution. Any suggestion on any aspect of the process is welcomed. Thank you

👍︎ 3

💬︎

👤︎ u/snarkj

📅︎ Dec 16 2021

🚨︎ report

You are about to build a product analytics tool. What is your data architecture design?

I started a new job as a data engineer this week, before that I was a data analyst. I have strong technical background* but I have a very little hands-on experience with data engineering.

I am supposed to design a data architecture for a product analytics tool (SaaS) similar to Amplitude or Fivetran. So the vision is to build a tool with automated data integration, support for adhoc querying, dashboards, etc. **

Where should I start?

I do not know what questions should I consider, what components in the stack should I research for, etc. I started to do some research today, you will laugh but the only thing I was able to do was to compare Elasticsearch and TimescaleDB, and I have found SingleStore which looks like something I should give my attention to, but I do not know what should I do next.

I will appreciate all kinds of advice.

___
edit:

* according to Dataengineer wiki, I would be somewhere in between of data engineer and senior data engineer. which is probably funny since I said that I have a little hands-on experience. What I meant is that I have never designed or discussed data architecture but I have quite rich experience with full-stack software development (mostly web apps) and some experience with data science.

** our customers will be from different business scopes with 10s to 100s GB large datasets (custom schemas). there will be common analytic use cases but each of the customers will have their own unique use cases.

👍︎ 8

💬︎

👤︎ u/unskilledexplorer

📅︎ Nov 08 2021

🚨︎ report

ETL and data architecture

I am currently a college student. I have a course that will make me do ETL and some data architecture. I am not really that aware of ETL or data architecture and want some information on it. Is ETL or data architecture the most difficult thing in the data science field to understand the concept of it. If you need to a comparison, would it be more difficult to understand those concepts compared to learning data analytics or database management systems? Is data architecture anything like regular architecture?

👍︎ 3

💬︎

👤︎ u/No_Helicopter9361

📅︎ Nov 12 2021

🚨︎ report