47 Hilarious Data model Puns - Punstoppable 🛑

Zuckerberg’s Meta Endgame Is Monetizing All Human Behavior | Exploiting data to manipulate human behavior has always been Facebook’s business model. The metaverse will be no different. vice.com/en/article/88g9v…

👍︎ 48k

💬︎

👤︎ u/chrisdh79

📅︎ Nov 02 2021

🚨︎ report

I created an algorithm that collected wallstreetbets posts and market data, and then utilized a machine learning model to try and calculate an edge of of WSB posts. It worked exactly how you expect it would... v.redd.it/vs88b3lgf2b81

👍︎ 1k

💬︎

👤︎ u/cj6464

📅︎ Jan 11 2022

🚨︎ report

YSK: The popular Bluetooth tracker company Tile was recently bought out by Life360, a company whose business model was primarily rooted in selling the location data of its 31 million customers including children

Source

Why YSK: Many people bought Tiles over the years because they wanted a good quality convenient bluetooth tracking device for their keys or valuables. With the introduction of Apple's Airtags and Galaxy SmartTags, Tile has been under a lot of pressure with their extremely limited network. Now that their parent company has switched to one whose philosophy is radically different, those who bought Tiles in the past should reconsider if they want to continue using those products and potentially lose out on a large part of their privacy.

Also something Tile customers may want to know is that Mark Zuckerberg's sister sits on Life360's board of directors.

It is worth noting that when asked about the parent company change, a Tile representative stated, "Tile does not sell/monetize personal data and we have Life360’s full support and commitment to continue that," but that remains to be seen.

👍︎ 8k

💬︎

👤︎ u/TheRavenSayeth

📅︎ Dec 25 2021

🚨︎ report

3D model using elevation data of Tonga volcano v.redd.it/xrepcdv4dcc81

👍︎ 1k

💬︎

👤︎ u/Chode_of_Justice

📅︎ Jan 18 2022

🚨︎ report

Just been asked to spend 5 mins sorting an issue with a PowerBI report of a recent leaver. This is the data model....

👍︎ 196

💬︎

👤︎ u/DryBinWetSinkElseLoo

📅︎ Jan 05 2022

🚨︎ report

[P] FFCV: Accelerated Model Training via Fast Data Loading

Hi r/MachineLearning! Today we released FFCV (ffcv.io), a library that speeds up machine learning model training by accelerating the data loading and processing pipeline. With FFCV we were able to:

Train an ImageNet model (ResNet-18) to 66.9% accuracy *on one GPU* in 35 minutes (98¢/model on AWS)
Train an ImageNet model (ResNet-50) to 75.6% accuracy *on one AWS machine* in 20 minutes ($4.5/model on AWS)
Train a CIFAR-10 model on one GPU in 36 seconds (2¢/model on AWS)

The best part about FFCV is how easy it is to use: chances are you can make your training code significantly faster by converting your dataset to FFCV format and changing just a few lines of code! To illustrate just how easy it is, we also made a minimal ImageNet example that gets high accuracies at SOTA speeds: https://github.com/libffcv/ffcv-imagenet.

Let us know what you think!

👍︎ 120

💬︎

👤︎ u/andrew_ilyas

📅︎ Jan 18 2022

🚨︎ report

I’m developing an app and data backed verification model to hold companies accountable. If you’re interested to be in test phase, DM me please.

👍︎ 108

💬︎

👤︎ u/JobSeekersAngel

📅︎ Jan 28 2022

🚨︎ report

Here is the U-23 AHL Defenceman Game Score/GP leaderboard. Data: InStatHockey, ahltracker Game Score Model: domluszczyszyn

👍︎ 93

💬︎

👤︎ u/Alvaratz

📅︎ Jan 18 2022

🚨︎ report

[Capitalists] If the supply and demand model represents the economy in real life, how can we actually see it using real world data?

The supply and demand model is one of the most fundamental models of modern economics that has been taught to students from highschool on. Basically everyone with even a little knowledge of economics has seen it, and most people view the model as an accurate description of how the economy works in terms of the behavior of buyers and sellers. It has become so dominant in fact, that criticism on it can hardly be found, eventhough it very well exists. While I used to adhere to the model as well, I have become more and more critical of it over time, and I am now at the point where I almost would argue that we should do away with the model entirely, for how misleading and unrepresentative of real economic behavior it is, in my opinion.

However, before I make that conclusion, I want to know from capitalists (and other people who consider themselves advocates of market economies) how we can use real world data (so data that we can acquire) to plot this model. Specifically, I want to know the following elements of the supply and demand model from real world data:

The shape / the formula of the supply curve
The shape / the formula of the demand curve

I argue that this is impossible with real world data, because the only data we have are prices (of which we dont know whether they are at equillibrium or not) and the quantities of goods sold (of which we would neither know whether they are at equillibrium or not). Not to mention that those supply and demand curves can be moving with every second in time that passed. Cockshott illustrated that rather well in this video, wherein he showed that the scientific predictive power of supply and demand curves was even lower than him litterally plotting a random line through a time series of real prices of oil.

If this turns out impossible to plot, then supply and demand curves dont exist, or at least we couldnt know whether they exist or not (just like we cant know whether God is exists or not). If we then claim to be scientific, we then thus should not assume their existence, and quit propagating their existence in economic debate. We should then forget about using the supply and demand model to explain any economic phenomenon. Which would lead to the breakdown of practically the entire doctrine of neoclassical economics.

That is not to say that supply and demand have no effect in the e

... keep reading on reddit ➡

👍︎ 17

💬︎

👤︎ u/Squadrist1

📅︎ Dec 31 2021

🚨︎ report

What is a model in regards to data analytics?

My coworker constantly uses this term when talking about reports he's creating. "I'm working on my model". His reports are generally just kpis - no regression, hypothesis testing, or anything like that. We're not doing anything in the realm of what I would consider "data science". He has a database in access that he imports data to from various sources. He queries off that for all of our data (Which has made it a little challenging for me to understand where everything is coming from and what it is).

Is he just using jargon to make it sound complicated or is there something more to this?

👍︎ 59

💬︎

👤︎ u/pard0nme

📅︎ Jan 16 2022

🚨︎ report

Here's a look at which U-23 AHL Forwards (min 15 GP) have the highest Game Score/GP this season. Data: InStatHockey, ahltracker Game Score Model: domluszczyszyn

👍︎ 88

💬︎

👤︎ u/Alvaratz

📅︎ Jan 18 2022

🚨︎ report

Woman holding a spool of magnetic tape used for data storage on early model desktop computer, 1970s

👍︎ 141

💬︎

👤︎ u/eaglemaxie

📅︎ Jan 07 2022

🚨︎ report

Heizou data found another dataminer Mia mentions he uses teen model reddit.com/gallery/r27wsb

👍︎ 1k

💬︎

👤︎ u/Bwlsoty

📅︎ Nov 25 2021

🚨︎ report

Who creates the ML Models? Data Scientists or ML Engineers? [D]

Who creates or trains the preliminary models like xgboost, random forest, etc? I used to think ML engineers did that but was told by an ML engineer that the Data scientists at his company create the models & then he gets them ready for production. Is this typically the case or does it depend on the company?

👍︎ 9

💬︎

👤︎ u/Careless_Drummer_208

📅︎ Jan 24 2022

🚨︎ report

How do you model data traffic and throughput? Do you know any paper describing how to simulate it?

Sorry if the question is confused. I can give more details in the comments if you ask specific questions.

Keeping the argument abstract: let's say that I have some agents in my fpga design that source samples with a certain throughput. Then I have interconnects that arbiter the data that passes through. And eventually I have consumers, which my be sinks like DDR or data reducers like decimators.

How would you model mathematically such system? I would like to improve in this field to create designs backed up b mathematical results to see if I can meet specific deadlines.

The approach would be similar to what can be done with RTOS task design.

Thanks a lot!

👍︎ 16

💬︎

👤︎ u/FrAxl93

📅︎ Jan 28 2022

🚨︎ report

Can we use data available for "non-commercial use" for training a DL model which will be deployed as a subsystem in a commercial product? [D]

I am having a lot of difficulties getting the right data for my task since most of the datasets out there including ImageNet restrict the usage of the data to only educational and non-commercial research. In this case, should I reach out to them to get usage permission? What are the chances that they will allow/respond and would there be a significant cost associated with this? Thanks!

👍︎ 64

💬︎

👤︎ u/PinPitiful

📅︎ Jan 08 2022

🚨︎ report

Quantum processor swapped in for a neural network — Quantum processor used to model weather based on sparse data arstechnica.com/science/2…

👍︎ 2k

💬︎

👤︎ u/swingadmin

📅︎ Dec 06 2021

🚨︎ report

Why build data science models when there are better ways to get the same info

Posting about data science here as the question is from the perspective of my being a BI/Data Analyst

The Data Scientist team in our org is building a model to predict certain customer characteristics.
These characteristics can be collected by asking customers - say, a survey, or after completing an order, or after signing-up. A question such as, say, how many employees the customer has. Instead the team is building a model, to predict these attributes.
There is a team specifically to answer questions around customer attributes.

I cannot understand why the org is spending the money on salaries of these data scientists working on this project, when we can pursue other model building. Are we not better served by trying to ask customers this information?

The answer of data scientists is: our website development team is too busy to create an interface to ask customers this question . The answer of manager is: Models don’t get built overnight, they take time, so we need to start now, even though this information can be obtained in other ways.

Neither of the answers make sense to me.
Can you please help me understand.

Thank you.

👍︎ 25

💬︎

👤︎ u/Signal_Explorer8071

📅︎ Jan 17 2022

🚨︎ report

[P] Deepchecks: an open-source tool for high standards validations for ML models and data.

Hey everyone!

I wanted to share with you an open-source tool we've been building for a while. Deepchecks is an open-source tool for validating & testing models and data efficiently:

https://github.com/deepchecks/deepchecks

Deepchecks is a python package, implementing validations and tests needed in order to trust an ML pipeline. It contains many built-in checks, such as verifying the data integrity, inspecting its distributions, validating data splits, evaluating your model, and comparing between different models.

In addition, it contains test suites, similar to the test suites in software programs, that can accompany you through all building blocks of the ML pipeline development. Each test suite contains checks necessary for the specific part in the pipeline.

The suite result looks something like this:

Suite result

The suites and checks have a simple syntax and are highly customizable.

If you want to jump right in, you can try it out in the quick start notebook:

https://docs.deepchecks.com/en/stable/examples/guides/quickstart_in_5_minutes.html

What do you think? I’ll be happy to hear your thoughts and feedback.

👍︎ 202

💬︎

👤︎ u/EuphoricMeal8344

📅︎ Jan 06 2022

🚨︎ report

Data Quirk (line on top) due to capping (y axis, values greater than 500k are set to 500k) - what is a good way to handle this? Removing the data points? Would like to feed clean data into my model. Help is much appreciated! Thanks in advance!

👍︎ 19

💬︎

👤︎ u/ratonmic

📅︎ Jan 21 2022

🚨︎ report

11/29/21 one day before INOVIO PR Omicron.....WHO established Animal modeling, the below doc details WHO requests to vaccine makers to obtain b.1 data-tested data...a trust but verify model..as you can see the first meeting goals were to establish NEW blueprints.

https://preview.redd.it/pazddmpisge81.png?width=1150&format=png&auto=webp&s=5a7167780674a4b3e2a4411471bd059e53cadd70

https://preview.redd.it/6m4mq84ssge81.png?width=1150&format=png&auto=webp&s=1ce6bbb286309939e4084b0f815de1a2f57d69ed

https://preview.redd.it/ifjnirtctge81.png?width=1201&format=png&auto=webp&s=cbb20f36238ff0516f9aa554a99c7beecd7ce54e

First meetings very early on with a much quicker pathway. WHO is fully funding SVT trial, its stands to reason they may request certain VOC data as it emerges.

https://preview.redd.it/myw2xasxvge81.png?width=1137&format=png&auto=webp&s=fd47c64335023f1451883daa44266fea14b977f6

https://preview.redd.it/czwei7x1wge81.png?width=1138&format=png&auto=webp&s=ec200d03f6da98619f3b37ed29ad2317e10b3457

Something to consider when comparing against 11/30 PR.

Its a good feeling knowing INO will not face Congress, well not because of a scientific coverup, we are well proven for future success where other current Gen are blocking courts to suppress 50000++ pages of data.

As far as INOs eff% ....we are boosting China, I am pretty sure its solid, also rumored 11/30/21 start date. all the data will come out with orders. imo

https://cdn.who.int/media/docs/default-source/blue-print/who_assays_amodels-summary-of-emergency-meeting-omicron_29nov2021_rd.pdf?sfvrsn=bbaf683f_7&download=true 11/29/21 blue print.

https://cdn.who.int/media/docs/default-source/blue-print/who-working-group-animal-models.pdf?sfvrsn=219c1dab_1&download=true 2020

👍︎ 22

💬︎

👤︎ u/INOcuredcancer

📅︎ Jan 28 2022

🚨︎ report

I got dataset already split (train & test). I threw a few models at it and got at best 82.5% test accuracy. On training data, I get 100% accuracy. I tried shuffling all the data (test & train together) to check my suspicions, & now I get 98% accuracy on the test set. Different distributions right?

After seeing my models get 100% accuracy on the training data but struggle to pass 80% on the test data I found that suspicious, especially after trying several regularization methods and several (dozens of models and NN architecture changes).

I began visualizing the data (it's 500 features, highly correlated) and noticed that some classes in the test set looked very different compared to the training set.

I asked my manager about how this data was generated. He said the embedded systems took measures over 3 full days and then a week later they took a full day. The first 3 days were the training data and the 1 day (a week) later was the test day.

I told him, you can't expect a model that is training on a different time period to be tested on one random day.

I told him we probably need several days sampled at various times, across the year even to get a good representative balanced dataset.

He said it was impossible and that a competing client got 97.5% accuracy on the test data.

I explained my findings when shuffling the data, but he said we can't do that.

I'm a tad frustrated now, is there something I'm overlooking here, or am I fundamentally right?

Most of the datasets I work with are created by my team and I, I rarely ever get a random dataset like this.

I've tried so many models, from simple to complex NNs, tried a bunch of regularisation methods, tried PCA (made things much worse at 60s% test accuracy), tried data augmentation techniques, clustering (which is how I found the differences between Class A in one dataset vs Class A in the other).

I also suspect some classes in the test data and mistakenly labeled, but he said the client assured him they are not.

Thoughts?

👍︎ 23

💬︎

👤︎ u/load_more_commments

📅︎ Jan 17 2022

🚨︎ report

'Cobra over weekend will discuss whether new restrictions are needed after latest dire data from scientists 1st image here is UKHSA today: all indicators bad 2nd image is latest Imperial model — which has been wrong before — forecasts 1000s of deaths a day Q is do SAGE/PM agree' twitter.com/alexwickham/s…

👍︎ 19

💬︎

👤︎ u/thegreatblazed

📅︎ Dec 17 2021

🚨︎ report

[D]You're working against a tight deadline and need to deliver a project quickly. Would you consider speeding up your model training by using only a subset of the available training data?

View Poll

👍︎ 15

💬︎

👤︎ u/rrpelgrim

📅︎ Jan 10 2022

🚨︎ report

MA COVID-19 Update as of December 24, with daily cases & restrictions. Latest estimate: cases = 10,040 new and 32 deaths. Plus SIR model projections. (image with Interactive Data Dashboard)

👍︎ 63

💬︎

👤︎ u/redit202

📅︎ Dec 24 2021

🚨︎ report

Am I FIREproof? Seeking tools to model "worst case" from historical data with a specific stock portfolio

I'm wondering: "I think I'm FIREproof, but what would happen to my finances, if my portfolio was crushed for years, like 2008 GFC?"

To answer these questions, I built a spreadsheet that lets me simulate my portfolio's performance during any period in history (that google finance has historical data for). I'd like to error-check my spreadsheet, by comparing its results to some other tool.

Do you know of a tool that lets you:

Enter your exact portfolio as of today (enter symbol, quantity for each holding)
pick a start date from in history (e.g. 1992), and number of years (eg your age at death).
enter annual expenses / drawdowns (year by year, ideally, since I can forecast some changes over my life - ie downsize city house for sea change, super payout, kids college, assisted living, etc)
change start date to see timing: best case, worst case, average case

RESULT: a graph of your net work , so you can see success ("die with money") with a variety of market conditions.

Know of any tool like this?

cheers

👍︎ 18

💬︎

👤︎ u/inventorj

📅︎ Jan 17 2022

🚨︎ report

Extending User Model After creating migrations but before there is any data worth saving?

Hello! I'm fairly new to Django, and working on my first "real" project, where I'm building a tool to help automate some processes at work. I'm still in the process of building out all of the forms, views and logic to do this, and have about a dozen models created as well as some data I've loaded in as I've built things just to ensure that everything is working as intended. But, There is nothing in there that I'd be worried about losing if that's the easiest path.

I've largely built this based on following tutorials at a high level of following the steps and substituting the tutorials steps (usually a to-do or library type of app) with the models, views and templates that I'm needing. Though I initially skipped what I am now realizing was a critical step, I never extended the User Model when I started, since I didn't understand the "Why". My bad, and a rookie mistake.

I'd like to do this now before I get any further. I'm using Sqlite on my dev environment, Am I able just follow the process of:

Delete the Sqlite database,
Delete the migration files,
Extend the User Model
Create and run migrations
Create new Superuser
Continue on as I was.

My thinking is, this would be just like doing it right at the start as if I had created this first thing anyway.

Is there any weird side effects of doing this? I know I'll lose any data I've created, and that's perfectly ok at this point. I suppose I would lose the migration history (I have probably half a dozen migrations, as I've extended models and changed things) but I think if anything that would just clean up things and make it simpler.

Should I take the rest of my models out before running the user extend migration? Or is it ok to leave those in and have a migrate that does of all of that?

Anything else I should be looking out for before doing this?

Thanks in advance!

👍︎ 5

💬︎

👤︎ u/KhausTO

📅︎ Jan 29 2022

🚨︎ report

Help needed on how to set up my data for a linear mixed model (or GLMM?)

Hey! I need some basic help I can't seem to find answers to. Here is a quick set up of my experiment:

I divided n mice in 8 different groups, based on two explanatory variables (a behavioural one with 3 levels ('task a', 'task b', 'task c'), and then drug/vehicle. So, the groups are 'task a + drug', 'task b + vehicle', 'task a + drug', 'task b + vehicle', 'task c + drug', 'task c + veh'. I also have a niave pair of groups ('no task + drug' and 'no task + veh')

After my experiment I make slices of each mouse brain, and image around 3 slices from each mice. I have a range of response variables that are morphological features of the cells that appear on the images (for simplicity, lets say 'cell volume', which is normally distributed). On each slice I measure around 30-80 cells, depending on how many are there. I don't want to just average out all the cells and the all the images on each mice, so I need to fit a LMM or GLMM to account for the lack of independence in my measurments.

So, this is how data ends up being nested: mice->slice->cell

mice_id	behavior	drug	slice	cell volume
1	task a	drug	1	12
1	task a	drug	1	22
1	task a	drug	1	18
1	task a	drug	1	34
...
1	task a	drug	3	8
2	task a	veh	1	54

I want to fit a LMM, with animal and slice as random factors. Is the following pseudo-code correct:

model1 &lt;- lmer(cell_volume ~ behavior * drug + (1|mice_id) + (1|slice), dataframe)

Here are a couple of my questions:

After running my model, and checking my assumptions, how do I perform a factorial anova to get the simple and interaction effects? Is it something like runnnig 'anova(model1)'?
And after the anova (assuming there is a significant effect), what are the recommended libraries to perform post-hoc comparisons?
Any advice on how should I set up the dataframe? should I colapse the behavior and drug columns into a 'group' column (A-DRUG, A-VEH, etc) Is there a problem is there are multiple slice '1' (each animal has it's own 'slice 1'), or should every slice have a unique id?

Any advice is appreciated, I'm pretty outside of my area of expertise with this kind of analyses

thanks!

👍︎ 7

💬︎

👤︎ u/atlanticlotus

📅︎ Jan 20 2022

🚨︎ report

Opinions on massive data models

I have a customer who wants to put all his data and tables into one model. There are about 100 tables, lots of columns (about 600 total). The biggest table have about 2m rows added every day, 3 years worth of data. Basically 300 shops, each product listed in a shop (inventory), their barcodes etc for each day. I estimate the model to be more than 15GB. At least the tables have clear one-to-many architecture and I don't anticipate any ambiguity in the relationships.

They are planning to put on a P1 premium capacity, with about 500 daily users. Do you have any experience with models like this? Is it going o work?

View Poll

👍︎ 20

💬︎

👤︎ u/jackassik

📅︎ Dec 31 2021

🚨︎ report

Update to "The Limits to Growth": Comparing the World3 model with empirical data | TLDR; Current models confirm and support the models from The Limits to Growth, collapse is coming dash.harvard.edu/handle/1…

👍︎ 280

💬︎

👤︎ u/mouthofreason

📅︎ Dec 02 2021

🚨︎ report

QED is a decentralized oracle protocol with a robust economic model that connects blockchains, smart contract platforms, and off-chain data resources. dailycryptopost.com/qed-l…

👍︎ 17

💬︎

👤︎ u/inkbarsap

📅︎ Jan 06 2022

🚨︎ report

Infra for Web3 & MetaVerse is still in its infancy.🌧️ A looming question is how to build user centric network to support Web3 & MetaVerse Apps ?🤔 MetaBlox, developed from Decentralized ID (DID) and Verifiable Credentials (VC) data model solves this issue.✅

👍︎ 5

💬︎

👤︎ u/sleep-over661

📅︎ Jan 28 2022

🚨︎ report

New 'Stop Facebook' Campaign Demands Ban on Data Harvesting and Corporate Surveillance | "Facebook's surveillance capitalist business model is fundamentally incompatible with basic human rights." commondreams.org/news/202…

👍︎ 3k

💬︎

👤︎ u/chrisdh79

📅︎ Oct 13 2021

🚨︎ report

PowerBI noob looking for some help in making a properly functioning data model from an SQL server

As the title states, I’m a noob in the back end data modelling. I have been building reports from an excisting database for a while, but now my company asked if I would be willing to do some back end stuff in powerBI as well.

I do know how to connect to SQL servers and how to select the tables that I’d like to use for my report. However for example I do not know to combine data from table A with table B in a visual and how to build a smart data model.

Do any of you have tips for a good “mentor” who can teach me this (preferably for free) or advice on how to build a proper data model?

Thanks a bunch!

👍︎ 2

💬︎

👤︎ u/BasVlijmen

📅︎ Jan 24 2022

🚨︎ report

How would you data model anonymous like counts in Firestore?

I'm in a bit of a predicament. I need to add a like/dislike system into my app and I can't find a way to do so without exposing the users who voted a specific way since firestore sends an entire document and cannot keep certain parts hidden.

👍︎ 3

💬︎

👤︎ u/TheOneWhoWil

📅︎ Jan 16 2022

🚨︎ report

[REQUEST]: Stats: Data and Models, Fourth Canadian Edition, 4th edition, ISBN-13: 9780137364626

by De Veaux, R., P. Velleman, D. Bock, A. Vukov, and A Wong

This a bit of a shot in the dark, but many thanks nonetheless! : )

👍︎ 2

💬︎

👤︎ u/manhattansnice

📅︎ Jan 08 2022

🚨︎ report

What type of models could I make with this data?

👍︎ 4

💬︎

👤︎ u/ImJuicedBro

📅︎ Jan 21 2022

🚨︎ report

How can I turn echo-cardiogram data into a printable model of my heart?

My doctor wants to check on the structure of my heart, so I'm going in to get an echo in a few weeks. I think it would be awesome to be able to print a full scale model of my heart.

I have absolutely no idea how to do this, does anyone here? I would pay for data processing and for a clear SLA print by the same person, if I'm that lucky.

👍︎ 16

💬︎

👤︎ u/AethericEye

📅︎ Jan 08 2022

🚨︎ report

Quick question - I was wondering could I use a hammerhead model but field it as a devilfish. Obviously I’d use the devilfish data sheet and rules but I dont really want to spend money on multiple devilfish and multiple hammerheads seems a waste when they are such similar models :)

👍︎ 7

💬︎

👤︎ u/Jksfown

📅︎ Jan 21 2022

🚨︎ report

Non-data USB-C Ports and Missing Games for recent '22 Model Y?

Does anyone else who just received the Model Y recently have this changed?

- USB-C ports in the center console are only for charging. Game controllers or SSD don't work.

- Missing Sonic, Cuphead or any controller-required games.

👍︎ 13

💬︎

👤︎ u/brucehhlo

📅︎ Dec 28 2021

🚨︎ report

Data scientist Tyler Morgan-Wall has created a 3D model of the Earth that shows the undersea optical cables forming a communications network around the planet. v.redd.it/lzqa816122y71

👍︎ 979

💬︎

👤︎ u/Thryloz

📅︎ Nov 06 2021

🚨︎ report

Infra for Web3 & MetaVerse is still in its infancy.🌧️ A looming question is how to build user centric network to support Web3 & MetaVerse Apps ?🤔 MetaBlox, developed from Decentralized ID (DID) and Verifiable Credentials (VC) data model solves this issue.

👍︎ 8

💬︎

👤︎ u/sleep-over661

📅︎ Jan 28 2022

🚨︎ report

Going to take a look at this Shopsmith in a couple days, for only $100. No good pics of a data plate or anything, can you tell anything about the model or anything?

👍︎ 9

💬︎

👤︎ u/Kleinstein17

📅︎ Jan 03 2022

🚨︎ report

More experienced data scientists: how important is the model deployment aspect for you?

Hey everyone,

so I recently got rejected after a pretty long take home (10 days) for some tech unicorn, which sucks of course because I'm doing this next to work and my own research, but hey things can't always work out.

I have 4 YOE and a research background in time series and econometrics. I've worked with some NLP + classification tasks, but would not consider myself an expert here. Since everything is machine learning nowadays I'm brushing up my knowledge currently. I'm studying ISLR, and more advanced literature if I want to dig deeper, but I'm wondering if that's the right move atm. I don't have a lot of time so I have to think carefully about my priorities :/

The feedback I received from the company got me wondering a bit. I'm working as a data scientist in one of the big 4 companies (but you know, these auditing companies, not tech). For several reasons I would like to switch into a more tech oriented company.

In my job I've been exposed to a pretty wide range of projects, everything from explaining regressions to Karen from accounting to providing the model component for large projects that are deployed in some cloud and get out to the customers.

However, I usually understood my role to be more focussed on the actual modeling part, i.e. we have some data source and we have a problem that we want to solve, what's the best model and what insights can it provide? And then usually I team up with engineering and they would make sure this model is deployed e.g. in Kubernetes.

This specific position was for a senior data scientist and the reason why they rejected me is that they weren't convinced of my software engineering skills and thought that my assignment indicated a lack of experience in putting things to production. Fair enough, whatever.

The good thing about being in a consultancy is that I can easily find new projects if I want to develop certain skills, but now I'm wondering if those are very role specific requirements or if I misunderstood my role so far / the role that senior data scientists have. While it's of course always a good idea to learn new skills, they also mean project and time commitments in which I can't focus on other skills.

I would be very happy to learn how important the deployment component for you guys on the more experienced side usually is. Do you work in teams with engineering, do you actually handle it yourself and are part of the production team? Ofc this can't be generalized but hoping for some discussions and

... keep reading on reddit ➡

👍︎ 16

💬︎

👤︎ u/flodase

📅︎ Dec 15 2021

🚨︎ report

Infra for Web3 & MetaVerse is still in its infancy.🌧️ A looming question is how to build user centric network to support Web3 & MetaVerse Apps ?🤔 MetaBlox, developed from Decentralized ID (DID) and Verifiable Credentials (VC) data model solves this issue.✅

👍︎ 5

💬︎

👤︎ u/sleep-over661

📅︎ Jan 28 2022

🚨︎ report

[D] Perhaps Leo Brieman was not right in his criticism about 'Data Models' in his famous paper "Statistical Modeling : The two cultures"

Often the Auto ML or low code Data science library creators take refuge in the famous Leo Brieman's paper "Statistical Modeling : The two cultures". In it, he argues that statistician's fixation with Data models has led to :

Led to irrelevant theory and questionable scientific conclusions
Kept statisticians from using more suitable algorithmic models
Prevented statisticians from working on exciting new problems

In his famous paper Leo Brieman gives an impression that Prediction is everything or as I infer it, one should just worry about predictive accuracy and perhaps one should not waste too much time trying to understand the Data generating process.

Fast forward 21 yrs, we find ourselves with cases like Zillow and other umpteen cases of Data science project failures where the goal was simply to " find an algorithm f(x) such that for future x in a test set, f(x) will be a good predictor of y.

I feel the mere fixation of trying to find the algorithm f(x) leads to people not focusing on the 'how' and 'Why' questions. The questions 'how' and 'why' are only asked post fact i.e. when the models fail miserably on the ground.

Given this background, don't you think Leo Brieman was wrong in his criticism about 'Data models'?

Would be happy to hear perspectives on both sides.

👍︎ 71

💬︎

👤︎ u/venkarafa

📅︎ Dec 12 2021

🚨︎ report

Who creates the ML Models? Data Scientists or ML Engineers? [D]

Who creates the preliminary models like xgboost, random forest, etc? I used to think ML engineers did that but was told by an ML engineer that the Data scientists at his company create the models & then he gets them ready for production. So is this typically the case or does it just depend on the company?

👍︎ 6

💬︎

👤︎ u/Careless_Drummer_208

📅︎ Jan 24 2022

🚨︎ report