A list of puns related to "Data model"
Why YSK: Many people bought Tiles over the years because they wanted a good quality convenient bluetooth tracking device for their keys or valuables. With the introduction of Apple's Airtags and Galaxy SmartTags, Tile has been under a lot of pressure with their extremely limited network. Now that their parent company has switched to one whose philosophy is radically different, those who bought Tiles in the past should reconsider if they want to continue using those products and potentially lose out on a large part of their privacy.
Also something Tile customers may want to know is that Mark Zuckerberg's sister sits on Life360's board of directors.
It is worth noting that when asked about the parent company change, a Tile representative stated, "Tile does not sell/monetize personal data and we have Life360βs full support and commitment to continue that," but that remains to be seen.
Hi r/MachineLearning! Today we released FFCV (ffcv.io), a library that speeds up machine learning model training by accelerating the data loading and processing pipeline. With FFCV we were able to:
The best part about FFCV is how easy it is to use: chances are you can make your training code significantly faster by converting your dataset to FFCV format and changing just a few lines of code! To illustrate just how easy it is, we also made a minimal ImageNet example that gets high accuracies at SOTA speeds: https://github.com/libffcv/ffcv-imagenet.
Let us know what you think!
The supply and demand model is one of the most fundamental models of modern economics that has been taught to students from highschool on. Basically everyone with even a little knowledge of economics has seen it, and most people view the model as an accurate description of how the economy works in terms of the behavior of buyers and sellers. It has become so dominant in fact, that criticism on it can hardly be found, eventhough it very well exists. While I used to adhere to the model as well, I have become more and more critical of it over time, and I am now at the point where I almost would argue that we should do away with the model entirely, for how misleading and unrepresentative of real economic behavior it is, in my opinion.
However, before I make that conclusion, I want to know from capitalists (and other people who consider themselves advocates of market economies) how we can use real world data (so data that we can acquire) to plot this model. Specifically, I want to know the following elements of the supply and demand model from real world data:
I argue that this is impossible with real world data, because the only data we have are prices (of which we dont know whether they are at equillibrium or not) and the quantities of goods sold (of which we would neither know whether they are at equillibrium or not). Not to mention that those supply and demand curves can be moving with every second in time that passed. Cockshott illustrated that rather well in this video, wherein he showed that the scientific predictive power of supply and demand curves was even lower than him litterally plotting a random line through a time series of real prices of oil.
If this turns out impossible to plot, then supply and demand curves dont exist, or at least we couldnt know whether they exist or not (just like we cant know whether God is exists or not). If we then claim to be scientific, we then thus should not assume their existence, and quit propagating their existence in economic debate. We should then forget about using the supply and demand model to explain any economic phenomenon. Which would lead to the breakdown of practically the entire doctrine of neoclassical economics.
That is not to say that supply and demand have no effect in the e
... keep reading on reddit β‘My coworker constantly uses this term when talking about reports he's creating. "I'm working on my model". His reports are generally just kpis - no regression, hypothesis testing, or anything like that. We're not doing anything in the realm of what I would consider "data science". He has a database in access that he imports data to from various sources. He queries off that for all of our data (Which has made it a little challenging for me to understand where everything is coming from and what it is).
Is he just using jargon to make it sound complicated or is there something more to this?
Who creates or trains the preliminary models like xgboost, random forest, etc? I used to think ML engineers did that but was told by an ML engineer that the Data scientists at his company create the models & then he gets them ready for production. Is this typically the case or does it depend on the company?
Sorry if the question is confused. I can give more details in the comments if you ask specific questions.
Keeping the argument abstract: let's say that I have some agents in my fpga design that source samples with a certain throughput. Then I have interconnects that arbiter the data that passes through. And eventually I have consumers, which my be sinks like DDR or data reducers like decimators.
How would you model mathematically such system? I would like to improve in this field to create designs backed up b mathematical results to see if I can meet specific deadlines.
The approach would be similar to what can be done with RTOS task design.
Thanks a lot!
I am having a lot of difficulties getting the right data for my task since most of the datasets out there including ImageNet restrict the usage of the data to only educational and non-commercial research. In this case, should I reach out to them to get usage permission? What are the chances that they will allow/respond and would there be a significant cost associated with this? Thanks!
Posting about data science here as the question is from the perspective of my being a BI/Data Analyst
The Data Scientist team in our org is building a model to predict certain customer characteristics.Β
These characteristics can be collected by asking customers - say, a survey, or after completing an order, or after signing-up. A question such as, say, how many employees the customer has.Β Instead the team is building a model, to predict these attributes.Β
There is a team specifically to answer questions around customer attributes.
I cannot understand why the org is spending the money on salaries of these data scientists working on this project, when we can pursue other model building. Are we not better served by trying to ask customers this information?
The answer of data scientists is: our website development team is too busy to create an interface to ask customers this question . The answer of manager is: Models donβt get built overnight, they take time, so we need to start now, even though this information can be obtained in other ways.
Neither of the answers make sense to me.Β
Can you please help me understand.
Thank you.
Hey everyone!
I wanted to share with you an open-source tool we've been building for a while. Deepchecks is an open-source tool for validating & testing models and data efficiently:
https://github.com/deepchecks/deepchecks
Deepchecks is a python package, implementing validations and tests needed in order to trust an ML pipeline. It contains many built-in checks, such as verifying the data integrity, inspecting its distributions, validating data splits, evaluating your model, and comparing between different models.
In addition, it contains test suites, similar to the test suites in software programs, that can accompany you through all building blocks of the ML pipeline development. Each test suite contains checks necessary for the specific part in the pipeline.
The suite result looks something like this:
The suites and checks have a simple syntax and are highly customizable.
If you want to jump right in, you can try it out in the quick start notebook:
https://docs.deepchecks.com/en/stable/examples/guides/quickstart_in_5_minutes.html
What do you think? Iβll be happy to hear your thoughts and feedback.
https://preview.redd.it/pazddmpisge81.png?width=1150&format=png&auto=webp&s=5a7167780674a4b3e2a4411471bd059e53cadd70
https://preview.redd.it/6m4mq84ssge81.png?width=1150&format=png&auto=webp&s=1ce6bbb286309939e4084b0f815de1a2f57d69ed
https://preview.redd.it/ifjnirtctge81.png?width=1201&format=png&auto=webp&s=cbb20f36238ff0516f9aa554a99c7beecd7ce54e
https://preview.redd.it/myw2xasxvge81.png?width=1137&format=png&auto=webp&s=fd47c64335023f1451883daa44266fea14b977f6
https://preview.redd.it/czwei7x1wge81.png?width=1138&format=png&auto=webp&s=ec200d03f6da98619f3b37ed29ad2317e10b3457
Something to consider when comparing against 11/30 PR.
Its a good feeling knowing INO will not face Congress, well not because of a scientific coverup, we are well proven for future success where other current Gen are blocking courts to suppress 50000++ pages of data.
As far as INOs eff% ....we are boosting China, I am pretty sure its solid, also rumored 11/30/21 start date. all the data will come out with orders. imo
https://cdn.who.int/media/docs/default-source/blue-print/who_assays_amodels-summary-of-emergency-meeting-omicron_29nov2021_rd.pdf?sfvrsn=bbaf683f_7&download=true 11/29/21 blue print.
After seeing my models get 100% accuracy on the training data but struggle to pass 80% on the test data I found that suspicious, especially after trying several regularization methods and several (dozens of models and NN architecture changes).
I began visualizing the data (it's 500 features, highly correlated) and noticed that some classes in the test set looked very different compared to the training set.
I asked my manager about how this data was generated. He said the embedded systems took measures over 3 full days and then a week later they took a full day. The first 3 days were the training data and the 1 day (a week) later was the test day.
I told him, you can't expect a model that is training on a different time period to be tested on one random day.
I told him we probably need several days sampled at various times, across the year even to get a good representative balanced dataset.
He said it was impossible and that a competing client got 97.5% accuracy on the test data.
I explained my findings when shuffling the data, but he said we can't do that.
I'm a tad frustrated now, is there something I'm overlooking here, or am I fundamentally right?
Most of the datasets I work with are created by my team and I, I rarely ever get a random dataset like this.
I've tried so many models, from simple to complex NNs, tried a bunch of regularisation methods, tried PCA (made things much worse at 60s% test accuracy), tried data augmentation techniques, clustering (which is how I found the differences between Class A in one dataset vs Class A in the other).
I also suspect some classes in the test data and mistakenly labeled, but he said the client assured him they are not.
Thoughts?
I'm wondering: "I think I'm FIREproof, but what would happen to my finances, if my portfolio was crushed for years, like 2008 GFC?"
To answer these questions, I built a spreadsheet that lets me simulate my portfolio's performance during any period in history (that google finance has historical data for). I'd like to error-check my spreadsheet, by comparing its results to some other tool.
Do you know of a tool that lets you:
Enter your exact portfolio as of today (enter symbol, quantity for each holding)
pick a start date from in history (e.g. 1992), and number of years (eg your age at death).
enter annual expenses / drawdowns (year by year, ideally, since I can forecast some changes over my life - ie downsize city house for sea change, super payout, kids college, assisted living, etc)
change start date to see timing: best case, worst case, average case
RESULT: a graph of your net work , so you can see success ("die with money") with a variety of market conditions.
Know of any tool like this?
cheers
Hello! I'm fairly new to Django, and working on my first "real" project, where I'm building a tool to help automate some processes at work. I'm still in the process of building out all of the forms, views and logic to do this, and have about a dozen models created as well as some data I've loaded in as I've built things just to ensure that everything is working as intended. But, There is nothing in there that I'd be worried about losing if that's the easiest path.
I've largely built this based on following tutorials at a high level of following the steps and substituting the tutorials steps (usually a to-do or library type of app) with the models, views and templates that I'm needing. Though I initially skipped what I am now realizing was a critical step, I never extended the User Model when I started, since I didn't understand the "Why". My bad, and a rookie mistake.
I'd like to do this now before I get any further. I'm using Sqlite on my dev environment, Am I able just follow the process of:
Delete the Sqlite database,
Delete the migration files,
Extend the User Model
Create and run migrations
Create new Superuser
Continue on as I was.
My thinking is, this would be just like doing it right at the start as if I had created this first thing anyway.
Is there any weird side effects of doing this? I know I'll lose any data I've created, and that's perfectly ok at this point. I suppose I would lose the migration history (I have probably half a dozen migrations, as I've extended models and changed things) but I think if anything that would just clean up things and make it simpler.
Should I take the rest of my models out before running the user extend migration? Or is it ok to leave those in and have a migrate that does of all of that?
Anything else I should be looking out for before doing this?
Thanks in advance!
Hey! I need some basic help I can't seem to find answers to. Here is a quick set up of my experiment:
I divided n mice in 8 different groups, based on two explanatory variables (a behavioural one with 3 levels ('task a', 'task b', 'task c'), and then drug/vehicle. So, the groups are 'task a + drug', 'task b + vehicle', 'task a + drug', 'task b + vehicle', 'task c + drug', 'task c + veh'. I also have a niave pair of groups ('no task + drug' and 'no task + veh')
After my experiment I make slices of each mouse brain, and image around 3 slices from each mice. I have a range of response variables that are morphological features of the cells that appear on the images (for simplicity, lets say 'cell volume', which is normally distributed). On each slice I measure around 30-80 cells, depending on how many are there. I don't want to just average out all the cells and the all the images on each mice, so I need to fit a LMM or GLMM to account for the lack of independence in my measurments.
So, this is how data ends up being nested: mice->slice->cell
mice_id | behavior | drug | slice | cell volume |
---|---|---|---|---|
1 | task a | drug | 1 | 12 |
1 | task a | drug | 1 | 22 |
1 | task a | drug | 1 | 18 |
1 | task a | drug | 1 | 34 |
... | ||||
1 | task a | drug | 3 | 8 |
2 | task a | veh | 1 | 54 |
I want to fit a LMM, with animal and slice as random factors. Is the following pseudo-code correct:
model1 <- lmer(cell_volume ~ behavior * drug + (1|mice_id) + (1|slice), dataframe)
Here are a couple of my questions:
Any advice is appreciated, I'm pretty outside of my area of expertise with this kind of analyses
thanks!
I have a customer who wants to put all his data and tables into one model. There are about 100 tables, lots of columns (about 600 total). The biggest table have about 2m rows added every day, 3 years worth of data. Basically 300 shops, each product listed in a shop (inventory), their barcodes etc for each day. I estimate the model to be more than 15GB. At least the tables have clear one-to-many architecture and I don't anticipate any ambiguity in the relationships.
They are planning to put on a P1 premium capacity, with about 500 daily users. Do you have any experience with models like this? Is it going o work?
As the title states, Iβm a noob in the back end data modelling. I have been building reports from an excisting database for a while, but now my company asked if I would be willing to do some back end stuff in powerBI as well.
I do know how to connect to SQL servers and how to select the tables that Iβd like to use for my report. However for example I do not know to combine data from table A with table B in a visual and how to build a smart data model.
Do any of you have tips for a good βmentorβ who can teach me this (preferably for free) or advice on how to build a proper data model?
Thanks a bunch!
I'm in a bit of a predicament. I need to add a like/dislike system into my app and I can't find a way to do so without exposing the users who voted a specific way since firestore sends an entire document and cannot keep certain parts hidden.
by De Veaux, R., P. Velleman, D. Bock, A. Vukov, and A Wong
This a bit of a shot in the dark, but many thanks nonetheless! : )
My doctor wants to check on the structure of my heart, so I'm going in to get an echo in a few weeks. I think it would be awesome to be able to print a full scale model of my heart.
I have absolutely no idea how to do this, does anyone here? I would pay for data processing and for a clear SLA print by the same person, if I'm that lucky.
Does anyone else who just received the Model Y recently have this changed?
- USB-C ports in the center console are only for charging. Game controllers or SSD don't work.
- Missing Sonic, Cuphead or any controller-required games.
Hey everyone,
so I recently got rejected after a pretty long take home (10 days) for some tech unicorn, which sucks of course because I'm doing this next to work and my own research, but hey things can't always work out.
I have 4 YOE and a research background in time series and econometrics. I've worked with some NLP + classification tasks, but would not consider myself an expert here. Since everything is machine learning nowadays I'm brushing up my knowledge currently. I'm studying ISLR, and more advanced literature if I want to dig deeper, but I'm wondering if that's the right move atm. I don't have a lot of time so I have to think carefully about my priorities :/
The feedback I received from the company got me wondering a bit. I'm working as a data scientist in one of the big 4 companies (but you know, these auditing companies, not tech). For several reasons I would like to switch into a more tech oriented company.
In my job I've been exposed to a pretty wide range of projects, everything from explaining regressions to Karen from accounting to providing the model component for large projects that are deployed in some cloud and get out to the customers.
However, I usually understood my role to be more focussed on the actual modeling part, i.e. we have some data source and we have a problem that we want to solve, what's the best model and what insights can it provide? And then usually I team up with engineering and they would make sure this model is deployed e.g. in Kubernetes.
This specific position was for a senior data scientist and the reason why they rejected me is that they weren't convinced of my software engineering skills and thought that my assignment indicated a lack of experience in putting things to production. Fair enough, whatever.
The good thing about being in a consultancy is that I can easily find new projects if I want to develop certain skills, but now I'm wondering if those are very role specific requirements or if I misunderstood my role so far / the role that senior data scientists have. While it's of course always a good idea to learn new skills, they also mean project and time commitments in which I can't focus on other skills.
I would be very happy to learn how important the deployment component for you guys on the more experienced side usually is. Do you work in teams with engineering, do you actually handle it yourself and are part of the production team? Ofc this can't be generalized but hoping for some discussions and
... keep reading on reddit β‘Often the Auto ML or low code Data science library creators take refuge in the famous Leo Brieman's paper "Statistical Modeling : The two cultures". In it, he argues that statistician's fixation with Data models has led to :
In his famous paper Leo Brieman gives an impression that Prediction is everything or as I infer it, one should just worry about predictive accuracy and perhaps one should not waste too much time trying to understand the Data generating process.
Fast forward 21 yrs, we find ourselves with cases like Zillow and other umpteen cases of Data science project failures where the goal was simply to " find an algorithm f(x) such that for future x in a test set, f(x) will be a good predictor of y.
I feel the mere fixation of trying to find the algorithm f(x) leads to people not focusing on the 'how' and 'Why' questions. The questions 'how' and 'why' are only asked post fact i.e. when the models fail miserably on the ground.
Given this background, don't you think Leo Brieman was wrong in his criticism about 'Data models'?
Would be happy to hear perspectives on both sides.
Who creates the preliminary models like xgboost, random forest, etc? I used to think ML engineers did that but was told by an ML engineer that the Data scientists at his company create the models & then he gets them ready for production. So is this typically the case or does it just depend on the company?
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.