DAX Measure to link and Query Optimization

Hello everyone,

So, I have been tasked with automating some reports for my organization. I'm relatively new to DAX so I request help with this one. Apologies if this goes very long.

I have a sales table like this:

1000s of products with lakhs of rows

And a stock table like this:

https://preview.redd.it/1pwawr6mt2c81.png?width=712&format=png&auto=webp&s=8c52ee71151e2ea7d5f32cc476dffb903374026c

I need to prepare a top 30 sales report as the main report. And for reconciliation, the rest of the products should be numbered as 31 and named as Others. So what I did is:

  1. Took the data via Power Query
  2. Created a separate table for Top 30 with an index (Let's call it Index Table)
  3. Merged with the existing tables shown above
  4. Changed the blanks (Products other than the Top 30) with Rank as 31 and Name as Others
  5. Linked the Index table with the above 2 tables via relationships in Power Pivot.

The issue is, I need a subsidiary report with the top 5 supplier sales of the top 30 units along with the stock present with the respective suppliers. Below image for reference:

https://preview.redd.it/0ecubk5qt2c81.png?width=655&format=png&auto=webp&s=7a34ea43a70a83d6f8cb3d2ce79a277cc90b96e1

I took the first two columns from Index Table and the rest from the linked tables. However, I couldn't get the stock values as intended. It cites something related to relationships. I couldn't create any other relationship related to the supplier due to many duplicates present under "Others".

It's okay if this report doesn't have the 31st product. Top 30 is fine. However, I don't want to create a separate file for this altogether.

I want to know whether I could write any DAX measure to get the stock values. Also, is there any way I could optimize the way I arrived at the top 30 products?

Thank you.

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/Warlock_22
πŸ“…︎ Jan 16 2022
🚨︎ report
SQL Query Optimization with Indexing selectfrom.dev/sql-query-…
πŸ‘︎ 44
πŸ’¬︎
πŸ‘€︎ u/payopt
πŸ“…︎ Dec 27 2021
🚨︎ report
Query Optimization Question

I am trying to understand if there is a more performant way to achieve this outcome. I have a table (called SRC.Prescription_Fills) containing 1,954,935,946 records, all bogus data. I partitioned the table so each month and year is within its partition.

I want to get a record count for each year and month. I created the following query to generate this output.

    SELECT [Month] AS [Month],
        [2014], [2015], [2016], [2017], [2018], [2019], [2020]
    FROM
    (
        SELECT
            YEAR(Fill_Date_Time) AS [Year], 
            MONTH(Fill_Date_Time) AS [Month],
            Fill_Identifier
        FROM 
            SRC.Prescription_Fills WITH (NOLOCK)
    ) AS Prescription_Fills
    PIVOT
    (
        COUNT_BIG(Fill_Identifier)
        FOR [Year] IN ([2014], [2015], [2016], [2017], [2018], [2019], [2020])
    ) AS PVT
    ORDER BY 
        [Month];

The query plan for this query is https://www.brentozar.com/pastetheplan/?id=B1j0YbG9K.

The output of this query looks like this:

https://preview.redd.it/4865dh6nfw481.png?width=805&format=png&auto=webp&s=acd49703a93fb321bf287bd0bde033eb72110cf6

This query takes up to 16 minutes to run. Any ideas of what I can do to improve it?

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/kentmaxwell
πŸ“…︎ Dec 11 2021
🚨︎ report
Need help in Query Optimization
WITH release_cte AS
(SELECT release_nbr,
        base_div_name,
        country_code,
        house_nbr,
        xref_count,
        effective_release_date,
        ROW_NUMBER() OVER (PARTITION BY release_nbr
        ORDER BY create_ts ASC) AS row_nbr
FROM status_table
WHERE create_ts >= '2021-03-27 18:43:50.307'
        AND house_nbr=32612
        AND country_code='US'
        AND process_status_code in (16, 4096)
        AND release_nbr >= 0
        AND release_nbr NOT in
        (
            SELECT DISTINCT release_nbr
            FROM status_table
            WHERE create_ts >= '2021-03-27 18:43:50.307'
            AND house_nbr=32612
            AND country_code='US'
            AND item_xref_id= -1
        ) 
)

SELECT release_nbr,
        base_div_name,
        country_code,
        house_nbr,
        xref_count,
        effective_release_date,
        MAX(row_nbr) AS row_nbr
FROM release_cte
GROUP BY release_nbr,
        base_div_name,
        country_code,
        house_nbr,
        xref_count,
        effective_release_date;
πŸ‘︎ 8
πŸ’¬︎
πŸ‘€︎ u/AnalogyOverSixes
πŸ“…︎ Nov 19 2021
🚨︎ report
EXPLAIN (ANALYZE) needs BUFFERS to improve the Postgres query optimization process postgres.ai/blog/20220106…
πŸ‘︎ 6
πŸ’¬︎
πŸ“…︎ Jan 07 2022
🚨︎ report
SQL Query Optimization: Understanding Key Principle hinty.io/devforth/sql-que…
πŸ‘︎ 2k
πŸ’¬︎
πŸ‘€︎ u/Zaiden-Rhys1
πŸ“…︎ May 27 2021
🚨︎ report
Indexing and query optimization

Hello everyone!

I'm newbie to MongoDB and I'm trying to do some query profiling and indexes creation on an Amazon dataset, for University project purpose. I have a doubt about indexing I couldn't clarify, hoping that this is not too trivial:

How can I choose a good index to speed up a very slow $lookup query? I can create indexes on both collections, on fields being queried, but is this really useful?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/MATTIOLATO
πŸ“…︎ Nov 16 2021
🚨︎ report
Pro-tips for query optimization in a data warehouse?

Hi r/SQL,

I’m in a bit of uncharted waters currently. I’ve recently changed companies, and the amount of data I sort through has gone from localized servers for individual clients, to a full blown data warehouse with billions of rows in each and all tables. (MSP->large client)

The ad hoc report I’ve been working on is not difficult or fancy. However, I’m having to reference and join to about 10 tables with an astounding (To me) amount of data.

My question: How do I tackle this? This simple query is taking 2-3 hours to run, and even breaking it down further into individual selects with simple conditions is taking an hour to run individually. (Ex. Select X from Y where;)

Do I need to just run these queries off the clock or on a weekend? Any solutions I could try or that you’d recommend?

Edit: asked my boss the same question and he hit me with β€œWelcome to my world” hahaha

πŸ‘︎ 22
πŸ’¬︎
πŸ‘€︎ u/assblaster68
πŸ“…︎ Aug 09 2021
🚨︎ report
Good resources for database query optimization and schema design?

Title says it all! I need good resources on both topics

πŸ‘︎ 6
πŸ’¬︎
πŸ“…︎ Sep 26 2021
🚨︎ report
Who handles query optimization in large organisations?

I’m interested to understand which job profile handles database related issues like query optimization, query performance tuning etc. in large organisations?

πŸ‘︎ 30
πŸ’¬︎
πŸ‘€︎ u/drwho1990
πŸ“…︎ Jun 01 2021
🚨︎ report
Join Query Optimization

I have two tables,

BookingMetaData and BookingDetails (MySql)

Both are huge tables.

So if i do something like

SELECT * FROM BookingMetaData bm INNER JOIN BookingDetails bd on bm.id = bd.id WHERE bm.id > 5M ; (Assuming currently there are slightly more than 5M records)

or instead of WHERE,

If I put ORDER BY bm.id DESC limit 100;

then,

Will MySql try to join the tables first (5M records) and then filter, or will it be able to do some optimisation after merging a few records in some binary search way.

If not, how can I do such operation efficiently. ( I am not allowed to change the tables)

Any help is greatly appreciated. Thank you

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Express_Rest_3753
πŸ“…︎ Sep 26 2021
🚨︎ report
Query Optimization in SQL

I am trying to figure out which one of these queries would be much faster (has less time complexity) and why.

Question 1: I am trying to find the top 10 distances.

Query 1

SELECT
    user_id,
    distance
FROM
    travel_table
ORDER BY
    distance
LIMIT 10; 

Query 2

SELECT
    user_id,
    distance,
    rank() OVER (partition by distnace DESC)
FROM
    travel_table
WHERE
    rank < 10;

Question 2: Finding top users

Query 1: Subquery

SELECT
    name,
    total_distance
FROM
(
    SELECT
        user_id,
        SUM(distance) AS total_distance
    FROM
        lyft_rides_log
    GROUP BY
        user_id
    ORDER BY
        total_distance DESC 
    LIMIT 10
) a
INNER JOIN 
    lyft_users b
ON 
    a.user_id = b.id;

Query 2: Inner join

SELECT
    name,
    SUM(distance) as total_distance
FROM
    lyft_rides_log r
INNER JOIN
    lyft_users u 
ON r.user_id = u.id
GROUP BY
    name
ORDER By
    total_distance DESC
LIMIT 10 
πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/brownstrom
πŸ“…︎ Jul 15 2021
🚨︎ report
Snowflake: Query Optimization Using Materialized View clark-perucho.medium.com/…
πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/fhoffa
πŸ“…︎ Sep 06 2021
🚨︎ report
Query Optimization in SQL /r/dataengineering/commen…
πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/brownstrom
πŸ“…︎ Jul 15 2021
🚨︎ report
Who handles query optimization in large organisations? /r/SQL/comments/npn0hz/wh…
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/drwho1990
πŸ“…︎ Jun 01 2021
🚨︎ report
query optimization

I am trying to remove duplicate rows from my select query. By duplicate I mean , if the the 3 column matches ,

I want to remove those records from selected records.

I am trying like this :

>SELECT column1 , column2 , column3,column4 , column5,column6

>FROM TableA

>WHERE NOT EXIST (SELECT 1 from FROM TableB where TableB.columnX= TableA.column1 AND TableB.columnY=

>TableA.column2 AND TableB.columnZ= TableA.column3 )

By duplicate I mean , if the the above 3 column matchs , I want to remove from selected records.

This is not giving results.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/anacondaonline
πŸ“…︎ Jun 09 2021
🚨︎ report
[DEMO]: Using changelog streams for Flink SQL query optimization ververica.com/blog/sql-qu…
πŸ‘︎ 10
πŸ’¬︎
πŸ‘€︎ u/Marksfik
πŸ“…︎ May 12 2021
🚨︎ report
[DEMO]: Using changelog streams for Flink SQL query optimization ververica.com/blog/sql-qu…
πŸ‘︎ 12
πŸ’¬︎
πŸ‘€︎ u/Marksfik
πŸ“…︎ May 12 2021
🚨︎ report
Rule-based Query Optimization querifylabs.com/blog/rule…
πŸ‘︎ 10
πŸ’¬︎
πŸ‘€︎ u/devozerov
πŸ“…︎ May 08 2021
🚨︎ report
[DEMO]: Using changelog streams for Flink SQL query optimization ververica.com/blog/sql-qu…
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Marksfik
πŸ“…︎ May 12 2021
🚨︎ report
Now Generally Available, Snowflake’s Search Optimization Service Accelerates Queries Dramatically | Snowflake Blog snowflake.com/blog/now-ge…
πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/fhoffa
πŸ“…︎ Mar 08 2021
🚨︎ report
[DEMO]: Using changelog streams for Flink SQL query optimization ververica.com/blog/sql-qu…
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Marksfik
πŸ“…︎ May 12 2021
🚨︎ report
I modified an SQL query from 24 mins down to 2 seconds - A tale of query optimization parallelthoughts.xyz/2019…
πŸ‘︎ 3k
πŸ’¬︎
πŸ‘€︎ u/NaeblisEcho
πŸ“…︎ May 19 2019
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.