23 Hilarious Query optimization Puns

DAX Measure to link and Query Optimization

Hello everyone,

So, I have been tasked with automating some reports for my organization. I'm relatively new to DAX so I request help with this one. Apologies if this goes very long.

I have a sales table like this:

1000s of products with lakhs of rows

And a stock table like this:

https://preview.redd.it/1pwawr6mt2c81.png?width=712&format=png&auto=webp&s=8c52ee71151e2ea7d5f32cc476dffb903374026c

I need to prepare a top 30 sales report as the main report. And for reconciliation, the rest of the products should be numbered as 31 and named as Others. So what I did is:

Took the data via Power Query
Created a separate table for Top 30 with an index (Let's call it Index Table)
Merged with the existing tables shown above
Changed the blanks (Products other than the Top 30) with Rank as 31 and Name as Others
Linked the Index table with the above 2 tables via relationships in Power Pivot.

The issue is, I need a subsidiary report with the top 5 supplier sales of the top 30 units along with the stock present with the respective suppliers. Below image for reference:

https://preview.redd.it/0ecubk5qt2c81.png?width=655&format=png&auto=webp&s=7a34ea43a70a83d6f8cb3d2ce79a277cc90b96e1

I took the first two columns from Index Table and the rest from the linked tables. However, I couldn't get the stock values as intended. It cites something related to relationships. I couldn't create any other relationship related to the supplier due to many duplicates present under "Others".

It's okay if this report doesn't have the 31st product. Top 30 is fine. However, I don't want to create a separate file for this altogether.

I want to know whether I could write any DAX measure to get the stock values. Also, is there any way I could optimize the way I arrived at the top 30 products?

Thank you.

👍︎ 5

💬︎

👤︎ u/Warlock_22

📅︎ Jan 16 2022

🚨︎ report

SQL Query Optimization with Indexing selectfrom.dev/sql-query-…

👍︎ 44

💬︎

👤︎ u/payopt

📅︎ Dec 27 2021

🚨︎ report

Query Optimization Question

I am trying to understand if there is a more performant way to achieve this outcome. I have a table (called SRC.Prescription_Fills) containing 1,954,935,946 records, all bogus data. I partitioned the table so each month and year is within its partition.

I want to get a record count for each year and month. I created the following query to generate this output.

    SELECT [Month] AS [Month],
        [2014], [2015], [2016], [2017], [2018], [2019], [2020]
    FROM
    (
        SELECT
            YEAR(Fill_Date_Time) AS [Year], 
            MONTH(Fill_Date_Time) AS [Month],
            Fill_Identifier
        FROM 
            SRC.Prescription_Fills WITH (NOLOCK)
    ) AS Prescription_Fills
    PIVOT
    (
        COUNT_BIG(Fill_Identifier)
        FOR [Year] IN ([2014], [2015], [2016], [2017], [2018], [2019], [2020])
    ) AS PVT
    ORDER BY 
        [Month];

The query plan for this query is https://www.brentozar.com/pastetheplan/?id=B1j0YbG9K.

The output of this query looks like this:

https://preview.redd.it/4865dh6nfw481.png?width=805&format=png&auto=webp&s=acd49703a93fb321bf287bd0bde033eb72110cf6

This query takes up to 16 minutes to run. Any ideas of what I can do to improve it?

👍︎ 4

💬︎

👤︎ u/kentmaxwell

📅︎ Dec 11 2021

🚨︎ report

Need help in Query Optimization

WITH release_cte AS
(SELECT release_nbr,
        base_div_name,
        country_code,
        house_nbr,
        xref_count,
        effective_release_date,
        ROW_NUMBER() OVER (PARTITION BY release_nbr
        ORDER BY create_ts ASC) AS row_nbr
FROM status_table
WHERE create_ts &gt;= '2021-03-27 18:43:50.307'
        AND house_nbr=32612
        AND country_code='US'
        AND process_status_code in (16, 4096)
        AND release_nbr &gt;= 0
        AND release_nbr NOT in
        (
            SELECT DISTINCT release_nbr
            FROM status_table
            WHERE create_ts &gt;= '2021-03-27 18:43:50.307'
            AND house_nbr=32612
            AND country_code='US'
            AND item_xref_id= -1
        ) 
)

SELECT release_nbr,
        base_div_name,
        country_code,
        house_nbr,
        xref_count,
        effective_release_date,
        MAX(row_nbr) AS row_nbr
FROM release_cte
GROUP BY release_nbr,
        base_div_name,
        country_code,
        house_nbr,
        xref_count,
        effective_release_date;

👍︎ 8

💬︎

👤︎ u/AnalogyOverSixes

📅︎ Nov 19 2021

🚨︎ report

EXPLAIN (ANALYZE) needs BUFFERS to improve the Postgres query optimization process postgres.ai/blog/20220106…

👍︎ 6

💬︎

👤︎ u/ImprovementBig3186

📅︎ Jan 07 2022

🚨︎ report

SQL Query Optimization: Understanding Key Principle hinty.io/devforth/sql-que…

👍︎ 2k

💬︎

👤︎ u/Zaiden-Rhys1

📅︎ May 27 2021

🚨︎ report

Indexing and query optimization

Hello everyone!

I'm newbie to MongoDB and I'm trying to do some query profiling and indexes creation on an Amazon dataset, for University project purpose. I have a doubt about indexing I couldn't clarify, hoping that this is not too trivial:

How can I choose a good index to speed up a very slow $lookup query? I can create indexes on both collections, on fields being queried, but is this really useful?

👍︎ 2

💬︎

👤︎ u/MATTIOLATO

📅︎ Nov 16 2021

🚨︎ report

Pro-tips for query optimization in a data warehouse?

Hi r/SQL,

I’m in a bit of uncharted waters currently. I’ve recently changed companies, and the amount of data I sort through has gone from localized servers for individual clients, to a full blown data warehouse with billions of rows in each and all tables. (MSP->large client)

The ad hoc report I’ve been working on is not difficult or fancy. However, I’m having to reference and join to about 10 tables with an astounding (To me) amount of data.

My question: How do I tackle this? This simple query is taking 2-3 hours to run, and even breaking it down further into individual selects with simple conditions is taking an hour to run individually. (Ex. Select X from Y where;)

Do I need to just run these queries off the clock or on a weekend? Any solutions I could try or that you’d recommend?

Edit: asked my boss the same question and he hit me with “Welcome to my world” hahaha

👍︎ 22

💬︎

👤︎ u/assblaster68

📅︎ Aug 09 2021

🚨︎ report

Good resources for database query optimization and schema design?

Title says it all! I need good resources on both topics

👍︎ 6

💬︎

👤︎ u/theprogrammingsteak

📅︎ Sep 26 2021

🚨︎ report

Who handles query optimization in large organisations?

I’m interested to understand which job profile handles database related issues like query optimization, query performance tuning etc. in large organisations?

👍︎ 30

💬︎

👤︎ u/drwho1990

📅︎ Jun 01 2021

🚨︎ report

Join Query Optimization

I have two tables,

BookingMetaData and BookingDetails (MySql)

Both are huge tables.

So if i do something like

SELECT * FROM BookingMetaData bm INNER JOIN BookingDetails bd on bm.id = bd.id WHERE bm.id > 5M ; (Assuming currently there are slightly more than 5M records)

or instead of WHERE,

If I put ORDER BY bm.id DESC limit 100;

then,

Will MySql try to join the tables first (5M records) and then filter, or will it be able to do some optimisation after merging a few records in some binary search way.

If not, how can I do such operation efficiently. ( I am not allowed to change the tables)

Any help is greatly appreciated. Thank you

👍︎ 2

💬︎

👤︎ u/Express_Rest_3753

📅︎ Sep 26 2021

🚨︎ report

Query Optimization in SQL

I am trying to figure out which one of these queries would be much faster (has less time complexity) and why.

Question 1: I am trying to find the top 10 distances.

Query 1

SELECT
    user_id,
    distance
FROM
    travel_table
ORDER BY
    distance
LIMIT 10;

Query 2

SELECT
    user_id,
    distance,
    rank() OVER (partition by distnace DESC)
FROM
    travel_table
WHERE
    rank &lt; 10;

Question 2: Finding top users

Query 1: Subquery

SELECT
    name,
    total_distance
FROM
(
    SELECT
        user_id,
        SUM(distance) AS total_distance
    FROM
        lyft_rides_log
    GROUP BY
        user_id
    ORDER BY
        total_distance DESC 
    LIMIT 10
) a
INNER JOIN 
    lyft_users b
ON 
    a.user_id = b.id;

Query 2: Inner join

SELECT
    name,
    SUM(distance) as total_distance
FROM
    lyft_rides_log r
INNER JOIN
    lyft_users u 
ON r.user_id = u.id
GROUP BY
    name
ORDER By
    total_distance DESC
LIMIT 10

👍︎ 5

💬︎

👤︎ u/brownstrom

📅︎ Jul 15 2021

🚨︎ report

Snowflake: Query Optimization Using Materialized View clark-perucho.medium.com/…

👍︎ 4

💬︎

👤︎ u/fhoffa

📅︎ Sep 06 2021

🚨︎ report

Query Optimization in SQL /r/dataengineering/commen…