Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    ShinyHunters Claims 1 Petabyte Information Breach at Telus Digital

    March 14, 2026

    Easy methods to Purchase Used or Refurbished Electronics (2026)

    March 14, 2026

    Rent Gifted Offshore Copywriters In The Philippines

    March 14, 2026
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Machine Learning & Research»Most Candidates Fail These SQL Ideas in Information Interviews
    Machine Learning & Research

    Most Candidates Fail These SQL Ideas in Information Interviews

    Oliver ChambersBy Oliver ChambersSeptember 6, 2025No Comments20 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Most Candidates Fail These SQL Ideas in Information Interviews
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Most Candidates Fail These SQL Concepts in Data Interviews
    Picture by creator | Canva

     

    An interviewer’s job is to seek out essentially the most appropriate candidates for the marketed place. In doing so, they’ll gladly arrange SQL interview inquiries to see if they’ll catch you off guard. There are a number of SQL ideas at which candidates typically fail.

    Hopefully, you’ll be a kind of who keep away from that future, as I’ll clarify these ideas intimately under, full with examples of find out how to resolve sure issues accurately.

     
    Most Candidates Fail These SQL Concepts in Data InterviewsMost Candidates Fail These SQL Concepts in Data Interviews
     

    # 1. Window Capabilities

     
    Why It’s Exhausting: Candidates memorize what every window operate does however don’t actually perceive how window frames, partitions, or ordering really work.

    Frequent Errors: A standard mistake just isn’t specifying ORDER BY in rating window capabilities or worth window capabilities, resembling LEAD() or LAG(), and anticipating the question to work or for the end result to be deterministic.

    Instance: In this instance, it’s essential discover customers who made a second buy inside 7 days of any earlier buy.

    You may write this question.

    WITH ordered_tx AS (
      SELECT user_id,
             created_at::date AS tx_date,
             LAG(created_at::DATE) OVER (PARTITION BY user_id) AS prev_tx_date
      FROM amazon_transactions
    )
    
    SELECT DISTINCT user_id
    FROM ordered_tx
    WHERE prev_tx_date IS NOT NULL AND tx_date - prev_tx_date <= 7;

     

    At first look, every part might sound proper. The code even outputs one thing which may seem like an accurate reply.

     
    Window FunctionsWindow Functions
     

    To start with, we’re fortunate that the code works in any respect! This occurs just because I’m writing it in PostgreSQL. In another SQL flavors, you’d get an error since ORDER BY is obligatory in rating and analytical window capabilities.

    Second, the output is unsuitable; I highlighted some rows that shouldn’t be there. Why do they seem, then?

    They seem as a result of we didn’t specify an ORDER BY clause within the LAG() window operate. With out it, the row order is bigoted. So, we’re evaluating the present transaction to some random earlier row for that consumer, not the one which occurred instantly earlier than it in time.

    This isn’t what the query asks. We have to examine every transaction to the earlier one by date. In different phrases, we have to specify this explicitly within the ORDER BY clause throughout the LAG() operate.

    WITH ordered_tx AS (
      SELECT user_id,
             created_at::date AS tx_date,
             LAG(created_at::DATE) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_tx_date
      FROM amazon_transactions
    )
    
    SELECT DISTINCT user_id
    FROM ordered_tx
    WHERE prev_tx_date IS NOT NULL AND tx_date - prev_tx_date <= 7;

     

    # 2. Filtering With Aggregates (Particularly HAVING vs. WHERE)

     
    Why It’s Exhausting: Individuals typically don’t perceive the execution order in SQL, which is: FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY. This order signifies that WHERE filters rows earlier than aggregation, and HAVING filters after. That additionally, logically, means you can’t use mixture capabilities within the WHERE clause.

    Frequent Mistake: Attempting to make use of mixture capabilities in WHERE in a grouped question and getting an error.

    Instance: This interview query asks you to seek out the overall income made by every vineyard. Solely wineries the place 90 is the bottom variety of factors for any of their varieties must be thought-about.

    Many will see this as a straightforward query and unexpectedly write this question.

    SELECT vineyard,
           selection,
           SUM(value) AS total_revenue
    FROM winemag_p1
    WHERE MIN(factors) >= 90
    GROUP BY vineyard, selection
    ORDER BY vineyard, total_revenue DESC;

     

    Nevertheless, that code will throw an error stating that mixture capabilities are usually not allowed within the WHERE clause. This gorgeous a lot explains every part. The answer? Transfer the filtering situation from WHERE to HAVING.

    SELECT vineyard,
           selection,
           SUM(value) AS total_revenue
    FROM winemag_p1
    GROUP BY vineyard, selection
    HAVING MIN(factors) >= 90
    ORDER BY vineyard, total_revenue DESC;

     

    # 3. Self-Joins for Time-Primarily based or Occasion-Primarily based Comparisons

     
    Why It’s Exhausting: The thought of becoming a member of a desk with itself is kind of unintuitive, so candidates typically neglect it’s an possibility.

    Frequent Mistake: Utilizing subqueries and complicating the question when becoming a member of a desk with itself could be less complicated and sooner, particularly when filtering by dates or occasions.

    Instance: Right here’s a query asking you to indicate the change of each foreign money’s change fee between 1 January 2020 and 1 July 2020.

    You’ll be able to resolve this by writing an outer correlated subquery that fetches the July 1 change charges, then subtracts the January 1 change charges, which come from the internal subquery.

    SELECT jan_rates.source_currency,
      (SELECT exchange_rate 
       FROM sf_exchange_rate 
       WHERE source_currency = jan_rates.source_currency AND date="2020-07-01") - jan_rates.exchange_rate AS distinction
    FROM (SELECT source_currency, exchange_rate
          FROM sf_exchange_rate
          WHERE date="2020-01-01"
    ) AS jan_rates;

     

    This returns an accurate output, however such an answer is unnecessarily difficult. A a lot less complicated answer, with fewer strains of code, entails self-joining a desk with itself after which making use of two date filtering circumstances within the WHERE clause.

    SELECT jan.source_currency,
           jul.exchange_rate - jan.exchange_rate AS distinction
    FROM sf_exchange_rate jan
    JOIN sf_exchange_rate jul ON jan.source_currency = jul.source_currency
    WHERE jan.date="2020-01-01" AND jul.date="2020-07-01";

     

    # 4. Subqueries vs. Frequent Desk Expressions (CTEs)

     
    Why It’s Exhausting: Individuals typically get caught on subqueries as a result of they study them earlier than Frequent Desk Expressions (CTEs) and proceed utilizing them for any question with layered logic. Nevertheless, subqueries can get messy in a short time.

    Frequent Mistake: Utilizing deeply nested SELECT statements when CTEs could be a lot less complicated.

    Instance: Within the interview query from Google and Netflix, it’s essential discover the highest actors primarily based on their common film score throughout the style by which they seem most steadily.

    The answer utilizing CTEs is as follows.

    WITH genre_stats AS
      (SELECT actor_name,
              style,
              COUNT(*) AS movie_count,
              AVG(movie_rating) AS avg_rating
       FROM top_actors_rating
       GROUP BY actor_name,
                style),
                
    max_genre_count AS
      (SELECT actor_name,
              MAX(movie_count) AS max_count
       FROM genre_stats
       GROUP BY actor_name),
         
    top_genres AS
      (SELECT gs.*
       FROM genre_stats gs
       JOIN max_genre_count mgc ON gs.actor_name = mgc.actor_name
       AND gs.movie_count = mgc.max_count),
         
    top_genre_avg AS
      (SELECT actor_name,
              MAX(avg_rating) AS max_avg_rating
       FROM top_genres
       GROUP BY actor_name),
       
    filtered_top_genres AS
      (SELECT tg.*
       FROM top_genres tg
       JOIN top_genre_avg tga ON tg.actor_name = tga.actor_name
       AND tg.avg_rating = tga.max_avg_rating),
         ranked_actors AS
      (SELECT *,
              DENSE_RANK() OVER (
                                 ORDER BY avg_rating DESC) AS rank
       FROM filtered_top_genres),
       
    final_selection AS
      (SELECT MAX(rank) AS max_rank
       FROM ranked_actors
       WHERE rank <= 3)
       
    SELECT actor_name,
           style,
           avg_rating
    FROM ranked_actors
    WHERE rank <=
        (SELECT max_rank
         FROM final_selection);
    

     

    It’s comparatively difficult, nevertheless it nonetheless consists of six clear CTEs, with the code’s readability enhanced by clear aliases.

    Curious what the identical answer would appear to be utilizing solely subqueries? Right here it’s.

    SELECT ra.actor_name,
           ra.style,
           ra.avg_rating
    FROM (
        SELECT *,
               DENSE_RANK() OVER (ORDER BY avg_rating DESC) AS rank
        FROM (
            SELECT tg.*
            FROM (
                SELECT gs.*
                FROM (
                    SELECT actor_name,
                           style,
                           COUNT(*) AS movie_count,
                           AVG(movie_rating) AS avg_rating
                    FROM top_actors_rating
                    GROUP BY actor_name, style
                ) AS gs
                JOIN (
                    SELECT actor_name,
                           MAX(movie_count) AS max_count
                    FROM (
                        SELECT actor_name,
                               style,
                               COUNT(*) AS movie_count,
                               AVG(movie_rating) AS avg_rating
                        FROM top_actors_rating
                        GROUP BY actor_name, style
                    ) AS genre_stats
                    GROUP BY actor_name
                ) AS mgc
                ON gs.actor_name = mgc.actor_name AND gs.movie_count = mgc.max_count
            ) AS tg
            JOIN (
                SELECT actor_name,
                       MAX(avg_rating) AS max_avg_rating
                FROM (
                    SELECT gs.*
                    FROM (
                        SELECT actor_name,
                               style,
                               COUNT(*) AS movie_count,
                               AVG(movie_rating) AS avg_rating
                        FROM top_actors_rating
                        GROUP BY actor_name, style
                    ) AS gs
                    JOIN (
                        SELECT actor_name,
                               MAX(movie_count) AS max_count
                        FROM (
                            SELECT actor_name,
                                   style,
                                   COUNT(*) AS movie_count,
                                   AVG(movie_rating) AS avg_rating
                            FROM top_actors_rating
                            GROUP BY actor_name, style
                        ) AS genre_stats
                        GROUP BY actor_name
                    ) AS mgc
                    ON gs.actor_name = mgc.actor_name AND gs.movie_count = mgc.max_count
                ) AS top_genres
                GROUP BY actor_name
            ) AS tga
            ON tg.actor_name = tga.actor_name AND tg.avg_rating = tga.max_avg_rating
        ) AS filtered_top_genres
    ) AS ra
    WHERE ra.rank <= (
        SELECT MAX(rank)
        FROM (
            SELECT *,
                   DENSE_RANK() OVER (ORDER BY avg_rating DESC) AS rank
            FROM (
                SELECT tg.*
                FROM (
                    SELECT gs.*
                    FROM (
                        SELECT actor_name,
                               style,
                               COUNT(*) AS movie_count,
                               AVG(movie_rating) AS avg_rating
                        FROM top_actors_rating
                        GROUP BY actor_name, style
                    ) AS gs
                    JOIN (
                        SELECT actor_name,
                               MAX(movie_count) AS max_count
                        FROM (
                            SELECT actor_name,
                                   style,
                                   COUNT(*) AS movie_count,
                                   AVG(movie_rating) AS avg_rating
                            FROM top_actors_rating
                            GROUP BY actor_name, style
                        ) AS genre_stats
                        GROUP BY actor_name
                    ) AS mgc
                    ON gs.actor_name = mgc.actor_name AND gs.movie_count = mgc.max_count
                ) AS tg
                JOIN (
                    SELECT actor_name,
                           MAX(avg_rating) AS max_avg_rating
                    FROM (
                        SELECT gs.*
                        FROM (
                            SELECT actor_name,
                                   style,
                                   COUNT(*) AS movie_count,
                                   AVG(movie_rating) AS avg_rating
                            FROM top_actors_rating
                            GROUP BY actor_name, style
                        ) AS gs
                        JOIN (
                            SELECT actor_name,
                                   MAX(movie_count) AS max_count
                            FROM (
                                SELECT actor_name,
                                       style,
                                       COUNT(*) AS movie_count,
                                       AVG(movie_rating) AS avg_rating
                                FROM top_actors_rating
                                GROUP BY actor_name, style
                            ) AS genre_stats
                            GROUP BY actor_name
                        ) AS mgc
                        ON gs.actor_name = mgc.actor_name AND gs.movie_count = mgc.max_count
                    ) AS top_genres
                    GROUP BY actor_name
                ) AS tga
                ON tg.actor_name = tga.actor_name AND tg.avg_rating = tga.max_avg_rating
            ) AS filtered_top_genres
        ) AS ranked_actors
        WHERE rank <= 3
    );

     

    There may be redundant logic repeated throughout subqueries. What number of subqueries is that? I do not know. The code is unattainable to keep up. Despite the fact that I simply wrote it, I’d nonetheless want half a day to know it if I wished to alter one thing tomorrow. Moreover, the utterly meaningless subquery aliases don’t assist.

     

    # 5. Dealing with NULLs in Logic

     
    Why It’s Exhausting: Candidates typically assume that NULL is the same as one thing. It’s not. NULL isn’t equal to something — not even itself. Logic involving NULLs behaves in another way from logic involving precise values.

    Frequent Mistake: Utilizing = NULL as an alternative of IS NULL in filtering or lacking output rows as a result of NULLs break the situation logic.

    Instance: There’s an interview query by IBM that asks you to calculate the overall variety of interactions and the overall variety of contents created for every buyer.

    It doesn’t sound too tough, so that you may write this answer with two CTEs, the place one CTE counts the variety of interactions per buyer, whereas the opposite counts the variety of content material objects created by a buyer. Within the remaining SELECT, you FULL OUTER JOIN the 2 CTEs, and you’ve got the answer. Proper?

    WITH interactions_summary AS
      (SELECT customer_id,
              COUNT(*) AS total_interactions
       FROM customer_interactions
       GROUP BY customer_id),
       
    content_summary AS
      (SELECT customer_id,
              COUNT(*) AS total_content_items
       FROM user_content
       GROUP BY customer_id)
       
    SELECT i.customer_id,
      i.total_interactions,
      c.total_content_items
    FROM interactions_summary AS i
    FULL OUTER JOIN content_summary AS c ON i.customer_id = c.customer_id
    ORDER BY customer_id;

     

    Virtually proper. Right here’s the output. (By the best way, you see double citation marks (“”) as an alternative of NULL. That’s how the StrataScratch UI shows it, however belief me, the engine nonetheless treats them for what they’re: NULL values).

     
    5. Handling NULLs in Logic5. Handling NULLs in Logic
     

    The highlighted rows comprise NULLs. This makes the output incorrect. A NULL worth is neither the client ID nor the variety of interactions and contents, which the query explicitly asks you to indicate.

    What we’re lacking within the above answer is COALESCE() to deal with NULLs within the remaining SELECT. Now, all the purchasers with out interactions will get their IDs from the content_summary CTE. Additionally, for patrons that don’t have interactions, or content material, or each, we’ll now exchange NULL with 0, which is a sound quantity.

    WITH interactions_summary AS
      (SELECT customer_id,
              COUNT(*) AS total_interactions
       FROM customer_interactions
       GROUP BY customer_id),
       
    content_summary AS
      (SELECT customer_id,
              COUNT(*) AS total_content_items
       FROM user_content
       GROUP BY customer_id)
       
    SELECT COALESCE(i.customer_id, c.customer_id) AS customer_id,
           COALESCE(i.total_interactions, 0) AS total_interactions,
           COALESCE(c.total_content_items, 0) AS total_content_items
    FROM interactions_summary AS i
    FULL OUTER JOIN content_summary AS c ON i.customer_id = c.customer_id
    ORDER BY customer_id;

     

    # 6. Group-Primarily based Deduplication

     
    Why It’s Exhausting: Group-based deduplication means you’re choosing one row per group, e.g., “most up-to-date”, “highest rating”, and so on. At first, it sounds such as you solely want to choose one row per consumer. However you may’t use GROUP BY until you mixture. Alternatively, you typically want a full row, not a single worth that aggregation and GROUP BY return.

    Frequent Mistake: Utilizing GROUP BY + LIMIT 1 (or DISTINCT ON, which is PostgreSQL-specific) as an alternative of ROW_NUMBER() or RANK(), the latter if you’d like ties included.

    Instance: This query asks you to establish the best-selling merchandise for every month, and there’s no must separate months by yr. The most effective-selling merchandise is calculated as unitprice * amount.

    The naive strategy could be this. First, extract the sale month from invoicedate, choose description, and discover the overall gross sales by summing unitprice * amount. Then, to get the overall gross sales by month and product description, we merely GROUP BY these two columns. Lastly, we solely want to make use of ORDER BY to type the output from the very best to the worst-selling product and use LIMIT 1 to output solely the primary row, i.e., the best-selling merchandise.

    SELECT DATE_PART('MONTH', invoicedate) AS sale_month,
           description,
           SUM(unitprice * amount) AS total_paid
    FROM online_retail
    GROUP BY sale_month, description
    ORDER BY total_paid DESC
    LIMIT 1;

     

    As I stated, that is naive; the output considerably resembles what we want, however we want this for each month, not only one.

     
    Group-Based DeduplicationGroup-Based Deduplication
     

    One of many appropriate approaches is to make use of the RANK() window operate. With this strategy, we comply with an analogous technique to the earlier code. The distinction is that the question now turns into a subquery within the FROM clause. As well as, we use RANK() to partition the info by month after which rank the rows inside every partition (i.e., for every month individually) from the best-selling to the worst-selling merchandise.

    Then, in the principle question, we merely choose the required columns and output solely rows the place the rank is 1 utilizing the WHERE clause.

    SELECT month,
           description,
           total_paid
    FROM
      (SELECT DATE_PART('month', invoicedate) AS month,
              description,
              SUM(unitprice * amount) AS total_paid,
              RANK() OVER (PARTITION BY DATE_PART('month', invoicedate) ORDER BY SUM(unitprice * amount) DESC) AS rnk
       FROM online_retail
       GROUP BY month, description) AS tmp
    WHERE rnk = 1;

     

     

    # Conclusion

     
    The six ideas we’ve coated generally seem in SQL coding interview questions. Take note of them, observe interview questions that contain these ideas, study the proper approaches, and also you’ll considerably enhance your possibilities in your interviews.
     
     

    Nate Rosidi is a knowledge scientist and in product technique. He is additionally an adjunct professor instructing analytics, and is the founding father of StrataScratch, a platform serving to information scientists put together for his or her interviews with actual interview questions from high firms. Nate writes on the newest traits within the profession market, offers interview recommendation, shares information science tasks, and covers every part SQL.



    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Oliver Chambers
    • Website

    Related Posts

    5 Highly effective Python Decorators for Excessive-Efficiency Information Pipelines

    March 14, 2026

    What OpenClaw Reveals In regards to the Subsequent Part of AI Brokers – O’Reilly

    March 14, 2026

    mAceReason-Math: A Dataset of Excessive-High quality Multilingual Math Issues Prepared For RLVR

    March 14, 2026
    Top Posts

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025

    Meta resumes AI coaching utilizing EU person knowledge

    April 18, 2025
    Don't Miss

    ShinyHunters Claims 1 Petabyte Information Breach at Telus Digital

    By Declan MurphyMarch 14, 2026

    The Canadian telecoms large Telus is at present selecting up the items after a large…

    Easy methods to Purchase Used or Refurbished Electronics (2026)

    March 14, 2026

    Rent Gifted Offshore Copywriters In The Philippines

    March 14, 2026

    5 Highly effective Python Decorators for Excessive-Efficiency Information Pipelines

    March 14, 2026
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2026 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.