What is the difference between WHERE and HAVING in SQL?

WHERE filters rows before grouping and aggregation, while HAVING filters groups after aggregation has been applied. You cannot use aggregate functions in a WHERE clause but you can in HAVING. For example, `WHERE salary > 80000` filters individual employees before grouping, but `HAVING AVG(salary) > 80000` filters departments based on their aggregated average salary.

What is the difference between ROW_NUMBER, RANK, and DENSE_RANK?

ROW_NUMBER assigns a unique sequential number to every row with no ties (1, 2, 3, 4, 5). RANK assigns the same rank to ties and skips numbers (1, 2, 2, 4, 5). DENSE_RANK assigns the same rank to ties without skipping (1, 2, 2, 3, 4). For an Nth-highest-salary problem, DENSE_RANK is the most robust choice because it handles ties correctly.

When should I use a CTE versus a subquery in SQL?

CTEs improve readability for complex queries because they are named and defined at the top, can be referenced multiple times in the same query, and support recursion (subqueries cannot recurse). Most modern optimizers treat CTEs and subqueries similarly, though some databases like older PostgreSQL versions materialize CTEs. Reach for a CTE when the same intermediate result is used more than once or when the query is hard to follow inline.

What are the most common causes of slow SQL queries?

Missing indexes on WHERE/JOIN/ORDER BY columns; SELECT * instead of specific columns; using functions on indexed columns in WHERE clauses (e.g., `YEAR(order_date) = 2026` instead of a range); the N+1 query problem where you make N separate queries inside a loop instead of one JOIN; and missing LIMIT on large result sets. Use EXPLAIN ANALYZE to find Seq Scans where Index Scans should be.

By PhantomCode TeamPublished March 11, 2026Last reviewed April 29, 202613 min read

50 Essential SQL Interview Questions for Software Engineers (2026)

SQL remains one of the most important skills for software engineers, regardless of your specialization. Whether you are building backend services, working on data pipelines, or designing APIs that interact with databases, SQL proficiency is tested in interviews at virtually every technology company.

SQL Interview Questions

This guide covers 50 essential SQL interview questions organized by difficulty and topic. Each question includes a clear explanation and a working SQL example that you can practice with. We assume a standard relational database (PostgreSQL or MySQL syntax) throughout.

Foundational SQL Concepts (Questions 1-10)
JOINs and Relationships (Questions 11-20)
Aggregation and Grouping (Questions 21-28)
Subqueries and Nested Queries (Questions 29-34)
Window Functions (Questions 35-42)
Common Table Expressions (Questions 43-46)
Query Optimization and Indexing (Questions 47-50)
Tips for SQL Interviews
FAQ
Conclusion

Sample Database Schema

For the examples in this guide, we will use the following tables:

-- Employees table
CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    salary DECIMAL(10, 2),
    hire_date DATE,
    manager_id INT
);
 
-- Departments table
CREATE TABLE departments (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    location VARCHAR(100)
);
 
-- Orders table
CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    amount DECIMAL(10, 2),
    order_date DATE,
    status VARCHAR(20)
);
 
-- Products table
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10, 2)
);
 
-- Customers table
CREATE TABLE customers (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    city VARCHAR(100),
    signup_date DATE
);

Foundational SQL Concepts (Questions 1-10)

Question 1: What is the difference between WHERE and HAVING?

Answer: WHERE filters rows before grouping and aggregation. HAVING filters groups after aggregation has been applied. You cannot use aggregate functions in a WHERE clause, but you can in a HAVING clause.

-- WHERE: filter individual rows
SELECT name, salary
FROM employees
WHERE salary > 80000;
 
-- HAVING: filter after aggregation
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 80000;

Question 2: What is the difference between DELETE, TRUNCATE, and DROP?

Answer:

DELETE removes specific rows from a table. It can include a WHERE clause and is logged per row (can be rolled back). It does not reset auto-increment counters.
TRUNCATE removes all rows from a table. It is faster than DELETE because it is minimally logged. It resets auto-increment counters. It cannot include a WHERE clause.
DROP removes the entire table, including its structure, data, indexes, and constraints.

DELETE FROM employees WHERE department_id = 5;
TRUNCATE TABLE employees;
DROP TABLE employees;

Question 3: What are the different types of SQL constraints?

Answer: Constraints enforce rules on data in a table:

PRIMARY KEY - Uniquely identifies each row. Cannot be NULL.
FOREIGN KEY - References the primary key of another table. Enforces referential integrity.
UNIQUE - Ensures all values in a column are distinct. Allows one NULL (varies by database).
NOT NULL - Prevents NULL values in a column.
CHECK - Validates that values meet a specified condition.
DEFAULT - Provides a default value when none is specified.

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE,
    salary DECIMAL(10, 2) CHECK (salary > 0),
    department_id INT,
    status VARCHAR(20) DEFAULT 'active',
    FOREIGN KEY (department_id) REFERENCES departments(id)
);

Question 4: What is the difference between UNION and UNION ALL?

Answer: UNION combines results from two queries and removes duplicate rows. UNION ALL combines results without removing duplicates. UNION ALL is faster because it skips the deduplication step.

-- UNION: removes duplicates
SELECT city FROM customers
UNION
SELECT location FROM departments;
 
-- UNION ALL: keeps duplicates
SELECT city FROM customers
UNION ALL
SELECT location FROM departments;

Use UNION ALL when you know there are no duplicates or when duplicates are acceptable. It performs significantly better on large datasets.

Question 5: Explain NULL handling in SQL.

Answer: NULL represents the absence of a value. It is not the same as zero, an empty string, or false. Any comparison with NULL returns NULL (not true or false), which is why you must use IS NULL or IS NOT NULL instead of = or !=.

-- Correct way to check for NULL
SELECT name FROM employees WHERE manager_id IS NULL;
 
-- This will NOT work as expected
SELECT name FROM employees WHERE manager_id = NULL;
 
-- COALESCE returns the first non-NULL value
SELECT name, COALESCE(manager_id, 0) AS manager
FROM employees;
 
-- NULLIF returns NULL if two expressions are equal
SELECT NULLIF(salary, 0) AS salary FROM employees;

Question 6: What is the order of execution of a SQL query?

Answer: SQL queries execute in a specific order that differs from the written order:

FROM and JOIN - Determine the data sources
WHERE - Filter individual rows
GROUP BY - Group rows by specified columns
HAVING - Filter groups
SELECT - Choose which columns to return
DISTINCT - Remove duplicate rows
ORDER BY - Sort the result set
LIMIT / OFFSET - Restrict the number of rows returned

Understanding this order explains why you cannot use a column alias defined in SELECT inside a WHERE clause (WHERE executes before SELECT).

Question 7: Write a query to find the second highest salary.

Answer:

-- Using LIMIT and OFFSET
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
 
-- Using a subquery
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
 
-- Using DENSE_RANK (most robust for Nth highest)
SELECT salary
FROM (
    SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
    FROM employees
) ranked
WHERE rank = 2;

Question 8: What is the difference between CHAR and VARCHAR?

Answer: CHAR(n) stores fixed-length strings. If the stored string is shorter than n, it is padded with spaces. VARCHAR(n) stores variable-length strings up to n characters without padding. Use CHAR when the length is always the same (like country codes). Use VARCHAR for variable-length data (like names or emails).

Question 9: Write a query to find duplicate records.

Answer:

-- Find duplicate emails
SELECT email, COUNT(*) AS occurrence_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;
 
-- Find and display all duplicate rows with details
SELECT c.*
FROM customers c
INNER JOIN (
    SELECT email
    FROM customers
    GROUP BY email
    HAVING COUNT(*) > 1
) dupes ON c.email = dupes.email
ORDER BY c.email;

Question 10: What is the difference between IN and EXISTS?

Answer: Both can check for the presence of values, but they work differently:

IN compares a value against a list of values returned by a subquery. It materializes the subquery result.
EXISTS checks whether a correlated subquery returns any rows. It stops as soon as it finds a match.

-- Using IN
SELECT name FROM employees
WHERE department_id IN (
    SELECT id FROM departments WHERE location = 'New York'
);
 
-- Using EXISTS (often faster for large datasets)
SELECT e.name FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d
    WHERE d.id = e.department_id AND d.location = 'New York'
);

EXISTS is generally more efficient when the outer table is small and the inner table is large. IN can be more efficient when the subquery result set is small.

JOINs and Relationships (Questions 11-20)

Question 11: Explain the different types of JOINs.

Answer:

INNER JOIN - Returns only rows that have matching values in both tables.
LEFT JOIN (LEFT OUTER JOIN) - Returns all rows from the left table and matching rows from the right table. Non-matching rows get NULL for right table columns.
RIGHT JOIN (RIGHT OUTER JOIN) - Returns all rows from the right table and matching rows from the left table.
FULL OUTER JOIN - Returns all rows from both tables, with NULLs where there is no match.
CROSS JOIN - Returns the Cartesian product of both tables (every row combined with every other row).
SELF JOIN - A table joined with itself, used for hierarchical or comparison queries.

-- INNER JOIN: employees with their department names
SELECT e.name, d.name AS department
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;
 
-- LEFT JOIN: all employees, including those without a department
SELECT e.name, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
 
-- FULL OUTER JOIN: all employees and all departments
SELECT e.name, d.name AS department
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.id;

Question 12: Write a query to find employees who do not belong to any department.

-- Using LEFT JOIN
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.id IS NULL;
 
-- Using NOT EXISTS
SELECT e.name
FROM employees e
WHERE NOT EXISTS (
    SELECT 1 FROM departments d WHERE d.id = e.department_id
);
 
-- Using NOT IN (be careful with NULLs)
SELECT e.name
FROM employees e
WHERE e.department_id NOT IN (
    SELECT id FROM departments WHERE id IS NOT NULL
);

Question 13: Write a self-join to find each employee's manager name.

SELECT
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Question 14: Find departments that have more than 5 employees.

SELECT d.name, COUNT(e.id) AS employee_count
FROM departments d
INNER JOIN employees e ON d.id = e.department_id
GROUP BY d.name
HAVING COUNT(e.id) > 5
ORDER BY employee_count DESC;

Question 15: Write a query to find customers who have never placed an order.

SELECT c.name, c.email
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;

Question 16: Find employees who earn more than their manager.

SELECT
    e.name AS employee,
    e.salary AS employee_salary,
    m.name AS manager,
    m.salary AS manager_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Question 17: What is a natural join and why should you avoid it?

Answer: A NATURAL JOIN automatically joins tables on columns with the same name. It should generally be avoided because it is implicit and fragile. Adding a column to either table can silently change the join behavior and produce incorrect results.

-- Natural join (avoid in production code)
SELECT * FROM employees NATURAL JOIN departments;
 
-- Explicit join (preferred)
SELECT * FROM employees e
INNER JOIN departments d ON e.department_id = d.id;

Question 18: Write a query to find products that have been ordered by customers in every city.

SELECT p.name
FROM products p
WHERE NOT EXISTS (
    SELECT DISTINCT c.city
    FROM customers c
    WHERE NOT EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.product_id = p.id
          AND o.customer_id = c.id
    )
);

This is a relational division query, one of the harder SQL patterns to write correctly.

Question 19: Explain the difference between ON and WHERE in a LEFT JOIN.

Answer: In a LEFT JOIN, conditions in the ON clause filter the right table before the join, while conditions in the WHERE clause filter the result after the join. This distinction does not matter for INNER JOIN but is critical for outer joins.

-- ON: keeps all employees, only joins active departments
SELECT e.name, d.name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
    AND d.location = 'New York';
 
-- WHERE: filters out employees without a New York department
SELECT e.name, d.name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.location = 'New York';

Question 20: Write a query to find the top 3 customers by total order value.

SELECT
    c.name,
    SUM(o.amount) AS total_spent
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.name
ORDER BY total_spent DESC
LIMIT 3;

Aggregation and Grouping (Questions 21-28)

Question 21: What aggregate functions does SQL support?

Answer: The standard aggregate functions are:

COUNT() - Counts the number of rows or non-NULL values
SUM() - Calculates the total of numeric values
AVG() - Calculates the average of numeric values
MIN() - Finds the minimum value
MAX() - Finds the maximum value

SELECT
    department_id,
    COUNT(*) AS total_employees,
    SUM(salary) AS total_salary,
    AVG(salary) AS avg_salary,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary
FROM employees
GROUP BY department_id;

Question 22: What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

-- COUNT(*): counts all rows, including NULLs
SELECT COUNT(*) FROM employees;
 
-- COUNT(column): counts non-NULL values in the column
SELECT COUNT(manager_id) FROM employees;
 
-- COUNT(DISTINCT column): counts unique non-NULL values
SELECT COUNT(DISTINCT department_id) FROM employees;

Question 23: Find the department with the highest average salary.

SELECT d.name, AVG(e.salary) AS avg_salary
FROM departments d
INNER JOIN employees e ON d.id = e.department_id
GROUP BY d.name
ORDER BY avg_salary DESC
LIMIT 1;

Question 24: Write a query to calculate a running total of orders by date.

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders
ORDER BY order_date;

Question 25: Find the percentage of total salary each department represents.

SELECT
    d.name AS department,
    SUM(e.salary) AS dept_salary,
    ROUND(
        SUM(e.salary) * 100.0 / (SELECT SUM(salary) FROM employees),
        2
    ) AS percentage
FROM departments d
INNER JOIN employees e ON d.id = e.department_id
GROUP BY d.name
ORDER BY percentage DESC;

Question 26: Find months where total order value exceeded 100,000.

SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(amount) AS monthly_total
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
HAVING SUM(amount) > 100000
ORDER BY month;

Question 27: Calculate year-over-year growth in order volume.

SELECT
    curr.year,
    curr.order_count,
    prev.order_count AS prev_year_count,
    ROUND(
        (curr.order_count - prev.order_count) * 100.0
        / NULLIF(prev.order_count, 0),
        2
    ) AS yoy_growth_pct
FROM (
    SELECT EXTRACT(YEAR FROM order_date) AS year, COUNT(*) AS order_count
    FROM orders
    GROUP BY EXTRACT(YEAR FROM order_date)
) curr
LEFT JOIN (
    SELECT EXTRACT(YEAR FROM order_date) AS year, COUNT(*) AS order_count
    FROM orders
    GROUP BY EXTRACT(YEAR FROM order_date)
) prev ON curr.year = prev.year + 1
ORDER BY curr.year;

Question 28: Find customers whose average order value is above the overall average.

SELECT
    c.name,
    AVG(o.amount) AS avg_order_value
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.name
HAVING AVG(o.amount) > (SELECT AVG(amount) FROM orders)
ORDER BY avg_order_value DESC;

Subqueries and Nested Queries (Questions 29-34)

Question 29: What are correlated vs non-correlated subqueries?

Answer: A non-correlated subquery runs independently of the outer query. It is executed once and its result is used by the outer query. A correlated subquery references columns from the outer query and is executed once for each row of the outer query.

-- Non-correlated: the subquery runs once
SELECT name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
-- Correlated: the subquery runs for each employee
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = e.department_id
);

Question 30: Find employees who earn more than the average salary of their department.

SELECT e.name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
)
ORDER BY e.department_id, e.salary DESC;

Question 31: Find the most recent order for each customer.

-- Using a correlated subquery
SELECT o.*
FROM orders o
WHERE o.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = o.customer_id
);
 
-- Using a JOIN with aggregation (often more efficient)
SELECT o.*
FROM orders o
INNER JOIN (
    SELECT customer_id, MAX(order_date) AS max_date
    FROM orders
    GROUP BY customer_id
) latest ON o.customer_id = latest.customer_id
    AND o.order_date = latest.max_date;

Question 32: Write a query using a subquery in the SELECT clause.

SELECT
    e.name,
    e.salary,
    (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) AS dept_avg,
    e.salary - (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) AS diff_from_avg
FROM employees e
ORDER BY diff_from_avg DESC;

Question 33: Find products that have never been ordered.

SELECT p.name
FROM products p
WHERE p.id NOT IN (
    SELECT DISTINCT product_id FROM orders WHERE product_id IS NOT NULL
);
 
-- More robust version using NOT EXISTS
SELECT p.name
FROM products p
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.product_id = p.id
);

Question 34: Find departments where all employees earn above 50,000.

SELECT d.name
FROM departments d
WHERE NOT EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department_id = d.id
      AND e.salary <= 50000
)
AND EXISTS (
    SELECT 1 FROM employees e WHERE e.department_id = d.id
);

Window Functions (Questions 35-42)

Window functions are one of the most important advanced SQL topics for interviews. They perform calculations across a set of rows that are related to the current row without collapsing the result set.

Question 35: Explain the difference between ROW_NUMBER, RANK, and DENSE_RANK.

SELECT
    name,
    department_id,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
    RANK() OVER (ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

ROW_NUMBER() - Assigns a unique sequential number to each row. No ties: 1, 2, 3, 4, 5.
RANK() - Assigns the same rank to ties and skips numbers: 1, 2, 2, 4, 5.
DENSE_RANK() - Assigns the same rank to ties without skipping: 1, 2, 2, 3, 4.

Question 36: Find the top 3 earners in each department.

SELECT name, department_id, salary
FROM (
    SELECT
        name,
        department_id,
        salary,
        DENSE_RANK() OVER (
            PARTITION BY department_id
            ORDER BY salary DESC
        ) AS rank
    FROM employees
) ranked
WHERE rank <= 3
ORDER BY department_id, salary DESC;

Question 37: Calculate a moving average of order amounts over the last 7 days.

SELECT
    order_date,
    amount,
    AVG(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7day
FROM orders
ORDER BY order_date;

Question 38: Find the difference between each employee's salary and the next highest salary in their department.

SELECT
    name,
    department_id,
    salary,
    LEAD(salary) OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
    ) AS next_lower_salary,
    salary - LEAD(salary) OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
    ) AS salary_gap
FROM employees
ORDER BY department_id, salary DESC;

Question 39: What is the difference between ROWS and RANGE in window frames?

Answer: ROWS defines the frame based on physical row positions. RANGE defines the frame based on value ranges. The difference matters when there are duplicate values in the ORDER BY column.

-- ROWS: exactly 2 preceding rows
SELECT
    order_date,
    amount,
    SUM(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS sum_rows
FROM orders;
 
-- RANGE: all rows with order_date within range of current date
SELECT
    order_date,
    amount,
    SUM(amount) OVER (
        ORDER BY order_date
        RANGE BETWEEN INTERVAL '2' DAY PRECEDING AND CURRENT ROW
    ) AS sum_range
FROM orders;

Question 40: Calculate the cumulative distribution of salaries.

SELECT
    name,
    salary,
    CUME_DIST() OVER (ORDER BY salary) AS cumulative_dist,
    PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank,
    NTILE(4) OVER (ORDER BY salary) AS salary_quartile
FROM employees
ORDER BY salary;

Question 41: Find consecutive days where orders were placed.

WITH daily_orders AS (
    SELECT DISTINCT order_date
    FROM orders
),
grouped AS (
    SELECT
        order_date,
        order_date - ROW_NUMBER() OVER (ORDER BY order_date) * INTERVAL '1 day' AS grp
    FROM daily_orders
)
SELECT
    MIN(order_date) AS streak_start,
    MAX(order_date) AS streak_end,
    COUNT(*) AS consecutive_days
FROM grouped
GROUP BY grp
HAVING COUNT(*) > 1
ORDER BY streak_start;

Question 42: Calculate month-over-month growth using LAG.

WITH monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(amount) AS revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
    ROUND(
        (revenue - LAG(revenue) OVER (ORDER BY month))
        * 100.0 / NULLIF(LAG(revenue) OVER (ORDER BY month), 0),
        2
    ) AS mom_growth_pct
FROM monthly_revenue
ORDER BY month;

Common Table Expressions (Questions 43-46)

Question 43: What is a CTE and when should you use one?

Answer: A Common Table Expression (CTE) is a temporary named result set defined within a WITH clause. It exists only during the execution of the query. CTEs improve readability, allow you to reference the same subquery multiple times, and enable recursive queries.

WITH high_earners AS (
    SELECT name, salary, department_id
    FROM employees
    WHERE salary > 100000
)
SELECT d.name AS department, COUNT(*) AS high_earner_count
FROM high_earners h
INNER JOIN departments d ON h.department_id = d.id
GROUP BY d.name
ORDER BY high_earner_count DESC;

Question 44: Write a recursive CTE to display an organizational hierarchy.

WITH RECURSIVE org_chart AS (
    -- Base case: top-level managers (no manager)
    SELECT
        id,
        name,
        manager_id,
        1 AS level,
        name AS path
    FROM employees
    WHERE manager_id IS NULL
 
    UNION ALL
 
    -- Recursive case: employees with managers
    SELECT
        e.id,
        e.name,
        e.manager_id,
        oc.level + 1,
        oc.path || ' > ' || e.name
    FROM employees e
    INNER JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT name, level, path
FROM org_chart
ORDER BY path;

Question 45: Use multiple CTEs to build a complex report.

WITH dept_stats AS (
    SELECT
        department_id,
        COUNT(*) AS emp_count,
        AVG(salary) AS avg_salary,
        MAX(salary) AS max_salary
    FROM employees
    GROUP BY department_id
),
dept_orders AS (
    SELECT
        e.department_id,
        COUNT(o.id) AS order_count,
        SUM(o.amount) AS total_revenue
    FROM employees e
    INNER JOIN orders o ON e.id = o.customer_id
    GROUP BY e.department_id
)
SELECT
    d.name AS department,
    ds.emp_count,
    ROUND(ds.avg_salary, 2) AS avg_salary,
    COALESCE(do2.order_count, 0) AS orders_handled,
    COALESCE(do2.total_revenue, 0) AS revenue_generated
FROM departments d
INNER JOIN dept_stats ds ON d.id = ds.department_id
LEFT JOIN dept_orders do2 ON d.id = do2.department_id
ORDER BY revenue_generated DESC;

Question 46: What is the difference between a CTE and a subquery?

Answer: Functionally, CTEs and subqueries can often achieve the same result. The key differences are:

Readability. CTEs are named and defined at the top of the query, making complex queries easier to read and maintain.
Reusability. A CTE can be referenced multiple times in the same query. A subquery must be duplicated if used in multiple places.
Recursion. Only CTEs support recursive queries. Subqueries cannot be recursive.
Performance. In most databases, the optimizer treats CTEs and subqueries similarly. However, some databases (like PostgreSQL before version 12) materialize CTEs, which can impact performance.

Query Optimization and Indexing (Questions 47-50)

Question 47: What is an index and how does it improve query performance?

Answer: An index is a data structure (typically a B-tree) that allows the database to find rows quickly without scanning the entire table. It works like a book's index: instead of reading every page to find a topic, you look up the topic in the index and go directly to the relevant page.

-- Create an index on the email column
CREATE INDEX idx_customers_email ON customers(email);
 
-- Composite index on multiple columns
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, order_date);
 
-- Unique index
CREATE UNIQUE INDEX idx_customers_email_unique
ON customers(email);

When indexes help:

WHERE clauses that filter on indexed columns
JOIN conditions on indexed columns
ORDER BY on indexed columns
GROUP BY on indexed columns

When indexes hurt:

INSERT, UPDATE, and DELETE operations become slower because the index must be maintained
Indexes consume additional storage space
Too many indexes can confuse the query optimizer

Question 48: Explain the EXPLAIN command and how to read a query plan.

Answer: EXPLAIN shows the execution plan that the database will use to run a query. It reveals whether indexes are being used, what join algorithms are chosen, and where the bottlenecks are.

EXPLAIN ANALYZE
SELECT e.name, d.name AS department
FROM employees e
INNER JOIN departments d ON e.department_id = d.id
WHERE e.salary > 80000;

Key things to look for in a query plan:

Seq Scan (Sequential Scan) - The database is reading the entire table row by row. This is slow for large tables and usually indicates a missing index.
Index Scan - The database is using an index to find relevant rows. This is efficient.
Hash Join / Merge Join / Nested Loop - The join algorithm being used. Hash joins are good for large tables. Nested loops are good when one table is small.
Sort - An explicit sort operation, which can be expensive for large result sets.
Actual rows vs estimated rows - A large discrepancy indicates that statistics are outdated and the optimizer is making suboptimal decisions.

Question 49: What are common causes of slow SQL queries and how do you fix them?

Answer:

Problem 1: Missing indexes.

-- Slow: full table scan
SELECT * FROM orders WHERE customer_id = 42;
 
-- Fix: add an index
CREATE INDEX idx_orders_customer ON orders(customer_id);

Problem 2: SELECT * instead of selecting specific columns.

-- Slow: retrieves all columns
SELECT * FROM orders WHERE status = 'completed';
 
-- Better: only retrieve needed columns
SELECT id, amount, order_date FROM orders WHERE status = 'completed';

Problem 3: Using functions on indexed columns in WHERE clauses.

-- Slow: the function prevents index usage
SELECT * FROM orders WHERE YEAR(order_date) = 2026;
 
-- Better: use a range condition
SELECT * FROM orders
WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01';

Problem 4: N+1 query problem.

Instead of making N separate queries inside a loop, use a single JOIN or IN clause to fetch all related data at once.

Problem 5: Not using LIMIT for large result sets.

-- Potentially slow: returns millions of rows
SELECT * FROM orders ORDER BY order_date DESC;
 
-- Better: limit the results
SELECT * FROM orders ORDER BY order_date DESC LIMIT 100;

Question 50: What is query optimization and what techniques should you know?

Answer: Query optimization is the process of rewriting queries to execute faster. Key techniques include:

Use EXISTS instead of IN for large subqueries:

-- Potentially slow for large subquery results
SELECT name FROM employees
WHERE department_id IN (SELECT id FROM departments WHERE location = 'NYC');
 
-- Often faster
SELECT name FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d
    WHERE d.id = e.department_id AND d.location = 'NYC'
);

Avoid DISTINCT when possible by rewriting with EXISTS or using proper JOINs that do not produce duplicates.

Use appropriate data types. Joining on integers is faster than joining on strings. Store dates as DATE type, not as strings.

Partition large tables to divide them into smaller, manageable segments based on a key like date or region.

Use covering indexes that include all columns needed by a query, eliminating the need to access the base table:

-- This query can be answered entirely from the index
CREATE INDEX idx_orders_covering
ON orders(customer_id, order_date, amount);
 
SELECT customer_id, order_date, amount
FROM orders
WHERE customer_id = 42;

Tips for SQL Interviews

Practice writing SQL by hand. Many interviews use a whiteboard or a simple text editor without auto-complete or syntax highlighting. Practice writing correct SQL without IDE assistance.
Understand the execution order. Knowing that WHERE runs before SELECT helps you avoid common mistakes like referencing aliases in WHERE clauses.
Always consider NULLs. Many SQL bugs stem from unexpected NULL behavior. When using NOT IN with a subquery, ensure the subquery cannot return NULLs, or use NOT EXISTS instead.
Think about edge cases. What happens when a table is empty? What if there are ties? What if there are NULL values in JOIN columns?
Explain your approach. Talk through your query design before writing it. State what tables you need, what joins are required, and what aggregations you will use.
Start simple and build up. Write the simplest version of the query first, verify it works, then add complexity (aggregations, window functions, subqueries).
Use Phantom Code for SQL practice. Phantom Code supports SQL interview preparation with AI-powered feedback, helping you identify errors in your queries and learn optimization techniques through interactive practice.
Know your window functions. Window functions are the most commonly tested advanced SQL topic. Be comfortable with ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and running aggregates.

FAQ

How important is SQL for software engineering interviews?

SQL is tested in interviews at most technology companies, especially for backend, full-stack, and data-related roles. Even frontend-focused roles sometimes include SQL questions. Companies like Meta, Amazon, Google, and Microsoft all include SQL components in their interview processes.

Should I learn PostgreSQL or MySQL?

For interview purposes, the differences between PostgreSQL and MySQL rarely matter. Standard SQL syntax covers 95% of interview questions. Learn standard SQL first, and then familiarize yourself with the specific syntax of the database your target company uses.

How many SQL questions should I practice?

Practicing 50 to 80 SQL problems across all difficulty levels provides strong coverage. Focus on JOINs, aggregation, subqueries, and window functions, as these are the most frequently tested topics.

Do I need to know about database internals?

For software engineering roles, you should understand indexes, basic query optimization, and execution plans at a conceptual level. Deep knowledge of database internals (write-ahead logs, buffer pools, lock managers) is typically required only for database engineering or infrastructure roles.

Are window functions commonly asked in interviews?

Yes, window functions have become one of the most popular advanced SQL interview topics. ROW_NUMBER, RANK, LAG, LEAD, and running aggregates appear frequently. Practice these until you can write them confidently without reference material.

Conclusion

SQL is a foundational skill that directly impacts your effectiveness as a software engineer and your performance in interviews. The 50 questions in this guide cover the full spectrum of topics you are likely to encounter, from basic SELECT statements to advanced window functions and query optimization.

Practice these questions systematically, starting with the foundational concepts and building up to the advanced topics. Use Phantom Code to get AI-assisted feedback on your SQL queries and accelerate your preparation. Pay special attention to JOINs, window functions, and CTEs, as these are the areas where interviewers tend to focus their most challenging questions.

The best SQL preparation combines writing queries by hand, understanding why certain approaches are more efficient than others, and being able to explain your reasoning clearly. With consistent practice and the strategies in this guide, you will be well-prepared to handle any SQL question that comes your way.

Frequently Asked Questions

What is the difference between WHERE and HAVING in SQL?: WHERE filters rows before grouping and aggregation, while HAVING filters groups after aggregation has been applied. You cannot use aggregate functions in a WHERE clause but you can in HAVING. For example, `WHERE salary > 80000` filters individual employees before grouping, but `HAVING AVG(salary) > 80000` filters departments based on their aggregated average salary.
What is the difference between ROW_NUMBER, RANK, and DENSE_RANK?: ROW_NUMBER assigns a unique sequential number to every row with no ties (1, 2, 3, 4, 5). RANK assigns the same rank to ties and skips numbers (1, 2, 2, 4, 5). DENSE_RANK assigns the same rank to ties without skipping (1, 2, 2, 3, 4). For an Nth-highest-salary problem, DENSE_RANK is the most robust choice because it handles ties correctly.
When should I use a CTE versus a subquery in SQL?: CTEs improve readability for complex queries because they are named and defined at the top, can be referenced multiple times in the same query, and support recursion (subqueries cannot recurse). Most modern optimizers treat CTEs and subqueries similarly, though some databases like older PostgreSQL versions materialize CTEs. Reach for a CTE when the same intermediate result is used more than once or when the query is hard to follow inline.
How do I find the second highest salary in SQL?: Three common approaches. Using LIMIT/OFFSET: `SELECT DISTINCT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 1`. Using a correlated subquery: `SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees)`. Most robust for the Nth highest is DENSE_RANK in a window function: `SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rk FROM employees) ranked WHERE rk = 2`.
What are the most common causes of slow SQL queries?: Missing indexes on WHERE/JOIN/ORDER BY columns; SELECT * instead of specific columns; using functions on indexed columns in WHERE clauses (e.g., `YEAR(order_date) = 2026` instead of a range); the N+1 query problem where you make N separate queries inside a loop instead of one JOIN; and missing LIMIT on large result sets. Use EXPLAIN ANALYZE to find Seq Scans where Index Scans should be.

By PhantomCode TeamPublished March 11, 2026Last reviewed April 29, 202613 min read

50 Essential SQL Interview Questions for Software Engineers (2026)

SQL Interview Questions

Foundational SQL Concepts (Questions 1-10)
JOINs and Relationships (Questions 11-20)
Aggregation and Grouping (Questions 21-28)
Subqueries and Nested Queries (Questions 29-34)
Window Functions (Questions 35-42)
Common Table Expressions (Questions 43-46)
Query Optimization and Indexing (Questions 47-50)
Tips for SQL Interviews
FAQ
Conclusion

Sample Database Schema

For the examples in this guide, we will use the following tables:

-- Employees table
CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    salary DECIMAL(10, 2),
    hire_date DATE,
    manager_id INT
);
 
-- Departments table
CREATE TABLE departments (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    location VARCHAR(100)
);
 
-- Orders table
CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    amount DECIMAL(10, 2),
    order_date DATE,
    status VARCHAR(20)
);
 
-- Products table
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10, 2)
);
 
-- Customers table
CREATE TABLE customers (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    city VARCHAR(100),
    signup_date DATE
);

Foundational SQL Concepts (Questions 1-10)

Question 1: What is the difference between WHERE and HAVING?

-- WHERE: filter individual rows
SELECT name, salary
FROM employees
WHERE salary > 80000;
 
-- HAVING: filter after aggregation
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 80000;

Question 2: What is the difference between DELETE, TRUNCATE, and DROP?

Answer:

DELETE removes specific rows from a table. It can include a WHERE clause and is logged per row (can be rolled back). It does not reset auto-increment counters.
TRUNCATE removes all rows from a table. It is faster than DELETE because it is minimally logged. It resets auto-increment counters. It cannot include a WHERE clause.
DROP removes the entire table, including its structure, data, indexes, and constraints.

DELETE FROM employees WHERE department_id = 5;
TRUNCATE TABLE employees;
DROP TABLE employees;

Question 3: What are the different types of SQL constraints?

Answer: Constraints enforce rules on data in a table:

PRIMARY KEY - Uniquely identifies each row. Cannot be NULL.
FOREIGN KEY - References the primary key of another table. Enforces referential integrity.
UNIQUE - Ensures all values in a column are distinct. Allows one NULL (varies by database).
NOT NULL - Prevents NULL values in a column.
CHECK - Validates that values meet a specified condition.
DEFAULT - Provides a default value when none is specified.

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE,
    salary DECIMAL(10, 2) CHECK (salary > 0),
    department_id INT,
    status VARCHAR(20) DEFAULT 'active',
    FOREIGN KEY (department_id) REFERENCES departments(id)
);

Question 4: What is the difference between UNION and UNION ALL?

-- UNION: removes duplicates
SELECT city FROM customers
UNION
SELECT location FROM departments;
 
-- UNION ALL: keeps duplicates
SELECT city FROM customers
UNION ALL
SELECT location FROM departments;

Use UNION ALL when you know there are no duplicates or when duplicates are acceptable. It performs significantly better on large datasets.

Question 5: Explain NULL handling in SQL.

-- Correct way to check for NULL
SELECT name FROM employees WHERE manager_id IS NULL;
 
-- This will NOT work as expected
SELECT name FROM employees WHERE manager_id = NULL;
 
-- COALESCE returns the first non-NULL value
SELECT name, COALESCE(manager_id, 0) AS manager
FROM employees;
 
-- NULLIF returns NULL if two expressions are equal
SELECT NULLIF(salary, 0) AS salary FROM employees;

Question 6: What is the order of execution of a SQL query?

Answer: SQL queries execute in a specific order that differs from the written order:

FROM and JOIN - Determine the data sources
WHERE - Filter individual rows
GROUP BY - Group rows by specified columns
HAVING - Filter groups
SELECT - Choose which columns to return
DISTINCT - Remove duplicate rows
ORDER BY - Sort the result set
LIMIT / OFFSET - Restrict the number of rows returned

Understanding this order explains why you cannot use a column alias defined in SELECT inside a WHERE clause (WHERE executes before SELECT).

Question 7: Write a query to find the second highest salary.

Answer:

-- Using LIMIT and OFFSET
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
 
-- Using a subquery
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
 
-- Using DENSE_RANK (most robust for Nth highest)
SELECT salary
FROM (
    SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
    FROM employees
) ranked
WHERE rank = 2;

Question 8: What is the difference between CHAR and VARCHAR?

Question 9: Write a query to find duplicate records.

Answer:

-- Find duplicate emails
SELECT email, COUNT(*) AS occurrence_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;
 
-- Find and display all duplicate rows with details
SELECT c.*
FROM customers c
INNER JOIN (
    SELECT email
    FROM customers
    GROUP BY email
    HAVING COUNT(*) > 1
) dupes ON c.email = dupes.email
ORDER BY c.email;

Question 10: What is the difference between IN and EXISTS?

Answer: Both can check for the presence of values, but they work differently:

IN compares a value against a list of values returned by a subquery. It materializes the subquery result.
EXISTS checks whether a correlated subquery returns any rows. It stops as soon as it finds a match.

-- Using IN
SELECT name FROM employees
WHERE department_id IN (
    SELECT id FROM departments WHERE location = 'New York'
);
 
-- Using EXISTS (often faster for large datasets)
SELECT e.name FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d
    WHERE d.id = e.department_id AND d.location = 'New York'
);

EXISTS is generally more efficient when the outer table is small and the inner table is large. IN can be more efficient when the subquery result set is small.

JOINs and Relationships (Questions 11-20)

Question 11: Explain the different types of JOINs.

Answer:

INNER JOIN - Returns only rows that have matching values in both tables.
LEFT JOIN (LEFT OUTER JOIN) - Returns all rows from the left table and matching rows from the right table. Non-matching rows get NULL for right table columns.
RIGHT JOIN (RIGHT OUTER JOIN) - Returns all rows from the right table and matching rows from the left table.
FULL OUTER JOIN - Returns all rows from both tables, with NULLs where there is no match.
CROSS JOIN - Returns the Cartesian product of both tables (every row combined with every other row).
SELF JOIN - A table joined with itself, used for hierarchical or comparison queries.

-- INNER JOIN: employees with their department names
SELECT e.name, d.name AS department
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;
 
-- LEFT JOIN: all employees, including those without a department
SELECT e.name, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
 
-- FULL OUTER JOIN: all employees and all departments
SELECT e.name, d.name AS department
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.id;

Question 12: Write a query to find employees who do not belong to any department.

-- Using LEFT JOIN
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.id IS NULL;
 
-- Using NOT EXISTS
SELECT e.name
FROM employees e
WHERE NOT EXISTS (
    SELECT 1 FROM departments d WHERE d.id = e.department_id
);
 
-- Using NOT IN (be careful with NULLs)
SELECT e.name
FROM employees e
WHERE e.department_id NOT IN (
    SELECT id FROM departments WHERE id IS NOT NULL
);

Question 13: Write a self-join to find each employee's manager name.

SELECT
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Question 14: Find departments that have more than 5 employees.

SELECT d.name, COUNT(e.id) AS employee_count
FROM departments d
INNER JOIN employees e ON d.id = e.department_id
GROUP BY d.name
HAVING COUNT(e.id) > 5
ORDER BY employee_count DESC;

Question 15: Write a query to find customers who have never placed an order.

SELECT c.name, c.email
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;

Question 16: Find employees who earn more than their manager.

SELECT
    e.name AS employee,
    e.salary AS employee_salary,
    m.name AS manager,
    m.salary AS manager_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Question 17: What is a natural join and why should you avoid it?

-- Natural join (avoid in production code)
SELECT * FROM employees NATURAL JOIN departments;
 
-- Explicit join (preferred)
SELECT * FROM employees e
INNER JOIN departments d ON e.department_id = d.id;

Question 18: Write a query to find products that have been ordered by customers in every city.

SELECT p.name
FROM products p
WHERE NOT EXISTS (
    SELECT DISTINCT c.city
    FROM customers c
    WHERE NOT EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.product_id = p.id
          AND o.customer_id = c.id
    )
);

This is a relational division query, one of the harder SQL patterns to write correctly.

Question 19: Explain the difference between ON and WHERE in a LEFT JOIN.

-- ON: keeps all employees, only joins active departments
SELECT e.name, d.name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
    AND d.location = 'New York';
 
-- WHERE: filters out employees without a New York department
SELECT e.name, d.name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.location = 'New York';

Question 20: Write a query to find the top 3 customers by total order value.

SELECT
    c.name,
    SUM(o.amount) AS total_spent
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.name
ORDER BY total_spent DESC
LIMIT 3;

Aggregation and Grouping (Questions 21-28)

Question 21: What aggregate functions does SQL support?

Answer: The standard aggregate functions are:

COUNT() - Counts the number of rows or non-NULL values
SUM() - Calculates the total of numeric values
AVG() - Calculates the average of numeric values
MIN() - Finds the minimum value
MAX() - Finds the maximum value

SELECT
    department_id,
    COUNT(*) AS total_employees,
    SUM(salary) AS total_salary,
    AVG(salary) AS avg_salary,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary
FROM employees
GROUP BY department_id;

Question 22: What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

-- COUNT(*): counts all rows, including NULLs
SELECT COUNT(*) FROM employees;
 
-- COUNT(column): counts non-NULL values in the column
SELECT COUNT(manager_id) FROM employees;
 
-- COUNT(DISTINCT column): counts unique non-NULL values
SELECT COUNT(DISTINCT department_id) FROM employees;

Question 23: Find the department with the highest average salary.

SELECT d.name, AVG(e.salary) AS avg_salary
FROM departments d
INNER JOIN employees e ON d.id = e.department_id
GROUP BY d.name
ORDER BY avg_salary DESC
LIMIT 1;

Question 24: Write a query to calculate a running total of orders by date.

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders
ORDER BY order_date;

Question 25: Find the percentage of total salary each department represents.

SELECT
    d.name AS department,
    SUM(e.salary) AS dept_salary,
    ROUND(
        SUM(e.salary) * 100.0 / (SELECT SUM(salary) FROM employees),
        2
    ) AS percentage
FROM departments d
INNER JOIN employees e ON d.id = e.department_id
GROUP BY d.name
ORDER BY percentage DESC;

Question 26: Find months where total order value exceeded 100,000.

SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(amount) AS monthly_total
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
HAVING SUM(amount) > 100000
ORDER BY month;

Question 27: Calculate year-over-year growth in order volume.

SELECT
    curr.year,
    curr.order_count,
    prev.order_count AS prev_year_count,
    ROUND(
        (curr.order_count - prev.order_count) * 100.0
        / NULLIF(prev.order_count, 0),
        2
    ) AS yoy_growth_pct
FROM (
    SELECT EXTRACT(YEAR FROM order_date) AS year, COUNT(*) AS order_count
    FROM orders
    GROUP BY EXTRACT(YEAR FROM order_date)
) curr
LEFT JOIN (
    SELECT EXTRACT(YEAR FROM order_date) AS year, COUNT(*) AS order_count
    FROM orders
    GROUP BY EXTRACT(YEAR FROM order_date)
) prev ON curr.year = prev.year + 1
ORDER BY curr.year;

Question 28: Find customers whose average order value is above the overall average.

SELECT
    c.name,
    AVG(o.amount) AS avg_order_value
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.name
HAVING AVG(o.amount) > (SELECT AVG(amount) FROM orders)
ORDER BY avg_order_value DESC;

Subqueries and Nested Queries (Questions 29-34)

Question 29: What are correlated vs non-correlated subqueries?

-- Non-correlated: the subquery runs once
SELECT name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
-- Correlated: the subquery runs for each employee
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = e.department_id
);

Question 30: Find employees who earn more than the average salary of their department.

SELECT e.name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
)
ORDER BY e.department_id, e.salary DESC;

Question 31: Find the most recent order for each customer.

-- Using a correlated subquery
SELECT o.*
FROM orders o
WHERE o.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = o.customer_id
);
 
-- Using a JOIN with aggregation (often more efficient)
SELECT o.*
FROM orders o
INNER JOIN (
    SELECT customer_id, MAX(order_date) AS max_date
    FROM orders
    GROUP BY customer_id
) latest ON o.customer_id = latest.customer_id
    AND o.order_date = latest.max_date;

Question 32: Write a query using a subquery in the SELECT clause.

SELECT
    e.name,
    e.salary,
    (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) AS dept_avg,
    e.salary - (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) AS diff_from_avg
FROM employees e
ORDER BY diff_from_avg DESC;

Question 33: Find products that have never been ordered.

SELECT p.name
FROM products p
WHERE p.id NOT IN (
    SELECT DISTINCT product_id FROM orders WHERE product_id IS NOT NULL
);
 
-- More robust version using NOT EXISTS
SELECT p.name
FROM products p
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.product_id = p.id
);

Question 34: Find departments where all employees earn above 50,000.

SELECT d.name
FROM departments d
WHERE NOT EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department_id = d.id
      AND e.salary <= 50000
)
AND EXISTS (
    SELECT 1 FROM employees e WHERE e.department_id = d.id
);

Window Functions (Questions 35-42)

Question 35: Explain the difference between ROW_NUMBER, RANK, and DENSE_RANK.

SELECT
    name,
    department_id,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
    RANK() OVER (ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

ROW_NUMBER() - Assigns a unique sequential number to each row. No ties: 1, 2, 3, 4, 5.
RANK() - Assigns the same rank to ties and skips numbers: 1, 2, 2, 4, 5.
DENSE_RANK() - Assigns the same rank to ties without skipping: 1, 2, 2, 3, 4.

Question 36: Find the top 3 earners in each department.

SELECT name, department_id, salary
FROM (
    SELECT
        name,
        department_id,
        salary,
        DENSE_RANK() OVER (
            PARTITION BY department_id
            ORDER BY salary DESC
        ) AS rank
    FROM employees
) ranked
WHERE rank <= 3
ORDER BY department_id, salary DESC;

Question 37: Calculate a moving average of order amounts over the last 7 days.

SELECT
    order_date,
    amount,
    AVG(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7day
FROM orders
ORDER BY order_date;

Question 38: Find the difference between each employee's salary and the next highest salary in their department.

SELECT
    name,
    department_id,
    salary,
    LEAD(salary) OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
    ) AS next_lower_salary,
    salary - LEAD(salary) OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
    ) AS salary_gap
FROM employees
ORDER BY department_id, salary DESC;

Question 39: What is the difference between ROWS and RANGE in window frames?

Answer: ROWS defines the frame based on physical row positions. RANGE defines the frame based on value ranges. The difference matters when there are duplicate values in the ORDER BY column.

-- ROWS: exactly 2 preceding rows
SELECT
    order_date,
    amount,
    SUM(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS sum_rows
FROM orders;
 
-- RANGE: all rows with order_date within range of current date
SELECT
    order_date,
    amount,
    SUM(amount) OVER (
        ORDER BY order_date
        RANGE BETWEEN INTERVAL '2' DAY PRECEDING AND CURRENT ROW
    ) AS sum_range
FROM orders;

Question 40: Calculate the cumulative distribution of salaries.

SELECT
    name,
    salary,
    CUME_DIST() OVER (ORDER BY salary) AS cumulative_dist,
    PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank,
    NTILE(4) OVER (ORDER BY salary) AS salary_quartile
FROM employees
ORDER BY salary;

Question 41: Find consecutive days where orders were placed.

WITH daily_orders AS (
    SELECT DISTINCT order_date
    FROM orders
),
grouped AS (
    SELECT
        order_date,
        order_date - ROW_NUMBER() OVER (ORDER BY order_date) * INTERVAL '1 day' AS grp
    FROM daily_orders
)
SELECT
    MIN(order_date) AS streak_start,
    MAX(order_date) AS streak_end,
    COUNT(*) AS consecutive_days
FROM grouped
GROUP BY grp
HAVING COUNT(*) > 1
ORDER BY streak_start;

Question 42: Calculate month-over-month growth using LAG.

WITH monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(amount) AS revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
    ROUND(
        (revenue - LAG(revenue) OVER (ORDER BY month))
        * 100.0 / NULLIF(LAG(revenue) OVER (ORDER BY month), 0),
        2
    ) AS mom_growth_pct
FROM monthly_revenue
ORDER BY month;

Common Table Expressions (Questions 43-46)

Question 43: What is a CTE and when should you use one?

WITH high_earners AS (
    SELECT name, salary, department_id
    FROM employees
    WHERE salary > 100000
)
SELECT d.name AS department, COUNT(*) AS high_earner_count
FROM high_earners h
INNER JOIN departments d ON h.department_id = d.id
GROUP BY d.name
ORDER BY high_earner_count DESC;

Question 44: Write a recursive CTE to display an organizational hierarchy.

WITH RECURSIVE org_chart AS (
    -- Base case: top-level managers (no manager)
    SELECT
        id,
        name,
        manager_id,
        1 AS level,
        name AS path
    FROM employees
    WHERE manager_id IS NULL
 
    UNION ALL
 
    -- Recursive case: employees with managers
    SELECT
        e.id,
        e.name,
        e.manager_id,
        oc.level + 1,
        oc.path || ' > ' || e.name
    FROM employees e
    INNER JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT name, level, path
FROM org_chart
ORDER BY path;

Question 45: Use multiple CTEs to build a complex report.

WITH dept_stats AS (
    SELECT
        department_id,
        COUNT(*) AS emp_count,
        AVG(salary) AS avg_salary,
        MAX(salary) AS max_salary
    FROM employees
    GROUP BY department_id
),
dept_orders AS (
    SELECT
        e.department_id,
        COUNT(o.id) AS order_count,
        SUM(o.amount) AS total_revenue
    FROM employees e
    INNER JOIN orders o ON e.id = o.customer_id
    GROUP BY e.department_id
)
SELECT
    d.name AS department,
    ds.emp_count,
    ROUND(ds.avg_salary, 2) AS avg_salary,
    COALESCE(do2.order_count, 0) AS orders_handled,
    COALESCE(do2.total_revenue, 0) AS revenue_generated
FROM departments d
INNER JOIN dept_stats ds ON d.id = ds.department_id
LEFT JOIN dept_orders do2 ON d.id = do2.department_id
ORDER BY revenue_generated DESC;

Question 46: What is the difference between a CTE and a subquery?

Answer: Functionally, CTEs and subqueries can often achieve the same result. The key differences are:

Readability. CTEs are named and defined at the top of the query, making complex queries easier to read and maintain.
Reusability. A CTE can be referenced multiple times in the same query. A subquery must be duplicated if used in multiple places.
Recursion. Only CTEs support recursive queries. Subqueries cannot be recursive.
Performance. In most databases, the optimizer treats CTEs and subqueries similarly. However, some databases (like PostgreSQL before version 12) materialize CTEs, which can impact performance.

Query Optimization and Indexing (Questions 47-50)

Question 47: What is an index and how does it improve query performance?

-- Create an index on the email column
CREATE INDEX idx_customers_email ON customers(email);
 
-- Composite index on multiple columns
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, order_date);
 
-- Unique index
CREATE UNIQUE INDEX idx_customers_email_unique
ON customers(email);

When indexes help:

WHERE clauses that filter on indexed columns
JOIN conditions on indexed columns
ORDER BY on indexed columns
GROUP BY on indexed columns

When indexes hurt:

INSERT, UPDATE, and DELETE operations become slower because the index must be maintained
Indexes consume additional storage space
Too many indexes can confuse the query optimizer

Question 48: Explain the EXPLAIN command and how to read a query plan.

Answer: EXPLAIN shows the execution plan that the database will use to run a query. It reveals whether indexes are being used, what join algorithms are chosen, and where the bottlenecks are.

EXPLAIN ANALYZE
SELECT e.name, d.name AS department
FROM employees e
INNER JOIN departments d ON e.department_id = d.id
WHERE e.salary > 80000;

Key things to look for in a query plan:

Seq Scan (Sequential Scan) - The database is reading the entire table row by row. This is slow for large tables and usually indicates a missing index.
Index Scan - The database is using an index to find relevant rows. This is efficient.
Hash Join / Merge Join / Nested Loop - The join algorithm being used. Hash joins are good for large tables. Nested loops are good when one table is small.
Sort - An explicit sort operation, which can be expensive for large result sets.
Actual rows vs estimated rows - A large discrepancy indicates that statistics are outdated and the optimizer is making suboptimal decisions.

Question 49: What are common causes of slow SQL queries and how do you fix them?

Answer:

Problem 1: Missing indexes.

-- Slow: full table scan
SELECT * FROM orders WHERE customer_id = 42;
 
-- Fix: add an index
CREATE INDEX idx_orders_customer ON orders(customer_id);

Problem 2: SELECT * instead of selecting specific columns.

-- Slow: retrieves all columns
SELECT * FROM orders WHERE status = 'completed';
 
-- Better: only retrieve needed columns
SELECT id, amount, order_date FROM orders WHERE status = 'completed';

Problem 3: Using functions on indexed columns in WHERE clauses.

-- Slow: the function prevents index usage
SELECT * FROM orders WHERE YEAR(order_date) = 2026;
 
-- Better: use a range condition
SELECT * FROM orders
WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01';

Problem 4: N+1 query problem.

Instead of making N separate queries inside a loop, use a single JOIN or IN clause to fetch all related data at once.

Problem 5: Not using LIMIT for large result sets.

-- Potentially slow: returns millions of rows
SELECT * FROM orders ORDER BY order_date DESC;
 
-- Better: limit the results
SELECT * FROM orders ORDER BY order_date DESC LIMIT 100;

Question 50: What is query optimization and what techniques should you know?

Answer: Query optimization is the process of rewriting queries to execute faster. Key techniques include:

Use EXISTS instead of IN for large subqueries:

-- Potentially slow for large subquery results
SELECT name FROM employees
WHERE department_id IN (SELECT id FROM departments WHERE location = 'NYC');
 
-- Often faster
SELECT name FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d
    WHERE d.id = e.department_id AND d.location = 'NYC'
);

Avoid DISTINCT when possible by rewriting with EXISTS or using proper JOINs that do not produce duplicates.

Use appropriate data types. Joining on integers is faster than joining on strings. Store dates as DATE type, not as strings.

Partition large tables to divide them into smaller, manageable segments based on a key like date or region.

Use covering indexes that include all columns needed by a query, eliminating the need to access the base table:

-- This query can be answered entirely from the index
CREATE INDEX idx_orders_covering
ON orders(customer_id, order_date, amount);
 
SELECT customer_id, order_date, amount
FROM orders
WHERE customer_id = 42;

Tips for SQL Interviews

Practice writing SQL by hand. Many interviews use a whiteboard or a simple text editor without auto-complete or syntax highlighting. Practice writing correct SQL without IDE assistance.
Understand the execution order. Knowing that WHERE runs before SELECT helps you avoid common mistakes like referencing aliases in WHERE clauses.
Always consider NULLs. Many SQL bugs stem from unexpected NULL behavior. When using NOT IN with a subquery, ensure the subquery cannot return NULLs, or use NOT EXISTS instead.
Think about edge cases. What happens when a table is empty? What if there are ties? What if there are NULL values in JOIN columns?
Explain your approach. Talk through your query design before writing it. State what tables you need, what joins are required, and what aggregations you will use.
Start simple and build up. Write the simplest version of the query first, verify it works, then add complexity (aggregations, window functions, subqueries).
Use Phantom Code for SQL practice. Phantom Code supports SQL interview preparation with AI-powered feedback, helping you identify errors in your queries and learn optimization techniques through interactive practice.
Know your window functions. Window functions are the most commonly tested advanced SQL topic. Be comfortable with ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and running aggregates.

FAQ

How important is SQL for software engineering interviews?

Should I learn PostgreSQL or MySQL?

How many SQL questions should I practice?

Practicing 50 to 80 SQL problems across all difficulty levels provides strong coverage. Focus on JOINs, aggregation, subqueries, and window functions, as these are the most frequently tested topics.

Do I need to know about database internals?

Are window functions commonly asked in interviews?

Conclusion

Frequently Asked Questions

What is the difference between WHERE and HAVING in SQL?: WHERE filters rows before grouping and aggregation, while HAVING filters groups after aggregation has been applied. You cannot use aggregate functions in a WHERE clause but you can in HAVING. For example, `WHERE salary > 80000` filters individual employees before grouping, but `HAVING AVG(salary) > 80000` filters departments based on their aggregated average salary.
What is the difference between ROW_NUMBER, RANK, and DENSE_RANK?: ROW_NUMBER assigns a unique sequential number to every row with no ties (1, 2, 3, 4, 5). RANK assigns the same rank to ties and skips numbers (1, 2, 2, 4, 5). DENSE_RANK assigns the same rank to ties without skipping (1, 2, 2, 3, 4). For an Nth-highest-salary problem, DENSE_RANK is the most robust choice because it handles ties correctly.
When should I use a CTE versus a subquery in SQL?: CTEs improve readability for complex queries because they are named and defined at the top, can be referenced multiple times in the same query, and support recursion (subqueries cannot recurse). Most modern optimizers treat CTEs and subqueries similarly, though some databases like older PostgreSQL versions materialize CTEs. Reach for a CTE when the same intermediate result is used more than once or when the query is hard to follow inline.
How do I find the second highest salary in SQL?: Three common approaches. Using LIMIT/OFFSET: `SELECT DISTINCT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 1`. Using a correlated subquery: `SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees)`. Most robust for the Nth highest is DENSE_RANK in a window function: `SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rk FROM employees) ranked WHERE rk = 2`.
What are the most common causes of slow SQL queries?: Missing indexes on WHERE/JOIN/ORDER BY columns; SELECT * instead of specific columns; using functions on indexed columns in WHERE clauses (e.g., `YEAR(order_date) = 2026` instead of a range); the N+1 query problem where you make N separate queries inside a loop instead of one JOIN; and missing LIMIT on large result sets. Use EXPLAIN ANALYZE to find Seq Scans where Index Scans should be.

50 Essential SQL Interview Questions for Software Engineers (2026)

Table of Contents

Sample Database Schema

Foundational SQL Concepts (Questions 1-10)

Question 1: What is the difference between WHERE and HAVING?

Question 2: What is the difference between DELETE, TRUNCATE, and DROP?

Question 3: What are the different types of SQL constraints?

Question 4: What is the difference between UNION and UNION ALL?

Question 5: Explain NULL handling in SQL.

Question 6: What is the order of execution of a SQL query?

Question 7: Write a query to find the second highest salary.

Question 8: What is the difference between CHAR and VARCHAR?

Question 9: Write a query to find duplicate records.

Question 10: What is the difference between IN and EXISTS?

JOINs and Relationships (Questions 11-20)

Question 11: Explain the different types of JOINs.

Question 12: Write a query to find employees who do not belong to any department.

Question 13: Write a self-join to find each employee's manager name.

Question 14: Find departments that have more than 5 employees.

Question 15: Write a query to find customers who have never placed an order.

Question 16: Find employees who earn more than their manager.

Question 17: What is a natural join and why should you avoid it?

Question 18: Write a query to find products that have been ordered by customers in every city.

Question 19: Explain the difference between ON and WHERE in a LEFT JOIN.

Question 20: Write a query to find the top 3 customers by total order value.

Aggregation and Grouping (Questions 21-28)

Question 21: What aggregate functions does SQL support?

Question 22: What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

Question 23: Find the department with the highest average salary.

Question 24: Write a query to calculate a running total of orders by date.

Question 25: Find the percentage of total salary each department represents.

Question 26: Find months where total order value exceeded 100,000.

Question 27: Calculate year-over-year growth in order volume.

Question 28: Find customers whose average order value is above the overall average.

Subqueries and Nested Queries (Questions 29-34)

Question 29: What are correlated vs non-correlated subqueries?

Question 30: Find employees who earn more than the average salary of their department.

Question 31: Find the most recent order for each customer.

Question 32: Write a query using a subquery in the SELECT clause.

Question 33: Find products that have never been ordered.

Question 34: Find departments where all employees earn above 50,000.

Window Functions (Questions 35-42)

Question 35: Explain the difference between ROW_NUMBER, RANK, and DENSE_RANK.

Question 36: Find the top 3 earners in each department.

Question 37: Calculate a moving average of order amounts over the last 7 days.

Question 38: Find the difference between each employee's salary and the next highest salary in their department.

Question 39: What is the difference between ROWS and RANGE in window frames?

Question 40: Calculate the cumulative distribution of salaries.

Question 41: Find consecutive days where orders were placed.

Question 42: Calculate month-over-month growth using LAG.

Common Table Expressions (Questions 43-46)

Question 43: What is a CTE and when should you use one?

Question 44: Write a recursive CTE to display an organizational hierarchy.

Question 45: Use multiple CTEs to build a complex report.

Question 46: What is the difference between a CTE and a subquery?

Query Optimization and Indexing (Questions 47-50)

Question 47: What is an index and how does it improve query performance?

Question 48: Explain the EXPLAIN command and how to read a query plan.

Question 49: What are common causes of slow SQL queries and how do you fix them?

Question 50: What is query optimization and what techniques should you know?

Tips for SQL Interviews

FAQ

How important is SQL for software engineering interviews?

Should I learn PostgreSQL or MySQL?

How many SQL questions should I practice?

Do I need to know about database internals?

Are window functions commonly asked in interviews?

Conclusion

Frequently Asked Questions

Ready to Ace Your Next Interview?

Related Articles

10 Things Great Candidates Do Differently in Technical Interviews

From 5 Rejections to a Google Offer: One Engineer's Story

Advanced SQL Interview Questions for Senior Engineers (2026)

50 Essential SQL Interview Questions for Software Engineers (2026)

Table of Contents

Sample Database Schema

Foundational SQL Concepts (Questions 1-10)

Question 1: What is the difference between WHERE and HAVING?

Question 2: What is the difference between DELETE, TRUNCATE, and DROP?