Concurrency Interview Questions for Software Engineers: A Deep Technical Guide
Concurrency is the single topic where senior candidates most often reveal themselves. Anyone can memorize a synchronized-block incantation, but only experienced engineers can explain why that incantation works on x86 but fails on ARM, or why a textbook double-checked locking pattern was broken for years in Java. This guide walks through the questions interviewers actually ask when they want to distinguish the engineers who understand concurrency from those who copy patterns.
The goal here is not to make you memorize answers. It is to give you mental models you can defend under pressure. Interviewers will push on edges: what if this runs on a weaker memory model, what if the thread pool is exhausted, what if a GC pause lands in the middle of your critical section. The strongest candidates reason from first principles rather than invoking magic keywords.
Table of Contents
- Threads vs Processes: The Fundamental Distinction
- Locks, Mutexes, and Semaphores
- Race Conditions and Atomicity
- Deadlock, Livelock, and Starvation
- The Memory Model and Happens-Before
- Lock-Free and Wait-Free Data Structures
- Async Patterns: Promises, Goroutines, Actors
- Thread Pools and Back-Pressure
- Common Mistakes Candidates Make
- FAQ
- Conclusion
Threads vs Processes: The Fundamental Distinction
Sample question: "Explain the difference between a thread and a process at the kernel level, and when you would pick one over the other."
A process owns its own virtual address space, file descriptor table, and signal handlers. A thread shares all of that with its siblings inside the same process. That is the entire distinction from the kernel's point of view on Linux, where the clone() syscall creates what we call a thread by sharing the CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND flags. A fork without these flags gives you a process.
The practical implications follow from that definition. Threads communicate through shared memory, which is fast but dangerous. Processes communicate through pipes, sockets, or shared memory segments that must be explicitly mapped. If one thread corrupts the heap, every thread in the process is doomed. If one process segfaults, its siblings keep running.
// Linux: creating a thread via pthread, which wraps clone()
#include <pthread.h>
void* worker(void* arg) {
int id = *(int*)arg;
printf("worker %d running\n", id);
return NULL;
}
int main() {
pthread_t t;
int id = 1;
pthread_create(&t, NULL, worker, &id);
pthread_join(t, NULL);
return 0;
}Pick processes when isolation matters more than communication cost: sandboxing untrusted code, running plugins, or exploiting multi-core with languages that have a global interpreter lock (CPython, MRI Ruby). Pick threads when you need low-latency shared state and you can afford the discipline that shared mutability demands.
A follow-up interviewers love: how expensive is a context switch? The direct cost is saving and restoring registers and updating the kernel's scheduler state, maybe a microsecond. The hidden cost is cache and TLB pollution. On a modern CPU that hidden cost can dwarf the direct cost by an order of magnitude, which is why thread pools sized equal to core count outperform pools with hundreds of threads on CPU-bound work.
Locks, Mutexes, and Semaphores
Sample question: "Walk me through the difference between a mutex, a semaphore, and a condition variable, and give me a scenario where each is the right tool."
A mutex enforces mutual exclusion: one holder at a time. It has an owner, which means only the thread that locked it can unlock it. That ownership lets the kernel implement priority inheritance to prevent priority inversion.
A semaphore is a counter with atomic increment and decrement that blocks when it would go negative. It has no owner, so any thread can signal it. That makes semaphores perfect for bounded resources like a connection pool with N slots, or for signaling between producer and consumer.
A condition variable lets a thread wait for a predicate to become true without spinning. It is always used with a mutex because the predicate check must be atomic with the wait.
// Classic bounded buffer with mutex + condition variables
class BoundedBuffer<T> {
private final Object[] buf;
private int head = 0, tail = 0, size = 0;
private final ReentrantLock lock = new ReentrantLock();
private final Condition notFull = lock.newCondition();
private final Condition notEmpty = lock.newCondition();
BoundedBuffer(int capacity) { buf = new Object[capacity]; }
public void put(T item) throws InterruptedException {
lock.lock();
try {
while (size == buf.length) notFull.await();
buf[tail] = item;
tail = (tail + 1) % buf.length;
size++;
notEmpty.signal();
} finally { lock.unlock(); }
}
@SuppressWarnings("unchecked")
public T take() throws InterruptedException {
lock.lock();
try {
while (size == 0) notEmpty.await();
T item = (T) buf[head];
head = (head + 1) % buf.length;
size--;
notFull.signal();
return item;
} finally { lock.unlock(); }
}
}Notice the while loop around await(). If you write if instead, you have a bug. Spurious wakeups are permitted by the specification, and even without spurious wakeups another thread can drain the buffer between your wake-up and your re-acquisition of the lock. Always re-check the predicate.
Race Conditions and Atomicity
Sample question: "Is counter++ atomic? Walk me through what actually happens at the CPU level."
counter++ is three operations: load, increment, store. Between the load and the store, another thread can load the same stale value, increment it, and store it back, and your increment is lost. The fix is either a lock or a hardware atomic instruction like LOCK XADD on x86.
// Buggy: two goroutines increment the same counter
var counter int64
var wg sync.WaitGroup
for i := 0; i < 2; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := 0; j < 1_000_000; j++ {
counter++ // RACE
}
}()
}
wg.Wait()
// counter is almost certainly less than 2_000_000
// Fixed with atomic
var counterAtomic atomic.Int64
// ...
counterAtomic.Add(1) // compiles to LOCK XADD on x86Atomicity is not the only property you need. You also need visibility: the guarantee that a write from one thread becomes visible to another. On x86 the strong TSO memory model hides a lot of visibility issues, but on ARM or RISC-V you will ship a bug to production if you confuse atomicity with visibility. That is where the memory model discussion below becomes non-negotiable.
A good interviewer will probe whether you know the difference between a data race (undefined behavior in C++ and Java) and a race condition (a higher-level correctness bug). Every data race is a race condition, but not every race condition is a data race. Check-then-act bugs like "if file does not exist, create it" are race conditions even if every individual operation is atomic.
Deadlock, Livelock, and Starvation
Sample question: "Give me the four Coffman conditions for deadlock, and tell me how you would detect a deadlock in a running production system."
The Coffman conditions are mutual exclusion, hold-and-wait, no preemption, and circular wait. Break any one and deadlock becomes impossible. The most common technique is to break circular wait by imposing a total order on locks: always acquire them in the same order.
# Breaks circular wait by ordering locks by id
def transfer(a, b, amount):
first, second = (a, b) if id(a) < id(b) else (b, a)
with first.lock:
with second.lock:
a.balance -= amount
b.balance += amountFor detection in production, thread dumps are the classic tool: jstack on the JVM, py-spy dump for Python, go tool pprof for Go. The JVM even detects and logs classic deadlocks for you. For more subtle cases, tools like ThreadSanitizer (C++ and Go) find data races at runtime, and Helgrind does the same for pthreads.
Livelock is subtler: threads are making state changes but not forward progress. Classic example is two threads politely yielding to each other. Starvation means a thread never gets the resource because others keep winning. Fair locks (with queuing) prevent starvation at the cost of throughput.
The Memory Model and Happens-Before
Sample question: "What does volatile mean in Java, and why is double-checked locking broken without it?"
A memory model is a contract between the language and the hardware about what reorderings are permitted. Without one, the compiler and CPU can reorder reads and writes so aggressively that perfectly reasonable-looking code becomes incorrect.
The Java Memory Model defines the happens-before relation. If action A happens-before action B, then B sees the effects of A. Key happens-before edges include:
- Each action in a thread happens-before every subsequent action in that same thread (program order).
- An unlock on a monitor happens-before every subsequent lock on the same monitor.
- A write to a
volatilefield happens-before every subsequent read of that field. - The start of a thread happens-before any action in the started thread.
- All actions in a thread happen-before any other thread successfully returns from
join()on it.
Double-checked locking was broken pre-Java-5 because without volatile, the write that publishes the constructed object could be reordered so another thread sees a non-null reference to a partially-constructed object.
class Singleton {
private static volatile Singleton instance; // volatile is mandatory
public static Singleton getInstance() {
Singleton local = instance;
if (local == null) {
synchronized (Singleton.class) {
local = instance;
if (local == null) {
instance = local = new Singleton();
}
}
}
return local;
}
}The local variable is not decorative. It turns two volatile reads in the happy path into one, which matters if this method is hot.
C++ has a similar story with std::memory_order. The defaults (sequentially consistent) are safe but slow. Acquire-release is almost always what you actually want for hand-off patterns. Relaxed is for counters and statistics where ordering does not matter.
Lock-Free and Wait-Free Data Structures
Sample question: "Implement a lock-free stack using compare-and-swap. What is the ABA problem and how would you fix it?"
Lock-free means at least one thread makes progress at any time. Wait-free means every thread makes progress in a bounded number of steps. Wait-free is strictly stronger and much harder to achieve.
// Lock-free Treiber stack
#include <atomic>
template <typename T>
class LockFreeStack {
struct Node {
T value;
Node* next;
};
std::atomic<Node*> head{nullptr};
public:
void push(T value) {
Node* n = new Node{std::move(value), nullptr};
n->next = head.load(std::memory_order_relaxed);
while (!head.compare_exchange_weak(
n->next, n,
std::memory_order_release,
std::memory_order_relaxed)) {
// n->next was updated by compare_exchange_weak on failure
}
}
bool pop(T& out) {
Node* old = head.load(std::memory_order_acquire);
while (old && !head.compare_exchange_weak(
old, old->next,
std::memory_order_acquire,
std::memory_order_acquire)) {}
if (!old) return false;
out = std::move(old->value);
delete old; // ABA hazard lives here
return true;
}
};The ABA problem: thread 1 reads head = A, gets preempted. Thread 2 pops A, pushes B, pops B, pushes A back. Thread 1 resumes, CAS succeeds (head is still A), but the linked list underneath has changed. Fixes include tagged pointers (pack a version counter into unused pointer bits), hazard pointers, and epoch-based reclamation. On the JVM, the garbage collector eliminates most ABA risk for free because A will not be reclaimed while any thread still references it.
Async Patterns: Promises, Goroutines, Actors
Sample question: "Compare Node's Promise model, Go's goroutines, and Erlang's actors. When would you pick each?"
Promises are a continuation-passing style with structured error propagation. Node.js uses a single-threaded event loop, so CPU-bound work blocks everything. Goroutines are preemptively scheduled user-space threads (since Go 1.14 with asynchronous preemption) multiplexed onto OS threads by the runtime. Actors are isolated units with private state that communicate only through messages, the model used by Erlang, Akka, and Elixir.
// JavaScript: chained Promises
async function fetchUser(id: string) {
const resp = await fetch(`/users/${id}`)
if (!resp.ok) throw new Error(`HTTP ${resp.status}`)
return resp.json()
}// Go: fan-out, fan-in
func fanOut(urls []string) []Result {
ch := make(chan Result, len(urls))
for _, u := range urls {
go func(u string) { ch <- fetch(u) }(u)
}
results := make([]Result, 0, len(urls))
for range urls {
results = append(results, <-ch)
}
return results
}%% Erlang: an actor
loop(State) ->
receive
{get, From} -> From ! {ok, State}, loop(State);
{set, NewVal} -> loop(NewVal);
stop -> ok
end.Pick Promises when you have I/O-bound work and a single-threaded runtime. Pick goroutines when you want cheap concurrency with shared memory and channels as the primary synchronization. Pick actors when isolation and fault tolerance matter more than raw throughput, or when your domain is naturally conversational.
Thread Pools and Back-Pressure
Sample question: "Design a thread pool from scratch. How do you handle the case where the work queue fills up?"
A thread pool has three parts: a bounded queue, a fixed set of worker threads, and a rejection policy for when the queue is full. The rejection policy is the interesting part because it determines how your system behaves under overload.
ExecutorService pool = new ThreadPoolExecutor(
8, 8, // core and max threads
0, TimeUnit.MILLISECONDS,
new ArrayBlockingQueue<>(1000), // bounded queue
new ThreadPoolExecutor.CallerRunsPolicy() // back-pressure
);CallerRunsPolicy is an elegant form of back-pressure. When the queue fills, the caller executes the task itself, which slows the producer and gives the pool time to catch up. The alternative, AbortPolicy, throws and pushes the decision to the caller. Never use an unbounded queue in production: it turns memory exhaustion into your load shedder, which is the worst possible failure mode.
A subtle point: if your tasks can block on each other (task A submits task B and waits for it), a fixed-size pool will deadlock. This is why you should never use the same pool for dependent work.
Common Mistakes Candidates Make
Mistake one: treating synchronized as a magic wand. Synchronizing every method is not thread safety; it often makes things slower without fixing the actual invariants. Thread safety is about which state transitions are atomic, not about which methods are serialized.
Mistake two: forgetting that volatile gives visibility but not atomicity. volatile int counter; counter++; is still racy.
Mistake three: catching InterruptedException and swallowing it. Interruption is a cooperative cancellation signal. Swallowing it makes your thread uncancellable. Either re-throw or call Thread.currentThread().interrupt() to restore the flag.
Mistake four: assuming that because code works on your laptop it will work on the server. x86 has a strong memory model. Your laptop probably is x86. Your ARM server is not. Test on the target architecture or rely on the language's memory model guarantees.
Mistake five: building locking into domain objects. Concurrency should be a concern of the boundary, not of the entity. A BankAccount that locks itself cannot be composed into a multi-account transfer without deadlock risk.
FAQ
How do I prepare for concurrency rounds without months of study?
Pick one primitive (say, the Java ReentrantLock) and implement it from memory with AbstractQueuedSynchronizer. Then implement a bounded blocking queue on top of it. Those two exercises force you to confront every concept in this guide.
Do I need to know the exact memory barriers on x86?
You need to know that x86 is Total Store Order, which means stores are not reordered with stores, loads are not reordered with loads, but a store followed by a load can be reordered. That one fact explains most of why Peterson's algorithm needs an mfence in its critical section.
When is it okay to use synchronized versus java.util.concurrent?
synchronized is fine for uncontended or lightly-contended critical sections. java.util.concurrent primitives shine when you need try-lock, timeout, interruptible lock acquisition, or read-write separation. Benchmarks rarely tell you synchronized is the bottleneck.
What is the relationship between concurrency and parallelism?
Concurrency is structuring a program as independent tasks. Parallelism is running them simultaneously. You can have concurrency without parallelism (a single-core machine runs goroutines concurrently by interleaving them). You rarely have parallelism without concurrency.
Conclusion
Concurrency interviews reward engineers who can reason from memory models and invariants rather than memorizing keywords. The practical mental model is simple: every shared piece of state has a set of legal transitions, and your job is to ensure no thread observes an illegal intermediate state. Locks, atomics, channels, and actors are just different tools for enforcing that property.
Before your interview, write one lock-free queue, one thread pool, and one actor. The act of implementing them end-to-end is worth a hundred multiple-choice questions. When the interviewer pushes, you will have the muscle memory to push back with concrete reasoning instead of handwaving.
If you can explain why double-checked locking needs volatile, why fork-join pools can deadlock on dependent tasks, and why ABA is about memory reclamation rather than about CAS itself, you will be in the top ten percent of candidates. Good luck.