Skip to main content
Home  /  Knowledge Hub  /  Interview Questions

Interview Questions& Model Answers

Real questions. Real answers. Built from 20 years of actual hiring and being hired.

54
Total Questions
3
Technologies
3
Levels
✕ Clear filters

Showing 9 questions · Intermediate · Python

Clear all filters
PY-INT-001 What is a list comprehension and when should you NOT use one?
Python Core Python Intermediate
4/10
Answer

A list comprehension is a concise way to create lists using a single line expression. Avoid them when the logic is complex enough that a regular loop is more readable.

Deep Explanation

List comprehensions follow the syntax [expression for item in iterable if condition]. They are faster than equivalent for loops because they are optimized at the C level in CPython. However they are not always the right choice. Avoid them when: the logic requires multiple nested conditions you need to handle exceptions inside the loop the comprehension spans more than two lines when formatted or you are consuming a large dataset where a generator expression would be more memory-efficient. Nested list comprehensions (list comprehensions inside list comprehensions) are almost always a readability mistake.

Real-World Example

In a data processing pipeline: [user.email for user in users if user.is_active and user.verified] is clean and appropriate. But building a matrix transformation with three nested comprehensions is a maintainability trap — a regular loop with clear variable names is better for the next developer.

⚠ Common Mistakes

Nesting comprehensions three levels deep making code unreadable. Using list comprehensions when you actually need a generator (you are iterating once over a large dataset). Adding side effects inside comprehensions (modifying external state) which is a major anti-pattern.

🏭 Production Scenario

A memory crash in a production data export service was traced to a list comprehension processing 2 million records at once loading everything into memory. Replacing it with a generator expression fixed the memory issue without changing any other code.

Follow-up Questions
What is the difference between a list comprehension and a generator expression? How do dict comprehensions and set comprehensions work? What is the performance difference between a comprehension and a map() call??
ID: PY-INT-001  ·  Difficulty: 4/10  ·  Level: Intermediate
PY-INT-005 How do you handle large files in Python without loading them entirely into memory?
Python Core Python Intermediate
4/10
Answer

Use generators file iteration (files are iterators in Python) or chunk-based reading. Never use read() or readlines() on large files — they load the entire file into memory.

Deep Explanation

Python file objects are iterators — you can iterate over them line by line without loading the entire file. For binary files or files where line iteration is not appropriate use file.read(chunk_size) to read fixed-size chunks in a loop. For CSV files use csv.DictReader (which iterates lazily) or pandas with chunksize parameter (pd.read_csv('file.csv' chunksize=10000) returns an iterator of DataFrames). For JSON use ijson for streaming JSON parsing. The with statement ensures the file is properly closed. For very large files (100GB+) memory-mapped files (mmap module) allow treating file content as if it were in memory while the OS handles paging.

Real-World Example

A log analysis system needed to process 50GB daily log files to extract error counts. Using open(file).read() caused OOM crashes. Refactoring to iterate line by line (for line in file) reduced memory usage from 50GB to under 10MB while processing the same file.

⚠ Common Mistakes

Using file.readlines() which builds a complete list of all lines in memory. Using pd.read_csv() without chunksize on multi-GB files. Not closing files (always use with statement). Forgetting to handle encoding explicitly — defaulting to system encoding causes silent corruption on non-ASCII data.

🏭 Production Scenario

A production data pipeline at a logistics company was crashing nightly when processing a 30GB shipment data CSV. The fix used pandas chunked reading: processing 50000 rows at a time aggregating results and writing summaries — reducing peak memory from 45GB (crashing the server) to 2GB.

Follow-up Questions
What is the mmap module and when would you use it? How does ijson enable streaming JSON parsing? How do you process a large file in parallel in Python??
ID: PY-INT-005  ·  Difficulty: 4/10  ·  Level: Intermediate
PY-INT-006 How does pytest work and what makes a good unit test in Python?
Python Core Python Intermediate
4/10
Answer

pytest discovers and runs test functions automatically providing rich assertion introspection fixtures for dependency injection and parametrize for data-driven tests. A good unit test is fast isolated deterministic and tests one specific behavior.

Deep Explanation

pytest looks for files named test_*.py functions named test_* and classes named Test*. When an assert fails pytest shows you exactly what the actual and expected values were — no need for assertEqual(). Fixtures (@pytest.fixture) provide setup/teardown and dependency injection for tests — database connections temporary files mock objects. Parametrize (@pytest.mark.parametrize) runs the same test with multiple input/output combinations eliminating test duplication. Mocking with unittest.mock.patch replaces real dependencies with controlled fakes making tests fast and isolated. Good unit tests: test one behavior run in milliseconds do not hit databases/networks/file systems (mock these) are deterministic (same result every run) and fail with clear messages.

Real-World Example

A FastAPI endpoint test: the test uses a pytest fixture providing a TestClient (mock HTTP client) patches the database dependency with an in-memory mock uses parametrize to test valid/invalid/edge case inputs and has clear test names like test_create_user_returns_201_for_valid_input. Each test runs in under 5ms with no external dependencies.

⚠ Common Mistakes

Writing tests that test implementation details instead of behavior — tests should not break when you refactor internals. Not mocking external dependencies making tests slow and flaky. Using a single large test function that tests multiple behaviors (impossible to tell which behavior failed). Asserting too broadly (assert response is not None) or too narrowly (asserting on exact internal state).

🏭 Production Scenario

A Django e-commerce platform's test suite took 45 minutes to run because 800 tests were hitting the actual test database. Refactoring to use pytest fixtures with database mocking and factory_boy for test data generation reduced the suite to 3 minutes enabling CI to run on every commit.

Follow-up Questions
What is the difference between mocking and stubbing? How do you test async functions with pytest? What is property-based testing with Hypothesis??
ID: PY-INT-006  ·  Difficulty: 4/10  ·  Level: Intermediate
PY-INT-002 How do decorators work in Python and what is the functools.wraps issue?
Python Core Python Intermediate
5/10
Answer

A decorator is a function that wraps another function to add behavior. Without functools.wraps the wrapper loses the original function's metadata like __name__ and __doc__.

Deep Explanation

Decorators work by taking a function as input and returning a new function that adds behavior before or after the original call. The syntax @decorator is syntactic sugar for function = decorator(function). The core problem is that the returned wrapper function has its own identity — its __name__ is 'wrapper' not the original function's name. This breaks logging debugging and documentation tools. functools.wraps(original_func) applied to the wrapper copies the original function's metadata to the wrapper. This is especially critical in Flask and FastAPI where the routing system uses function names to identify view functions — without wraps all decorated routes have the same name and only one will be registered.

Real-World Example

In a Flask application a custom authentication decorator without functools.wraps caused all protected routes to map to the same endpoint name 'wrapper' making url_for() return wrong URLs and breaking the entire navigation system. Adding @functools.wraps(f) to the inner wrapper function fixed it immediately.

⚠ Common Mistakes

Forgetting @functools.wraps on the inner wrapper function. Decorators that do not preserve the function signature breaking tools that inspect function parameters. Applying decorators in the wrong order when stacking multiple decorators.

🏭 Production Scenario

A production Flask API broke its authentication after a refactor added a logging decorator without functools.wraps. The route registration system saw multiple routes all named 'wrapper' and silently dropped all but one making several API endpoints return 404 despite the code being correct.

Follow-up Questions
How do class-based decorators work? How do you write a decorator that accepts its own arguments? How does decorator stacking (applying multiple decorators) work in Python??
ID: PY-INT-002  ·  Difficulty: 5/10  ·  Level: Intermediate
PY-INT-004 How do context managers work and how do you create a custom one?
Python Core Python Intermediate
5/10
Answer

Context managers use __enter__ and __exit__ methods to manage setup and teardown of resources. The 'with' statement calls these automatically ensuring cleanup even if an exception occurs.

Deep Explanation

When you use 'with open(file) as f' Python calls f.__enter__() to set up and f.__exit__() to clean up. You can create custom context managers two ways: implement __enter__ and __exit__ in a class or use the @contextmanager decorator from contextlib with a generator function that yields once. The __exit__ method receives exception information and can suppress exceptions by returning True. Context managers are the Pythonic way to handle any resource that needs guaranteed cleanup: database connections locks temporary directories timers and transaction management.

Real-World Example

A database transaction context manager in a Django-like ORM: __enter__ begins the transaction __exit__ commits if no exception occurred or rolls back if one did. This pattern ensures no transaction is ever left open regardless of what happens inside the with block.

⚠ Common Mistakes

Not handling exceptions in __exit__ letting them propagate when they should be caught. Creating context managers with @contextmanager and forgetting to wrap the yield in try-finally skipping cleanup on exceptions. Using try-finally everywhere instead of the cleaner with statement.

🏭 Production Scenario

A production PostgreSQL service had intermittent connection failures traced to database transactions being left open. The root cause was exception handling that bypassed the connection cleanup code. Refactoring to use a context manager with proper __exit__ eliminated the issue permanently.

Follow-up Questions
What is the contextlib module? How do nested context managers work? What is contextlib.ExitStack used for??
ID: PY-INT-004  ·  Difficulty: 5/10  ·  Level: Intermediate
PY-DS-001 What is the difference between pandas DataFrame.apply() and vectorized operations?
Python Data Science Intermediate
5/10
Answer

Vectorized operations (using NumPy/pandas built-ins) operate on entire arrays at once in optimized C code. apply() calls a Python function row by row or column by column in pure Python. Vectorized operations are 10-1000x faster; use apply() only when no vectorized alternative exists.

Deep Explanation

pandas is built on NumPy which stores data in contiguous memory arrays and performs operations in optimized C/FORTRAN code without Python overhead. When you write df['price'] * 1.1 NumPy multiplies the entire array in C. When you write df.apply(lambda x: x['price'] * 1.1 axis=1) Python calls a function for every single row — potentially millions of function calls with Python overhead each time. The performance gap is enormous: for a 1M row DataFrame vectorized operations might take 10ms while apply() takes 10-30 seconds. Use apply() only for: operations that cannot be expressed vectorially complex multi-column operations with conditional logic or when applying a function that expects a Series object.

Real-World Example

A daily sales report generation for a retail chain was taking 45 minutes to run on a 5M-row transaction DataFrame. Profiling revealed three apply() calls doing price calculations that could be rewritten as vectorized operations. Replacing them reduced runtime to 90 seconds — a 30x speedup with no algorithmic change.

⚠ Common Mistakes

Using apply() for simple arithmetic that pandas/NumPy can do natively. Using apply(axis=1) to iterate rows for anything that can be done with vectorized conditionals (use np.where instead). Not knowing about str accessor methods (df['col'].str.contains()) which provide vectorized string operations avoiding apply() entirely.

🏭 Production Scenario

A pandas ETL pipeline at a financial data company was processing end-of-day data and regularly missing the 6 AM business deadline. Profiling showed apply() calls for currency conversion and date parsing were the bottleneck. Replacing with vectorized arithmetic and pd.to_datetime() reduced the pipeline from 4 hours to 18 minutes.

Follow-up Questions
What is the difference between apply() and applymap()? How does numpy.vectorize() differ from true vectorization? When should you use Polars instead of pandas??
ID: PY-DS-001  ·  Difficulty: 5/10  ·  Level: Intermediate
PY-INT-008 How do Python dictionaries work internally and what is their time complexity?
Python Core Python Intermediate
5/10
Answer

Python dictionaries are hash tables. Lookup insertion and deletion are O(1) average case. Hash collisions can degrade this to O(n) worst case but Python's implementation makes this extremely rare. Python 3.7+ guarantees insertion-order preservation.

Deep Explanation

Dictionaries store key-value pairs in a hash table. When you set d[key] = value Python computes hash(key) maps it to a bucket and stores the value. When you access d[key] Python recomputes the hash and looks up the bucket directly — O(1). Hash collisions (two different keys mapping to the same bucket) are resolved via open addressing in CPython. Python 3.6 introduced a compact dictionary representation that stores insertion order as a side effect. Python 3.7 made insertion order preservation official. Only hashable objects can be dictionary keys (immutable types: strings integers tuples — but not lists or other dicts). dict.get(key default) avoids KeyError for missing keys. collections.defaultdict automatically creates default values. collections.Counter counts hashable objects.

Real-World Example

In a word frequency counter processing millions of log lines dict-based counting with Counter outperforms sorting-based approaches by orders of magnitude — O(n) with hash table vs O(n log n) for sort-then-count. In a URL routing system a dict of {path: handler} enables O(1) route lookup regardless of how many routes exist.

⚠ Common Mistakes

Using a list to check membership (if item in list is O(n) — use a set or dict instead). Modifying a dictionary while iterating over it (raises RuntimeError — iterate over list(d.items()) instead). Using mutable objects as dictionary keys (unhashable type TypeError). Not using setdefault() or defaultdict() and writing verbose if-key-in-dict patterns instead.

🏭 Production Scenario

A production request deduplication service was checking if a request ID had been seen using a list (if request_id in seen_list). At 10000 requests per second the O(n) membership check was consuming 60% of CPU time. Replacing with a set (O(1) lookup) reduced CPU usage to 2% with identical functionality.

Follow-up Questions
How does Python set differ from dict internally? What is the difference between dict and OrderedDict after Python 3.7? What is dict comprehension and when should you use defaultdict instead??
ID: PY-INT-008  ·  Difficulty: 5/10  ·  Level: Intermediate
PY-INT-003 What is the GIL in Python and how does it affect multithreading?
Python Performance Intermediate
6/10
Answer

The Global Interpreter Lock (GIL) is a mutex that prevents multiple native threads from executing Python bytecode simultaneously. It makes Python threads unsuitable for CPU-bound parallelism.

Deep Explanation

CPython (the standard Python implementation) uses reference counting for memory management. The GIL protects this reference counting from race conditions by ensuring only one thread executes Python code at a time. This means Python threads do NOT run in true parallel for CPU-bound tasks — they take turns. However the GIL is released during I/O operations (file reads network calls database queries) so threading IS effective for I/O-bound tasks. For true CPU parallelism use the multiprocessing module which spawns separate processes each with their own GIL or use libraries like NumPy that release the GIL in their C extensions.

Real-World Example

A web scraper using threading to fetch 100 URLs runs significantly faster with threads because most time is spent waiting for network I/O (GIL released). The same approach for parsing and processing 100 large JSON files (CPU-bound) would see no speedup from threading — multiprocessing or concurrent.futures ProcessPoolExecutor should be used instead.

⚠ Common Mistakes

Using threading for CPU-intensive tasks and being confused when there is no performance improvement. Assuming multiprocessing will always be better — it has high overhead for process spawning and IPC. Not considering asyncio for I/O-bound tasks which is more efficient than threading for high-concurrency scenarios.

🏭 Production Scenario

A production image processing service used Python threading expecting parallel image resizing. Performance was identical to single-threaded execution. The fix was switching to multiprocessing.Pool which reduced processing time by 75% on an 8-core server by actually utilizing all cores.

Follow-up Questions
What is the difference between threading multiprocessing and asyncio? When does Python release the GIL? Does Jython or PyPy have a GIL??
ID: PY-INT-003  ·  Difficulty: 6/10  ·  Level: Intermediate
PY-INT-007 How do you build a REST API with FastAPI and what makes it production-ready?
Python Data Science Intermediate
6/10
Answer

FastAPI uses Python type hints to automatically generate API validation serialization and OpenAPI documentation. Production-ready additions include async database access dependency injection for auth middleware for logging/CORS rate limiting and health check endpoints.

Deep Explanation

FastAPI is built on Starlette (ASGI framework) and Pydantic (data validation). You define endpoints as async functions with type-annotated parameters — FastAPI automatically validates inputs returns 422 for invalid data and generates Swagger UI documentation. Pydantic models define request/response schemas with validation. Dependency injection (Depends()) handles shared logic: database sessions authentication rate limiting. For production: use async ORMs (SQLAlchemy async Tortoise ORM) add middleware (CORS request logging timing) implement proper error handling with custom exception handlers add health check endpoints for load balancer probes use environment-based configuration (pydantic-settings) and containerize with uvicorn behind nginx.

Real-World Example

A production API for a fintech app: Pydantic models validate all financial amounts (positive correct decimal places) JWT authentication is injected via Depends() into protected routes a PostgreSQL database is accessed via async SQLAlchemy Prometheus middleware exports metrics and a /health endpoint returns database connectivity status for the load balancer.

⚠ Common Mistakes

Using synchronous database drivers with async FastAPI (blocks the event loop destroying performance). Not validating response models (can leak internal data). Forgetting to handle the database connection lifecycle — connections not closed properly exhaust the pool. Not implementing proper HTTP status codes — returning 200 for errors.

🏭 Production Scenario

A FastAPI service handling 500 req/s was experiencing periodic slowdowns. Investigation revealed synchronous calls to a third-party API inside async route handlers were blocking the event loop during each slow response. Replacing with httpx (async HTTP client) and proper timeout handling eliminated the slowdowns.

Follow-up Questions
What is ASGI vs WSGI? How does Pydantic validation work under the hood? What is the difference between FastAPI and Flask for production APIs??
ID: PY-INT-007  ·  Difficulty: 6/10  ·  Level: Intermediate