Interview Questions& Model Answers
Real questions. Real answers. Built from 20 years of actual hiring and being hired.
'break' exits the loop entirely. 'continue' skips the current iteration and moves to the next. 'pass' does nothing — it is a placeholder.
These three keywords control loop flow differently. 'break' immediately terminates the enclosing loop and execution continues after the loop block. 'continue' stops the current iteration and jumps back to the loop condition check. 'pass' is a null operation — it literally does nothing and is used when Python syntax requires a statement but you have no code to put there yet such as in an empty class or function body during development. Misunderstanding these leads to infinite loops or skipped logic in data processing pipelines.
In a CSV data cleaning pipeline: 'continue' skips rows with missing values 'break' stops processing if a critical error is found in the data and 'pass' is used in an exception handler that acknowledges an error but intentionally takes no action (though this is usually bad practice in production).
Using 'pass' thinking it skips an iteration (it does not — use 'continue'). Using 'break' inside a nested loop thinking it exits all loops (it only exits the innermost one). Leaving 'pass' in production exception handlers silently swallowing errors.
A data ingestion job was silently skipping thousands of records because a developer used 'pass' in an exception handler instead of 'continue' combined with logging. The job appeared to complete successfully but the database was missing 30% of expected records.
'self' refers to the specific instance of the class that a method is being called on. It gives each instance access to its own attributes and other methods.
When you define a method inside a class Python does not automatically know which instance the method is operating on. 'self' is the conventional first parameter that receives a reference to the calling instance. When you call instance.method() Python automatically passes the instance as the first argument — you never pass 'self' explicitly when calling. Without 'self' all instances of a class would share the same state which would make OOP impossible. The name 'self' is a convention not a keyword — you could use any name but deviating from convention is considered bad practice.
In a User class for a web application self.username and self.email store per-instance data. When the send_email() method is called on a specific user object 'self' ensures the method sends to that user's email address not to some global or shared value.
Forgetting to add 'self' as the first parameter of an instance method causing a TypeError when called. Confusing instance methods (use self) with class methods (use cls) and static methods (use neither). Thinking 'self' is a keyword like 'this' in Java.
A production multi-tenant SaaS application had a bug where all tenants were seeing the same configuration because a developer defined tenant settings as class-level attributes instead of instance attributes set via self. Every update to one tenant's config overwrote all others.
F-strings (formatted string literals) are the modern Python way to embed expressions inside strings using f'text {expression}'. They are faster more readable and less error-prone than % formatting or str.format().
Introduced in Python 3.6 f-strings evaluate expressions inside curly braces at runtime. The 'f' prefix before the quote tells Python to treat the string as a formatted literal. You can embed any valid Python expression: variables arithmetic function calls method calls conditional expressions. They are the fastest string formatting method in Python — benchmarks show f-strings are 40-70% faster than str.format() and significantly faster than % formatting because the expression evaluation happens at the bytecode level. Python 3.12 added even more f-string capabilities including reusing quote types inside expressions.
In a web application logging system f-strings make log messages clear and fast: f'User {user.id} ({user.email}) performed {action} on resource {resource_id} at {timestamp}' — includes no string concatenation and is immediately readable during log review.
Using string concatenation with + instead of f-strings in high-frequency code paths. Forgetting that curly braces must be escaped as {{ and }} if you want literal braces. Using f-strings in logging calls when the string might never be formatted (use lazy % formatting for log messages to avoid building strings that are never logged at the configured log level).
A high-throughput data processing service was building millions of formatted strings per hour using str.format(). Profiling showed string formatting as a significant CPU cost. Switching to f-strings reduced the formatting overhead by 45% contributing to a measurable throughput improvement.
Lists are mutable (changeable); tuples are immutable (fixed). Use tuples for data that should not change.
In Python, a list is defined with square brackets [] and can be modified after creation — you can append, remove, or change elements. A tuple is defined with parentheses () and cannot be modified after creation. This immutability makes tuples slightly faster and hashable, meaning they can be used as dictionary keys or set members. Python internally optimizes tuple storage so they consume less memory than equivalent lists. The immutability also serves as a signal to other developers that this data is not meant to change.
A Django settings file uses tuples for ALLOWED_HOSTS and INSTALLED_APPS because these values should be fixed at configuration time. Using a list there would work but signals the wrong intent to maintainers.
Using a list when the data never changes (wastes memory and loses semantic meaning). Trying to modify a tuple and getting a TypeError without understanding why. Forgetting that a tuple with one element needs a trailing comma: (42,) not (42).
A production API was returning inconsistent responses because a developer accidentally appended to what should have been a fixed configuration list. Switching to a tuple made the bug immediately visible as a TypeError on the next attempted modification.
'==' checks value equality. 'is' checks identity — whether two variables point to the exact same object in memory.
The == operator calls the __eq__ method and compares values. The 'is' operator compares object identity using id(). Two objects can be equal in value but be different objects in memory. Python caches small integers (-5 to 256) and interned strings which can make 'is' return True unexpectedly for these values leading to subtle bugs if misused. You should almost never use 'is' to compare values — reserve it for None checks (if x is None) where it is both correct and idiomatic.
In a user authentication system: 'if user_role == admin_role' correctly compares role names as strings. Using 'is' instead works on small test data due to string interning but silently fails in production when role strings come from a database and are different objects with the same value.
Using 'is' to compare strings or integers expecting value equality. Being confused by small integer caching making 'is' appear to work correctly during testing. Not using 'is None' — using == None instead which is slower and less Pythonic.
A production bug was caused by comparing user permission strings with 'is' instead of '=='. Tests passed because short strings were interned but in production with database-fetched strings the comparison always returned False locking all users out of admin features.
*args collects extra positional arguments as a tuple. **kwargs collects extra keyword arguments as a dictionary. Both allow functions to accept a variable number of arguments.
When you define a function with *args any positional arguments beyond the explicitly defined ones are packed into a tuple called args. With **kwargs any keyword arguments not explicitly defined are packed into a dictionary called kwargs. The names args and kwargs are just convention — the * and ** operators are what matter. You can use *args and **kwargs together and you can also use them when calling functions to unpack sequences and dictionaries into arguments. This pattern is heavily used in decorators, class inheritance, and API wrappers.
Django's class-based views use **kwargs extensively to pass URL parameters captured by the router into view methods. FastAPI uses *args and **kwargs in middleware to forward requests without knowing the exact signature of the next handler.
Confusing *args (tuple) with a list. Forgetting that *args must come before **kwargs in the function signature. Trying to access args by keyword or kwargs by position. Mutating args thinking it is a list.
A logging decorator in a production Flask app broke when a new endpoint added a keyword argument. The fix was changing the decorator to use *args and **kwargs so it would transparently forward any arguments to the wrapped function without needing updates every time a new parameter was added.
'try' runs code that might fail. 'except' catches specific errors. 'finally' always runs regardless of whether an error occurred — used for cleanup.
The try block contains the risky code. If an exception occurs Python looks for a matching except clause. You can catch specific exception types (except ValueError) or use a bare except to catch everything (not recommended). The else clause (optional) runs only if no exception occurred. The finally clause always executes even if there was an exception or a return statement inside try — making it essential for releasing resources like file handles database connections or locks. Multiple except clauses can handle different exception types differently.
In a database write operation: the try block executes the INSERT query the except block catches IntegrityError for duplicate keys and returns a meaningful error message the finally block always closes the database connection regardless of success or failure — preventing connection pool exhaustion.
Using a bare 'except:' that catches everything including KeyboardInterrupt and SystemExit making the program impossible to stop. Not closing resources in finally causing memory or connection leaks. Catching too broad an exception type and hiding real bugs.
A production API server ran out of database connections after 6 hours because a developer forgot to close connections in a finally block. The try block opened a connection an exception occurred the connection was never closed and the pool was exhausted within hours under normal traffic.
A module is a single .py file containing Python code. A package is a directory containing multiple modules and an __init__.py file. Packages allow organizing related modules into a hierarchical namespace.
Any .py file is a module — it can be imported with 'import filename'. A package is a directory with an __init__.py file (can be empty) that tells Python to treat the directory as a package. The __init__.py can import from submodules to define the package's public API. Modern Python (3.3+) supports namespace packages — directories without __init__.py — but explicit __init__.py is still preferred for clarity. Import paths follow the directory structure: in a package 'myapp' with a subpackage 'utils' containing 'helpers.py' you import with 'from myapp.utils.helpers import my_function'. The __init__.py content controls what 'from myapp import *' exports.
Django is structured as a package: the top-level 'django' directory contains __init__.py and subpackages like 'django.db' 'django.http' 'django.contrib' each have their own __init__.py. This allows clean imports like 'from django.db import models' while keeping the codebase organized across hundreds of files.
Forgetting __init__.py in package directories (causes ImportError in Python 2 sometimes works as namespace package in Python 3 but can cause confusing behavior). Circular imports between modules in the same package. Relative imports (from . import module) vs absolute imports — relative imports can cause issues when running scripts directly.
A production Django application was growing to 50+ Python files in a single directory. Refactoring into packages (api/ models/ services/ utils/) with __init__.py files and clean public APIs reduced import statement complexity and made it possible to see the application structure at a glance.
A generator produces items one at a time using lazy evaluation — it only computes each item when requested. A list computes and stores all items immediately. Generators use far less memory for large sequences.
Generators are created using generator functions (functions with yield instead of return) or generator expressions (like list comprehensions but with parentheses). When you call a generator function it returns a generator object without executing the body. Each call to next() on the generator executes until the next yield pauses execution and returns the value. The generator remembers its state between next() calls. Key advantage: memory. A list of 1 million items stores all 1 million in memory. A generator that yields 1 million items stores only the current item and the execution state. Generators are also composable — you can chain generators to build processing pipelines without intermediate memory allocation.
Processing a 10GB log file: reading the entire file into a list would require 10GB of RAM. A generator that yields one line at a time uses constant memory regardless of file size. In data pipelines: file_lines → filter_errors → parse_timestamps → aggregate — each step is a generator passing items to the next without intermediate storage.
Forgetting that a generator is exhausted after iteration — you cannot iterate over it twice. Not recognizing that for loops and many Python builtins (sum list map) accept any iterable including generators. Using a list comprehension when a generator expression would suffice (when you only need to iterate once). Confusing generator functions (use yield) with regular functions that return lists.
A data export API was timing out for large datasets because it built a complete list of 500000 records before streaming. Refactoring to yield records one at a time from a generator allowed streaming the response immediately and eliminated the memory spike and timeout.
A list comprehension is a concise way to create lists using a single line expression. Avoid them when the logic is complex enough that a regular loop is more readable.
List comprehensions follow the syntax [expression for item in iterable if condition]. They are faster than equivalent for loops because they are optimized at the C level in CPython. However they are not always the right choice. Avoid them when: the logic requires multiple nested conditions you need to handle exceptions inside the loop the comprehension spans more than two lines when formatted or you are consuming a large dataset where a generator expression would be more memory-efficient. Nested list comprehensions (list comprehensions inside list comprehensions) are almost always a readability mistake.
In a data processing pipeline: [user.email for user in users if user.is_active and user.verified] is clean and appropriate. But building a matrix transformation with three nested comprehensions is a maintainability trap — a regular loop with clear variable names is better for the next developer.
Nesting comprehensions three levels deep making code unreadable. Using list comprehensions when you actually need a generator (you are iterating once over a large dataset). Adding side effects inside comprehensions (modifying external state) which is a major anti-pattern.
A memory crash in a production data export service was traced to a list comprehension processing 2 million records at once loading everything into memory. Replacing it with a generator expression fixed the memory issue without changing any other code.
Use generators file iteration (files are iterators in Python) or chunk-based reading. Never use read() or readlines() on large files — they load the entire file into memory.
Python file objects are iterators — you can iterate over them line by line without loading the entire file. For binary files or files where line iteration is not appropriate use file.read(chunk_size) to read fixed-size chunks in a loop. For CSV files use csv.DictReader (which iterates lazily) or pandas with chunksize parameter (pd.read_csv('file.csv' chunksize=10000) returns an iterator of DataFrames). For JSON use ijson for streaming JSON parsing. The with statement ensures the file is properly closed. For very large files (100GB+) memory-mapped files (mmap module) allow treating file content as if it were in memory while the OS handles paging.
A log analysis system needed to process 50GB daily log files to extract error counts. Using open(file).read() caused OOM crashes. Refactoring to iterate line by line (for line in file) reduced memory usage from 50GB to under 10MB while processing the same file.
Using file.readlines() which builds a complete list of all lines in memory. Using pd.read_csv() without chunksize on multi-GB files. Not closing files (always use with statement). Forgetting to handle encoding explicitly — defaulting to system encoding causes silent corruption on non-ASCII data.
A production data pipeline at a logistics company was crashing nightly when processing a 30GB shipment data CSV. The fix used pandas chunked reading: processing 50000 rows at a time aggregating results and writing summaries — reducing peak memory from 45GB (crashing the server) to 2GB.
pytest discovers and runs test functions automatically providing rich assertion introspection fixtures for dependency injection and parametrize for data-driven tests. A good unit test is fast isolated deterministic and tests one specific behavior.
pytest looks for files named test_*.py functions named test_* and classes named Test*. When an assert fails pytest shows you exactly what the actual and expected values were — no need for assertEqual(). Fixtures (@pytest.fixture) provide setup/teardown and dependency injection for tests — database connections temporary files mock objects. Parametrize (@pytest.mark.parametrize) runs the same test with multiple input/output combinations eliminating test duplication. Mocking with unittest.mock.patch replaces real dependencies with controlled fakes making tests fast and isolated. Good unit tests: test one behavior run in milliseconds do not hit databases/networks/file systems (mock these) are deterministic (same result every run) and fail with clear messages.
A FastAPI endpoint test: the test uses a pytest fixture providing a TestClient (mock HTTP client) patches the database dependency with an in-memory mock uses parametrize to test valid/invalid/edge case inputs and has clear test names like test_create_user_returns_201_for_valid_input. Each test runs in under 5ms with no external dependencies.
Writing tests that test implementation details instead of behavior — tests should not break when you refactor internals. Not mocking external dependencies making tests slow and flaky. Using a single large test function that tests multiple behaviors (impossible to tell which behavior failed). Asserting too broadly (assert response is not None) or too narrowly (asserting on exact internal state).
A Django e-commerce platform's test suite took 45 minutes to run because 800 tests were hitting the actual test database. Refactoring to use pytest fixtures with database mocking and factory_boy for test data generation reduced the suite to 3 minutes enabling CI to run on every commit.
Context managers use __enter__ and __exit__ methods to manage setup and teardown of resources. The 'with' statement calls these automatically ensuring cleanup even if an exception occurs.
When you use 'with open(file) as f' Python calls f.__enter__() to set up and f.__exit__() to clean up. You can create custom context managers two ways: implement __enter__ and __exit__ in a class or use the @contextmanager decorator from contextlib with a generator function that yields once. The __exit__ method receives exception information and can suppress exceptions by returning True. Context managers are the Pythonic way to handle any resource that needs guaranteed cleanup: database connections locks temporary directories timers and transaction management.
A database transaction context manager in a Django-like ORM: __enter__ begins the transaction __exit__ commits if no exception occurred or rolls back if one did. This pattern ensures no transaction is ever left open regardless of what happens inside the with block.
Not handling exceptions in __exit__ letting them propagate when they should be caught. Creating context managers with @contextmanager and forgetting to wrap the yield in try-finally skipping cleanup on exceptions. Using try-finally everywhere instead of the cleaner with statement.
A production PostgreSQL service had intermittent connection failures traced to database transactions being left open. The root cause was exception handling that bypassed the connection cleanup code. Refactoring to use a context manager with proper __exit__ eliminated the issue permanently.
Python dictionaries are hash tables. Lookup insertion and deletion are O(1) average case. Hash collisions can degrade this to O(n) worst case but Python's implementation makes this extremely rare. Python 3.7+ guarantees insertion-order preservation.
Dictionaries store key-value pairs in a hash table. When you set d[key] = value Python computes hash(key) maps it to a bucket and stores the value. When you access d[key] Python recomputes the hash and looks up the bucket directly — O(1). Hash collisions (two different keys mapping to the same bucket) are resolved via open addressing in CPython. Python 3.6 introduced a compact dictionary representation that stores insertion order as a side effect. Python 3.7 made insertion order preservation official. Only hashable objects can be dictionary keys (immutable types: strings integers tuples — but not lists or other dicts). dict.get(key default) avoids KeyError for missing keys. collections.defaultdict automatically creates default values. collections.Counter counts hashable objects.
In a word frequency counter processing millions of log lines dict-based counting with Counter outperforms sorting-based approaches by orders of magnitude — O(n) with hash table vs O(n log n) for sort-then-count. In a URL routing system a dict of {path: handler} enables O(1) route lookup regardless of how many routes exist.
Using a list to check membership (if item in list is O(n) — use a set or dict instead). Modifying a dictionary while iterating over it (raises RuntimeError — iterate over list(d.items()) instead). Using mutable objects as dictionary keys (unhashable type TypeError). Not using setdefault() or defaultdict() and writing verbose if-key-in-dict patterns instead.
A production request deduplication service was checking if a request ID had been seen using a list (if request_id in seen_list). At 10000 requests per second the O(n) membership check was consuming 60% of CPU time. Replacing with a set (O(1) lookup) reduced CPU usage to 2% with identical functionality.
Vectorized operations (using NumPy/pandas built-ins) operate on entire arrays at once in optimized C code. apply() calls a Python function row by row or column by column in pure Python. Vectorized operations are 10-1000x faster; use apply() only when no vectorized alternative exists.
pandas is built on NumPy which stores data in contiguous memory arrays and performs operations in optimized C/FORTRAN code without Python overhead. When you write df['price'] * 1.1 NumPy multiplies the entire array in C. When you write df.apply(lambda x: x['price'] * 1.1 axis=1) Python calls a function for every single row — potentially millions of function calls with Python overhead each time. The performance gap is enormous: for a 1M row DataFrame vectorized operations might take 10ms while apply() takes 10-30 seconds. Use apply() only for: operations that cannot be expressed vectorially complex multi-column operations with conditional logic or when applying a function that expects a Series object.
A daily sales report generation for a retail chain was taking 45 minutes to run on a 5M-row transaction DataFrame. Profiling revealed three apply() calls doing price calculations that could be rewritten as vectorized operations. Replacing them reduced runtime to 90 seconds — a 30x speedup with no algorithmic change.
Using apply() for simple arithmetic that pandas/NumPy can do natively. Using apply(axis=1) to iterate rows for anything that can be done with vectorized conditionals (use np.where instead). Not knowing about str accessor methods (df['col'].str.contains()) which provide vectorized string operations avoiding apply() entirely.
A pandas ETL pipeline at a financial data company was processing end-of-day data and regularly missing the 6 AM business deadline. Profiling showed apply() calls for currency conversion and date parsing were the bottleneck. Replacing with vectorized arithmetic and pd.to_datetime() reduced the pipeline from 4 hours to 18 minutes.
PAGE 1 OF 2 · 23 QUESTIONS TOTAL