Skip to main content
Home  /  Knowledge Hub  /  Interview Questions

Interview Questions& Model Answers

Real questions. Real answers. Built from 20 years of actual hiring and being hired.

54
Total Questions
3
Technologies
3
Levels
✕ Clear filters

Showing 23 questions · Python

Clear all filters
PY-BEG-004 What is the difference between ‘break’ ‘continue’ and ‘pass’ in Python loops?
Python Core Python Beginner
2/10
Answer

'break' exits the loop entirely. 'continue' skips the current iteration and moves to the next. 'pass' does nothing — it is a placeholder.

Deep Explanation

These three keywords control loop flow differently. 'break' immediately terminates the enclosing loop and execution continues after the loop block. 'continue' stops the current iteration and jumps back to the loop condition check. 'pass' is a null operation — it literally does nothing and is used when Python syntax requires a statement but you have no code to put there yet such as in an empty class or function body during development. Misunderstanding these leads to infinite loops or skipped logic in data processing pipelines.

Real-World Example

In a CSV data cleaning pipeline: 'continue' skips rows with missing values 'break' stops processing if a critical error is found in the data and 'pass' is used in an exception handler that acknowledges an error but intentionally takes no action (though this is usually bad practice in production).

⚠ Common Mistakes

Using 'pass' thinking it skips an iteration (it does not — use 'continue'). Using 'break' inside a nested loop thinking it exits all loops (it only exits the innermost one). Leaving 'pass' in production exception handlers silently swallowing errors.

🏭 Production Scenario

A data ingestion job was silently skipping thousands of records because a developer used 'pass' in an exception handler instead of 'continue' combined with logging. The job appeared to complete successfully but the database was missing 30% of expected records.

Follow-up Questions
How do you break out of nested loops in Python? What is the for-else construct in Python? How does 'continue' interact with try-except blocks??
ID: PY-BEG-004  ·  Difficulty: 2/10  ·  Level: Beginner
PY-BEG-005 What is the purpose of ‘self’ in Python class methods?
Python Core Python Beginner
2/10
Answer

'self' refers to the specific instance of the class that a method is being called on. It gives each instance access to its own attributes and other methods.

Deep Explanation

When you define a method inside a class Python does not automatically know which instance the method is operating on. 'self' is the conventional first parameter that receives a reference to the calling instance. When you call instance.method() Python automatically passes the instance as the first argument — you never pass 'self' explicitly when calling. Without 'self' all instances of a class would share the same state which would make OOP impossible. The name 'self' is a convention not a keyword — you could use any name but deviating from convention is considered bad practice.

Real-World Example

In a User class for a web application self.username and self.email store per-instance data. When the send_email() method is called on a specific user object 'self' ensures the method sends to that user's email address not to some global or shared value.

⚠ Common Mistakes

Forgetting to add 'self' as the first parameter of an instance method causing a TypeError when called. Confusing instance methods (use self) with class methods (use cls) and static methods (use neither). Thinking 'self' is a keyword like 'this' in Java.

🏭 Production Scenario

A production multi-tenant SaaS application had a bug where all tenants were seeing the same configuration because a developer defined tenant settings as class-level attributes instead of instance attributes set via self. Every update to one tenant's config overwrote all others.

Follow-up Questions
What is the difference between instance attributes and class attributes? What is @classmethod versus @staticmethod? Can you call a method without an instance using the class directly??
ID: PY-BEG-005  ·  Difficulty: 2/10  ·  Level: Beginner
PY-BEG-007 What is an f-string in Python and why is it preferred over older formatting methods?
Python Core Python Beginner
2/10
Answer

F-strings (formatted string literals) are the modern Python way to embed expressions inside strings using f'text {expression}'. They are faster more readable and less error-prone than % formatting or str.format().

Deep Explanation

Introduced in Python 3.6 f-strings evaluate expressions inside curly braces at runtime. The 'f' prefix before the quote tells Python to treat the string as a formatted literal. You can embed any valid Python expression: variables arithmetic function calls method calls conditional expressions. They are the fastest string formatting method in Python — benchmarks show f-strings are 40-70% faster than str.format() and significantly faster than % formatting because the expression evaluation happens at the bytecode level. Python 3.12 added even more f-string capabilities including reusing quote types inside expressions.

Real-World Example

In a web application logging system f-strings make log messages clear and fast: f'User {user.id} ({user.email}) performed {action} on resource {resource_id} at {timestamp}' — includes no string concatenation and is immediately readable during log review.

⚠ Common Mistakes

Using string concatenation with + instead of f-strings in high-frequency code paths. Forgetting that curly braces must be escaped as {{ and }} if you want literal braces. Using f-strings in logging calls when the string might never be formatted (use lazy % formatting for log messages to avoid building strings that are never logged at the configured log level).

🏭 Production Scenario

A high-throughput data processing service was building millions of formatted strings per hour using str.format(). Profiling showed string formatting as a significant CPU cost. Switching to f-strings reduced the formatting overhead by 45% contributing to a measurable throughput improvement.

Follow-up Questions
What are the format specification mini-language options available in f-strings? How do f-strings handle multi-line expressions? What changed in Python 3.12 regarding f-strings??
ID: PY-BEG-007  ·  Difficulty: 2/10  ·  Level: Beginner
PY-BEG-001 What is the difference between a list and a tuple in Python?
Python Core Python Beginner
2/10
Answer

Lists are mutable (changeable); tuples are immutable (fixed). Use tuples for data that should not change.

Deep Explanation

In Python, a list is defined with square brackets [] and can be modified after creation — you can append, remove, or change elements. A tuple is defined with parentheses () and cannot be modified after creation. This immutability makes tuples slightly faster and hashable, meaning they can be used as dictionary keys or set members. Python internally optimizes tuple storage so they consume less memory than equivalent lists. The immutability also serves as a signal to other developers that this data is not meant to change.

Real-World Example

A Django settings file uses tuples for ALLOWED_HOSTS and INSTALLED_APPS because these values should be fixed at configuration time. Using a list there would work but signals the wrong intent to maintainers.

⚠ Common Mistakes

Using a list when the data never changes (wastes memory and loses semantic meaning). Trying to modify a tuple and getting a TypeError without understanding why. Forgetting that a tuple with one element needs a trailing comma: (42,) not (42).

🏭 Production Scenario

A production API was returning inconsistent responses because a developer accidentally appended to what should have been a fixed configuration list. Switching to a tuple made the bug immediately visible as a TypeError on the next attempted modification.

Follow-up Questions
Can a tuple contain mutable objects? What is the performance difference between list and tuple iteration? When would you use a named tuple??
ID: PY-BEG-001  ·  Difficulty: 2/10  ·  Level: Beginner
PY-BEG-002 What does the ‘is’ operator do versus ‘==’?
Python Core Python Beginner
3/10
Answer

'==' checks value equality. 'is' checks identity — whether two variables point to the exact same object in memory.

Deep Explanation

The == operator calls the __eq__ method and compares values. The 'is' operator compares object identity using id(). Two objects can be equal in value but be different objects in memory. Python caches small integers (-5 to 256) and interned strings which can make 'is' return True unexpectedly for these values leading to subtle bugs if misused. You should almost never use 'is' to compare values — reserve it for None checks (if x is None) where it is both correct and idiomatic.

Real-World Example

In a user authentication system: 'if user_role == admin_role' correctly compares role names as strings. Using 'is' instead works on small test data due to string interning but silently fails in production when role strings come from a database and are different objects with the same value.

⚠ Common Mistakes

Using 'is' to compare strings or integers expecting value equality. Being confused by small integer caching making 'is' appear to work correctly during testing. Not using 'is None' — using == None instead which is slower and less Pythonic.

🏭 Production Scenario

A production bug was caused by comparing user permission strings with 'is' instead of '=='. Tests passed because short strings were interned but in production with database-fetched strings the comparison always returned False locking all users out of admin features.

Follow-up Questions
What is object identity in Python? How does Python intern strings? Why is 'is None' preferred over '== None'??
ID: PY-BEG-002  ·  Difficulty: 3/10  ·  Level: Beginner
PY-BEG-003 What are *args and **kwargs in Python functions?
Python Core Python Beginner
3/10
Answer

*args collects extra positional arguments as a tuple. **kwargs collects extra keyword arguments as a dictionary. Both allow functions to accept a variable number of arguments.

Deep Explanation

When you define a function with *args any positional arguments beyond the explicitly defined ones are packed into a tuple called args. With **kwargs any keyword arguments not explicitly defined are packed into a dictionary called kwargs. The names args and kwargs are just convention — the * and ** operators are what matter. You can use *args and **kwargs together and you can also use them when calling functions to unpack sequences and dictionaries into arguments. This pattern is heavily used in decorators, class inheritance, and API wrappers.

Real-World Example

Django's class-based views use **kwargs extensively to pass URL parameters captured by the router into view methods. FastAPI uses *args and **kwargs in middleware to forward requests without knowing the exact signature of the next handler.

⚠ Common Mistakes

Confusing *args (tuple) with a list. Forgetting that *args must come before **kwargs in the function signature. Trying to access args by keyword or kwargs by position. Mutating args thinking it is a list.

🏭 Production Scenario

A logging decorator in a production Flask app broke when a new endpoint added a keyword argument. The fix was changing the decorator to use *args and **kwargs so it would transparently forward any arguments to the wrapped function without needing updates every time a new parameter was added.

Follow-up Questions
How does ** unpacking work when calling a function? Can you have both *args and explicit keyword arguments? How are *args and **kwargs used in class __init__ with inheritance??
ID: PY-BEG-003  ·  Difficulty: 3/10  ·  Level: Beginner
PY-BEG-006 How does try-except-finally work in Python?
Python Core Python Beginner
3/10
Answer

'try' runs code that might fail. 'except' catches specific errors. 'finally' always runs regardless of whether an error occurred — used for cleanup.

Deep Explanation

The try block contains the risky code. If an exception occurs Python looks for a matching except clause. You can catch specific exception types (except ValueError) or use a bare except to catch everything (not recommended). The else clause (optional) runs only if no exception occurred. The finally clause always executes even if there was an exception or a return statement inside try — making it essential for releasing resources like file handles database connections or locks. Multiple except clauses can handle different exception types differently.

Real-World Example

In a database write operation: the try block executes the INSERT query the except block catches IntegrityError for duplicate keys and returns a meaningful error message the finally block always closes the database connection regardless of success or failure — preventing connection pool exhaustion.

⚠ Common Mistakes

Using a bare 'except:' that catches everything including KeyboardInterrupt and SystemExit making the program impossible to stop. Not closing resources in finally causing memory or connection leaks. Catching too broad an exception type and hiding real bugs.

🏭 Production Scenario

A production API server ran out of database connections after 6 hours because a developer forgot to close connections in a finally block. The try block opened a connection an exception occurred the connection was never closed and the pool was exhausted within hours under normal traffic.

Follow-up Questions
What is the difference between except Exception and bare except? When does finally NOT execute? How do context managers (with statement) relate to try-finally??
ID: PY-BEG-006  ·  Difficulty: 3/10  ·  Level: Beginner
PY-BEG-008 What is the difference between a Python module and a package?
Python Core Python Beginner
3/10
Answer

A module is a single .py file containing Python code. A package is a directory containing multiple modules and an __init__.py file. Packages allow organizing related modules into a hierarchical namespace.

Deep Explanation

Any .py file is a module — it can be imported with 'import filename'. A package is a directory with an __init__.py file (can be empty) that tells Python to treat the directory as a package. The __init__.py can import from submodules to define the package's public API. Modern Python (3.3+) supports namespace packages — directories without __init__.py — but explicit __init__.py is still preferred for clarity. Import paths follow the directory structure: in a package 'myapp' with a subpackage 'utils' containing 'helpers.py' you import with 'from myapp.utils.helpers import my_function'. The __init__.py content controls what 'from myapp import *' exports.

Real-World Example

Django is structured as a package: the top-level 'django' directory contains __init__.py and subpackages like 'django.db' 'django.http' 'django.contrib' each have their own __init__.py. This allows clean imports like 'from django.db import models' while keeping the codebase organized across hundreds of files.

⚠ Common Mistakes

Forgetting __init__.py in package directories (causes ImportError in Python 2 sometimes works as namespace package in Python 3 but can cause confusing behavior). Circular imports between modules in the same package. Relative imports (from . import module) vs absolute imports — relative imports can cause issues when running scripts directly.

🏭 Production Scenario

A production Django application was growing to 50+ Python files in a single directory. Refactoring into packages (api/ models/ services/ utils/) with __init__.py files and clean public APIs reduced import statement complexity and made it possible to see the application structure at a glance.

Follow-up Questions
What is the __all__ variable in Python modules? How does Python's import system search for modules (sys.path)? What is the difference between 'import module' and 'from module import name'??
ID: PY-BEG-008  ·  Difficulty: 3/10  ·  Level: Beginner
PY-BEG-009 What is a generator in Python and how does it differ from a list?
Python Core Python Beginner
3/10
Answer

A generator produces items one at a time using lazy evaluation — it only computes each item when requested. A list computes and stores all items immediately. Generators use far less memory for large sequences.

Deep Explanation

Generators are created using generator functions (functions with yield instead of return) or generator expressions (like list comprehensions but with parentheses). When you call a generator function it returns a generator object without executing the body. Each call to next() on the generator executes until the next yield pauses execution and returns the value. The generator remembers its state between next() calls. Key advantage: memory. A list of 1 million items stores all 1 million in memory. A generator that yields 1 million items stores only the current item and the execution state. Generators are also composable — you can chain generators to build processing pipelines without intermediate memory allocation.

Real-World Example

Processing a 10GB log file: reading the entire file into a list would require 10GB of RAM. A generator that yields one line at a time uses constant memory regardless of file size. In data pipelines: file_lines → filter_errors → parse_timestamps → aggregate — each step is a generator passing items to the next without intermediate storage.

⚠ Common Mistakes

Forgetting that a generator is exhausted after iteration — you cannot iterate over it twice. Not recognizing that for loops and many Python builtins (sum list map) accept any iterable including generators. Using a list comprehension when a generator expression would suffice (when you only need to iterate once). Confusing generator functions (use yield) with regular functions that return lists.

🏭 Production Scenario

A data export API was timing out for large datasets because it built a complete list of 500000 records before streaming. Refactoring to yield records one at a time from a generator allowed streaming the response immediately and eliminated the memory spike and timeout.

Follow-up Questions
What is the difference between yield and return in a generator? What is yield from and when do you use it? How do you convert a generator to a list and back??
ID: PY-BEG-009  ·  Difficulty: 3/10  ·  Level: Beginner
PY-INT-001 What is a list comprehension and when should you NOT use one?
Python Core Python Intermediate
4/10
Answer

A list comprehension is a concise way to create lists using a single line expression. Avoid them when the logic is complex enough that a regular loop is more readable.

Deep Explanation

List comprehensions follow the syntax [expression for item in iterable if condition]. They are faster than equivalent for loops because they are optimized at the C level in CPython. However they are not always the right choice. Avoid them when: the logic requires multiple nested conditions you need to handle exceptions inside the loop the comprehension spans more than two lines when formatted or you are consuming a large dataset where a generator expression would be more memory-efficient. Nested list comprehensions (list comprehensions inside list comprehensions) are almost always a readability mistake.

Real-World Example

In a data processing pipeline: [user.email for user in users if user.is_active and user.verified] is clean and appropriate. But building a matrix transformation with three nested comprehensions is a maintainability trap — a regular loop with clear variable names is better for the next developer.

⚠ Common Mistakes

Nesting comprehensions three levels deep making code unreadable. Using list comprehensions when you actually need a generator (you are iterating once over a large dataset). Adding side effects inside comprehensions (modifying external state) which is a major anti-pattern.

🏭 Production Scenario

A memory crash in a production data export service was traced to a list comprehension processing 2 million records at once loading everything into memory. Replacing it with a generator expression fixed the memory issue without changing any other code.

Follow-up Questions
What is the difference between a list comprehension and a generator expression? How do dict comprehensions and set comprehensions work? What is the performance difference between a comprehension and a map() call??
ID: PY-INT-001  ·  Difficulty: 4/10  ·  Level: Intermediate
PY-INT-005 How do you handle large files in Python without loading them entirely into memory?
Python Core Python Intermediate
4/10
Answer

Use generators file iteration (files are iterators in Python) or chunk-based reading. Never use read() or readlines() on large files — they load the entire file into memory.

Deep Explanation

Python file objects are iterators — you can iterate over them line by line without loading the entire file. For binary files or files where line iteration is not appropriate use file.read(chunk_size) to read fixed-size chunks in a loop. For CSV files use csv.DictReader (which iterates lazily) or pandas with chunksize parameter (pd.read_csv('file.csv' chunksize=10000) returns an iterator of DataFrames). For JSON use ijson for streaming JSON parsing. The with statement ensures the file is properly closed. For very large files (100GB+) memory-mapped files (mmap module) allow treating file content as if it were in memory while the OS handles paging.

Real-World Example

A log analysis system needed to process 50GB daily log files to extract error counts. Using open(file).read() caused OOM crashes. Refactoring to iterate line by line (for line in file) reduced memory usage from 50GB to under 10MB while processing the same file.

⚠ Common Mistakes

Using file.readlines() which builds a complete list of all lines in memory. Using pd.read_csv() without chunksize on multi-GB files. Not closing files (always use with statement). Forgetting to handle encoding explicitly — defaulting to system encoding causes silent corruption on non-ASCII data.

🏭 Production Scenario

A production data pipeline at a logistics company was crashing nightly when processing a 30GB shipment data CSV. The fix used pandas chunked reading: processing 50000 rows at a time aggregating results and writing summaries — reducing peak memory from 45GB (crashing the server) to 2GB.

Follow-up Questions
What is the mmap module and when would you use it? How does ijson enable streaming JSON parsing? How do you process a large file in parallel in Python??
ID: PY-INT-005  ·  Difficulty: 4/10  ·  Level: Intermediate
PY-INT-006 How does pytest work and what makes a good unit test in Python?
Python Core Python Intermediate
4/10
Answer

pytest discovers and runs test functions automatically providing rich assertion introspection fixtures for dependency injection and parametrize for data-driven tests. A good unit test is fast isolated deterministic and tests one specific behavior.

Deep Explanation

pytest looks for files named test_*.py functions named test_* and classes named Test*. When an assert fails pytest shows you exactly what the actual and expected values were — no need for assertEqual(). Fixtures (@pytest.fixture) provide setup/teardown and dependency injection for tests — database connections temporary files mock objects. Parametrize (@pytest.mark.parametrize) runs the same test with multiple input/output combinations eliminating test duplication. Mocking with unittest.mock.patch replaces real dependencies with controlled fakes making tests fast and isolated. Good unit tests: test one behavior run in milliseconds do not hit databases/networks/file systems (mock these) are deterministic (same result every run) and fail with clear messages.

Real-World Example

A FastAPI endpoint test: the test uses a pytest fixture providing a TestClient (mock HTTP client) patches the database dependency with an in-memory mock uses parametrize to test valid/invalid/edge case inputs and has clear test names like test_create_user_returns_201_for_valid_input. Each test runs in under 5ms with no external dependencies.

⚠ Common Mistakes

Writing tests that test implementation details instead of behavior — tests should not break when you refactor internals. Not mocking external dependencies making tests slow and flaky. Using a single large test function that tests multiple behaviors (impossible to tell which behavior failed). Asserting too broadly (assert response is not None) or too narrowly (asserting on exact internal state).

🏭 Production Scenario

A Django e-commerce platform's test suite took 45 minutes to run because 800 tests were hitting the actual test database. Refactoring to use pytest fixtures with database mocking and factory_boy for test data generation reduced the suite to 3 minutes enabling CI to run on every commit.

Follow-up Questions
What is the difference between mocking and stubbing? How do you test async functions with pytest? What is property-based testing with Hypothesis??
ID: PY-INT-006  ·  Difficulty: 4/10  ·  Level: Intermediate
PY-INT-004 How do context managers work and how do you create a custom one?
Python Core Python Intermediate
5/10
Answer

Context managers use __enter__ and __exit__ methods to manage setup and teardown of resources. The 'with' statement calls these automatically ensuring cleanup even if an exception occurs.

Deep Explanation

When you use 'with open(file) as f' Python calls f.__enter__() to set up and f.__exit__() to clean up. You can create custom context managers two ways: implement __enter__ and __exit__ in a class or use the @contextmanager decorator from contextlib with a generator function that yields once. The __exit__ method receives exception information and can suppress exceptions by returning True. Context managers are the Pythonic way to handle any resource that needs guaranteed cleanup: database connections locks temporary directories timers and transaction management.

Real-World Example

A database transaction context manager in a Django-like ORM: __enter__ begins the transaction __exit__ commits if no exception occurred or rolls back if one did. This pattern ensures no transaction is ever left open regardless of what happens inside the with block.

⚠ Common Mistakes

Not handling exceptions in __exit__ letting them propagate when they should be caught. Creating context managers with @contextmanager and forgetting to wrap the yield in try-finally skipping cleanup on exceptions. Using try-finally everywhere instead of the cleaner with statement.

🏭 Production Scenario

A production PostgreSQL service had intermittent connection failures traced to database transactions being left open. The root cause was exception handling that bypassed the connection cleanup code. Refactoring to use a context manager with proper __exit__ eliminated the issue permanently.

Follow-up Questions
What is the contextlib module? How do nested context managers work? What is contextlib.ExitStack used for??
ID: PY-INT-004  ·  Difficulty: 5/10  ·  Level: Intermediate
PY-INT-008 How do Python dictionaries work internally and what is their time complexity?
Python Core Python Intermediate
5/10
Answer

Python dictionaries are hash tables. Lookup insertion and deletion are O(1) average case. Hash collisions can degrade this to O(n) worst case but Python's implementation makes this extremely rare. Python 3.7+ guarantees insertion-order preservation.

Deep Explanation

Dictionaries store key-value pairs in a hash table. When you set d[key] = value Python computes hash(key) maps it to a bucket and stores the value. When you access d[key] Python recomputes the hash and looks up the bucket directly — O(1). Hash collisions (two different keys mapping to the same bucket) are resolved via open addressing in CPython. Python 3.6 introduced a compact dictionary representation that stores insertion order as a side effect. Python 3.7 made insertion order preservation official. Only hashable objects can be dictionary keys (immutable types: strings integers tuples — but not lists or other dicts). dict.get(key default) avoids KeyError for missing keys. collections.defaultdict automatically creates default values. collections.Counter counts hashable objects.

Real-World Example

In a word frequency counter processing millions of log lines dict-based counting with Counter outperforms sorting-based approaches by orders of magnitude — O(n) with hash table vs O(n log n) for sort-then-count. In a URL routing system a dict of {path: handler} enables O(1) route lookup regardless of how many routes exist.

⚠ Common Mistakes

Using a list to check membership (if item in list is O(n) — use a set or dict instead). Modifying a dictionary while iterating over it (raises RuntimeError — iterate over list(d.items()) instead). Using mutable objects as dictionary keys (unhashable type TypeError). Not using setdefault() or defaultdict() and writing verbose if-key-in-dict patterns instead.

🏭 Production Scenario

A production request deduplication service was checking if a request ID had been seen using a list (if request_id in seen_list). At 10000 requests per second the O(n) membership check was consuming 60% of CPU time. Replacing with a set (O(1) lookup) reduced CPU usage to 2% with identical functionality.

Follow-up Questions
How does Python set differ from dict internally? What is the difference between dict and OrderedDict after Python 3.7? What is dict comprehension and when should you use defaultdict instead??
ID: PY-INT-008  ·  Difficulty: 5/10  ·  Level: Intermediate
PY-DS-001 What is the difference between pandas DataFrame.apply() and vectorized operations?
Python Data Science Intermediate
5/10
Answer

Vectorized operations (using NumPy/pandas built-ins) operate on entire arrays at once in optimized C code. apply() calls a Python function row by row or column by column in pure Python. Vectorized operations are 10-1000x faster; use apply() only when no vectorized alternative exists.

Deep Explanation

pandas is built on NumPy which stores data in contiguous memory arrays and performs operations in optimized C/FORTRAN code without Python overhead. When you write df['price'] * 1.1 NumPy multiplies the entire array in C. When you write df.apply(lambda x: x['price'] * 1.1 axis=1) Python calls a function for every single row — potentially millions of function calls with Python overhead each time. The performance gap is enormous: for a 1M row DataFrame vectorized operations might take 10ms while apply() takes 10-30 seconds. Use apply() only for: operations that cannot be expressed vectorially complex multi-column operations with conditional logic or when applying a function that expects a Series object.

Real-World Example

A daily sales report generation for a retail chain was taking 45 minutes to run on a 5M-row transaction DataFrame. Profiling revealed three apply() calls doing price calculations that could be rewritten as vectorized operations. Replacing them reduced runtime to 90 seconds — a 30x speedup with no algorithmic change.

⚠ Common Mistakes

Using apply() for simple arithmetic that pandas/NumPy can do natively. Using apply(axis=1) to iterate rows for anything that can be done with vectorized conditionals (use np.where instead). Not knowing about str accessor methods (df['col'].str.contains()) which provide vectorized string operations avoiding apply() entirely.

🏭 Production Scenario

A pandas ETL pipeline at a financial data company was processing end-of-day data and regularly missing the 6 AM business deadline. Profiling showed apply() calls for currency conversion and date parsing were the bottleneck. Replacing with vectorized arithmetic and pd.to_datetime() reduced the pipeline from 4 hours to 18 minutes.

Follow-up Questions
What is the difference between apply() and applymap()? How does numpy.vectorize() differ from true vectorization? When should you use Polars instead of pandas??
ID: PY-DS-001  ·  Difficulty: 5/10  ·  Level: Intermediate

PAGE 1 OF 2  ·  23 QUESTIONS TOTAL