Interview Questions& Model Answers
Real questions. Real answers. Built from 20 years of actual hiring and being hired.
'break' exits the loop entirely. 'continue' skips the current iteration and moves to the next. 'pass' does nothing — it is a placeholder.
These three keywords control loop flow differently. 'break' immediately terminates the enclosing loop and execution continues after the loop block. 'continue' stops the current iteration and jumps back to the loop condition check. 'pass' is a null operation — it literally does nothing and is used when Python syntax requires a statement but you have no code to put there yet such as in an empty class or function body during development. Misunderstanding these leads to infinite loops or skipped logic in data processing pipelines.
In a CSV data cleaning pipeline: 'continue' skips rows with missing values 'break' stops processing if a critical error is found in the data and 'pass' is used in an exception handler that acknowledges an error but intentionally takes no action (though this is usually bad practice in production).
Using 'pass' thinking it skips an iteration (it does not — use 'continue'). Using 'break' inside a nested loop thinking it exits all loops (it only exits the innermost one). Leaving 'pass' in production exception handlers silently swallowing errors.
A data ingestion job was silently skipping thousands of records because a developer used 'pass' in an exception handler instead of 'continue' combined with logging. The job appeared to complete successfully but the database was missing 30% of expected records.
'self' refers to the specific instance of the class that a method is being called on. It gives each instance access to its own attributes and other methods.
When you define a method inside a class Python does not automatically know which instance the method is operating on. 'self' is the conventional first parameter that receives a reference to the calling instance. When you call instance.method() Python automatically passes the instance as the first argument — you never pass 'self' explicitly when calling. Without 'self' all instances of a class would share the same state which would make OOP impossible. The name 'self' is a convention not a keyword — you could use any name but deviating from convention is considered bad practice.
In a User class for a web application self.username and self.email store per-instance data. When the send_email() method is called on a specific user object 'self' ensures the method sends to that user's email address not to some global or shared value.
Forgetting to add 'self' as the first parameter of an instance method causing a TypeError when called. Confusing instance methods (use self) with class methods (use cls) and static methods (use neither). Thinking 'self' is a keyword like 'this' in Java.
A production multi-tenant SaaS application had a bug where all tenants were seeing the same configuration because a developer defined tenant settings as class-level attributes instead of instance attributes set via self. Every update to one tenant's config overwrote all others.
AI is the broad field of making machines intelligent. Machine Learning is a subset of AI where systems learn from data. Deep Learning is a subset of ML using multi-layered neural networks. Each is more specific and powerful but also more data and compute intensive.
AI (Artificial Intelligence) encompasses any technique that enables machines to simulate human intelligence — including rule-based expert systems search algorithms and ML. Machine Learning is the AI approach where systems improve through experience: instead of explicit programming they learn patterns from data. Traditional ML algorithms (decision trees SVMs linear regression) require manual feature engineering — humans decide what features to extract. Deep Learning uses neural networks with many layers that automatically learn hierarchical features from raw data. DL requires large amounts of data and GPU compute but achieves state-of-the-art performance on images text and audio. In 2025 when people say 'AI' in business contexts they usually mean ML or DL — specifically LLM-based systems.
A spam filter using keyword rules is rule-based AI. A spam filter using logistic regression on email features (word counts sender history) is ML. A spam filter using a fine-tuned BERT model on raw email text is Deep Learning. All three are AI each progressively more powerful and data-hungry.
Thinking AI = Deep Learning = LLMs. Missing that many production 'AI' systems are traditional ML (gradient boosting random forests) which are often more interpretable cheaper and more appropriate for tabular data. Assuming more complex (deep learning) is always better — for structured/tabular data gradient boosting typically outperforms neural networks.
A hospital wanted to predict patient readmission risk. A vendor proposed a deep learning solution requiring 10M training examples. The hospital had 50000 records. A properly tuned gradient boosting model (traditional ML) achieved 0.82 AUC on the available data while the deep learning approach overfit severely with only 0.68 AUC.
F-strings (formatted string literals) are the modern Python way to embed expressions inside strings using f'text {expression}'. They are faster more readable and less error-prone than % formatting or str.format().
Introduced in Python 3.6 f-strings evaluate expressions inside curly braces at runtime. The 'f' prefix before the quote tells Python to treat the string as a formatted literal. You can embed any valid Python expression: variables arithmetic function calls method calls conditional expressions. They are the fastest string formatting method in Python — benchmarks show f-strings are 40-70% faster than str.format() and significantly faster than % formatting because the expression evaluation happens at the bytecode level. Python 3.12 added even more f-string capabilities including reusing quote types inside expressions.
In a web application logging system f-strings make log messages clear and fast: f'User {user.id} ({user.email}) performed {action} on resource {resource_id} at {timestamp}' — includes no string concatenation and is immediately readable during log review.
Using string concatenation with + instead of f-strings in high-frequency code paths. Forgetting that curly braces must be escaped as {{ and }} if you want literal braces. Using f-strings in logging calls when the string might never be formatted (use lazy % formatting for log messages to avoid building strings that are never logged at the configured log level).
A high-throughput data processing service was building millions of formatted strings per hour using str.format(). Profiling showed string formatting as a significant CPU cost. Switching to f-strings reduced the formatting overhead by 45% contributing to a measurable throughput improvement.
Supervised learning trains on labeled data (input-output pairs). Unsupervised learning finds patterns in unlabeled data with no predefined outputs.
In supervised learning every training example has a correct answer (label). The algorithm learns to map inputs to outputs by minimizing prediction error. Examples: classification (spam/not spam) regression (predicting house prices). In unsupervised learning data has no labels. The algorithm discovers hidden structure: clustering groups similar items dimensionality reduction compresses features anomaly detection finds outliers. There is also semi-supervised learning (small labeled dataset + large unlabeled dataset) and self-supervised learning (labels generated from the data itself as in language model pretraining). Choosing the right paradigm depends on whether labeled data is available and how expensive it is to obtain.
A credit card fraud detection system: training on historical transactions labeled as 'fraud' or 'legitimate' is supervised learning. Discovering clusters of unusual spending behavior without predefined fraud labels is unsupervised (anomaly detection). Real production systems often use both — unsupervised to surface suspicious patterns supervised to classify confirmed cases.
Thinking unsupervised learning is always worse because it has no labels — it is simply solving a different problem. Confusing clustering (unsupervised) with classification (supervised). Underestimating the cost and effort of labeling data for supervised learning at scale.
A retail company tried to build a supervised product recommendation model but had insufficient labeled purchase-intent data. Switching to unsupervised collaborative filtering (clustering users by purchase history) produced better recommendations in production without requiring explicit labels.
Lists are mutable (changeable); tuples are immutable (fixed). Use tuples for data that should not change.
In Python, a list is defined with square brackets [] and can be modified after creation — you can append, remove, or change elements. A tuple is defined with parentheses () and cannot be modified after creation. This immutability makes tuples slightly faster and hashable, meaning they can be used as dictionary keys or set members. Python internally optimizes tuple storage so they consume less memory than equivalent lists. The immutability also serves as a signal to other developers that this data is not meant to change.
A Django settings file uses tuples for ALLOWED_HOSTS and INSTALLED_APPS because these values should be fixed at configuration time. Using a list there would work but signals the wrong intent to maintainers.
Using a list when the data never changes (wastes memory and loses semantic meaning). Trying to modify a tuple and getting a TypeError without understanding why. Forgetting that a tuple with one element needs a trailing comma: (42,) not (42).
A production API was returning inconsistent responses because a developer accidentally appended to what should have been a fixed configuration list. Switching to a tuple made the bug immediately visible as a TypeError on the next attempted modification.
Classification predicts a category (discrete output). Regression predicts a continuous numerical value.
In classification the output is one of a fixed set of categories: spam/not spam cat/dog/bird disease/healthy. Binary classification has two classes multiclass has more. The model output is typically a probability for each class and a threshold or argmax converts it to a final prediction. In regression the output is a continuous number: predicting tomorrow's temperature estimating a house price forecasting sales volume. The same algorithms often have both variants — linear regression vs logistic regression (despite the name logistic regression is a classifier) decision tree regressor vs classifier. Evaluation metrics differ: accuracy/F1 for classification RMSE/MAE/R2 for regression.
A real estate platform uses regression to estimate property values (continuous output: $425000) and classification to predict whether a property will sell within 30 days (binary output: yes/no). Both models are trained on the same property feature data but with different target variables and evaluation strategies.
Using regression metrics (RMSE) to evaluate a classifier or vice versa. Treating a regression problem as classification by binning the output (losing information). Not recognizing that logistic regression IS a classifier despite the word 'regression' in its name.
A demand forecasting system incorrectly used a classifier to predict inventory needs by bucketing demand into Low/Medium/High. The loss of continuous information caused systematic over-ordering. Switching to a regression model that predicted exact units improved inventory efficiency by 23%.
An LLM is a neural network trained on vast amounts of text to predict and generate language. Unlike traditional software with explicit rules LLMs learn statistical patterns from data and generate probabilistic outputs rather than deterministic ones.
Traditional software follows explicit if-then rules written by programmers — the same input always produces the same output. LLMs are trained on hundreds of billions of text tokens using self-supervised learning (predicting the next word) developing internal representations of language knowledge and reasoning patterns. At inference time they generate text token by token each token sampled from a probability distribution. This means: the same input can produce different outputs (non-deterministic) the model can generalize to tasks it was never explicitly programmed for it can fail in unpredictable ways unlike traditional software which fails at known edge cases and its 'knowledge' is frozen at training time. Key components: transformer architecture attention mechanism tokenization and the pretraining + fine-tuning paradigm.
When you ask a traditional search engine for 'Python list comprehension examples' it retrieves pages containing those exact keywords. When you ask an LLM it understands the intent generates an explanation tailored to apparent context (beginner vs expert) provides examples and can answer follow-up questions — all without having been explicitly programmed for your specific question.
Treating LLMs like databases that return facts reliably (they hallucinate). Expecting deterministic behavior (they are probabilistic). Assuming they have real-time information (they have a training cutoff). Building systems that rely entirely on LLM output without validation or grounding.
A legal tech company built a contract review tool that used an LLM to check for specific clause types. In production the LLM occasionally hallucinated that clauses existed when they did not. The fix required adding a verification step that located the actual clause text in the document rather than trusting the LLM's claim.
Overfitting is when a model learns the training data too well — including its noise — and performs poorly on new data. Detect it by comparing training and validation accuracy. Prevent it with regularization dropout more data or simpler models.
A model overfits when it memorizes training examples rather than learning generalizable patterns. The tell-tale sign is high training accuracy but significantly lower validation/test accuracy — the gap between them is your overfitting signal. Prevention techniques: regularization (L1/L2 add penalty terms for large weights) dropout (randomly deactivating neurons during training) early stopping (halt training when validation loss stops improving) data augmentation (artificially expand training data) cross-validation (use all data for both training and validation) and reducing model complexity. The bias-variance tradeoff is the theoretical framework: overfitting is high variance underfitting is high bias.
An image classification model for medical diagnostics achieved 99% training accuracy but only 71% on the validation set. Analysis showed it was memorizing specific image artifacts from the training hospital's scanner. Fixing required data augmentation (random crops flips brightness changes) and L2 regularization bringing validation accuracy to 89%.
Evaluating model performance only on training data and reporting those numbers. Not setting aside a test set that is never touched during development. Using the validation set for hyperparameter tuning and then reporting validation accuracy as if it were test accuracy (data leakage).
A production churn prediction model was deployed with 94% training accuracy. In production it performed at 61% barely better than always predicting 'no churn'. Investigation revealed no validation split was used and the model had memorized customer IDs that leaked into the feature set.
A generator produces items one at a time using lazy evaluation — it only computes each item when requested. A list computes and stores all items immediately. Generators use far less memory for large sequences.
Generators are created using generator functions (functions with yield instead of return) or generator expressions (like list comprehensions but with parentheses). When you call a generator function it returns a generator object without executing the body. Each call to next() on the generator executes until the next yield pauses execution and returns the value. The generator remembers its state between next() calls. Key advantage: memory. A list of 1 million items stores all 1 million in memory. A generator that yields 1 million items stores only the current item and the execution state. Generators are also composable — you can chain generators to build processing pipelines without intermediate memory allocation.
Processing a 10GB log file: reading the entire file into a list would require 10GB of RAM. A generator that yields one line at a time uses constant memory regardless of file size. In data pipelines: file_lines → filter_errors → parse_timestamps → aggregate — each step is a generator passing items to the next without intermediate storage.
Forgetting that a generator is exhausted after iteration — you cannot iterate over it twice. Not recognizing that for loops and many Python builtins (sum list map) accept any iterable including generators. Using a list comprehension when a generator expression would suffice (when you only need to iterate once). Confusing generator functions (use yield) with regular functions that return lists.
A data export API was timing out for large datasets because it built a complete list of 500000 records before streaming. Refactoring to yield records one at a time from a generator allowed streaming the response immediately and eliminated the memory spike and timeout.
Hallucination is when an LLM generates confident-sounding but factually incorrect or fabricated information. It happens because LLMs are trained to produce plausible next tokens based on patterns — not to retrieve verified facts.
LLMs learn statistical patterns from training data and generate text that sounds fluent and coherent — but they have no mechanism for verifying that what they generate is factually true. The model predicts the most probable next token given context which may not correspond to reality especially for: obscure facts (low representation in training data) recent events (after training cutoff) precise numerical information (dates statistics) citations and URLs (commonly fabricated) and complex multi-step reasoning (errors compound). Hallucination is not a bug it is an inherent property of the probabilistic text generation approach. Mitigation strategies: RAG (ground the model in retrieved documents) chain-of-thought (forces the model to reason explicitly) output validation (verify claims against reliable sources) and citation requirements (ask the model to quote source text supporting claims).
A legal AI assistant was generating case citations that did not exist — fabricated case names and citations that looked completely plausible. Lawyers who did not verify sources submitted briefs with non-existent precedents. Implementing a verification layer that checked all citations against a legal database before displaying them eliminated the problem.
Believing LLM outputs are inherently factual. Not validating LLM outputs before acting on them especially for medical legal or financial decisions. Using LLMs to recall specific numbers dates or citations without verification. Thinking that larger models do not hallucinate — they hallucinate less but still hallucinate.
A medical information chatbot was confidently providing incorrect drug dosage information that contradicted official guidelines. The information sounded authoritative and patients followed it. This resulted in a product recall and regulatory action. The fix required implementing RAG against official medical databases for all drug-related queries.
Prompt engineering is the practice of designing inputs to LLMs to reliably produce desired outputs. It matters in production because the same model with different prompts can produce dramatically different quality format and accuracy of responses.
LLMs are extremely sensitive to how questions and instructions are phrased. A vague prompt produces vague output. A well-structured prompt with context constraints examples and a clear output format produces consistent usable output. Key techniques: zero-shot prompting (just the instruction) few-shot prompting (instruction + examples) chain-of-thought prompting (asking the model to reason step by step) system prompts (persistent instructions that frame all interactions) output format specification (JSON markdown specific structure) role prompting (giving the model a persona) and constraint specification (word limits forbidden content required elements). In production prompts are version-controlled tested and iterated on like code.
A customer intent classification system was achieving 67% accuracy with a simple prompt. Adding three labeled examples (few-shot) specifying the output as a JSON object with confidence scores and adding a chain-of-thought instruction to 'explain your reasoning before giving the final category' raised accuracy to 89% on the same model.
Writing prompts that work once and assuming they will always work — LLMs are sensitive to small wording changes. Not version-controlling prompts making production debugging impossible. Using prompts that work on GPT-4 and assuming they work identically on GPT-3.5 or other models. Ignoring prompt injection vulnerabilities when building user-facing systems.
A content moderation system was incorrectly flagging safe content as harmful at a rate of 12%. Prompt analysis revealed the system prompt was ambiguous about edge cases. Adding 10 examples of borderline-safe content with explicit reasoning reduced false positive rate to 3% without model retraining.
A neural network is a series of connected layers of mathematical functions (neurons) that transform inputs into outputs. It learns by adjusting the connection weights using backpropagation — computing how much each weight contributed to the error and updating it to reduce the error.
A neural network has an input layer (receives features) hidden layers (learn representations) and an output layer (produces predictions). Each neuron computes a weighted sum of its inputs adds a bias and applies an activation function (ReLU sigmoid tanh) to introduce non-linearity. Learning happens through: forward pass (compute prediction) loss computation (measure how wrong the prediction was using a loss function like cross-entropy or MSE) backpropagation (use chain rule to compute gradient of loss with respect to each weight) and gradient descent (update weights in the direction that reduces loss). This cycle repeats for many iterations (epochs) over the training data. The learning rate controls how large each weight update is.
Image classification: the input layer receives pixel values early hidden layers learn to detect edges and colors middle layers detect shapes and textures later layers detect object parts and the output layer assigns class probabilities. This hierarchical feature learning happens automatically through training — no hand-engineering required.
Using too high a learning rate causing the loss to oscillate or diverge. Not normalizing inputs (neural networks are sensitive to input scale). Not enough data — neural networks need more data than traditional ML algorithms to generalize. Using too many layers for a simple problem when a shallower network would suffice.
A production image recognition model for quality control on a manufacturing line was failing to converge during training. Investigation showed input images were not normalized — pixel values ranged 0-255 instead of 0-1. Adding a normalization layer as the first layer stabilized training and the model converged in 50 epochs.
Training set is used to fit the model. Validation set is used to tune hyperparameters and select the best model. Test set is held out completely and used only once to report final performance. Using only train/test leads to overfitting on the test set through repeated evaluation.
Without a separate validation set developers tune hyperparameters (learning rate tree depth regularization strength) by evaluating on the test set. Each evaluation leaks information about the test set into the model selection process — the final reported test accuracy is optimistically biased. A proper split: 70% training (model learns from this) 15% validation (used during development for hyperparameter tuning and model selection) 15% test (locked away evaluated exactly once to report final performance). For small datasets k-fold cross-validation replaces the validation set by rotating which portion of training data is held out. The test set must never be touched during any development decision.
An ML competition showed that teams who repeatedly submitted to the public leaderboard (which used the test set) were effectively overfitting to the test set through hundreds of submission cycles. Teams who maintained a strict held-out final test set reported the more realistic performance numbers.
Using the test set during hyperparameter tuning then reporting test set performance as if it were unbiased. Not stratifying the split for classification — random splits of imbalanced data can put almost no positive examples in the validation set. Time-series data: splitting randomly instead of chronologically leaks future information into training.
A production recommendation system was developed with 50 rounds of hyperparameter tuning each evaluated on the same test set. Deployed performance was 15% lower than the reported test AUC. Post-mortem confirmed the test set had been evaluated 50 times during development causing effective test set overfitting.
A module is a single .py file containing Python code. A package is a directory containing multiple modules and an __init__.py file. Packages allow organizing related modules into a hierarchical namespace.
Any .py file is a module — it can be imported with 'import filename'. A package is a directory with an __init__.py file (can be empty) that tells Python to treat the directory as a package. The __init__.py can import from submodules to define the package's public API. Modern Python (3.3+) supports namespace packages — directories without __init__.py — but explicit __init__.py is still preferred for clarity. Import paths follow the directory structure: in a package 'myapp' with a subpackage 'utils' containing 'helpers.py' you import with 'from myapp.utils.helpers import my_function'. The __init__.py content controls what 'from myapp import *' exports.
Django is structured as a package: the top-level 'django' directory contains __init__.py and subpackages like 'django.db' 'django.http' 'django.contrib' each have their own __init__.py. This allows clean imports like 'from django.db import models' while keeping the codebase organized across hundreds of files.
Forgetting __init__.py in package directories (causes ImportError in Python 2 sometimes works as namespace package in Python 3 but can cause confusing behavior). Circular imports between modules in the same package. Relative imports (from . import module) vs absolute imports — relative imports can cause issues when running scripts directly.
A production Django application was growing to 50+ Python files in a single directory. Refactoring into packages (api/ models/ services/ utils/) with __init__.py files and clean public APIs reduced import statement complexity and made it possible to see the application structure at a glance.
PAGE 1 OF 2 · 19 QUESTIONS TOTAL