Machine Learning Interview Questions

Easy

  1. Company: Robinhood
    How would you approach building a binary classifier when dealing with an imbalanced dataset?
  2. Company: Square
    What distinctions do you anticipate between a model that minimizes squared error and one that minimizes absolute error? When would you consider each error metric suitable?
  3. Company: Facebook
    How do you choose k in k-means clustering?
  4. Company: Salesforce
    How can you make your models more robust to outliers?
  5. Company: AQR
    Say you’re conducting a multiple linear regression and suspect that some predictors are correlated. How will the results of the regression be affected if several are indeed correlated? How would you address this problem?
  6. Company: Point72
    Explain the rationale behind random forests. How do they enhance the performance of individual decision trees?
  7. Company: PayPal
    Given a large dataset of payment transactions, say we want to predict the likelihood of a given transaction being fraudulent. However, there are many rows with missing values for various columns. How would you deal with this?
  8. Company: Airbnb
    Say you are running a simple logistic regression to solve a problem but find the results to be unsatisfactory. What are some ways you might improve your model, or what other models might you look into using instead?
  9. Company: Two Sigma
    Say you were running a linear regression for a dataset but you accidentally duplicated every data point. What happens to your beta coefficient.
  10. Company: PWC
    Compare gradient boosting vs random forests.
  11. Company: DoorDash
    Say that DoorDash is launching in Singapore. For this new market, you want to predict estimated time of arrival (ETA) for a delivery to reach a customer after an order has been placed on the app. From an earlier beta test in Singapore, there were 10,000 deliveries made. Do you have enough training data to create an accurate ETA model?

Medium

  1. Company: Affirm
    Say we are running a binary classification loan model, and rejected applicants must be supplied with a reason why they were rejected. Without digging into the weights of features, how would you supply these reasons?
  2. Company: Google
    Say you are given a very large corpus of words. How would you identify synonyms?
  3. Company: Facebook
    What is the bias-variance tradeoff? How is it expressed using an equation?
  4. Company: Uber
    Define the cross-validation process. What is the motivation behind using it?
  5. Company: Salesforce
    How would you build a lead scoring algorithm to predict whether a prospective company is likely to convert into being an enterprise customer?
  6. Company: Spotify
    How would you approach creating a music recommendation algorithm?
  7. Company: Amazon
    Define what it means for a function to be convex. What is an example of a machine learning algorithm that is not convex and describe why that is so?
  8. Company: Microsoft
    Explain what information gain and entropy are in the context of a decision tree and walk through a numerical example.
  9. Company: Uber
    What is L1 and L2 regularization? What are the differences between the two?
  10. Company: Amazon
    Describe gradient descent and the motivations behind stochastic gradient descent.
  11. Company: Affirm
    Assume we have a classifier that produces a score between 0 and 1 for the probability of a particular loan application being fraudulent. Say that for each application’s score, we take the square root of that score. How would the ROC curve change? If it doesn’t change, what kinds of functions would change the curve?
  12. Company: IBM
    Say X is a univariate Gaussian random variable. What is the entropy of X?
  13. Company: Stitch Fix
    How would you build a model to calculate a customer’s propensity to buy a particular item? What are some pros and cons of your approach?
  14. Company: Citadel
    Compare Gaussian Naive Bayes and logistic regression. When would you use one over the other?

Hard

  1. Company: Walmart
    What loss function is used in k-means clustering given k clusters and n sample points? Compute the update formula using (1) batch gradient descent and (2) stochastic gradient descent for the cluster mean for cluster k using a learning rate ε
  2. Company: Two Sigma
    Describe the kernel trick in SVMs and give a simple example. How do you decide what kernel to choose?
  3. Company: Morgan Stanley
    Say we have N observations for some variable which we model as being drawn from Gaussian distribution. What are your best guesses for the parameters of the distribution?
  4. Company: Stripe
    Say we are using a Gaussian mixture model(GMM) for anomaly detection of fraudulent transactions to classify incoming transactions into K classes. Describe the model setup formulaically and how to evaluate the posterior probabilities and log likelihood. How can we determine if a new transaction should be deemed fraudulent?
  5. Company: Robinhood
    How would you build a model to predict whether a Robinhood user will churn?
  6. Company: Two Sigma
    Suppose you are running a linear regression and model the error terms as being normally distributed. Show that in this setup, maximizing the likelihood of the data is equivalent to minimizing the sum of the squared residuals.
  7. Company: Uber
    Describe the idea behind Principal Components Analysis(PCA) and describe its formulation and derivation in matrix form. Next, go through the procedural description and solve the constrained maximization.
  8. Company: Citadel
    Describe the model formulation behind logistic regression. How do you maximize the log likelihood of a given model(using two-class case)?
  9. Company: Spotify
    How would you approach creating a music recommendation algorithm for Discover weekly (a 30-song weekly playlist personalized to an individual user)?
  10. Company: Google
    Derive the variance-covariance matrix of the least squares parameter estimates in matrix form.