Machine Learning Interview

Easy

Company: Robinhood
How would you approach building a binary classifier when dealing with an imbalanced dataset?
Company: Square
What distinctions do you anticipate between a model that minimizes squared error and one that minimizes absolute error? When would you consider each error metric suitable?
Company: Facebook
How do you choose k in k-means clustering?
Company: Salesforce
How can you make your models more robust to outliers?
Company: AQR
Say you’re conducting a multiple linear regression and suspect that some predictors are correlated. How will the results of the regression be affected if several are indeed correlated? How would you address this problem?
Company: Point72
Explain the rationale behind random forests. How do they enhance the performance of individual decision trees?
Company: PayPal
Given a large dataset of payment transactions, say we want to predict the likelihood of a given transaction being fraudulent. However, there are many rows with missing values for various columns. How would you deal with this?
Company: Airbnb
Say you are running a simple logistic regression to solve a problem but find the results to be unsatisfactory. What are some ways you might improve your model, or what other models might you look into using instead?
Company: Two Sigma
Say you were running a linear regression for a dataset but you accidentally duplicated every data point. What happens to your beta coefficient.
Company: PWC
Compare gradient boosting vs random forests.
Company: DoorDash
Say that DoorDash is launching in Singapore. For this new market, you want to predict estimated time of arrival (ETA) for a delivery to reach a customer after an order has been placed on the app. From an earlier beta test in Singapore, there were 10,000 deliveries made. Do you have enough training data to create an accurate ETA model?

Medium

Company: Affirm
Say we are running a binary classification loan model, and rejected applicants must be supplied with a reason why they were rejected. Without digging into the weights of features, how would you supply these reasons?
Company: Google
Say you are given a very large corpus of words. How would you identify synonyms?
Company: Facebook
What is the bias-variance tradeoff? How is it expressed using an equation?
Company: Uber
Define the cross-validation process. What is the motivation behind using it?
Company: Salesforce
How would you build a lead scoring algorithm to predict whether a prospective company is likely to convert into being an enterprise customer?
Company: Spotify
How would you approach creating a music recommendation algorithm?
Company: Amazon
Define what it means for a function to be convex. What is an example of a machine learning algorithm that is not convex and describe why that is so?
Company: Microsoft
Explain what information gain and entropy are in the context of a decision tree and walk through a numerical example.
Company: Uber
What is L1 and L2 regularization? What are the differences between the two?
Company: Amazon
Describe gradient descent and the motivations behind stochastic gradient descent.
Company: Affirm
Assume we have a classifier that produces a score between 0 and 1 for the probability of a particular loan application being fraudulent. Say that for each application’s score, we take the square root of that score. How would the ROC curve change? If it doesn’t change, what kinds of functions would change the curve?
Company: IBM
Say X is a univariate Gaussian random variable. What is the entropy of X?
Company: Stitch Fix
How would you build a model to calculate a customer’s propensity to buy a particular item? What are some pros and cons of your approach?
Company: Citadel
Compare Gaussian Naive Bayes and logistic regression. When would you use one over the other?

Hard

Company: Walmart
What loss function is used in k-means clustering given k clusters and n sample points? Compute the update formula using (1) batch gradient descent and (2) stochastic gradient descent for the cluster mean for cluster k using a learning rate ε
Company: Two Sigma
Describe the kernel trick in SVMs and give a simple example. How do you decide what kernel to choose?
Company: Morgan Stanley
Say we have N observations for some variable which we model as being drawn from Gaussian distribution. What are your best guesses for the parameters of the distribution?
Company: Stripe
Say we are using a Gaussian mixture model(GMM) for anomaly detection of fraudulent transactions to classify incoming transactions into K classes. Describe the model setup formulaically and how to evaluate the posterior probabilities and log likelihood. How can we determine if a new transaction should be deemed fraudulent?
Company: Robinhood
How would you build a model to predict whether a Robinhood user will churn?
Company: Two Sigma
Suppose you are running a linear regression and model the error terms as being normally distributed. Show that in this setup, maximizing the likelihood of the data is equivalent to minimizing the sum of the squared residuals.
Company: Uber
Describe the idea behind Principal Components Analysis(PCA) and describe its formulation and derivation in matrix form. Next, go through the procedural description and solve the constrained maximization.
Company: Citadel
Describe the model formulation behind logistic regression. How do you maximize the log likelihood of a given model(using two-class case)?
Company: Spotify
How would you approach creating a music recommendation algorithm for Discover weekly (a 30-song weekly playlist personalized to an individual user)?
Company: Google
Derive the variance-covariance matrix of the least squares parameter estimates in matrix form.

Across Time

Welcome data enthusiast!

Machine Learning Interview

Machine Learning Interview Questions