In the present day Zarrar talks us by this query requested by Fb about how you can use Machine Studying to flag unlawful objects posted on …
source
Tags: aiartificial intelligenceartificial intelligence newsartificial intelligence news 2023data sciencedata science interviewdata science jaydata science mock interviewdetectionFacebookfacebook data scientistfacebook machine learningfacebook machine learning interviewIllegalInterviewinterview queryitemslatest news about robotics technologylatest robotslatest robots 2023learningMachinemachine learningmachine learning case studymachine learning interviewmachine learning systemmachine learning system designMetameta interviewmeta machine learning interviewML engineerml mock interviewml systemml system design interviewml system design interview questionsMockrobot newsrobotics newsrobotics news 2023robotics technologies llcrobotics technology
Great insights to sample questions
great insights but the text data can be various language but when he also said augment the some keywords to detect can that work or train different language different??just curious
Hello. I think this was super helpful overall. I'm a little confused when he describes Gradient Boosting. For each successor tree, we should set new target labels for training errors in the predecessor, no? (and leave the weights alone)
Great videos! Where do you get the sample questions from shown at the start of the video?
F2 score will be better here I think 🤔
I feel this video is a fantastic resource, not only the explanation was great and very insightful, but I think you also made the right questions, going for the extra-mile of the explanation/analysis…thank you for sharing!
Typical end to end ML Question:
Understate the problem, Data collection, Feature Engineering, Building Model, Train Model, Evaluate Performance ( Confusion Matrix: Precision ± Recall) , Deploy Model, Rebuild Model if needed
This is a fantastic video for giving an idea for an ML system design interview ! Thanks for making this.
If the dataset is biased? Why bother using accuracy as the metrics to evaluate the model?
Very useful! Thanks for sharing. Do they ask about data pipelines and technologies that might be useful to scale the model (for the MLE role)? Would love to know more resources on it! as well as more mock interviews 🙂
Thanks for tuning in! If you're interested in learning more about machine learning, be sure to check out our machine learning course. It's designed to help you master the key concepts and skills needed to excel in machine-learning roles.
https://www.interviewquery.com/learning-paths/modeling-and-machine-learning
INFORMATIVE GOOD SIR
Re; whether or not to do CV on images – shouldn't one do error analysis to check if text and other features lacked the predictive power and the signal was elsewhere (aka images) which is why we should invest in extracting signals from images; as opposed to building a giant model with all features and doing ablations to understand feature class importance. Latter seems quite expensive?
this is the best video ever
thanks Zarrar
Amazing! As a point to improve even more, I’d add as finishing touch fine-tuning the model with adversarial examples.
Were you use white board for ML design architecture? Is white boarding helpful in the interview?
Around @12:00 the algorithm that upweights incorrect prediction is Adaboost instead of GBM, right?
I would have suggested CNN as an alternative approach but ya agree. The listing is not only about an image but also text. Edge case where they have different text and different images then that won't get captured. Thank you.
It's also possible to use re-ranking or bagging approaches to combine xgboost model and vision/nlp model, which would most likely improve performance
Too mock, not like real interview, all things were mouth work without any drawing and writing.
I can never remember what Precision and Recall stands for. It is clearly visible how the interveiwee was also confused and video is edited around that point.
Great video. I find all the quick cuts to be a bit disorienting though.
Why does he say that it is a better idea to use NN rather than gradient boosted trees if we need to continuously train/update the model with every new training label that we collect from the customer labeling team?
Wow this was so useful.
Great Interview Zarrar!
Excellent
GBM is fast to train?????
Sorry, where did you discuss the label generation part? There are multiple ways to generate labels with pros and cons:
1. user feedback: Automatic, lot of data but noisy.
2. Manual annotation: accurate labels but not scalable. Very high proportion of examples would be tagged as negative.
3. Bootstrap: Train a simple model and sample more examples based on model scores to get a higher proportion of positive examples.
4. Hybrid: Manually annotate examples marked as "X" by users where "X" can be tags like "illegal", "offsensive", etc.
Wow this guy is good. I really like how he start from model framework with baseline model, point out the reasoning and key considerations – and we can evolve from there to more complicated model just by all similar reasoning
what does the following mean? TF-IDF: "We scale the values of each word based of each frequency in different postings"?
2 points that I would added for the end questions:
1. in order to overcome the coded firearm words -> use tranformers models like BERT as you can catch the meaning by the embeddings (ie: cosine similarity) and filter the best ratings
2. Computer Vision on the images can be used as additional inference if the F1 score is low, but not always as this type of inference is more expensive
Should've mentioned that people try to disguise the actual product description using proxy words.
Also, to include image analysis or not, I'd draw multiple samples and train models in A/B setting. Then run a t-test to see if the mean prediction metric is significantly different or not.
@iqjayfeng I think Zarrar mistakenly mixed up False Pos and False Neg around 2:00 mark. It would be ok if customer service received False Neg (model pred True but its really False) not False Pos