Have you ever noticed that every time you search on Google, the results, more often than not, return links from Quora? Founded in 2009 as a question and answer website by Adam D’Angelo and Charlie Cheever, Quora was made available to the public in 2010. This website allows users to ask and answer questions and even vote/comment on answers given by other users. In 2020, the website recorded 300 million unique visitors to the website and ranks among the top 20 websites. The most searched topics were technology, movies, health, food and science.
There are several machine learning algorithms working behind the scenes that have helped Quora maintain its position as one of the most popular websites even after a decade of its launch.
Classification of questions and answers
Every Quora user looking for information on a particular topic does so by introducing their question or a “need for information”. Machine learning algorithms drive a question understanding process in which the exact information sought is extracted from the question. The next step is to identify “question quality,” which is done through a question quality classification that helps distinguish between high and low quality questions.
At this point, the algorithms also determine several different types of questions. Once questions are categorized, the step involves question-topic tagging, where the model determines the bucket/topic under which the question should be listed. Here, the analysis is based on data describing the actions that “Quorans” take on the platform. To facilitate analysis, Quora relies on a schematic relationship between users, questions, and topics. Unlike most topic modeling apps that deal with large document text and smaller topic ontology, Quora’s algorithms work with short question text and “over a million potential topics” to label the question. question.
When it comes to answers, Quora has a proprietary algorithm that ranks them. It is modeled similarly to Google’s PageRank, which counts the number and quality of links to a particular page to determine the website’s importance. The underlying belief is that important websites are more likely to have backlinks from other websites. Similarly, Quora ranks answers based on their usefulness. The “helpful” part is subject to factors such as upvotes and downvotes on the answer; previous answers written by the author; whether the author is an expert in the subject; the type and quality of content, among others.
Quora looks at two specific examples of ranking machine learning algorithms: search and custom ranking. In the case of search ranking, first, questions that match the query are returned; then these documents are ranked according to the probability of a click. In the case of a custom ranking, Quora attempts to select and rank the most “interesting” answer based on the user’s usage pattern measured from their profile.
Quora uses a combination of interest answers and questions. Upcoming actions are considered and aggregated at different time windows and passed to the ranking algorithm. Quora continues to experiment with the custom feed model.
Another important consideration for Quora when it comes to feed ranking apps is that it should be responsive to factors like user actions, impressions, and trending events. The challenge here is that there is a growing collection of questions and answers that may not be possible to categorize in real time for every user. To optimize the user experience, Quora implements a multi-stage ranking algorithm where candidates are ranked even before the final ranking is actually made.
One of the main considerations in QoE discussions on Quora is filtering out duplicate content. To this end, Quora’s ML team detects different questions that have the same intent and merges them into a single canonical question. One of the techniques used is a random forest model with features such as cosine similarity of average token word-to-word embeddings, common words, part of word voice tags, and common topics tagged on the questions. Apart from that, Quora also has different machine learning systems and their combinations to fight against spammy content. Additionally, machine learning algorithms along with human moderators help identify offensive, abusive, and hurtful content on the platform.
Until 2016, the platform was ad-free. According to Nikhil Dandekar, former head of engineering at Quora, the platform uses ad CTR prediction to ensure that the ads served are relevant to users and also provide value for money to advertisers.
Overall, major machine learning algorithms used at Quora include but are not limited to Logistic Regression, Elastic Networks, Gradient Boosted Decision Trees, Random Forests, Neural Networks, LambdaMART , matrix factorization, vector models and several other NLP techniques.
Main references – here and here.