Learning to rank in Taobao


What kind of problem we need to solve in sorting? How to put the user want, good goods to the front; how to adjust the traffic of different sellers; give more quality, but the price is not cheap goods more traffic, to guide the market more normative. The problem that needs to be solved is complicated, but the sort result is good and difficult to judge.

For problems such as search results, the usual practice is to divide each of the search results into top 5:

Bad (poor)

Fair (general)

Good (good)

Excellent (very good)


The results of the search are manually evaluated, and this type 5 calculate NDCG to determine the quality of the sort results.

This is a very exclusive judge, and a large number of artificial demonstration results are marked. This method is expensive and the evaluation of the search results is difficult to standardize. So make a new sorting for newly added features. In practical applications, we usually use AB Test to compare tests online. Through the direct judgment of search results, indirectly evaluate the new sort by user feedback.

Back to the nature of the problem, we need to know the good or bad, the whole is very hard, but for a pair of goods, the baby is better than another baby, we are It is more prone to judge. If you construct a large-consistent baby to distribute the online distribution, as long as the items on these collections are consistent with our expectations, you can think that sorting is better, so that the quality of the goods is good and sorted. Bad definition is more clear, more beneficial to optimize sorting. With the good or bad items, we use Learning to Rank technology to optimize the item to optimize the product pairs (PAIRWISE) optimization.

Learning to Rank (LTR) Chinese translation as sorting learning, using machine learning methods to solve the sorting problem in the search system. When characteristic is large to a certain scale, or sorted to a certain scale, it is difficult to reach a better condition using artificial rules, which requires the use of machine learning methods to help us optimize the sort model.

Why use Learning to Rank in Taobao.

First, you have to briefly introduce the development history of Taobao search, from the database to the search engine, solve the problem of larger product volume; at that time, the product sorting completely relies on the fall time to sort, as the product is increasing, the descending effect It is getting worse, it has joined the classicity and textual correlation, the person’s person is increasing; then in order to balance the traffic flow, the seller is added; then in order to better user experience, add personalization, image quality And other factors.

Similar categories, text-correlation is characterized in search sorting models, each feature produces a fraction of 0 to 1 according to different models or rules, which are performed by linear weighting. Role, the weights of these features have found a better group by continuously performing AB Test.

Step by step, now the sort model of Taobao search has a certain inevitability. However, each model has a longer time adjustment, which requires some reasonable parameters to be combined with artificial experience, and continuously performs testing through user feedback to find a better set of parameters. This has the following shortcomings:

1. The test time is longer.

2. Test is valid, generally may not reach an optimized effect.

3. During the continuous optimization process, the adjustment of the existing feature parameters is often omitted.

Therefore, we pass the method of Learning to Rank through the machine learning model to move these parameters. The project is named JAZZ, and the positioning is automatically adjusting the feature weight parameters, but does not generate new features. The newly added features will be used to determine its optimal parameters through this model.

Advantages, disadvantages, difficulties

Taobao LTR

Taobao’s LTR has the following special things,

First, in general, the characteristics of LTR are very basic, such as a feature that is often mentioned in a web page sort:

Unlike the general LTR in the general sense, it has accumulated a lot of stronger statistical features in Taobao. These useful features have been continuously optimized, and it is better in the existing sort of Taobao. We will then optimize, you don’t need to explore the characteristics from the head as a web page. We do the LTR on this basis. The features used are naturally used, and they are adjusted on the weighted parameters. More images are optimized for sort parameters. (Of course, this is just the first step, it will increase the characteristics according to the effect.)

Second, use statistical user feedback behavior instead of artificial marking. Many LTR paper refers to the need to manually divide the relevance into Bad, Fair, Good, Excellent, Perfect five files. This is a time-consuming work, and is marked by manual, the standard is difficult to unify, so The user’s click and purchase of purchases instead of the correlation between the goods and the determination of the product advantages. Optimization of the feature parameters, from the original manual adjustment, to automated learning. This change is shown below.

The original product sort system parameter generation process is shown in the green arrow in the following figure, and the new process is as shown in red. Reduce human factors, and can evaluate the effect in the line, saving the time of repeated testing of online Abtest, and is the purpose of optimizing parameters.

The overall architecture is as follows:

Production training data using PaIRWISE

The goal of search is relatively much, both need to consider the transfer rate of goods, and need to take care of the product clicks, and do not apply to PAIRWISE to do training data. The difficulty of PairWise is the construction of training data, of course, followed by some methods of Listwise, as well as methods of PAIRWISE and ListWise, and Boost methods.

Most of the paper will mention several PAIRWISE methods.

1.click> SKIP Above

2.LastClick> SKIP Above

3.click> Earlier Click

4.LastClick> Skip Previous

5.click> No-click Next

Click> Skipabove refers to the item in the search results page, which is better than the item you have viewed before this item, that is, sorted in front of this product and unsolved items. The top 4 methods are similar ideas, but for these relative mature sorting systems like Taobao, these methods are not applicable, because the selected PAIR sample is just in contrast, the original sort order, will each The characteristic fraction is higher, and the PAIR collection selected by Click> Skipabove is a good point than the score, which can be said to be an alternative, which will cause all characteristic weights of the model training result. burden.

So the way we choose Pair is to produce PAIR pairs that represent the product quality through the user’s click and purchase feedback.

Use click feedback to form pair

Statistics with Query, the click rate of each item is smoothed, and the hits are different from the item having more than one item.

Use no clicked data to form Pair

Add a portion of the sample to be generated by the original sorting, such a purpose is to change the parameters of the sorting not too violent, form some unexpected Badcase.

This part of the sample is based on the sort of the original product in the log, the first product is better than the 20th item, and the second product is better than the 21st item, and so on.

Sample mixed with layered sampling

Sample production

1. Sort by the line parameters to calculate the NDCG.

a) NDCG calculates that there is a clicked product correlation is 1. There is a transaction related to 2

b) When the PVLog is rearranged, the rearrangement ranges only within the current page, and the goods are raised to the first page after the rearrangement is considered.

c) DCG [i] DCG [I-1] + g [i] / log_2? ¡¼(i + 1), i is the location of the sort, G [i] is the correlation of the product in the i-th position. IDCG is DCG after being sorted by correlation

d) ndcgdcg / idcg

2. Adjust the sample mixing ratio, or sample selection policies. And re-training.

3. Reorder PVLog in accordance with the parameters of the training model. Calculate NDCG

4. Compare the NDCG sorted by the line parameters to see if there is an increase or decrease, and the extent to which the increase is increased to determine the income of this adjustment.

5. Analysis of the cause of the decline by finding NDCG’s decline in Query.

Tip: This article is simultaneously posted in micro signal {alibabatech}, please pay attention.