文章基本信息

标题：Why Do Platforms Use Ad Valorem Fees? Evaluating Two Alternative Explanations.
作者：Wang, Zhu
期刊名称：Economic Quarterly
印刷版ISSN：1069-7225
出版年度：2018
期号：September
出版社：Federal Reserve Bank of Richmond
摘要：Platforms that intermediate transactions between sellers and buyers have become increasingly important in the economy. People are familiar with, for example, online marketplaces (such as Amazon and eBay), payment platforms (such as Visa, MasterCard, and Paypal), and hotel booking sites (such as Booking.com and Expedia). However, there has been a great pricing puzzle associated with these platforms in that they almost universally rely on ad valorem fees, in which cases platforms charge sellers fees proportional to the transaction value plus sometimes small per-transaction fees. Given that these platforms do not incur significant costs that vary with transaction value, it is puzzling why ad valorem fees are so prevalently used.

In this article, we review two alternative explanations on this pricing puzzle. One theory, provided by Shy and Wang (2011) and others, emphasizes the vertical relation between the platform and the sellers. It is shown that in the case where the platform (i.e., the upstream) and the sellers (i.e., the downstream) both have market power (i.e., so-called "double marginalization") (1), the platform extracts a higher profit by using a proportional fee than using a per-transaction fee. Another explanation, offered by Wang and Wright (2017), instead focuses on the price discrimination angle. The key idea is that for a platform dealing with transactions of many different goods that vary widely in their costs and values, ad valorem fees serve as an efficient form of price discrimination that increases the platform's profit. While these two explanations provide alternative views, we will show that they indeed complement each other in explaining the ad valorem fee puzzle.

Why Do Platforms Use Ad Valorem Fees? Evaluating Two Alternative Explanations.

Wang, Zhu

Why Do Platforms Use Ad Valorem Fees? Evaluating Two Alternative Explanations.

Platforms that intermediate transactions between sellers and buyers have become increasingly important in the economy. People are familiar with, for example, online marketplaces (such as Amazon and eBay), payment platforms (such as Visa, MasterCard, and Paypal), and hotel booking sites (such as Booking.com and Expedia). However, there has been a great pricing puzzle associated with these platforms in that they almost universally rely on ad valorem fees, in which cases platforms charge sellers fees proportional to the transaction value plus sometimes small per-transaction fees. Given that these platforms do not incur significant costs that vary with transaction value, it is puzzling why ad valorem fees are so prevalently used.

In this article, we review two alternative explanations on this pricing puzzle. One theory, provided by Shy and Wang (2011) and others, emphasizes the vertical relation between the platform and the sellers. It is shown that in the case where the platform (i.e., the upstream) and the sellers (i.e., the downstream) both have market power (i.e., so-called "double marginalization") (1), the platform extracts a higher profit by using a proportional fee than using a per-transaction fee. Another explanation, offered by Wang and Wright (2017), instead focuses on the price discrimination angle. The key idea is that for a platform dealing with transactions of many different goods that vary widely in their costs and values, ad valorem fees serve as an efficient form of price discrimination that increases the platform's profit. While these two explanations provide alternative views, we will show that they indeed complement each other in explaining the ad valorem fee puzzle.

Our article contributes to a growing literature on platforms and their fee structures. In fact, besides the two theories analyzed in this article, there are additional (competing or complementary) views on ad valorem platform fees. For example, Loertscher and Niedermayer (2012) consider a mechanism design approach in an independent private values setup with privately informed buyers and sellers, in which an intermediary's optimal fees converge to linear fees as markets become increasingly thin. Muthers and Wismer (2013) show that if a platform can commit to proportional fees, this can reduce a hold-up problem that arises from the platform wanting to compete with sellers after they have incurred costs to enter the platform. Hagiu and Wright (forthcoming) provide a theory that ad valorem contracts align the incentives between upstream firms (principals) and downstream firms (agents), which allows the principal to achieve the same profits as if it could observe the demand shocks and control price.

The article is organized as follows. In Section 1, we first lay out two simple models that each justify one of the two explanations: double marginalization versus price discrimination. In Section 2, we then study a generalized model that accommodates both explanations. Our findings suggest that, in reality, platforms may choose a simple ad valorem fee schedule that addresses both double marginalization and price discrimination considerations. In Section 3, we apply the generalized model to a calibration exercise using data on DVD sales on Amazon and quantify the relative importance of the two explanations. Finally, Section 4 offers concluding remarks.

1. TWO ALTERNATIVE EXPLANATIONS

In this section, we lay out two simple models that each highlight one of the two alternative explanations: double marginalization versus price discrimination.

Double Marginalization

We first study a model environment similar to Shy and Wang (2011), where double marginalization motivates the use of ad valorem fees. (2) Consider that a monopoly seller sells a good on a monopoly platform. The good is indexed by c, the per-unit cost of the good to the seller, which is known to everyone in the market. There is a unit mass of buyers, each of whom wants to purchase one unit of the good. The value of the good to a buyer is c (1 + b), where b [greater than or equal to] 0 is a parameter that the buyer draws. (3) We assume that 1 + b is randomly distributed according to a cumulative distribution function F. Only buyers know their own b, while F is public information.

For illustrative purposes, we assume that F takes on a simple Pareto distribution

F (x) = 1 - [x.sup.-[lambda]]. (1)

Accordingly, the number of transactions [Q.sub.c] for the good c is the measure of buyers who obtain a nonnegative surplus from buying the good at price [p.sub.c], Pr (c (1 + b) - [p.sub.c] [greater than or equal to] 0). Therefore, the demand function for good c is

[Q.sub.c]([p.sub.c]) = 1 - F([p.sub.c]/c) = [([p.sub.c]/c).sup.-[lambda]], (2)

which has the constant elasticity [lambda]. For the monopoly pricing problem to be well-defined, we require that [lambda] > 1.

The platform incurs a cost of d [greater than or equal to] 0 per transaction, and it can potentially charge fees to either the buyer side or the seller side or both. Regardless of which side is charged, the final price faced by buyers will reflect any fees, and the buyer treats these the same whether she faces them directly or through sellers. Due to this standard result on the irrelevance of the incidence of taxes across the two sides, we can assume without loss of generality that only the seller side is charged.

In terms of timing, the platform moves first and announces the fee schedule it would charge the seller. Taking the fee schedule as given, the seller then decides the price of the good. Finally, buyers make purchase decisions.

Given the model setup, we are interested in the following question: If the platform can choose among a per-transaction fee, a proportional fee, or a mix of both fees, what type of fee schedule would the platform prefer?

To answer the question, we consider that the platform decides on an affine fee schedule, T ([p.sub.c]) = [t.sub.0] + [t.sub.1][p.sub.c], which covers all the possibilities listed above. We assume that the platform cannot subsidize the seller to operate by setting [t.sub.0] < 0. Doing so is likely to create an adverse incentive for which the seller could just collect to but not sell anything real. This imposes the requirement that [t.sub.0] [greater than or equal to] 0.

The model can be solved backward. Because the platform would make its fee decision by incorporating the seller's response, we solve the seller's problem first. The seller, taking the affine fee schedule ([t.sub.0], [t.sub.1]) charged by the platform as given, would choose [p.sub.c] to maximize her profit:

[mathematical expression not reproducible],

which implies

[p.sub.c.sup.*] = [lambda] (c + [t.sub.0])/([lambda] - 1)(1 - [t.sub.1]). (3)

Anticipating the seller's pricing decision [p.sub.c.sup.*], the platform would then choose [t.sub.0] and [t.sub.1] to solve

[mathematical expression not reproducible]

subject to the constraint [t.sub.0] [greater than or equal to] 0. We can verify that the constraint t0 [greater than or equal to] 0 is binding at the maximum, so the optimal affine fee schedule is just a proportional fee:

[t.sub.0] = 0. [t.sub.1] = c + d([lambda] - 1)/[lambda]c + d([lambda] - 1). (4)

Given that [lambda] > 1, we know 1 > [t.sub.1] > 0:

This simple model yields several useful findings. First, in the presence of double marginalization (i.e., when both the platform and the seller have market power), the platform strictly prefers a proportional fee to a per-transaction fee. Note that the use of a proportional fee allows the platform to mitigate, but not eliminate, double marginalization. In fact, if the seller side has no market power (or the platform owns the seller), the platform, being the single monopoly in the market, would earn an even higher profit and would be indifferent with a proportional fee or a per-transaction fee, as we will show in the analysis coming next. Second, to implement the optimal proportional fee, the platform needs to know c unless the marginal cost d of the platform is zero, in which case the platform has a simple formula [t.sub.1] = 1/[lambda]. Considering that d is typically small in reality, a platform may use [t.sub.1] = 1/[lambda] as a good proxy even if it has no knowledge of c.

The model above serves as a simple illustrative example. As shown in Shy and Wang (2011) and others, the result holds in more general settings, including the cases where sellers engage in Cournot competition with or without free entry. (4)

Price Discrimination

In contrast to the double marginalization explanation, we now study an alternative model proposed by Wang and Wright (2017) where price discrimination motivates the use of ad valorem fees. In doing so, we consider the same model setup as above except for two things: (i) a variety of goods is being sold on the platform, with the costs c differing widely across goods; and (ii) for each good c, there are multiple sellers who engage in Bertrand competition, so sellers have no market power. (5) The rest of the model specification remains unchanged--for each good c, there is a unit mass of buyers each of whom wants to purchase one unit of the good. Buyers draw their benefit 1 + b from a simple Pareto distribution, and as a result sellers face constant-elasticity demand. The platform considers charging sellers an affine fee schedule, T ([p.sub.c]) = [t.sub.0] + [t.sub.1][p.sub.c], subject to the constraint [t.sub.0] [greater than or equal to] 0.

Assume c takes on a finite number of distinct values in the set of C. The probability distribution of c on C is denoted [g.sub.c], with [mathematical expression not reproducible] = 1. As before, we solve the sellers' problem first. For each good c, taking the affine fee schedule as given, Bertrand sellers compete by setting the lowest possible price just to break even, so that

[p.sub.c.sup.*] = c + [t.sub.0] + [t.sub.1][p.sub.c.sup.*] [??] [p.sub.c.sup.*] = c + [t.sub.0]/1 - [t.sub.1].

Anticipating sellers' pricing decisions, the platform would then choose [t.sub.0] and [t.sub.1] to solve

[mathematical expression not reproducible]. (5)

To derive the solution to (5) intuitively, we first consider the hypothetical scenario where the platform could perfectly observe the cost and valuation for each good c and set a different optimal fee ([t.sub.0], [t.sub.1]) for each as follows:

[mathematical expression not reproducible],

which is equivalent to solving

[mathematical expression not reproducible].

The first-order condition implies a unique value of [t.sub.0.sup.*] + [ct.sub.1.sup.*]/1 - [t.sub.1.sup.*] such that

[t.sub.0.sup.*] + [ct.sub.1.sup.*]/1 - [t.sub.1.sup.*] = c + [lambda]d/[lambda] - 1, (6)

which could be potentially consistent with different fee schedules ([t.sub.0.sup.*], [t.sub.1.sup.*]). For example, the optimal fee could be a pure per-transaction fee or a pure proportional fee, but those fee schedules have to depend on c. However, one can verify that there is a unique affine fee

[t.sub.0.sup.*] = d; [t.sub.1.sup.*] = 1/[lambda] (7)

that also satisfies the condition (6), but the fee schedule does not depend on c. This means that the affine fee (7) maximizes the platform's overall profit (5) without requiring the platform to keep track of the goods traded.

This yields several new findings. First, for a given good, when the cost c is known to the platform and sellers have no market power, the platform is indifferent between charging a proportional fee and a per transaction fee. This contrasts our finding above that a proportional fee is strictly preferred to a per-transaction fee when sellers do have market power. Second, the platform can maximize profit by implementing the affine fee (7) without conditioning on c, which is a great advantage. There are often a large number of goods being traded on a platform, and the platform may not be able to track each good's cost and value. In this case, using the affine fee (7) requires no information of c, so it can be easily used by the platform. This results in optimal price discrimination in the sense that charging ad valorem fees (7) allows the platform to achieve the same level of profit that could be obtained under third-degree price discrimination as if the platform could perfectly observe the cost and valuation for each good traded. Finally, note that the optimal affine fee (7) has a per-transaction term [t.sub.0.sup.*] > 0 only if the platform incurs a positive marginal cost d; otherwise, a proportional fee [t.sub.1] = 1/[lambda] is optimal. Again, considering that d is typically small in reality, a simple proportional fee [t.sub.1] = 1/[lambda] can be a good proxy in practice.

The model is a simple illustrative example. Wang and Wright (2017) show the result holds broadly, including the demand takes more general functional forms or involves unobserved random variations.

2. A GENERALIZED ANALYSIS

The two theories noted above provide alternative justifications for the use of ad valorem fees by platforms. However, these two theories are not necessarily exclusive to each other. In this section, we provide a generalized analysis that accommodates both explanations. We show in reality a platform can choose a simple ad valorem fee that addresses both double marginalization and price discrimination considerations. The analysis and results in this section draw heavily from the online appendix of Wang and Wright (2017).

In the generalized analysis, we consider a variety of different goods being traded on a platform. We suppose that for each good there are [n.sub.c] [greater than or equal to] 1 identical quantity-setting sellers on the platform (i.e., Cournot competitors). This covers different intensities of seller competition, including the two special cases discussed in Section 1: when [n.sub.c] = 1, a good is sold by a monopoly seller; when [n.sub.c] [right arrow] [infinity], sellers are perfectly competitive. As before, each seller obtains the goods at a unit cost c and sells them at a retail price [p.sub.c].

On the demand side, we assume as before that the value of good c to a buyer drawing the benefit parameter b [greater than or equal to] 0 is c (1 + b). To generalize the analysis, we now consider that 1 + b is distributed according to the broad family of generalized Pareto distributions (GPD), of which the simple Pareto distribution is a special case. Accordingly, the cumulative distribution function F is defined as

F (x) = 1 - [(1 + [lambda] ([sigma] - 1)(x - 1)).sup.1/1 - [sigma]], (8)

with [lambda] > 0 being the scale parameter and [sigma] < 2 being the shape parameter. Only buyers know their own b, while F is public information.

Note that the generalized Pareto distribution implies the demand functions for sellers on the platform are defined by the class of demands with constant curvature of inverse demand (6)

[Q.sub.c]([p.sub.c]) = 1 - F ([p.sub.c]/c) = [(1 + [lambda]([sigma] - 1)([p.sub.c] - c)/c).sup.1/1 - [sigma]]. (9)

The constant [sigma] is the curvature of inverse demand, defined as the elasticity of the slope of the inverse demand with respect to quantity. When [sigma] < 1, the support of F is [1, 1 + 1/[lambda] (1 - [sigma])] and it has increasing hazard. Accordingly, the implied demand functions [Q.sub.c][(.sub.pc]) are log-concave and include the linear demand function ([sigma]= 0) as a special case. Alternatively, when 1 < [sigma] < 2, the support of F is [1, [infinity]), and it has decreasing hazard. The implied demand functions are log-convex and include the constant elasticity demand function ([sigma] = 1 + 1/[lambda]) as a special case. When a = 1, F captures the left-truncated exponential distribution F (x) = 1 - [e.sup.-[lambda](x-1)] on the support [1, [infinity]), with a constant hazard rate [lambda]. This implies the exponential (or log-linear) demand [mathematical expression not reproducible].

Taking as given that demand belongs to the generalized Pareto class, we allow c to take on potentially many different values in [[c.sub.L], [C.sub.H]], with the set of all such values being denoted C. The cumulative distribution of c on C is denoted G, and [g.sub.c] is the probability corresponding to the realization c.

The platform incurs a cost of d [greater than or equal to] 0 per transaction. Without loss of generality, we assume that the platform only charges the seller side to maximize its profit.

Below, in Section 2.1, as a benchmark, we first derive the platform's optimal affine fee in a setting with generalized Pareto demand and Bertrand sellers (or equivalently, sellers engage in Cournot competition, but the number of sellers goes to infinity). This extends the results we derived in Section 1.2, and we name the resulting fee schedule the "Bertrand affine fee." In this general case, as in Section 1.2, the Bertrand affine fee achieves optimal price discrimination given that sellers have no market power. In Section 2.2, we show that in a setting where sellers have market power and engage in Cournot competition, the Bertrand affine fee continues to do well. Particularly, we show that without knowing each good's cost and how many sellers are competing, the platform can continue to use the Bertrand affine fee and earn a higher profit than if it knew everything and set the optimal per-transaction fee for each good. This is because the Bertrand affine fee now achieves more than price discrimination; it also mitigates double marginalization. We then derive analytical results for the case d = 0 and show that while the Bertrand affine fee is not necessarily the optimal affine fee when sellers have market power, it can be very close. Therefore, in practice, a platform can implement the Bertrand affine fee as a good proxy.

Bertrand Affine Fee

We start with deriving the Bertrand affine fee. Consider that the platform charges sellers the fee schedule T([p.sub.c]). Assuming that sellers engage in Bertrand competition, the price pc for good c solves

[p.sub.c] = c + T ([p.sub.c]). (10)

Accordingly, the platform's profit is [[PI].sub.c] = (T ([p.sub.c]) - d) [Q.sub.c] ([p.sub.c]) for good c, where [Q.sub.c] ([p.sub.c]) is given by (9). The platform's problem is to choose T ([p.sub.c]) to maximize

[mathematical expression not reproducible]. (11)

In Wang and Wright (2017), it is shown that the optimal fee schedule is affine, given by

T ([p.sub.c]) = [lambda]d/1 + [lambda] (2 - [sigma]) + [p.sub.c]/1 + [lambda](2 - [sigma]), (12)

which maximizes (11). (7) Similar to our finding in Section 1.2, while the affine fee (12) does not condition on c, it achieves optimal price discrimination. To see this, note that the solution in (12) is equivalent to the platform charging the optimal per-transaction fee

[T.sub.c.sup.*] = [lambda]d + c/[lambda](2 - [sigma]) (13)

for each different good c, which would be possible if the platform could identify each good c and set its optimal per-transaction fee accordingly.

Our result in Section 1.2 is a special case of the Bertrand affine fee given by (12), with [sigma] = 1 + 1/[lambda]. In the general case, the platform's optimal affine fee again has a fixed per-transaction component only if there is a positive cost to the platform of handling each transaction (i.e., d > 0). Given [lambda] > 0 and [sigma] < 2, the fee schedule is increasing (higher prices imply higher fees are paid) but with a slope less than unity (this implies (10) has a unique solution for any given c > 0). The result in (12) also implies the platform can maximize its profit without tracking each individual good c or knowing the distribution G of goods that are traded.

Seller Market Power and Bertrand Affine Fee

We now study the platform's fee setting when sellers do have market power. We will show in the case of Cournot sellers, the platform can continue to use the Bertrand affine fee, which not only addresses the price discrimination, but also mitigates double marginalization. As a result, it leads to a higher platform profit than using optimal pertransaction fees.

Optimal per-transaction fees

To start, we consider the problem of a platform with full information on c (i.e., each good's cost) and nc (i.e., the number of Cournot sellers) setting an optimal per-transaction fee for each good.

Suppose the platform charges a per-transaction fee [T.sub.c] for good c. Let [q.sub.c,i] denote the output sold by seller i for good c. Each seller i sets [q.sub.c,i] taking the output by competing sellers [q.sub.c,-i] = [Q.sub.c] - [q.sub.c,i] as given and maximizes its profit ([p.sub.c] - c - [T.sub.c]) [q.sub.c,i]. Assuming F follows the GPD distribution (8), the total demand for good c is given by (9), which implies that the inverse demand is

[p.sub.c] = c (1 + [Q.sub.c.sup.1-[sigma]] - 1/[lambda]([sigma] - 1)).

Therefore, an individual seller's profit maximization problem is

[mathematical expression not reproducible].

The first-order condition for good c is

[mathematical expression not reproducible].

In a symmetric Cournot equilibrium, [q.sub.c,i] = [q.sub.c] for every seller, so the total sellers' output is [Q.sub.c] = [n.sub.c][q.sub.c]. We can then rewrite the first-order condition as

c[([n.sub.c][q.sub.c]).sup.1-[sigma]] - c/[lambda]([sigma] - 1) = c[([n.sub.c][q.sub.c]).sup.1-[sigma]] - [n.sub.c][lambda] + [T.sub.c]

and derive

[Q.sub.c] = [n.sub.c][q.sub.c] = [([cn.sub.c] + [lambda]([sigma] - 1)[T.sub.c][n.sub.c]/[cn.sub.c] - ([sigma] - 1)c).sup.1/1 - [sigma]]. (14)

Accordingly, the price of good c is

[mathematical expression not reproducible]. (15)

The platform takes (14) as given and maximizes its profit by setting a per-transaction fee for good c as follows

[mathematical expression not reproducible].

The first-order condition implies the optimal per-transaction fee [T.sub.c.sup.f]:

[T.sub.c.sup.f] = [lambda]d + c/[lambda](2 - [sigma]), (16)

which is the same optimal per-transaction fee that we derive in the Bertrand seller setting (13). The optimal per-transaction fee does not depend on the number of sellers and so also holds for a monopoly seller. Note that to ensure a meaningful solution (i.e. [T.sub.c.sup.f] > d), it is required that

d ([sigma] - 1) + c/[lambda] > 0. (17)

This is satisfied for the GPD demand specification: When demand is log-linear or log-convex, the GPD specification requires that [sigma] [greater than or equal to] 1 so the condition in (17) holds. When demand is log-concave, the GPD specification requires that [sigma] < 1 and d < c/[lambda](1 - [sigma]), so the condition in (17) again holds.

Substituting (16) into (14) and (15), we get

[mathematical expression not reproducible], (18)

and

[mathematical expression not reproducible]. (19)

As a result, the platform profit from good c is

[mathematical expression not reproducible].

Comparing Bertrand affine fee and optimal per-transaction fees

We now compare Bertrand affine fee and optimal per-transaction fees in the Cournot seller setting.

Consider Cournot sellers facing an affine fee schedule T ([p.sub.c]) = [t.sub.0] + [t.sub.1][p.sub.c] for each transaction. With GPD demand, the sellers' problem is to choose [q.sub.c,i] to maximize

((1 - [t.sub.1])[p.sub.c] - c - [t.sub.0])[q.sub.c,i], (20)

where

[p.sub.c] = c(1 + [([q.sub.c,-1] + [q.sub.c,i]).sup.1-[sigma]] - 1/[lambda]([sigma] - 1)). (21)

In a symmetric Cournot equilibrium, [q.sub.c,i] = [q.sub.c] for every seller, so the total sellers' output is [Q.sub.c] = [n.sub.c][q.sub.c]. The first-order condition then requires

[mathematical expression not reproducible]. (22)

Substituting the Bertrand affine fee from equation (12) into (22) gives the same price and output for a given c as we found above in (18) and (19) for the full information case. That is, the price and output for each good are identical to that implied by the optimal per-transaction fee (16). However, the per-transaction fee for good c implied by the Bertrand affine fee is now

[mathematical expression not reproducible],

which is strictly higher than the fee in (16) if and only if the condition (17) holds. This implies the platform earns a higher profit using the Bertrand affine fee than if it used the optimal per-transaction fee for each different good assuming full information. This result holds for any [n.sub.c] [greater than or equal to] 1 and so also holds for monopoly sellers.

This result shows that the Bertrand affine fee can be used in this setting to solve the price discrimination problem. It delivers the same price and output for each good without using any information on each good's cost. At the same time, the Bertrand affine fee generates a higher profit for the platform because it mitigates the double marginalization problem associated with using the optimal per-transaction fee for each good, allowing the platform to collect a higher fee from each good while achieving the same level of final price and output.

Comparing Bertrand affine fee and optimal affine fee

We have so far shown that Bertrand affine fee profit dominates per transaction fee when sellers have market power. In this section, assuming d = 0, we show that the Bertrand affine fee schedule (12) is indeed very close to the optimal affine fee schedule under Cournot sellers. (8) Note that given d =0, the Bertrand affine fee (12) implies the proportional fee schedule

[T.sup.*] ([p.sub.c]) = (1 + 1 + (2 - [sigma])[lambda])[p.sub.c]. (23)

We can then check whether this is the optimal affine fee schedule under Cournot sellers.

Consider a platform maximizing its profit by using an affine fee schedule [t.sub.0] + [t.sub.1][p.sub.c]. As before, we assume that the platform cannot subsidize sellers to operate by setting [t.sub.0] < 0. This imposes the requirement that [t.sub.0] [greater than or equal to] 0.

Cournot sellers take the platform affine fee schedule T ([p.sub.c]) = [t.sub.0] + [t.sub.1][p.sub.c] as given for each transaction. As shown above, with a GPD demand, the sellers' problem is given by (20) and (21), and the first-order condition for seller's profit-maximizing problem is given by (22).

Anticipating sellers' responses, the platform then solves the following problem:

[mathematical expression not reproducible]

subject to the constraint [t.sub.0] [greater than or equal to] 0 as well as the conditions

[p.sub.c] = c (1 + [Q.sub.c.sup.1 - [sigma]] - 1/[lambda]([sigma] - 1)) (24)

and

[mathematical expression not reproducible], (25)

where (24) is given by the GPD demand and (25) is the first-order condition (22). We can verify that the constraint [t.sub.0] [greater than or equal to] 0 is binding at the maximum, so the optimal affine fee schedule is also just a proportional fee schedule. Moreover, given that [t.sub.0] = 0, [p.sub.c]/c does not depend on c, so the platform can solve for the optimal [t.sub.1] without knowing the distribution of c. The first-order condition on [t.sub.1] requires

[mathematical expression not reproducible]. (26)

The optimal proportional fee implied by (26) is in general not equal to the proportional fee implied by (23), but based on an examination of some common demand functions, it is very close and so are the profits, as discussed below.

Consider first the case of constant elasticity demand, where [sigma] = 1 + 1/[lambda] and [lambda] > 1. In this case, both (23) and (26) yield [t.sub.1] = 1/[lambda] and so have identical profits. Thus, in this case, the Bertrand affine fee coincides with the optimal affine fee schedule. This result confirms our findings in Sections 1.1 and 1.2 that when d =0, the optimal affine fee under double marginalization (i.e., [t.sub.0] = 0, [t.sub.1] = 1/[lambda]) coincides with that which achieves optimal price discrimination (which is again to = 0, [t.sub.1] = 1/[lambda]).

Next, consider the case of exponential demand where [sigma] =1. Then (26) implies the optimal proportional fee satisfies

[(1 - [t.sub.1]).sup.3] + [lambda](1 - [t.sub.1])([n.sub.c] - [t.sub.1]) = [[n.sub.c][t.sub.1][lambda].sup.2],

which has a unique solution. In contrast, (23) implies the proportional fee

[t.sub.1] = 1/1 + [lambda].

The two fees are not exactly equal, but they are very close. For the empirically meaningful range where the proportional term [t.sub.1] of the Bertrand affine fee satisfies [t.sub.1] < 50 percent (or equivalently, [lambda] [greater than or equal to] 1), the Bertrand affine fee can recover more than 98.5 percent of the profit under the optimal affine fee schedule when all sellers are monopolists (so [n.sub.c] = 1 for all c). Moreover, the profit gap between using the Bertrand affine fee and using the optimal affine fee schedule decreases monotonically in [n.sub.c], and the two converge as the number of Cournot sellers gets large.

Finally, consider the case of linear demand where [sigma] = 0. Then (26) implies that the optimal proportional fee satisfies

[(1 - [t.sub.1]).sup.2] (1 + [lambda]) (1 - [t.sub.1] - [t.sub.1][lambda]) - [t.sub.1](1 - [t.sub.1])[lambda] (1 + [lambda]) = [n.sub.c] (2[t.sub.1][[lambda].sup.2] - [lambda](1 - [t.sub.1])),

which has a unique solution. In contrast, (23) implies the proportional fee

[t.sub.1] = 1 + 1 + 2[lambda].

For the empirically meaningful range where the proportional term ti of the Bertrand affine fee satisfies [t.sub.1] [less than or equal to] 50 percent (or equivalently, [lambda] [greater than or equal to] 0.5), the Bertrand affine fee can recover more than 97.5 percent of the profit under the optimal affine fee schedule when all sellers are monopolists (so [n.sub.c] = 1 for all c). Again, the profit gap between using the Bertrand affine fee schedule and using the optimal affine fee decreases monotonically in [n.sub.c], and the two converge as the number of Cournot sellers gets large.

The findings in Section 2 are summarized below.

Assume that the demand functions for sellers on the platform belong to the generalized Pareto class with [lambda] > 0 and [sigma] < 2 and that for each good c there are [n.sub.c] [greater than or equal to] 1 identical sellers that set quantities. Then we have the following results:

(i) the platform obtains a higher profit using the Bertrand affine fee than if it sets the optimal per-transaction fee for each good;

(ii) if sellers face constant elasticity demand ([sigma] = 1 + 1/[lambda] and [lambda] > 1) and d = 0, the Bertrand affine fee is the optimal affine fee schedule;

(iii) if sellers face exponential demand ([sigma] = 1), [lambda] > 1, and d = 0, the Bertrand affine fee can recover more than 98.5 percent of the profit under the optimal affine fee schedule;

(iv) if sellers face linear demand ([sigma] = 0), [lambda] > 0.5, and d =0, the Bertrand affine fee can recover more than 97.5 percent of the profit under the optimal affine fee schedule.

3. A QUANTITATIVE EXERCISE

Finally, we may consider the general case in which d > 0 and compare the platform's profit from the Bertrand affine fee (12) with its profit from the optimal fee schedules, including nonlinear ones. This exercise was carried out in detail in Wang and Wright (2017), and we summarize the findings here.

Once we allow for a nonlinear fee schedule, the optimal fee schedule will depend on the distribution of goods G(c). This is also true for the optimal affine fee schedule once we allow d > 0. Therefore, to proceed, one needs to assume some realistic distribution for c and calculate the profitability of different fee schedules numerically. Wang and Wright (2017) use the distribution based on fitting a log-normal distribution to the actual distribution of sales obtained from sales ranks of DVDs sold on Amazon. (9) It is assumed that sellers face constant elasticity demand, and d = 1.35 and [sigma] = 1.15 so that the calibrated Bertrand fee schedule matches the actual fee schedule used by Amazon for DVDs (which is $1.35+15 percent). Sellers are assumed to be monopolists (i.e., [n.sub.c] = 1). (10)

With these assumptions, it is found that the platform obtains a profit of 0.383 with a fixed per-transaction fee (i.e., without any price discrimination). (11) If the platform could observe each different good sold by the sellers, it could do better setting the per-transaction fee that is optimal for each good c. This increases its profit by 17.7 percent to 0.457, which represents the gain due to price discrimination. Moreover, the benefits of price discrimination can be obtained by using the Bertrand fee schedule, which does not require any information on the values of c and has the added benefit of mitigating double marginalization. Indeed, the platform can increase its profit to 0.537, or a further 16.3 percent, by using the Bertrand fee schedule. Taking into account that sellers are monopolists and the particular distribution of c, the platform can increase its profit by a further 1.5 percent by moving to the optimal affine fee schedule.

Finally, Wang and Wright (2017) obtain the platform's profit for the optimal nonlinear fee schedule, which comes from solving for the optimal polynomial fee schedule of degree k, starting with k = 1 (the affine fee schedule) and considering higher and higher k until the platform's profit no longer increases. Compared with the optimal affine fee schedule, moving to the optimal nonlinear fee schedule only increases the platform's profit by a further 1.3 percent. The results are summarized in Table 1. The table also shows the results from repeating the exercise with linear demand.

Quantitatively, the results show that the platform loses little from restricting fee schedules to affine fee schedules or indeed the Bertrand affine fees. In the constant-elasticity demand case, price discrimination and double marginalization have similar quantitative effects on justifying the platform's use of the Bertrand affine fee: using the Bertrand affine fee increases platform's profit by 33.8 percent compared with using a fixed per-transaction fee, where 17.7 percent comes from price discrimination and 16.3 percent comes from mitigating double marginalization. In the linear demand case, price discrimination's effect turns out higher than double marginalization: using the Bertrand affine fee increases platform's profit by 49.6 percent compared with using a fixed per-transaction fee, where 42.4 percent comes from price discrimination and 7.2 percent comes from mitigating double marginalization.

4. CONCLUSION

In this article, we review two alternative explanations for why platforms use ad valorem fees: double marginalization versus price discrimination. Using a generalized framework, we show that the two theories complement each other in explaining this pricing puzzle, and their relative importance is quantified in a calibration exercise.

Our findings set the stage for normative analysis. Given that platforms do not incur significant costs that vary with transaction prices, there have been policy concerns regarding their use of ad valorem fees. Using the framework discussed in this article, one could evaluate the welfare consequences of regulating platforms' use of ad valorem fees. In fact, Shy and Wang (2011) and Wang and Wright (forthcoming) have shown that banning platforms' use of ad valorem fees tends to reduce social welfare in the presence of double marginalization or price discrimination. Therefore, caution ought to be taken when policymakers consider intervening in platforms' use of ad valorem pricing.

REFERENCES

Aguirre, Inaki, Simon Cowan, and John Vickers. 2010. "Monopoly Price Discrimination and Demand Curvature." American Economic Review 100 (September): 1601-15.

Bulow, Jeremy, and Paul Pfleiderer. 1983. "A Note on the Effect of Cost Changes on Prices." Journal of Political Economy 91 (February): 182-85.

Bulow, Jeremy, and Paul Klemperer. 2012. "Regulated Prices, Rent Seeking, and Consumer Surplus." Journal of Political Economy 120 (February): 160-86.

Foros, Oystein, Hans Jarle Kind, and Greg Shaffer. 2013. "Turning the Page on Business Formats for Digital Platforms: Does Apple's Agency Model Soften Competition?" Working Paper.

Gaudin, Germain, and Alexander White. 2014. "On the Antitrust Economics of the Electronic Books Industry." Dusseldorf Institute for Competition Economics Discussion Paper 147 (May).

Hagiu, Andrei, and Julian Wright. Forthcoming. "The Optimality of Ad Valorem Contracts." Management Science.

Johnson, Justin. 2017. "The Agency Model and MFN Clauses." Review of Economic Studies 84 (July): 1151-85.

Loertscher, Simon, and Andras Niedermayer. 2012. "Fee-setting Mechanisms: On Optimal Pricing by Intermediaries and Indirect Taxation." Governance and the Efficiency of Economic Systems Discussion Paper 434 (October).

Miao, Chun-Hui. 2013. "Do Card Users Benefit from the Use of Proportional Fees?" Review of Network Economics 12 (September): 323-41.

Muthers, Johannes, and Sebastian Wismer. 2013. "Why Do Platforms Charge Proportional Fees? Commitment and Seller Participation." Working Paper.

Shy, Oz, and Zhu Wang. 2011. "Why Do Payment Card Networks Charge Proportional Fees?" American Economic Review 101 (June): 1575-90.

Wang, Zhu and Julian Wright. 2017. "Ad Valorem Platform Fees, Indirect Taxes, and Efficient Price Discrimination." RAND Journal of Economics 48 (Summer): 467-84.

Wang, Zhu, and Julian Wright. Forthcoming. "Should Platforms Be Allowed to Charge Ad Valorem Fees?" Journal of Industrial Economics.

Weyl, Glen, and Michal Fabinger. 2013. "Pass-Through as an Economic Tool: Principles of Incidence under Imperfect Competition." Journal of Political Economy 121 (June): 528-83.

Research Department, Federal Reserve Bank of Richmond. Email: zhu.wang@rich.frb.org. I thank Eric LaRose, John Weinberg, Alexander Wolman, and Russell Wong for helpful comments. The views expressed are solely those of the author and do not necessarily reflect the views of the Federal Reserve Bank of Richmond or the Federal Reserve System.

(1) In the industrial organization literature, double marginalization refers to the phenomenon in which different firms at different vertical levels in the supply chain (e.g., upstream and downstream) have their respective market powers and apply their own markups in prices. For example, consider that a firm with market power buys an input from another firm that also has market power. The producer of the input will price above marginal cost when it sells the input to the other firm, who will then price above marginal cost again when they sell the final product that uses the input. This means the input is being marked up above marginal cost twice, which is called double marginalization.

(2) In a similar vein, several studies (e.g., Foros et al. 2013; Gaudin and White 2014; and Johnson 2017) have explored the advantages of the so-called agency model used by mass retailers such as Amazon, where the retailer lets suppliers (i.e., sellers) set final prices and receive a share of the revenue, which is equivalent to using a percentage fee. Like Shy and Wang (2011), they also show that the revenue sharing used in the agency model has the advantage of mitigating double marginalization.

(3) A higher c (i.e., higher cost) implies in the model that the gains from trade are higher in expectation (due to the multiplicative connection between c and b). One interpretation for this specification, as shown in Wang and Wright (2017), is that such a platform reduces trading frictions, and as a result the value to buyers of using the platform (so that they can avoid the loss of using a less-efficient trade intermediary) is proportional to the cost or price of the goods traded. Note that the assumption b > 0 is an innocuous normalization because consumers whose valuation for a product is less than its cost can be ignored.

(4) Cournot competition refers to an oligopoly market structure in which multiple firms producing a homogeneous product compete by choosing outputs independently and simultaneously. Assuming a fixed number of Cournot sellers, Shy and Wang (2011) show that the platform earns a higher profit by using a proportional fee than a per-transaction fee. Miao (2013) shows that the result continues to hold under free entry of sellers.

(5) Bertrand competition is a model of competition in which multiple firms producing a homogeneous product compete by setting prices simultaneously and consumers want to buy everything from a firm with a lower price.

(6) This class of demands has been considered by Bulow and Pfleiderer (1983), Aguirre et al. (2010), Bulow and Klemperer (2012), and Weyl and Fabinger (2013), among others.

(7) With this model setting, the optimal platform fee schedule is affine and does not condition on c if and only if the distribution of buyers' benefits F is the generalized Pareto distribution. See Wang and Wright (2017) for a detailed proof.

(8) If d > 0 the results will depend on the distribution of c. We discuss this case in Section 3.

(9) Using a web robot, Wang and Wright (2017) collected data on every DVD listed under "Movies & TV" on Amazon's marketplace in January 2014. Given shipping fees are often not included in the listed price, the focus is on the items where the listed price included free shipping, resulting in a sample with 191,280 distinct items. The data collected include the title, unique ASIN number identifying the DVD, the price, and sales rank of each DVD. Given that the sale of each DVD is not directly observable, a power law is used to infer it from the sales rank data, so [Q.sub.c] = a[R.sub.c.sup.-[phi]], where [Q.sub.c] is the estimated sale of an item c and [R.sub.c] is the corresponding sales rank. The parameter a does not affect the analysis, so it is normalized as a = 1. It is assumed [phi] = 1.7, which is the number suggested by an experimental study on DVD sales on Amazon.

(10) This quantitative exercise evaluates how well the Bertrand affine fee performs under Cournot sellers. Assuming monopoly sellers is the most extreme alternative to Bertrand competition, so it provides the most conservative results.

(11) Note that because the sales of DVDs are inferred from data on sales ranks with scale normalized, only the relative (but not the absolute) value of the platform profit is meaningful for comparison.

COPYRIGHT 2018 Federal Reserve Bank of Richmond
No portion of this article can be reproduced without the express written permission from the copyright holder.