The Judicial Hierarchy

Crucial to understanding the behavior of judges and the outputs of courts is the institutional context in which they operate. One key component of courts’ institutional structure is that the judiciary is organized as a hierarchy, which creates both problems and opportunities for judges. For instance, one problem for judges at the top of a hierarchy is how to best exercise oversight of lower court judges, whose decisions are often not reviewed by higher courts. One opportunity is that higher courts can reverse errors by lower courts; another is that, as new legal issues emerge, hierarchy provides opportunities for judges to learn from one another.

Scholars of the judicial hierarchy have pursued two broad approaches. The “team perspective” begins by assuming that all judges in a hierarchy have the same values or principles, and thus care only about achieving the correct outcome in a given case. In the team approach, the key problem in adjudication is informational. All judges agree on the correct outcome of a case, conditional on understanding the relevant facts, but may lack this understanding due to resource constraints or informational advantages enjoyed by litigants. The agency approach, by contrast assumes that judges in the hierarchy have differing preferences, and the key problem is how higher courts can ensure compliance by lower courts.

Despite these different foundational assumptions, the team and agency approaches have both been employed successfully to study core questions regarding the judicial hierarchy, including: why hierarchy exists; how higher courts can best oversee lower courts; how learning takes place both within and across the levels of the judiciary; and how collegiality influences judicial decision-making. Yet, while our understanding of the judicial hierarchy has greatly increased in recent years, many questions remain, such as how judges learn and how to measure legal doctrine.

Despite Alexander Hamilton’s well-worn characterization of the judiciary as the “least dangerous branch,” there can be little doubt today that courts exercise tremendous power, both in the United States and (increasingly) globally. As with any institution, key to understanding the behavior of judges and the outputs of courts is the institutional context in which they operate. Judiciaries are organized in a hierarchical structure, which provides judges with both potential problems and potential opportunities.

In this article, the literature on several features of the judicial hierarchy will be reviewed and evaluated, with an emphasis on the importance of hierarchical organization of courts in common law systems—namely the United States.1 To emphasize the cutting edge in research on the hierarchy, the focus is mainly on studies written after the publication of Kornhauser’s excellent review in 1999.2 In addition to both classic and recent work by political scientists, related and important work by legal scholars and economists is discussed. Recent theoretical developments in the study of the judicial hierarchy are covered, along with important empirical papers that speak to these developments. Some potential paths for moving forward in the empirical study of the hierarchy are suggested.

Discussed are the basic structure of judicial hierarchy and how the hierarchy helps higher courts oversee lower courts. “Team Models and the Error of Correction Function of Hierarchy” discusses the “team” approach to understanding the organization of hierarchy, and how the appellate system can mitigate the problem of incorrect decisions by lower courts. “Agency Models and the Problem of Noncompliance” introduces agency models of the hierarchy, which employ principal-agent analysis to understand how strategic oversight can ensure consistency in decision making. “Learning in the Judicial Hierarchy” discusses models in which higher court judges learn from lower court judges to help select legal rules in a new area of the law. “The Existence, Nature, and Scope of Precedent” discusses the importance of precedents and the structure of legal rules in the hierarchy. “The Importance of Collegiality” discusses the importance and effects of having multimember appellate courts decide cases. “Conclusion: Future Theoretical and Empirical Avenues” presents discusses fruitful avenues for theoretical and empirical development in the study of the judicial hierarchy.

The Structure of Judicial Hierarchy

In theory, judiciaries could be arranged non-hierarchically in a single tier. In practice, judicial organizations in common law systems are always arranged in a hierarchy. To understand why this is so, it is useful to begin with Posner’s (1985, p. 3) succinct definition of what courts are and what they do: “A court is a public body for resolving disputes in accordance with law. In every case the court must determine what the facts are and what their legal significance is. If the court determines their legal significance by applying an existing rule of law unchanged, it is engaged in pure dispute resolution. But if to resolve the dispute the court must create a new rule or modify an old one, that is law creation.”

The distinction between dispute resolution and law creation is critical for understanding the typical arrangement of a judicial hierarchy. At the bottom of the hierarchy sit trial courts, which are tasked with fact finding and dispute resolution. Above trial courts sit at least one and usually two levels of appellate courts, with a supreme court at the top and intermediate appellate courts (if they exist) between the trial and supreme courts.3 The appellate courts are tasked with both reviewing the decisions of the trial courts and with law creation. While the classic “fact/law” dichotomy is rigid more in theory than in fact, in general trial courts are responsible for determining the facts of a case, and appellate courts are responsible for reviewing the legal determinations of trial courts (and for law creation).4

Critical to the operation of a judicial hierarchy in common law systems is stare decisis, the doctrine that judges should be guided by legal precedent in reaching their decisions. There are two types of stare decisis to consider. Horizontal stare decisis exists when judges follow the prior decisions made by courts at the same level of the hierarchy. Vertical stare decisis exists when courts are bound by the decisions of courts above them. In the U.S. federal system, there exists strict vertical stare decisis. By contrast, horizontal stare decisis varies by the level of the hierarchy. District court judges are not constrained by horizontal stare decisis, judges on the Courts of Appeals are bound by prior precedents issued by judges within a given circuit court, but are not bound by the decisions of judges in other circuits. Finally, while in theory the U.S. Supreme Court is bound by horizontal stare decisis, both the form of this constraint from a normative standpoint, as well as whether the justices are constrained in practice, is heavily contested (see e.g., Segal & Spaeth, 1996; Richards & Kritzer, 2002; Friedman, 2006).

Team Models and the Error Correction Function of Hierarchy

Given the ubiquity of these institutional arrangements, a natural question is: What advantages do these structures provide for the operation of a judicial hierarchy? Some answers come from the “team perspective,” which begins by assuming that all judges in a hierarchy have the same values or principles, and thus care only about achieving the correct outcome in a given case. The dilemma is how best to organize the hierarchy to minimize the overall number of incorrect outcomes. In contrast, the “agency perspective” begins by viewing judges as political actors with (potentially) differing preferences over legal rules and case outcomes; the dilemma then is how to minimize “noncompliance” of higher court rules by lower court judges. Rather than being viewed as competing models, the team and agency perspectives have comparative advantages in asking different questions and providing different answers about the judicial hierarchy.5

In the team approach, the key problem in adjudication is informational. All judges agree on the correct outcome of a case, conditional on understanding the relevant facts, but may lack this understanding due to resource constraints or informational advantages enjoyed by litigants. A judge’s ability to reach the correct outcome is a function of the amount of effort she exerts in a given case. The goal then for the team is to maximize the number of correct outcomes across the overall distribution of cases, given these resource and informational constraints.

The informal model presented in Kornhauser (1995) shows that a hierarchy has several advantages over a flat organization for achieving this goal. First, hierarchy allows for specialization of labor—some judges can hear trials, others appeals. Second, trial court judges have to consult only cases decided by courts above them—that is, appellate courts—which means fewer wasted resources used in scanning the set of cases for precedential value. Third, hierarchy allows for the shifting of judicial resources toward important precedential cases; namely, toward appellate judges. From these advantages, it follows that judges in the hierarchy should follow strict vertical stare decisis, because appellate judges will make higher quality decisions and thus be more likely to reach the correct legal judgment than a trial court judge. Horizontal stare decisis, on the other hand, should not be followed at the trial level, because having independent decisions benefits the appellate court judges who establish precedents, and because trial court judges invest less time in their decision-making. Conversely, appellate judges should employ horizontal stare decisis, because the resources spent on reconsidering a prior appellate judge’s decision would be better allocated to cases of first impression.

Whereas Kornhauser (1995) suggests a role for both rule creation and dispute resolution in his team approach, other papers more explicitly focus on the importance of error correction in the hierarchy. That is, assuming some legal rule existed, how can hierarchy reduce the number of errors in a judicial system? A key piece of the puzzle is that losing litigants have the choice of appealing a trial court decision; moreover, litigants are likely to possess superior understanding of the underlying fact patterns of a case, and thus be able to detect “mistakes” by trial court judges. An appeal is thus a costly signal that can, under some circumstances, lead an appellate court to believe that the trial court’s decision should be reversed. In the first such mode of error correction, Shavell (1995) showed that as long as the probability that an incorrect trial court decision will be reversed on appeal is greater than the probability that a correct decision will be reversed, only wrongly decided cases will be appealed. Shavell also demonstrates that relying on litigants to determine the supply of appeals is superior to a strategy of randomly reviewing trial court decisions.

Building off Shavell (1995), Cameron and Kornhauser (2006) present a formal theory in which a plaintiff and a defendant have asymmetric information before a trial court (which precludes a settlement); at the end of the trial, both litigants become fully informed about whether the defendant is liable, but the judges in the hierarchy have only a probabilistic understanding of this. The authors show that when there exists only a single appellate court (i.e., a two-tiered hierarchy), the “good” equilibrium of only incorrect decisions being appealed can be fragile. Specifically, it requires a sufficient level of accuracy by the trial courts, which is in turn a function of their caseload. If the trial courts are overburdened relative to their resources, the appeals court may become overwhelmed by “correct losers”—losing litigants who should have lost, based on the facts in the case—seeking to have their decisions overturned on appeals, meaning that both wrongly and rightly decided cases will be appealed. However, Cameron and Kornhauser (2006) show that—somewhat remarkably—all cases are resolved correctly once a third tier is added. With a second appellate court, correct losers do not have an incentive to appeal the trial court’s decision, because even if the intermediate appellate court ruled in their favor, the supreme appellate court would simply reverse. Thus, in equilibrium, only incorrect losers appeal initially, and their cases are (correctly) reversed by the intermediate court.6

There are two very useful features of the team perspective in general and Cameron and Kornhauser (2006) in specific. First, whereas the importance of litigants has been well studied in the law and economics literature, political scientists who have studied the judicial hierarchy have tended to assume away the importance of litigants.7 Second, the main result in Cameron and Kornhauser (2006) helps explain two pervasive institutional features of the judicial hierarchy. First, the model shows that a judicial organization never needs more than three tiers, which accords nicely with the empirical reality that most hierarchies, including the U.S. federal system, employ three tiers. Second, the number of cases heard by each successive level of the hierarchy decreases—trial courts hear the bulk of cases; intermediate appellate courts hear some proportion of appeals from those decisions; and supreme courts generally hear few very cases.8 In the model, the decrease is driven solely by the selection of appeals for litigants. Also important for considering the caseload of supreme courts is that most have discretionary dockets—such courts tend to hear only a small fraction of cases that are appealed to them (Eisenberg & Miller, 2009, p. 1459). The importance of this discretion is considered in agency models.

Agency Models and the Problem of Noncompliance

The point of departure for agency models is the classic principal-agent framework, which analyzes the dilemma that exists within most firms: How does the boss of a firm motivate and organize his employees to do what he wants, given that they may want to “shirk” and act in accordance with their own preferences, should the two conflict (Moe, 1984)? Higher courts (particularly the Supreme Court) are envisioned as the principal, and the lower courts as agents. The notion of error is thus very different from the team approach. In the latter, errors are simply mistakes that arise from imperfect information about a case. In the former, errors are willful decisions by a lower court judge to apply her preferred rule—that is, to engage in noncompliance—rather than implement the preferred rule of a higher court.

As such, the higher court faces a delegation problem: given time and resource constraints, it is impossible for a higher court to make every decision that reaches the judiciary’s door—just as it is impossible for the CEO of a large organization to make all of its decisions. Accordingly, the higher court “delegates” most of the work to lower court judges. As Baker and Kim (2012, p. 331) note, “for the vast majority of litigants, it is the decisions of the lower courts that give meaning and force to the pronouncements of the Supreme Court.” The problem is that the justices do not possess the traditional tools superiors in firms can use to incentivize their subordinates—namely the ability to reward or sanction them with things like higher salary or threats of termination, respectively (Songer, Segal, & Cameron, 1994). Given that the Supreme Court hears only a tiny fraction of the cases appealed to it each year, the Court’s enforcement problem would appear to be severe. Yet this does not appear to be the case—there is a long empirical literature in political science on compliance, and studies generally find widespread compliance by lower courts (Gruhl, 1980; Songer & Sheehan, 1990; Songer, Segal, & Cameron, 1994; Benesh & Reddick, 2002).9

What then explains the apparent discrepancy between the Supreme Court’s lack of formal tools and general compliance by the lower courts? Agency models have provided a range of answers. Such models usually assume that lower court judges suffer a utility loss from being reversed by a higher court; the threat of being reversed may compel compliance by lower court judges.10 In one of the earliest agency models, Cameron (1993) demonstrated that a supreme court could mitigate noncompliance among lower courts by placing them “in competition” with each other, with the higher court targeting the most outlying courts. McNollgast (1995) presents a similar but more complicated model in which the Supreme Court first establishes a range of acceptable decisions (around its ideal point); the lower courts decide whether to comply by adhering to this range; then the Supreme Court randomly audits a subset of the lower court decisions, and reverses those that are noncompliant. While these models rest on very strong assumptions, they nicely show that relatively simple review strategies by a Supreme Court with a discretionary docket, combined with placing the lower courts in what we might call structured competition, can reduce noncompliance in a hierarchy—even though the Court lacks formal tools to punish “disobedient” lower court judges.

As in the team approach, informational asymmetries play an important role in agency models. The form of asymmetry, however, differs. In team models, litigants usually enjoy an informational advantage over the judges. Agency theories, by contrast, typically assume away litigants (but see Songer, Cameron, & Segal, 1995); instead, information asymmetries arise within the judiciary, with judges at the lower level possessing greater information about a case by virtue of having adjudicated. For example, in the influential model presented in Cameron, Segal, and Songer (2000) of strategic auditing by an appellate court with a discretionary docket, a lower court perfectly learns the case facts, some of which are publicly observable, and some of which are not. A higher court can learn the “private” facts only if it pays a cost to review the case. This informational asymmetry allows the lower court to get away with noncompliance in some cases where it would comply in the absence of private information. The higher court, in turn, chooses to review cases where noncompliance is most likely—for example, liberal rulings by very liberal lower courts (see also Spitzer & Talley, 2000). Cameron, Segal, and Songer (2000) present evidence in support of their theory using data from search-and-seizure cases. Moving beyond the Supreme Court, Giles, Walker, and Zorn (2006), Clark (2009), and Beim and Kastellec (2014) extend the logic of the theoretical predictions in Cameron, Segal, and Songer (2000) to full circuits’ review of three-judge panels on the U.S. Courts of Appeals via the en banc process—each finds strong support for the theory of strategic auditing.11

While most of the literature on agency models has focused on the relationship of the upper levels of the hierarchy, agency theory also applies to the Court of Appeals’ supervision of federal district court decisions (Haire, Lindquist, & Songer, 2003; Boyd & Spriggs, 2009). Yet, the fact/law distinction changes the nature of the principal-agent relationship. Because the Court of Appeals has mandatory jurisdiction, every trial court decision that is appealed by the losing litigant will be reheard, providing the appellate court with the opportunity to reverse the district courts. However, because appellate courts generally review only legal determinations by district courts, and not determinations of fact, trial court judges may be able to insulate their decisions from review by “shading” the case facts in such a way that enables them to follow the appellate court’s precedent, while still reaching the trial court judge’s preferred disposition. Gennaioli and Shleifer (2008) present such a model of what they call “judicial fact discretion,” and show that the practice can have deleterious effects on the functioning of the appellate process.12 While anecdotes of judicial fact discretion abound (see Gennaioli & Shleifer, 2008, pp. 5–6), empirical studies of the phenomenon are rarer. One exception is Schanzenbach and Tiller (2007), who present evidence suggesting that federal district court judges engage in fact discretion in order to achieve their preferred outcomes in sentencing decisions, especially when their preferences differ from the reviewing Court of Appeals (see also Fischman & Schanzenbach, 2011, 2012).

Auditing for Policy in a Case-Space Setting

One limitation of auditing models is that they tend to focus on the error correction function of appellate courts. For some courts—such as intermediate appellate courts sitting en banc—this focus makes sense, because this procedure is designed to allow the full court to overturn a panel’s decision if it is “outlying.” But supreme courts—particularly the U.S. Supreme Court—tend to focus mostly on law creation, that is, on setting new precedents or clarifying existing law. The early review models of Cameron (1993) and McNollgast (1995) implicitly focused on the policy-making functions of the Supreme Court—but did so in a way that ignored the key institutional fact that courts make policy via the adjudication of cases in appellate court adjudication, and not out of whole cloth, like legislatures.13

Carrubba and Clark (2012) bridge this gap by developing a model where lower courts issue both a disposition (i.e., which party wins and which loses) and a policy. A higher court with a discretionary docket can observe the disposition and the policy set by the lower court, but not the case facts—unless it pays a cost and reviews the case. In equilibrium, the lower court can induce the higher court not to review (and potentially reverse) its decision by either issuing a decision that the higher court favors but the lower court does not, or by writing a rule that is closer to the higher court’s ideal point. Clark and Carrubba (2012) extend the model to allow lower court judges to further “buy off” the higher court by writing a higher quality opinion, because all judges favor higher quality opinions (Lax & Cameron, 2007). Thus, in these models, lower and higher court judges seek to implement both their preferred rules and dispositions—the informational asymmetry over the case facts means that sometimes lower court will be able to “shirk” and make policies and decisions that the higher court would not if it were deciding the case.

Learning in the Judicial Hierarchy

One striking feature of the models discussed previously—in both the team and agency perspective—is that they tend to consider a higher court overseeing a single lower court in a dyadic relationship.14 Importantly, however, the Supreme Court oversees many lower courts, with judges distributed across several jurisdictions throughout the United States. This institutional reality casts the Supreme Court’s position in a new light, and points toward two important questions. First, how can higher courts learn from multiple lower courts? (Of course, learning can occur in a dyadic relationship, but having multiple sources of information increases the opportunity for learning). Second, how can a higher court craft legal rules in a manner that lower courts across the hierarchy can apply in a consistent manner? Whereas earlier models tended to study the interactions of higher and lower courts assuming some fixed rule in place, several recent studies have attempted to ask both how the Supreme Court learns from lower courts and the manner in which it uses precedent and crafts new rules.

Opportunities for learning exist throughout a common-law system, given that the common law largely consists of judge-made law. In particular, a number of recent studies have examined how legal rules are established in cases of first impression in the federal judiciary—that is, those where there is no clear existing Supreme Court precedent. New legal issues, in the parlance of Perry (1991), “percolate” through the lower courts, allowing judges with different viewpoints and backgrounds to weigh in on preferred legal rules for these issues. For example, one question the model presented in Carrubba and Clark (2012) raises is why the Supreme Court, rather than be mollified by a lower court decision that accommodates the Supreme Court’s preferred rule, would not simply decide to review that case and extend that rule to cover to all federal courts in the country. This would seem to be especially true of high-quality opinions by lower court judges that the justices could “borrow” from in writing their opinions.15

In one of the most comprehensive studies of judicial learning, Klein (2002) examined how judges on the Courts of Appeals make law in cases of first impression. He found that judges were often persuaded by the reasoning of their colleagues in another circuit, leading judges across circuits to agree on the proper rule more often than not. Westerland, Segal, Epstein, Cameron, and Comparato (2010) reach a complementary conclusion in their comprehensive study of circuit courts’ treatment of Supreme Court precedent. They find as the number of positive treatments of an existing Supreme Court precedent increases within a given circuit, future panels are more likely to treat the precedent favorably as well (a symmetric finding holds with respect to negative treatments of precedents). Relatedly, a number of papers have examined whether “herding behavior” across circuits occurs—that is, judges following the decisions of earlier circuits rather than relying on their independent information (Talley, 1999; Daughety & Reinganum, 1999; Baker & Malani, 2015).

Vertical learning is also an important part of hierarchical judicial politics. The Supreme Court, of course, can choose to weigh in on a new legal issue and set a national legal policy. As part of the voluminous literature on the Court’s case selection (which dates back at least to Tanenhaus, Schick, Muraskin, & Rosen, 1963), scholars have asked how the justices use the certiorari process to select the best cases to make legal policy. In his seminal book based on interviews with the justices and their clerks, Perry (1991) identified several strategies, including searching for cases that serve as “good vehicles” to make law and avoiding those with “bad facts.”

An additional criterion that Perry identifies—and perhaps the key one political scientists have focused on—is the presence of inter-circuit conflict, which substantially increases the likelihood that the Supreme Court will review a case (see e.g., Ulmer, 1984; Caldeira & Wright, 1988). One reason for this relationship is that conflict imposes costs on the legal system, given that the law means one thing in one circuit and another in a different circuit. Beyond this consideration, however, conflict provides the justices with an opportunity to learn about the desired legal rule in a new issue. Returning to Klein’s (2002) study, while judges sometimes are persuaded by colleagues in other circuits, other times they disagree with their fellow judges’ reasoning, and make the independent judgement to decide in the opposite direction, thereby creating a conflict (if one does not already exist). If multiple circuit courts weigh in on an issue, the justices can aggregate the decisions of the circuits when deciding which side to come down on. Indeed, Lindquist and Klein (2006) show that when the justices review circuit conflicts, they are more likely to come down on the side of the issue that was favored by a majority of the circuits, suggesting that the justices are engaging in vertical learning.

The nature of this learning has implications for how the justices should decide when to issue a new rule, and how the justices might balance obtaining information against the costs of waiting to establish a key precedent. Clark and Kastellec (2013) present a model of certiorari in which the justices learn from independent decisions by lower courts, which choose to implement one of two possible rules. Conflict, should it arise, is costly to the justices. The justices thus face an optimal stopping problem. Allowing successive lower courts to weigh in on an issue provides the Supreme Court with more information and increases the likelihood that the Court will select the best rule. However, the existence of conflict is costly to the Supreme Court, and so the justices must weigh this cost against the benefit of further information from future lower court decisions. One prediction from the model is that the Court should be more likely to end a conflict immediately—rather than waiting for additional courts to weigh in—when a conflict emerges after several lower courts have already weighed in on a new legal issue. The authors present evidence in support of this result.

One limitation in this model is that the lower courts’ behavior is unmodeled. In contrast, Beim (2016) presents a theory in which the Supreme Court can learn from the types of arguments raised in lower court cases, which is an important component of appellate court decision making (Johnson, Wahlbeck, & Spriggs, 2006). In the theory, the Supreme Court chooses one of three doctrines—liberal, moderate, or conservative—in the presence of uncertainty about which doctrine is best. Two lower courts make decisions, but the Supreme Court can review at most one of these decisions. The Supreme Court cannot directly observe the arguments raised by the litigants but can make inferences about those arguments, based on the lower courts’ decisions. In equilibrium, the Supreme Court is most likely to review cases from the side of the conflict it eventually rules against, because these cases are most informative. This finding accords nicely with the empirical reality that the Supreme Court is more likely to reverse than affirm the cases it chooses to decide.

The Existence, Nature, and Scope of Precedent

Once a court decides to issue (or modify) a legal rule, it must decide on the form and structure of that rule. The importance of stare decisis in common law systems means that judges make and implement rules through the introduction and interpretation of legal precedents. One first-order question here is why would judges with preferences over legal policy choose to abide by a system of stare decisis in the first place? One potential answer comes from team theory—judges with shared conceptions of the law will not mind following the previous decisions of like-minded judges. But this is not so interesting from an analytical perspective. As Jerome Frank put it, “Stare decisis has no bite when it means merely that a court adheres to a precedent that it considers correct. It is significant only when a court feels constrained to stick to a former ruling although the Court has come to regard it as unwise or unjust” (Foo v. Shaugnessy, 234 F.2d 715 [1955]).

Accordingly, scholars have looked for theories in which judges endogenously benefit from a system of stare decisis—that is, “judges care about precedent because they care about policy” (Bueno de Mesquita & Stephenson, 2002, p. 755). One set of related papers argues that stare decisis emerges from an equilibrium in a repeated game between judges with divergent preferences (Rasmusen, 1994; O’Hara, 1993; Cameron & Kornhauser, 2015). In such models, Judge A follows precedents by Judge B that Judge A does not like, in exchange for the assurance that Judge B will follow her precedents, when Judge B would in fact prefer to break with precedent and rule in accordance with her own preferences.

A second explanation focuses on the informational value of precedent—and the costs of breaking precedent. In the model presented in De Mesquita and Stephenson (2002) of higher court–lower court communication, each additional decision in a line of precedent decreases the variance of a trial court’s decision, given that appellate court decisions provide only a noisy signal to the trial court of the appellate court’s preferred legal rule. Under some circumstances, an appellate court judge will be incentivized to maintain an existing precedent rather than break precedent and move the legal rule to her ideal point, because the latter strategy increases the chance that the trial court will make a decision that is quite distant from the appellate court’s ideal rule. In a related article, Gennaioli and Shleifer (2007) present a theory in which judges can distinguish existing precedents—but at some cost. The authors find that, in the presence of a hierarchy with ideologically polarized judges, the ability of judges to distinguish precedents imposes some costs on the legal system, as judges are able to implement their own “biased” preferences. However, there is a benefit, as continued precedential distinctions by judges with diverse preferences improves the quality of the law by making legal rules more precise.

Rules and Standards

Given the importance of stare decisis in the hierarchy, combined with the fact that most cases are adjudicated by lower courts, a key decision for a higher court when setting or modifying a legal rule is how much flexibility to give lower courts in applying a precedent. This dilemma is often modeled as a choice over “rules versus standards,” a question that has long been debated by legal academics (see e.g., Kaplow, 1992; Sullivan, 1992). A rule is more rigid in that it carefully prescribes the correct mapping between a set of case facts and the outcome in that case. For example, Miranda v. Arizona (384 U.S. 436) created a bright-line rule that required the police to explicitly warn suspects about their right to remain silent. A standard, on the other hand, allows for more discretion in the interpretation of the mapping between facts and outcomes. The Miranda warnings replaced a “totality of the circumstances” test in which lower courts were allowed to weigh a variety of factors in order to determine whether a confession was legally admissible.

With respect to hierarchical control, the choice of rules versus standards presents a policy-making higher court with a tradeoff. Rules mitigate the problem of lower court judges following their own preferred rule instead of that of the higher court, but at the cost of preventing lower courts from taking into account circumstances under which the bright-line rule leads to the wrong outcome from the perspective of the higher court, given those circumstances. Standards allow for such flexibility, but at the cost of potentially enabling greater noncompliance. Lax (2012) presents a model in which a higher court chooses a doctrine in an area of the law with two relevant case facts, where one is readily observable to the higher court but the other can only be discerned imperfectly (or has a subjective component), unless the higher court reviews a case. The higher court must choose between a bright-line rule that divides cases on the observable dimension and ignores the unobservable dimension, and a standard that incorporates both dimensions. Intuitively, standards are more appealing when the lower courts are more likely to share the preferences of the higher court. Lax also shows that the higher court is more likely to choose a standard as the transparency of the unobservable case fact increases, as the sensitivity of the higher court’s preferred rule to the unobservable case fact increases, and as the salience of the issue area increases. Jacobi and Tiller (2007) present a similar model in which a standard comprises a range of acceptable policies in a two-dimensional policy space. In addition to considering the likelihood of agreement between higher and lower courts, this model also allows for higher courts to have preferences that depend on litigant status, as well as the choice of rules versus standards over procedural dimensions.16

While valuable, these models have a few limitations: they do not explicitly model lower court behavior, they assume a one-shot game, and they assume the relevant case facts (i.e., the dimensions) are fixed. Baker and Kim (2012) develop a theory that explicitly includes lower courts in a repeated game where a new case factor may emerge. In the model, the Supreme Court announces a new precedent, which falls on a continuum ranging from a pure rule to a pure standard. Subsequently, cases arise containing a new case factor that the lower court believes should be included in the doctrine—the lower court, however, only probabilistically agrees with the Supreme Court that this case factor is relevant. The Supreme Court then decides whether to retain its original doctrine or alter it. The key insight of the model is that an equilibrium exists where the Supreme Court may oscillate between a rule-based doctrine and a standard-based doctrine as a function of the extent to which the lower courts mistakenly implement a standard set by the Supreme Court (based on its preferences on a new case factor). Baker and Kim (2012, pp. 350–352) argue that this oscillation can be seen in the Supreme Court’s doctrine regarding the Sixth Amendment’s confrontation clause in recent decades.17

The Scope of Precedent

A related but distinct question for higher courts when establishing precedents is determining what Kornhauser (1995, p. 1609) calls the scope of precedent: “the range of cases governed by a prior decision.” When judges apply precedents, they are effectively engaging in analogical reasoning—how closely tied are the facts in Case B to the facts in Case A, where a precedent was set? As Case B moves further away from Case A, the force of precedent diminishes. When issuing a doctrine, a higher court has a choice over the scope of precedent: it can choose a narrow scope in which only cases that are very near to the precedent-setting case are governed by that precedent (i.e., a results-based approach), or can rule more broadly by articulating a broader precedent that is intended to cover a wider range of cases (i.e., a rule or principle-based approach).

What, then, is the tradeoff between the two? Clark (2015) presents a model in which the Supreme Court strategically sets the precision of its doctrine, which determines how closely tied a given precedent is to the facts of the precedent-setting case. The tradeoff is that “narrowly-tailored” precedents—that is, ones with more precision—produce more certain outcomes by lower courts in cases near the precedent-setting case, but at the cost of allowing for greater variability in outcomes (from the perspective of the higher court’s preferred doctrines) when cases are distant from the original case; a more principle-based precedent with less precision accomplishes the reverse. The key explanatory variable in the model is the distribution of case facts across an issue area: when the precedent-setting case is more representative of the underlying distribution of case facts, the Court will write a more precise precedent, because it will cover a wider range of cases. Clark then analyzes the Court’s incentives to hear additional cases—as the Court’s willingness to hear additional cases increases, it becomes more likely to write narrower opinions, because it can craft its doctrine across a series of cases. Thus, areas of the law with less “investment’” should see more frequent use of principle-based decisions (see also Callander & Clark, 2016).18

Badawi and Baker (2015) develop a nuanced theory that embeds the decision about the scope of a precedent within a larger game of appellate court lawmaking and trial court implementation in a judicial hierarchy. One nice feature of this model is not only does it incorporate strategic trial court judges (and thus situates itself in the agency theory literature), but it allows for two types—“legalist judges” who attempt to apply existing precedent as best they can, versus “realist judges” with rule simply based on their own preferences. In their model, precedent takes the form of partitions in a case-space: cases above some threshold are treated one way (e.g., the defendant in a tort case is held liable), cases below a lower threshold are decided the other way, and there is uncertainty about cases in the “middle” of the two thresholds.19 One of the many interesting results in the paper is that an appellate court can use “dicta”—that is, language in an opinion that is not essential to reaching the holding in the case—in order to stretch the scope of a precedent, which it makes it more likely the legalist trial court will implement the appellate court’s preferred doctrine.20

The Importance of Collegiality

Recall that the number of cases decreases as one moves up the hierarchy—thus, caseload can be viewed as a pyramid. Accordingly, the number of judges assigned to each level decreases as well.21 However, the hierarchy also has a reverse pyramidical feature: the number of judges hearing cases increases in each tier. Trials are overseen by a single judge; intermediate appeals courts usually sit in panels of three (chosen among a larger group of judges assigned to a specific jurisdiction); finally, supreme courts usually comprise an odd number larger than five and usually sit together (i.e., en banc) in every case. Thus, appellate courts are always multimember, or collegial.

Why is this the case? According to Posner (1985, p. 12), having multimember courts is necessary to “reduce the power of any individual, ameliorate the consequences of low-quality appointments, obtain the benefits of collective decision deliberations, and in all these ways enhance the legitimacy and hence the authority of the court’s decisions.” Having more judges hear case at the highest level is thus desirable given the importance of the Supreme Court’s decisions. Yet, all of the theories discussed so far assume that both higher courts and lower appellate courts operate as unitary actors. In recent decades, scholars of the judicial hierarchy have moved away from this assumption to examine the effects of collegiality—in particular, how the existence of collegiality affects both the policy making of higher courts and compliance behavior of lower courts.

Collegiality on the Supreme Court

The Supreme Court communicates the scope of its precedent and choice of rules via judicial opinions. If the Court had a single judge, understanding what the Court wants would be relatively straightforward. However, because an opinion of the Court aggregates (somehow) the preferences of multiple justices, discerning the intent and scope of an opinion can be more complicated. Even more complicating is that, unlike legislatures, judicial opinions do two things: they first announce which party wins and which party loses (i.e., the “disposition”), and then announce a rule or rationale justifying this decision. From the perspective of the judicial hierarchy, it is the rule that we (and generally the justices as well) care about, because the rule governs behavior by future litigants and decision-making by lower court judges, whereas the disposition applies only to the parties in a given case.

This distinction has been considered in one of the key debates in judicial politics: who controls the majority opinion on the U.S. Supreme Court? There are two ways to approach this debate. First, there is the narrower question of where would we expect an opinion to be located in opinion space, given the Court’s institutional procedures. Second, and more relevant here, is how the effects of collegiality pertain to the relationship between the rules established by the Supreme Court, and the lower courts that must interpret those rules.

With respect to the first question, a natural starting point for discerning the location of the majority opinion is the median justice on the Court. Overlaying the canonical median voter theorem onto the Court (assuming that the justices have single-peaked preferences on a single dimension) results in the prediction that every opinion of the Court should be located exactly at the ideal point of the median justice (Martin, Quinn, & Epstein, 2004). Yet, this would mean that opinion authorship (and hence, opinion assignment) is irrelevant, something few observers of the Court would claim. This apparent discrepancy has led scholars to produce a range of theories that suggest the majority opinion is not located at the median justice.22 One offering comes from Hammond, Bonneau, and Sheehan (2005) who present an agenda-setting model in which the opinion author presents a candidate opinion, and the median justice endorses this opinion if it is closer to her ideal point than some exogenous status quo. This theory, however, assumes too little power for the median justice, who can in reality always write an opinion that cannot be beat, under the median voter theorem. Lax and Cameron (2007) present a theory in which authors can endogenously endow their opinions with quality—all justices value higher-quality opinions, because they are implemented with lower variance subsequent to the Court’s decision. The key result is that the location of the opinion usually falls between the author and the median justice, given that in equilibrium the author works hard enough to provide an opinion with sufficient quality in order to preempt any other justice from investing in writing an alternative opinion that the median justice would endorse.

Taking a different tack, Carrubba, Friedman, Martin, and Vanberg (2012) develop a theory that exploits the distinction between dispositions and rules. They begin by assuming that there exist some cases where the median justice will value the disposition sufficiently (say, in a death penalty case) that she will not be willing to “trade off” her dispositional vote to secure a rule closer to her ideal point. If so, then bargaining over the rule is restricted to the justices in the majority coalition. Hence, in such cases the opinion should be located at or near the median of the majority coalition (essentially, this theory shifts the prediction of the basic median voter theorem to the majority coalition). Using an innovative measure of opinion location based on citation patterns, Clark and Lauderdale (2010) show that the location of the median of the majority coalition does better at predicting opinion location than either the location of the median justice or the opinion author, providing support for Carrubba et al.’s theory.

In terms of their insights into the Supreme Court’s bargaining processes, these papers are invaluable. But, returning to the second implication of the debate over the location of the majority opinion, some puzzles emerge when these theories are considered in the larger context of the judicial hierarchy. First, because the lower court’s implementation of a Supreme Court opinion is unmodeled in Lax and Cameron (2007), it is unclear how higher quality reduces the variance of lower court decisions. Second, the structure of the model in Carrubba and Clark (2012) creates (as the authors recognize on p. 410) what we might call a “dynamic consistency problem.” Imagine a lower court applies a rule announced by the Court located at the median of the majority coalition. There will be a range of cases in which the Court’s stated rule implies dispositions that are disfavored by the median justice—and hence the lower court’s dispositional choice would be reversed if the Court reviewed the case. The lower court faces the choice of either faithfully applying the rule and risking reversal, or breaking with existing precedent by voting in accordance with the preference of the median justice.23 Both the agency and team models discussed suggest that such a game is not one that is beneficial for the operation of the hierarchy.

Returning to those agency models, the fact that the Supreme Court is multimember also has implications for theories of strategic auditing. Building off the auditing model presented in Cameron, Segal, and Songer (2000), Lax (2003) develops the theory that analyzes the Court’s “rule of four”—under which only the votes of a sub-majority of four justices are needed to grant cert and review a case. The model shows that the rule of four actually makes review of noncompliant decisions by lower courts more credible, and hence enhances the power of the median justice by reducing the incidence of noncompliance. This paper provides a nice example of the many non-obvious ways in which collegiality can intersect with the hierarchy to affect decision-making at each level.

Collegiality on Lower Courts

Intermediate appellate courts are also multimember, but the structure of collegiality differs from the Supreme Court in important ways. Whereas the same nine justices (within a natural court) hear every case on the Supreme Court, intermediate appellate cases are generally decided by rotating panels of judges—usually comprising three judges—chosen among the larger group of judges in the court’s jurisdiction. On the U.S. Courts of Appeals, for example, cases are heard by panels of three judges chosen among the judges of a given circuit, in a process that is effectively random.

This institutional structure has given rise to a large literature on “panel effects” on the Courts of Appeals, which examines how the voting behavior of judges varies depending on the characteristics of their colleagues in a given case. Perhaps the core finding in this literature, which dates back to Revesz (1997), is that the propensity of judges to vote liberally increases with every additional liberal judge that is added to a panel, and vice versa (see also Sunstein, Schkade, Ellman, & Sawicki, 2006; Kastellec, 2011b; Epstein, Landes, & Posner, 2013). A parallel line of inquiry has examined the effect of gender and racial diversity on panels—male judges are more likely to support female plaintiffs in employment discrimination and sex discrimination cases when they sit with even a single female colleague (Farhang & Wawro, 2004; Boyd, Epstein, & Martin, 2010), and white judges are more likely to vote in the pro-minority direction in voting rights and affirmative action cases when they sit with even a single African American colleague (Cox & Miles, 2008; Kastellec, 2013). These studies collectively demonstrate that having multimember panels instead of single judges hear intermediate appeals can lead to qualitatively different outcomes in a significant proportion of cases.

What explains the existence of panel effects? One set of explanations focuses on mechanisms that are internal to the panel. These include deliberation effects among the three judges on the panel (Sunstein et al., 2006); voting effects, where a judge may disagree with her colleagues in the panel majority, but rather than cast a dissenting vote (which may be a costly activity), she decides to go along with them (Fischman, 2011, 2015); and a “presence” effect, the presence of a judge who is different from the other two judges on some observable dimension (e.g., race) that causes them to approach a case differently (Kastellec, 2013).

From the perspective of the larger judicial hierarchy, it is more relevant to consider the external explanation for panel effects, which focuses on the relationship between the preferences of the judges on a three-judge panel and those of the reviewing court(s) above them. In their seminal paper, Cross and Tiller (1998) introduced a theory of judicial “whistleblowing,” which argued that compliance by a lower court comprising a majority of judges opposed to a higher court’s doctrine was more likely if the panel contained a third judge who was aligned with the higher court. This judge could “blow the whistle” to the higher court via a judicial dissent if the panel majority engaged in noncompliance. The threat of dissent, in turn, is highly credible, because dissent has long been shown to be highly predictive of review by courts with discretionary dockets, due to its informational value (see e.g., Tanenhaus, Schick, Muraskin, & Rosen, 1963; Songer, 1986; Perry, 1991; Caldeira, Wright, & Zorn, 1999). Cross and Tiller found support for their theory by examining the D.C. Circuit’s compliance with the Supreme Court’s Chevron doctrine. Kastellec (2007) formalized the key insight from the whistleblowing model: the random assignment of a judge with preferences aligned with a higher court can reduce the likelihood of noncompliance by a lower court, because a dissent would lower the cost of review by a higher court, making discretionary review and subsequent reversal more likely.

The literature on judicial whistleblowing has been extended in a number of theoretical and empirical directions. First, Kim (2008) and Kastellec (2011a) argued that the panel’s relationship to both the full circuit—which can review panel decisions en banc—and the Supreme Court can affect the likelihood of whistleblower-based effects. Using data from numerous areas of the law, Kastellec (2011a) showed that a panel majority is most likely to be influenced by the presence of a potential whistleblowing judge when the majority is aligned against both the full circuit and the Supreme Court and the potential whistleblower is aligned with them. The upshot of this result is that the variance of decision-making across panels in terms of liberal versus conservative decisions is greatly reduced, compared to a world in which panel effects did not exist.

Whereas the model presented in Kastellec (2007) assumes perfect information, and hence dissents do not occur in equilibrium, Beim, Hirsch, and Kastellec (2014) develop a more nuanced theory of an interactive game between a unitary higher court and a multimember lower court. There are two types of imperfect information. First, the higher court cannot observe the case facts unless it pays to review the lower court’s decision. Second, the lower court does not know the higher court’s cost of review. The utility of all judges is a function of the distance between their indifference point in a case space and the location of the case facts. They key insight of the model is that dissent is most valuable to the higher court when it is most rare, because the higher court wants to save its resources on reviewing cases where noncompliance is most severe—that is, cases where it really prefers one disposition over the other. Because more ideologically extreme judges dissent “too often” from decisions they disagree with, panel effects (and hence compliance) is maximized with potential whistleblowers who are more aligned with the panel majority. A related result is that dissents by such “moderate” whistleblowers should be more likely to trigger review by the higher court. Beim, Hirsch, and Kastellec (2016) test this prediction on full circuits’ decisions to review panel judgments en banc, and find support for it: cases with dissents from judges who are closely aligned with the panel majority are much more likely to be reviewed en banc than cases with dissents from more ideologically extreme judges. In sum, then, the literature on collegiality on lower courts demonstrates that having multimember appeals courts has advantages even beyond the core benefits identified by Posner.

Conclusion: Future Theoretical and Empirical Avenues

The literatures discussed reveal that scholars have made tremendous strides in bolstering our understanding of the judicial hierarchy. Some brief (and potentially idiosyncratic) thoughts are offered on possible fruitful avenues for further theoretical and empirical advances.


It is only recently that scholars have begun to examine in depth how the judicial hierarchy enables learning, and further development on this front is surely needed. Because constructing the common law is inherently an incremental (i.e., case-by-case) process, it seems likely that one-shot models of the hierarchy can go only so far in helping us to understand how higher courts and lower courts learn from each other in an iterative fashion. One attractive feature of the work of Baker and his colleagues (Badawi & Baker, 2015; Baker & Kim, 2012) is that it explicitly places the Supreme Court and the lower courts in a repeated game, where the higher court simultaneously monitors and learns from lower courts. Where the error correction theories in both the agency and team camps have largely proceeded on parallel tracks from learning models of the hierarchy, this work suggests the two could profitably be merged to consider both the problem and opportunities of hierarchy for the Supreme Court in an integrated fashion. One promising empirical paper along these lines comes from Hansford, Spriggs, and Stenger (2013), who argue that the Supreme Court learns from how lower courts treat a newly established Supreme Court precedent.

Next, there is substantial room for further understanding the importance of collegiality on each of the appellate tiers in the hierarchy. With respect to the Supreme Court, as discussed previously, the leading theories of bargaining on the Court—which, recall, emphasize how the median justice is not all-powerful—raise questions about the relationship between the rule announced by the Court and how lower courts should interpret it. Relatedly, these theories largely ignore any possible role for concurring and dissenting opinions to influence lower court behavior—is an opinion issued by a divided Court less likely to be implemented faithfully by a lower court? While this possibility has found some support in empirical studies based on citations, theoretical understanding of the top-down effects of collegiality on lower court rule implementation remains limited.

Turning to importance of collegiality on intermediate appeals courts, one limitation of the literatures discussed is that they exclusively revolve around agency models of noncompliance. While whistleblowing dynamics are certainly crucial to understanding compliance in the hierarchy, it is possible that heterogeneity on appellate panels may also contribute to learning. In a learning world, having multiple decision-makers evaluate new issues within the context of a single case may provide information to the Supreme Court as it resolves new issues—just as having multiple panels provides information to the Court in the learning models discussed previously. In particular, whether a decision by a three-judge panel is unanimous or split may be useful to the justices in terms of the likelihood of the Supreme Court adopting a particular doctrine. This possibility, however, has not been explored in the literature.


While a number of empirical works have been discussed in the context of the larger theoretical themes evaluated here, these citations do not do justice to the host of impressive empirical studies on the judicial hierarchy, stretching back to the early behavioralist days of judicial politics and continuing to the modern day. For example, as previously discussed, Tanenhaus, Schick, Muraskin, and Rosen (1963) presented one of the first statistical analyses of certiorari. The work of Sickels (1965) and Atkins (1973) on intermediate appellate courts can be viewed as forefathers of the panel effects literature. The themes explored in Howard’s (1981) magisterial study of the Courts of Appeals continue to resonate today (see e.g., Hettinger, Lindquist, & Martinek, 2006). In addition, both the Supreme Court and Courts of Appeals databases created and maintained by Harold Spaeth and Donald Songer, respectively, have produced hundreds of invaluable studies on the judicial hierarchy.24 Finally, due to the innovation of Epstein, Martin, Segal, and Westerland (2007), there now exist measures of ideal points that place all federal judges on the same scale, which enables more precise testing of hierarchical theories (see e.g., Giles, Walker, & Zorn, 2006; Westerland, Segal, Epstein, Cameron, & Comparato, 2010; Carrubba & Clark, 2012).

When one views the successful empirical literatures, they tend to (not surprisingly) be based on outputs that are easily observable: quantities such as votes, dissents, cases granted certiorari, and reversal. The recent book by Epstein, Landes, and Posner (2013), which provides a comprehensive empirical examination of each level of the federal courts, is a nice example of what can be learned from the study of these actions. Among many things, the authors apply the logic of the attitudinal model—a highly influential model that itself is built on the study of dispositional votes by the Supreme Court—and show that ideology plays an increasingly important role as you move up the hierarchy, which makes sense given the types of cases that are likely to reach the appellate courts (see also Zorn & Bowie, 2010). They also devote a chapter to an extensive examination of dissent and dissent aversion on the Courts of Appeals and the Supreme Court.

In contrast, measuring doctrine is much more difficult. Not surprisingly, then, the gap between theoretical models outlining the importance of legal doctrine and our ability to measure it is sizable. To be sure, political scientists do recognize the importance of studying doctrine, and not simply dispositional votes. This recognition can be seen in the voluminous and important literature on the transmission, maintenance, and influence of legal precedent at each level of the hierarchy, including state as well as federal courts, through the use of citations (see e.g., Caldeira, 1985; Spriggs & Hansford, 2001; Westerland et al., 2010; Hinkle, 2015). However, while precedents and doctrine are related, they are not identical. And it seems unlikely that examining precedents in isolation will allow us to evaluate some of the more nuanced theories of legal rules and doctrine discussed.

One research tradition that seems more promising (and one that has a long lineage in judicial politics—see Kort [1957] and Segal [1984]) is the study of fact-pattern analysis. In such studies, researchers study the relationship between case facts and judicial outcomes, in an effort to quantitatively discern the mapping between case facts and case outcomes in a particular area of the law. This approach surely does capture doctrine, which is inherently about this mapping. Yet, in studies of the hierarchy, the typical approach has been to compare how the Supreme Court decides cases as the case facts vary, and then to see if the lower courts follow a similar pattern (see e.g., Songer, Segal, & Cameron, 1994; Benesh, 2002). There are two potential issues here. First, as Kastellec and Lax (2008) show, the Supreme Court’s discretionary docket potentially creates selection bias in inferring the justices’ preferred doctrine based solely on the cases they hear. Second, just evaluating the pure mapping between facts and outcomes doesn’t necessarily translate into understanding and evaluating theories of doctrine.

For example, consider the question of whether, as an empirical matter, rules are actually more constraining than standards—something that most observers believe is true, but a proposition for which there is not much systematic evidence. Using the straightforward case-fact approach will not necessarily work here, since we need to also know how much discretion the lower courts should have over particular dimensions of a case, based on the Supreme Court’s choice of rules versus standards. In a clever research design, Smith and Todd (2015) exploit the Supreme Court’s shift from a rule to a standard in its doctrine regarding the admissibility of information supplied by confidential informants, in order to test whether lower courts were more constrained by a rule rather than a standard. While the empirical results are mixed, the design demonstrates how qualitative understandings of doctrine may be fused with empirical tests in order to understand lower courts’ treatment of Supreme Court rules. Similarly, the “jurisprudential regimes” theory of Richards and Kritzer (2002) relies on their understanding of transformative precedents in order to predict significant shifts in the Court’s doctrine, which subsequently has effects on lower court decision-making (Luse, McGovern, Martinek, & Benesh, 2009). While Lax and Rader (2010) call into question the statistical conclusions reached by Richards and Kritzer (2002), their design is another example of how thinking harder about doctrine may be profitable.

As is the case with the theoretical literature, it seems fair to say that our empirical understanding of judicial learning lags behind our understanding of compliance and implementation of existing rules. One reason for this is that the U.S. Supreme Court has received a disproportionate share of attention from scholars, relative to its caseload and (arguably) its importance, given that lower courts dispose of the vast majority of cases. Building off the excellent example set in Klein (2002), a bottom-up approach that focused on the structure and nature of rule-making in the district and appellate courts would facilitate the study of doctrinal learning, because there is much more data to work with and fewer issues of selection to deal with. For example, how does a circuit court’s choice of the scope of precedent affect later doctrinal development in a circuit? How much variability in doctrine exists across jurisdictions? Although circuit splits are a useful measure of doctrinal discrepancies, even circuits that broadly agree on the side of an issue may disagree about the proper scope of a rule and how it should be implemented by future courts in that jurisdiction. Finally, more flexible analytical tools such as classification and regression trees (Kastellec, 2010) may enable the uncovering of deeper structure in issue areas involving high-dimensional fact spaces.

In conclusion, scholars have made tremendous strides in enhancing our understanding of the judicial hierarchy. While many questions remain open, the research discussed strongly suggests that progress on both the theoretical and empirical front will continue in the coming years.


I thank Deborah Beim, Chuck Cameron, Tom Clark, Josh Fischman, and Lewis Kornhauser for helpful comments and suggestions.


