Datasets

Original Datasets Constructed or in Progress

Patent Court Caselaw Database of District Court and Federal Circuit Court of Appeals Decisions: In this database, I code more than 75,000 patent cases heard in the federal courts between 2000-2021, I code for case outcome, procedural posture, and various legal, economic and political factors. I also code various patent-specific information as well as the Federal Circuit’s treatment of each of the cases that are appealed. I also add in demographic factors related to each of the  judges assigned to hear each case. 

Patent Administrative Caselaw Database: In this database, I code over 12,000 cases heard in the inter partes review process before the Patent Trial and Appeals Board (“PTAB”). I also separately code for information relating to the administrative patent law judges. Using FOIA, I obtained the resumes of the patent law administrative judges and analyze how their field of expertise and past employment experience impacts results in inter partes review cases. In other projects, I compare how outcomes in PTAB cases compare to those at the district court.

Copyright District Court Caselaw Database: Similar to the patent law database, I am working on constructing a similar database for the 48,000 copyright decisions before district courts, 2010-2020.

Trademark District Court Caselaw Database: Similar to the patent law database, I am working on constructing a similar database for the 58,000 trademark decisions before district courts, 2010-2020.

Immigration Court Database:I reconstructed over 100 CSV files from the Department of Justice to compile a dataset of all immigration court decisions from 1951 to the present. The dataset has over 9 million observations and several hundred variables from both the DOJ dataset as well as supplementary datasets independently obtained or created. There are a host of case-related, noncitizen, judge demographic, political, economic, geographic, and institutional variables for each case. I converted all the DOJ spreadsheets to Stata format so if anyone would like access to them please email me. I will make my Stata code available later this summer for how to use the immigration court database. 

Board of Immigration Appeals Database: This database compiles all the Board of Immigration Court decisions from 1951 through the present using both records obtained from the Department of Justice as well as data obtained through a text analysis search of BIA decisions. This database closely matches the Immigration Court database as there are a host of case-related, noncitizen, judge demographic, political, economic, geographic, and institutional variables for each case. 

National Labor Relations Board Database: This database consists of all NLRB decisions on unfair labor relations cases from 1993-2016. I code each case outcome for over 3,000 cases, as well as various other legal, political, economic and case-specific variables. In addition, I code each case for the statutory methodologies used by the Board.

National Labor Relations Board Appellate Court Database: In this database, I code all appellate court decisions arising from NLRB decisions, 1994-2018. I code legal, political, economic and case-specific variables for each of the more than 1,400 cases.

Environmental State Supreme Court Database: This dataset codes 1,000 environmental law cases heard in the state supreme courts from 1990 through 2015, for various political, legal and case-specific factors. (dataset available to public) (with Brandice Canes-Wrone and Tom Clark)

Upon publication of my work, I intend to make these databases publicly available on this website and the Harvard dataverse/SSRN subject to any constraints on third party proprietary information. If anyone would like access to any of the information I received from the  government online or through FOIA requests now, please email me. I converted many large DOJ and USPTO files available online from CSV/TSV to Stata or R format. I will also make my Stata/R code available for creating the immigration dataset available later this summer if anyone with statistical knowledge wants to recreate it themselves.