The longitudinal research database: status and research possibilities.
McGuckin, Robert H. ; Pascoe, George A., Jr.
The Longitudinal Research Database: Status and Research Possibilities
Introduction
THE Longitudinal Research Database (LRD) is a large micro
database(1) of establishment-level data constructed by pooling
information from the Census of Manufactures (CM) and the Annual Survey
of Manufactures (ASM). It is housed within the Census Bureau at the
Center for Economic Studies (the Center), which was established in 1982
to oversee the development of this database, to use the data to improve
future Census Bureau data collection and reports, and to make the data
available to outside users.
The construction of the database was itself a major achievement. It
contains linked data from 5 censuses and 11 annual surveys. There are
2,311,794 individual establishment year records currently in the file,
and it is updated as new data become available. Thus, the LRD is one of
the most ambitious and comprehensive data sets available for the study
of manufacturing, and it promises to provide an exciting and stimulating
research environment for many years. At the same time, the sheer
magnitude of the database, coupled with its complexity, means that
researchers must take the time to fully understand the structure of the
database before embarking on research. This paper outlines the
development of the database, its structure and current status, and the
possibilities for its use in economic research.
The discussion is organized into four sections. We begin with some
general observations on the characteristics that researchers desire in a
database. In particular, we focus on the need for micro-level detail to
adequately examine many economic issues. These observations provide the
framework for the more specific remarks in the remainder of the paper.
These remarks include a brief section outlining the origins of the LRD.
The main portion of the paper details the major components of the LRD,
the kinds of information included in the database, and the related data
sets available at the Center. Throughout, we try to describe the
research conducted at the Center as a way of providing concrete examples
of the kinds of activity the LRD will support. We then briefly discuss
access to the database and conclude with some observations intended to
provide an overall assessment of the usefulness and flexibility of the
LRD.
The Need For Detail in a Database
Economic analysis has a profound influence on data development.
Researchers often approach particular problems with a well-defined
theory, sophisticated econometric or statistical techniques, and data
that are inadequate or inappropriate for testing the theory. This
situation provides the incentive for developing new data. The theory
provides guidance and direction to the data development strategy.
Unfortunately, the need for better data often occurs when an answer to a
question is required in a timeframe too short to develop a new data set.
Even if there is time, the costs of developing new data are often
prohibitive. In these instances, the available data influence the
theory and the econometric procedures used. Thus, data development also
influences economic analysis.
In most research on production functions and total factor
productivity, data availability dictates the estimation procedures. The
absence of detailed data for specific producing units often causes
researchers to use aggregate data in econometric specifications.
Several recent papers using the LRD suggest the existence of substantial
aggregation bias in estimates of productivity relationships.(2)
Moreover, there are many productivity-related questions that simply
cannot be examined with aggregate data. John Solow (1987) argues
convincingly that it is impossible to determine whether energy is a
complement or substitute for other inputs using aggregate data (for
example, two-digit manufacturing industries).
As an example of the need for detailed data, consider the problem
of the measurement of trade flows and the technological leadership of
U.S. industry. Examinations of this problem have focused on the
high-tech trade balance defined in terms of trade flows measured at the
three-digit industry level. This level of aggregation was chosen
because high-tech industries are distinguished from low-tech industries
solely on the basis of research and development (R&D) to sales
ratios. Use of this procedure means that low-tech products are often
included in the high-tech industry category. For instance, the office
and computing equipment industry (Standard Industrial Classification
357) includes high-tech products, such as electronic computers and
peripheral computing equipment. It also includes low-tech products,
such as adding machines and coin counters. Conclusions based on such
aggregate numbers may be misleading.(3)
These examples show that the need for more detailed data is a
central feature of economic research. This need cuts across all applied
fields of economics. The LRD is a longitudinal micro database that
consists of individual establishment (plant) data and that provides a
substantial source of detailed data.
Other elements of data structure
Elements of data structure other than the level of aggregation are
also important for determining the usefulness of a data set to
researchers. Such elements are the aspects of the data used to classify individual records. Although it is unlikely that any list of categories
of economic data would satisfy all researchers, it is possible to list
typical categories that are required for most economic research. As
might be anticipated from the title of this paper, we view time as one
of the most important structural characteristics. Various
cross-sectional aspects of data are also regularly desired in economic
research. Although for some problems the plant may be the appropriate
unit for analysis, the firm or enterprise affiliation of the plant is
more important for other issues. The location, industry classification,
and size of the plant are other important aspects of the data structure
that are of particular interest to economic researchers. Each of these
variables has been made a part of the basic key structure of the LRD.
As the discussion proceeds, we will highlight these structural
characteristics of the LRD, but we will also emphasize that the LRD has
the flexibility to accommodate research requiring new key variables.
Origins of the LRD
In the late 1970's, the Census Bureau agreed to develop a
longitudinal database of individual establishments based on data
collected in the CM and the ASM. The project was carried out under the
direction of Richard and Nancy Ruggles of Yale University. Initial
funding was provided by the National Science Foundation (NSF), the Small
Business Administration, and the Census Bureau. The product of this
effort was the Longitudinal Establishment Database (LED), which contains
data for establishments for 1972 to 1981.
The Center was created to facilitate access to the LED file. Much
of the Center's early efforts at database development were focused
on a balanced panel of the LED file called the Time Series File.
However, it soon became obvious that a balanced panel strategy was
inappropriate. Exits due to plant closings continually reduced the
number of plants in the file. Adding to the decline in the number of
plants operating continuously were changes in the sample design used to
collect data in noncensus years. Furthermore, analysis of the births of
new plants and firms had extensive direct policy and research interest.
In particular, many of the questions of interest to researchers required
a focus on the firm, not simply on plants.
These factors led the Center to rethink its strategy in early 1987.
All CM data for 1963, 1967, 1972, 1977, and 1982 and ASM data for 1973
to 1985 were grouped into a distributed database, which was termed the
Longitudinal Research Database. The change of the database name from
LED to LRD was made to emphasize the new database structure used for
updating and extracting microdata; to focus attention on the primary use
of the data--research and analysis; and to eliminate any confusion that
may have existed, because the Time Series File and LED file had become
synonymous in the inds of some people. The main consequence of this
substantial undertaking is that it is now possible to generate extracts
of the data using a variety of selection keys, such as geographic
location, industry, size, firm, etc. Panels can be selected that meet
the needs of the researcher and that are not constrained to certain
years. Consequently, this paper focuses on the LRD--an unbalanced panel
from which various balanced and unbalanced time series may be obtained.
Contents of the LRD
To determine if the LRD is a useful data source requires a clear
understanding of what the LRD contains. The two principal components of
the LRD--the CM and the ASM--are fundamentally different. We will
discuss the CM first, and then we will contrast it with the ASM.
We want to alert the reader that our discussion concentrates on
methodological issues that the researcher must be careful about when
conducting research. Such a discussion has a tendency to emphasize
problems with the data. As already noted, the LRD has been successfully
employed in a wide range of studies. The results of these studies show
that the LRD is a rich data source with great potential as a research
tool.
The Census of Manufactures component
The CM is an enumeration of all establishments whose primary
activity is manufacturing, as classified by the Census Bureau according
to the Standard Industrial Classification System (SIC). An
establishment is defined as an economic unit, at a single location,
where business is conducted or where services or industrial operations
are performed. The basic unit of data collection is the establishment,
and accordingly, one of the primary data keys in the LRD is the
establishment.
Since 1954, the Census Bureau has obtained the mailing lists used
for data collection from the Internal Revenue Service (IRS) and the
Social Security Administration (SSA). For single-establishment
companies, these lists are usually sufficient for data collection
purposes. However, for multiestablishment companies, the Census Bureau
must request additional information, in particular, the name and address
of each of the company's establishments. (An interesting byproduct of this survey is a detailed description of the firm's legal form
of ownership, which we will discuss later in this article.) The
information from the Census Bureau survey of multiestablishment
companies is combined with the information from the IRS and the SSA to
form the Standard Statistical Establishment List, which forms the basis
for both the CM and the ASM.
Although the CM is a complete enumeration of all manufacturing
establishments, not all establishments actually report data to the
Census Bureau. Some data items for some establishments are obtained
from other Government agencies, and other data items for these
establishments are estimated. After the 1963 CM, it was decided to
reduce the reporting burden, particularly for small companies, by making
greater use of the data in the records obtained from the IRS and the
SSA. Beginning in 1967, some small companies were exempted from
reporting their data to the Census Bureau. Instead, census-type
statistics for these establishments were developed from IRS and SSA
records. The information obtained from these records includes the
firm's name and address, payroll, and gross business receipts.
Other statistics for these small firms are estimated using industry
averages in conjunction with this administrative information.
In 1972, approximately 120,000 small single-establishment
manufacturing firms identified as having less than 10 employees were
designated administrative record cases and were excused from filing
reports. In 1977 and 1982, approximately 145,000 and 130,000 firms,
respectively, were designated administrative record cases. (See Appendix
A.) The impact of administrative record data on industry aggregates is
slight; for manufacturing as a whole, administrative record cases
accounted for only 1.2 percent of the value added in 1972, 1.7 percent
in 1977, and 1.3 percent in 1982. However, these data may be important
in particular industries and for certain research topics.
The information on sales and payrolls obtained from the IRS and the
SSA appears to be of high quality. Moreover, the estimation techniques
for the unobserved variables work well for aggregate data. However, the
methods used to estimate values for the unobserved variables in these
administrative record cases may produce less useful data for
microeconomic projects. Researchers must determine if the Census Bureau
estimation method or some alternative is more appropriate for their
projects.(4)
The treatment of the data collected from the approximately 220,000
remaining establishments reflects the demands of primary Census Bureau
users and the budget constraints. The Census Bureau's primary
objective for both the CM and ASM is to publish useful and accurate
current year aggregates. Consequently, the data are evaluated and
edited with the accuracy of the aggregate statistics in mind. Little
consideration is given to the time series or microaspects of the data.
In designing sampling plans and other collection procedures, the time
and expense required to edit the data for an individual establishment is
weighed against the probable effect that data for that particular
establishment will have on the aggregates. The result is that, during
editing, data for larger establishments receive more careful evaluating
and editing than the data for smaller establishments.
The Annual Survey of Manufactures component
There are two major differences between the CM and the ASM: In the
ASM, the number of establishments is smaller, and fewer data items are
collected.
The ASM is a sample of establishments drawn from the universe of
establishments in the CM. The sample is selected during the year
following each census and is used for data collection for 5 years. After
5 years, a new sample is drawn from the most recent CM.
The LRD contains data from the annual surveys for 1973 to 1985.
These data were collected from four separate ASM panels--the survey
samples drawn originally in 1969, 1974, 1979, and 1984. Although there
is substantial overlap in the establishments present in each ASM sample,
the correspondence is not perfect. Details of the sampling plan are
therefore important in evaluating the possibilities of using a
continuous panel of establishments. Moreover, since the sampling
methodology for the ASM has changed over time and since these changes
have a significant effect on the time series that can be derived from
the LRD, we describe them in some detail.
For the panels selected for 1969 and 1974, an establishment's
size, industry, and company affiliation determined the probability of
selection. If an establishment of a multiestablishment company was
included in the sample, all of the company's establishments were
also required to report their data, regardless of size. Thus, all firms
in the ASM sample for these years were complete in the sense that all
their manufacturing establishments were included.
The probability of selection for a company is related to the size
of its establishments.(5) All companies with a manufacturing
establishment with 250 employees or more were selected. These large
companies account for more than two-thirds of total manufacturing
employment in each of the censuses conducted from 1963 forward.
Companies with smaller establishments were assigned probabilities
proportional to their size.
In 1979, under severe budget pressure, the Census Bureau adopted a
new procedure for sample selection. The main change was that the
probability of selection for any establishment was now solely a function
of the size of the establishment itself. Company affiliation played no
part in the sample design. All establishments with 250 employees or
more in the 1977 Census of Manufactures were included in the 1979 sample
panel. Smaller establishments were still sampled with probabilities
proportional to their size, but the plants of multiestablishment
companies were not included in the sample automatically if one of the
company's other plants was chosen.
The 1979 panel captures about 91 percent of the total manufacturing
activity (measured by total value of shipments) captured by the previous
panel, but the number of sampled individual establishments was reduced
significantly--from about 75,000 to about 55,000. The major effect of
the change was that many small establishments of multiestablishment
companies were excluded from the ASM sample. In turn, the number of
companies for which complete data were collected was also substantially
reduced. Approximately 5,000 companies, roughly half of the total
number of companies in the ASM for which complete data would have been
available under the old sampling design, reported for only a portion of
their establishments under the 1979 sampling methodology. Consequently,
any time series research that requires complete information on the
activities of a company will have substantially fewer observations after
1979.
To compensate for the loss of information that resulted from the
1979 change, the 1984 ASM panel now includes all establishments of
companies with value of shipments of $500 million or more in 1982. As
before, establishments with 250 employees or more are always included in
the sample, regardless of company size, and smaller establishments are
selected with probabilities that are proportional to their size.
It is important to note that the sampling design has implications
for analysis conducted on the basis of categorizations of the data other
than at the national level. Consider, for example, the establishment
location information in the LRD. The location of each establishment is
coded by state, standard metropolitan statistical area, county, and
place. A sample based on these codes permits analysis below the
national level. However, the selection probabilities for the ASM sample
make such analysis subject to potential error. Each ASM sample provides
sufficient sample points to develop estimates for national totals. But
since location is not a criterion used in determining the selection
probability for a particular establishment, totals derived from
aggregating the microdata may not be appropriate for subnational levels
of aggregation. For example, developing county or State totals in ASM
years requires reweighting the data. Similarly, irrespective of the
aggregations involved, the use of data from survey years requires
careful consideration of the sample selection process before estimating
microeconomic models. As part of the Center's software
development, we plan to provide data users with methods to account for
such selection biases.
Summary of CM and ASM coverage
The LRD contains data for all large establishments for every year
from 1972 to 1985. These data are likely to be of high quality due to
the attention they receive during collection and editing. The data for
smaller establishments are less reliable, because they receive less
attention during editing. However, the sales and payroll data for the
administrative record establishments are not subject to substantial
response error.
The ASM samples are less likely to contain small establishments
because of policies to reduce reporting burdens and costs. Moreover,
the composition of the sample of smaller establishments changes every 5
years. Establishments with 250 employees or more remain in the ASM
panels over time. Even though the available time series of firms is
less after 1979 than before, there are still over 6,000 complete
multiunit companies available for annual analysis, and there are
substantially more available than that for census years. Taken together,
these sampling procedures imply that time series over many years will
contain primarily large establishments. Finally, although the sampling
procedures limit the size of continuous panels available for research,
several current projects are utilizing continuous panels of over 20,000
establishments.
Data items in the CM and ASM
From every manufacturing establishment with one employee or more,
the CM collects data on the establishment's inputs of labor,
materials, and capital; its output of products and services; its
location; and the legal form of organization of the owning firm.
Associated with each establishment record is a permanent identification
number and location. Both of these items stay with the establishment
from its birth until it shuts down. In addition, each plant is linked
to a parent firm, and detailed status codes allow one to trace ownership
changes over time.
These establishment-firm codes were used to identify mergers among
the largest firms in each four-digit industry for the study of
conglomerate mergers by McGuckin and Andrews (1987). The same codes
were used for the Lichtenberg and Siegel (1987) study of ownership
changes in continuously operated plants. Lichtenberg and Siegel
examined the relationship between total factor productivity growth and
ownership changes using the time series panel. The McGuckin-Andrews
work examined the performance of acquired lines of business in the
period following their acquisition by a firm not previously operating in
the same industry. This study used census year data and includes
analysis of closed and opened plants. The Lichtenberg and Siegel work
used yearly observations on continuously operated plants derived from
the CM and the ASM.
The ASM collects the same basic measures of economic activity as
the CM, and, in addition, the ASM collects detailed information on
assets, capital expenditures, rental payments, supplemental labor costs,
retirements and depreciation (after 1976), and in selected years, the
cost of purchased services. In survey years, however, less detailed
information on materials consumption and the plant's product
outputs is collected. Data on individual materials consumption are not
requested in survey years. Additionally, in survey years, the value of
products shipped is recorded only in terms of approximately 1,500
product classes, instead of the roughly 11,000 individual products used
in census years.
A detailed description of the individual data items can be found in
the LED Technical Documentation (1987). A brief list of the data items
gives one a good idea of the breadth of coverage. On the input side,
the LRD contains the following: Total employment, number of production
workers, production worker hours, salaries and wages, supplemental labor
costs, cost of materials, inventory stocks for finished products,
work-in-process and materials, capital expenditures, rental payments,
capital stocks of buildings and equipment, depreciation, retirements,
and rents and repairs. Appendix B provides the complete list.
The output data include the value of shipments reported for each
seven-digit product in CM years and for each five-digit product class in
ASM years. Related information--such as value added, miscellaneous
receipts, value of resales, and receipts for contract work--are also
available for each establishment.
There are two important points to keep in mind when designing
research projects with the LRD. First, the reporting unit for data
collection is the establishment. The various inputs used by the
establishment are not allocated to the specific products produced by the
establishment. In most applications and for most Census Bureau
published tabulations, a plant is classified by the industry that
accounts for the plant's largest output. As noted, detailed
information on the value of shipments and physical output of products,
at the seven-digit level in census years and at the five-digit level in
survey years, is available for each plant. The other variables are
reported at the level of the entire establishment.
Second, price data, in the form of unit values, are only collected
in census years.(6) The units (quantity) are not always well defined.
For example, the seven-digit level of detail does not distinguish
between a $200, 10-speed bicycle and a $1,000 racing bicycle. The
absence of even this information outside of census years means that
price series needed, for example, for deflation in production function
estimation must be obtained from non-Census Bureau sources for annual
time series analysis.
This problem was recognized early on by researchers studying total
factor productivity. Fortunately, the Bureau of Industrial Economics
(BIE) at the U.S. Department of Commerce published an SIC-based price
series based on Bureau of Labor Statistics (BLS) data. This series has
been used by several researchers working with the continuous panel.(7)
We want to make one final point with regard to the price data
available in census years: These unit value figures are obtained by
dividing total product (or establishment) value of shipments by the
quantity produced. They represent an average value for all the outputs
of the establishment or product class. They may represent the combined
outputs of the plant better than the BLS prices, which are based on
probability samples of products. There has been little research on the
relative usefulness of these alternative measures. We explicitly raise
this point, because there appears to be a tendency to deemphasize unit
value collection as a way to meet budget reductions, which may be very
shortsighted, since it is not clear that BLS price indexes are
appropriate in all cases.(8)
Although there have been a number of specific research projects
using the LRD, an NSF-sponsored Resources for the Future study is
developing a complete data set for research into productivity issues.
Phase I of the study established the feasibility of producing a balanced
panel containing detailed output, price, and input data. Preliminary
analysis of the information developed for selected industries was
reported at the American Economic Association annual meeting in 1987.
The goal of phase II of this work is to develop a full-scale data set
incorporating the methodological lessons learned in phase I.
Unfortunately, budget cuts will probably prevent the completion of phase
II.
Related data files
The tendency for data availability to influence the development and
testing of economic models is evident in many of the research projects
undertaken at the Center and described previously. To most users, the
data development efforts associated with the Center's research
agenda are perhaps more interesting. In this section, we highlight
several projects involving extensions of the LRD that have been driven
by the requirements of particular research projects. Each of these
extensions involved linking the LRD to another database. Some of these
efforts, like the use of BIE price index data discussed previously,
involved outside databases. Other examples involved specialized Census
Bureau surveys.
In an extension of their 1987 paper, McGuckin and Andrews (1988)
are linking stock market premium data and other financial statistics for
a small sample of companies to LRD-based performance measures for
acquired lines of business (market share, profits, and productivity).
This effort is an attempt to reconcile the disparate findings regarding
the gains to takeovers found in the literature. Financial market studies
show substantial gains that are not observed in accounting studies.(9)
One future project, which could have big payoffs, would be the
development of an association between Census Bureau identification
numbers and numbers used to identify companies in public financial
databases. Such a step would improve research possibilities at the
Center. Currently, the linking of company-level data to LRD companies
in the McGuckin-Andrews study is being made by name matches. A similar
procedure has been used to match companies reporting R&D data in the
NSF-sponsored R&D survey to companies in the LRD. This latter
procedure has resulted in several published papers about large
firms.(10) Currently, with supplemental NSF support, the R&D and LRD
linking is being extended to small firms. Completion of this work will
mean that the entire R&D survey data will be linked to the LRD.
Supplementing the LRD by including the operations of firms outside
manufacturing would be useful in research.(11) Restricting analysis of a
firm to its manufacturing activities is unnecessarily limiting.
There are several areas in which the Center is working to expand
the LRD's compatibility with existing Census Bureau data. One
major area is foreign trade; the increasingly global nature of the
economy has made it necessary to merge foreign trade data with domestic
statistics. Because the foreign trade data are collected on a product
basis, it is sometimes difficult to reconcile these data with LRD data
collected under the SIC system. The Center is currently heading up a
task force at the Census Bureau that is examining the feasibility of
producing trade-adjusted concentration and market penetration statistics
for detailed product classes (five- and seven-digit). The project
includes CM, ASM, and Current Industrial Reports data. If the product
codes and firm identifiers can be successfully linked, then these data
can also be linked to the LRD. One of the first studies will examine
the impact of foreign imports on domestic markets. In turn, research
involving the linked data should help refine edit procedures and provide
for adjustments in collection procedures when necessary.
Finally, a major long-term interest of the Center is the
exploitation of individual data collected through the population
censuses and surveys. The Center has at least one project that will
make use of both LRD and demographic information.(12) The Center also
has recently become the repository for the relatively new Survey of
Characteristics of Business Owners (CBO). This survey was first
conducted in 1982, and there is hope that a new panel can be developed
for 1987. It is the only Census Bureau survey that directly links the
characteristics of business owners with the characteristics of the
business they operate. This data will greatly expand our ability to
examine the nature and characteristics of entrepreneurs.
Accessing the Data
Establishment data are collected by the Census Bureau under the
authority of Title 13 of the United States Code. To protect
confidentiality, Title 13 and the disclosure rules and regulations of
the Census Bureau prohibit the release of information that could be used
to identify or closely approximate the data for an individual
establishment or enterprise. In practice, the Census Bureau considers
disclosure protection a binding constraint, but it provides as much
public information as possible within this constraint. Although the
Census Bureau has well-defined procedures for evaluating and releasing
aggregate data and tabulations, it does not have similar procedures for
evaluating and releasing microdata files. As a result, only a limited
number of outside researchers working at the Census Bureau as special
sworn employees (such as NSF and Census Bureau research fellows and
associates) have access to the LRD.(13)
The practical considerations that make it impossible to accommodate
all demands for microdata by allowing outside researchers to work at the
Census Bureau have led to considerable interest in the development of
public use data files. The major structural characteristics of a public
use data file would be similar to those of the original data file so
that the important economic relationships among variables in the file
would be maintained. Ideally, the public use data file would preserve
the economic relationships with sufficient precision so that
elasticities and other parameters of interest could be directly obtained
without any need for processing by the Center.(14)
In line with the public use data concept, the provision of
researchers with a mock file that they could use to debug programs
written in Service Annual Survey or other standard packages for
execution by the Center would be a way to increase the access to the
LRD. For projects involving the new and relatively clean CBO database,
we hope to be able to provide complete processing without the researcher
having to obtain special employee status. For LRD projects, until we
have developed better software for editing the data and have had more
experience with it, most researchers will still need to visit the Center
to examine the data.(15) Nonetheless, with the use of programs debugged
outside the Center, the necessary time required at the Center would be
reduced. This means that research costs would be reduced and the Center
could accommodate more LRD users.
Concluding Comment
We began our discussion by emphasizing the need for detailed
microdata in resolving important issues in economic research and policy.
In closing, we note that the limit on detail in the LRD is imposed by
the establishment collection unit. However, within this limit,
available computer technology makes it possible to classify and
aggregate the data in a variety of dimensions. No longer does data
collection and dissemination need to be tied to only one system. In
contrast to the past, when tabulations of the data have been restricted
to SIC classifications and to particular localities, the use of the data
can be the determining factor in classification.
This principle has been described recently in work conducted at the
Center involving the SIC system.(16) After recounting numerous
complaints and shortcomings that have been voiced about the SIC system,
Abbott and Andrews (1988) examined how well it classifies the data under
alternative conceptual frameworks that have been proposed as a basis for
the SIC system (markets, production compatibility, etc.). They find
that the current system is a compromise that satisfies no particular
objective. Extensions of the research to show (through the use of
cluster algorithms) how the LRD data would look under various
classification criteria are currently under way. But the real message
that we draw from their work is that the data are sufficiently detailed
and rich to support many classifications developed from objectively
determined criteria. One such criterion is the grouping of producers
based on the closeness of their production technologies, as judged by
input proportions.(17) There are other possibilities. Regardless of the
desired categorizations of the data, the Center is attempting to build
into the LRD software the flexibility to organize the raw observations
according to research needs. (1.) A micro database is one composed of
the individual observations
collected in a survey (the establishment-level observations
in the Annual Survey of Manufactures, for example). The term
distinguishes such data from aggregations of the survey observations,
such as employment or value of shipments for an industry. (2.) Abbott
(1988) shows that the use of aggregate industry price deflators leads to
biased estimates of productivity growth and production functions
estimated in first differences. Lichtenberg and Siegel (1987) found
that failure to account for the diversified structure of a firm's
production when applying price deflators has a substantial effect on
estimates of the role of technical change in total factor productivity.
Similar findings are also reported by Kokkelenberg and Nguyen (1987).
Finally, in a recent theoretical paper, using examples from the Census
Bureau's Survey of Plant Capacity and from earlier work performed
under Center sponsorship, McGuckin and Zadrozny (1988) describe several
econometric problems with existing work on capacity utilization, most of
which employs aggregate data. (3.) A comparison of trade balances
derived from allocating aggregate industries to high-tech and low-tech
categories with those derived by aggregating information on individual
products separated into high-tech and low-tech categories showed
substantial level and trend differences. See McGuckin and Monahan
(1987) and Abbott, McGuckin, Herrick, and Norfolk. (4.) To this end, the
Center is developing software that will enable a researcher to select
alternative estimation strategies. (5.) In this section, we focus on the
size of the reporting unit in determining its probability of selection.
In practice, the sampling design is more complex, including factors such
as the existence of the unit in the previous panel and industry
affiliation. In the past, location may also have been included in the
sample design. It is not currently a criterion variable. (6.) Current
Industrial Reports data are not linked to the LRD. These reports
contain yearly and sometimes monthly unit value data for many detailed
SIC classifications. The Center hopes eventually to link these data to
the LRD. (7.) See Lichtenberg and Siegel (1988) and Hazilla and Kopp
(1986). (8.) A recent paper by Lichtenberg and Griliches (1986)
discusses these differences. (9.) See, for example, the paper by
Ravenscraft and Scherer (1987), which uses accounting data, and the ones
by McGuckin, Warren-Boulton, and Waldstein (1988) and Guerin-Calvert,
McGuckin, and Warren-Boulton (1987), both of which report premiums based
on financial market data. (10.) Lichtenberg (1987) and Guerard, Bean,
and Andrews (1987). (11.) This could be accomplished in part by linking
LRD companies to publicly available financial data. A better procedure,
which the Center hopes to undertake, would be the development of
longitudinal panels for census programs conducted outside manufacturing.
Such a program is already under way for the agriculture census. (12.)
See Davis and Haltiwanger (1987). (13.) The Center has begun to create
public use microdata files. However, precise criteria for evaluating
disclosure risk in economic microdata like those found in the LRD are
not yet available. Masked microdata files of demographic data have been
released by the Census Bureau. These files contain samples of 100,000
individuals or more. The skewed size distribution and the relatively
small number of establishments in the LRD make the development of
useful, disclosures free, public use files difficult. (14.) See McGuckin
and Nguyen (1988) for an extended discussion and several proposals.
(15.) In some cases, for projects involving data tabulations,
arrangements can be made for the Center staff to undertake the data work
directly. (16.) See Abbott and Andrews (1988). (17.) This type of
procedure was used by Gollop and Monahan (1986) in constructing an index
of diversification. They measured the closeness of products by the
technologies of pure producers.
References Abbott, Thomas A., III (1988), "Price Dispersion in U.S. Manufacturing," Center for Economic Studies Working Paper,
U.S. Bureau of the Census. Abbott, Thomas A., III, and Stephen H.
Andrews (1988), "An Examination of the Standard Industrial
Classification of Manufacturing Activity Using the Longitudinal Research
Data Base," Center for Economic Studies Working Paper, U.S. Bureau
of the Census. Abbott, Thomas A., III, Robert H. McGuckin, Paul
Herrick, and Leroy Norfolk, "Advanced Technology Products and the
U.S. Trade Balance," Center for Economic Studies Discussion Paper.
Forthcoming. Davis, Steve J., and John Haltiwanger (1987),
"Establishment-Specific Labor Demand Disturbances and Unemployment
in U.S. Manufacturing Industries," Research proposal to the Center
for Economic Studies, U.S. Bureau of the Census. Dunne, Timothy
(1988), "Firm Entry and Industry Evolution in the U.S.
Manufacturing Sector: Measurement and Analysis," Center for
Economic Studies Working Paper, U.S. Bureau of the Census. Dunne,
Timothy, and Mark J. Roberts (1986), "Measuring Firm Entry,
Growth, and Exit with Census of Manufactures Data," Mimeo,
Pennsylvania State University. Dunne, Timothy, Mark J. Roberts, and
Larry Samuelson (1987), "The Impact of Plant Failure on Employment
Growth in the U.S. Manufacturing Sector," Mimeo, Pennsylvania State
University. Gollop, Frank M., and James L. Monahan (1986), "From
Homogeneity to Heterogeneity: An Index of Diversification," Center
for Economic Studies Working Paper, U.S. Bureau of the Census.
Griliches, Zvi (1984), "Data Problems in Econometrics,"
National Bureau of Economic Research Technical Paper No. 39, July 1984.
Guerard, John B., Alden Bean, and Stephen H. Andrews (1987),
"R&D Management and Corporate Financial Policy,"
Management Science 33, No. 11 (November 1987). Guerin-Calvert, Margaret
E., Robert H. McGuckin, and Frederick R. Warren- Boulton (1987),
"State and Federal Regulation in the Market for Corporate
Control," Antitrust Bulletin 32, Spring 1987. Hazilla, Michael,
and Raymond Kopp (1986), "Plant Level Productivity 1972-81:
Measurement Using a Large Panel of Manufacturing Establishments,"
Working Paper. Kokkelenberg, Edward C., and Sang V. Nguyen (1987),
"Forecasting Comparison of Three Flexible Functional Cost
Forms," 1987 Proceedings of the Business and Economic Statistics
Section, American Statistical Association. The LED Technical
Documentation (1987), Center for Economic Studies, U.S. Bureau of the
Census. Lichtenberg, Frank R. (1987), "The Effects of R&D and
Fixed Investment on Productivity," Paper presented at the Allied
Social Science Association meeting, December 1987. Lichtenberg, Frank
R., and Zvi Griliches (1986), "Errors of Measurement in Output
Deflators," National Bureau of Economic Research Working Paper
Series 2000, August 1986. Lichtenberg, Frank R., and Donald Siegel
(1988), "Productivity and Changes in Ownership of Manufacturing
Plants," Center for Economic Studies Working Paper, U.S. Bureau of
the Census. McGuckin, Robert H., and Stephen H. Andrews (1987),
"The Performance of Lines of Business Purchased in Conglomerate
Acquisitions," Paper presented at the American Economic Association
meeting in Chicago, December 27-30, 1987. McGuckin, Robert H., and
Stephen H. Andrews (1988), "Post Acquisition Performance of
Acquired Lines of Business: Do Stock Market and Accounting Data Tell
Different Stories." Forthcoming. McGuckin, Robert H., and James L.
Monahan (1987), "High Technology Goods and the U.S. Trade
Deficit," Internal Report, U.S. Bureau of the Census. McGuckin,
Robert H., and Sang V. Nguyen (1988), "Use of 'Surrogate'
Files to Conduct Economic Studies with Longitudinal Microdata,"
Paper presented at the U.S. Bureau of the Census Fourth Annual Research
Conference, March 1988. McGuckin, Robert H., Frederick R.
Warren-Boulton, and Peter Waldstein (1988), "Analysis of Mergers
Using Stock Market Returns," Economic Analysis Group Discussion
Paper No. EAG 88-1, U.S. Department of Justice, Antitrust Division.
McGuckin, Robert H., and Peter Zadrozny (1987), "Long Run
Expectations and Capacity," Center for Economic Studies Working
Paper, U.S. Bureau of the Census. Nguyen, Sang V., and Edward C.
Kokkelenberg, (1987), "The Stock of Research and Development
Knowledge and Multi-Factor Productivity Growth." Paper presented at
the American Economic Association meeting in Chicago, December 27-30,
1987. Ravenscraft, David J., and F.M. Scherer (1987), Mergers,
Sell-Offs, and Economic Efficiency (Washington, DC: Brookings
Institution). Solow, John L. (1987), "The Capital-Energy
Complementarity Debate Revisited," American Economic Review
77:605-614. Nguyen, Sang V., and Edward C. Kokkelenberg, (1987),
"The Stock of Research and Development Knowledge and Multi-Factor
Productivity Growth." Paper presented at the American Economic
Association meeting in Chicago, December 27-30, 1987. Ravenscraft, David
J., and F.M. Scherer (1987), Mergers, Sell-Offs, and Economic
Efficiency (Washington, DC: Brookings Institution). Solow, John L.
(1987), "The Capital-Energy Complementarity Debate Revisited,"
American Economic Review 77:605-614.