Open Access and NASA Astrophysics Data
Michael J. Kurtz (ADS Project Scientist)
16 Mar 2016
NASA Astrophysics is, and has always been, a leader in providing open access to scientific data. With small exceptions, all data from NASA missions are publically available. Indeed NASA has also been the leader in creating systems to make the access of these data easy and useful.
The first requirement for providing access to data is that the data be kept. Historically in astronomy large data collections, such as the Harvard College Observatory Plate Collection (500,000 glass photographic plates), are very rare. Observational data, whether in the form of lab books, photographs, computer tapes, or electronic files, have traditionally been viewed as being the personal property of the observer.
NASA changed this paradigm by deciding that data taken with NASA instruments is the property of NASA. To implement this policy NASA has created an infrastructure devoted to maintaining, preserving, and providing access to the highest quality archival data, beginning with the founding of the National Space Science Data Center (NSSDC) in 1966.
The philosophy of open access to data was fully ingrained in the NASA community by the time of the launch of the Einstein X-Ray Observatory in 1979. In announcing that they would relinquish all proprietary rights to the data, Giacconi et al (1979) stated “We are convinced that participation by a broad segment of the astronomical community in the utilization of this facility will substantially enhance the scientific benefits of this mission.”
At the beginning of the 1980s NASA established three wavelength specific data centers, at CfA/SAO (X-ray), Caltech/IPAC (infrared) and JHU/STScI (UV, optical). These projects have been spectacularly successful, and they now house data from Chandra, Spitzer and HST. Also at this time the SIMBAD database was born at the CDS in Strasbourg.
In 1987 NASA sponsored a series of workshops to chart the future path for NASA astrophysics data, the Astrophysics Data System Study. The report of this study (of which The Squibb Report is a summary) is a remarkably prescient document: it lays out in some detail the requirements for the networked astronomy information system we have today.
While the network system NASA built in the early 90s was eventually replaced by the World Wide Web, the new infrastructure created persists to this day. The collaboration between NASA and the CDS, the creation of NED, HEASARC, and the ADS all occurred at this time. All these organizations exist today, and together with the three great NASA data centers, and with international partners at the CDS, CADC, ESO and ESA, form the core of our modern interconnected and interoperable open astronomical data environment.
NASA’s preeminence in astrophysics archiving has been often noted. The (U.S.) National Research Council in 2007 published the report “Portals to the Universe” describing this system; recently the Astronomy & Astrophysics Decadal Survey recommended “… NASA currently supports widely used curated data archives, and similar data curation models could be adopted by NSF and DOE.”
While most use of the ADS today centers mainly on bibliographic information, the ADS has been designed from its very conception to provide links between the literature and the data. Today the ADS has about half a million links to external data sources, from more than 300,000 different papers. It is important to note that while the ADS provides access to these data, the data and links themselves are curated by the organizations which host them, and require a very substantial effort to create and maintain.
The indexing of data links provides in ADS provides a lightweight but effective data discovery mechanism through its search capabilities. As an example of the power of the combined system, one can make a combined query to the ADS, SIMBAD and NED to find the 263 papers which have links to data on the x-ray emitting cluster of galaxies Abell 754. Links to datasets from these papers to Chandra, XMM, HST, ROSAT, BeppoSAX, EGRET, Fermi, Einstein, VLT, Suzaku are provided with this bibliography, as are additional datasets stored in Vizier or at the HEASARC.
A detailed description of the ADS’ plans and efforts on data linking was written for the decadal survey. A more recent update on the indexing of this content was recently published.
Tho’ much has been accomplished, much remains to be done. Aside from ESO and the huge surveys (SDSS, UKIDSS), few ground based observatories have developed systems to effectively store, curate, and share their data. We hope to see this change in the coming years.