Crossref Event Data - Short Review

Research Tools

“`

Product Overview: Crossref Event Data

Introduction

Crossref Event Data is a comprehensive service designed to capture and distribute online mentions of Crossref-registered content. This system monitors a diverse range of sources, including social media, blogs, news outlets, Wikipedia, and other web pages, to track the activity and impact of scholarly works.

Key Features

Data Collection and Distribution

Crossref Event Data collects events from various sources such as Twitter, Reddit, newsfeeds, and web pages. These events are generated by Agents operated by Crossref, DataCite, or their partners, which connect to external data sources and convert the data into actionable events.

Event Structure

Each event is represented as a subject-relation-object triple, providing detailed information about the activity. This includes fields such as `subj_id` (the subject of the relation), `relation_type_id` (the type of relation), `obj_id` (the object of the relation), `occurred_at` (the date and time of the event), and a unique `id` for tracing evidence.

Query API

The service provides a REST API that allows users to query and download events in batches. The API supports various filters, such as querying events by date, source, or DOI. It offers two main views: `occurred` and `collected`, which differentiate between the time the event happened and the time it was collected, respectively. The `collected` view ensures that once an event is available, it will never change, while `occurred` results can be updated over time.

Data Sources

Events are gathered from multiple sources, including Crossref to DataCite, DataCite to Crossref, newsfeeds, social media platforms like Twitter and Reddit, and general web pages. Each source is individually documented, and the data is processed differently depending on the source and the Agent involved.

Evidence and Metadata

The system includes an Evidence Registry that provides supporting evidence for every event collected. This ensures transparency and traceability of the events. Additionally, the service maintains logs and metadata that document the behavior of the system and the provenance of each event.

Flexibility and Customization

Users have the flexibility to scan over all data or retrieve specific events since their last visit. The API allows queries such as “all events for this DOI” or “all events for this tweet,” enabling tailored data retrieval. Users can also build their own databases or graph representations using the event stream data.

System Monitoring

The Status Service provides data about the activity within the system, including diagnostic reports and information on the availability and completeness of data. This helps users understand the behavior of the system at any given point in time.

Functionality

Real-Time Monitoring

Crossref Event Data continuously monitors the web for new links and mentions of registered content, providing a constant stream of events that reflect the dynamic nature of online activity.

Pagination and Data Retrieval

The API uses pagination to deliver large datasets, allowing users to follow ‘next’ links to retrieve all the data they need. This ensures efficient and manageable data retrieval.

Data Integrity

The service ensures that collected results, once available, do not change, providing a stable dataset for users. This is particularly important for referencing or citing datasets that should remain consistent over time.

In summary, Crossref Event Data is a powerful tool for tracking the online impact of scholarly content, offering a robust API, diverse data sources, and detailed event metadata. It is designed to provide flexibility and reliability, making it an invaluable resource for researchers, publishers, and anyone interested in understanding the broader reach and influence of academic works.

“`