
Crossref Event Data - Detailed Review
Research Tools

Crossref Event Data - Product Overview
Introduction to Crossref Event Data
Crossref Event Data is a comprehensive service designed to capture and distribute online activities related to scholarly research. Here’s a breakdown of its primary function, target audience, and key features:Primary Function
Crossref Event Data collects and records instances where research outputs, such as articles, datasets, and books, are mentioned, shared, or discussed online. This includes activities like social media mentions, comments on blogs and websites, references in Wikipedia, and links from various web sources. The service acts as a hub for storing and distributing this data, providing visibility into the broader conversations around research.Target Audience
The data is valuable to a diverse group of stakeholders, including:- Publishers: To track how their articles are being shared and discussed.
- Authors: To see when and where their work is being mentioned.
- Researchers: For conducting bibliometrics research and assessing the impact of scholarly outputs.
- Developers and Analysts: To build tools and services using the collected data.
Key Features
Data Collection
Event Data gathers information from a variety of web sources, including social media platforms like Twitter and Reddit, newsfeeds, and websites with syndication feeds. It also includes data from DataCite, capturing citations between datasets and articles.Event Structure
Each recorded event consists of several key fields:- Subject: The source of the event, such as a Wikipedia article.
- Relation Type: The type of relationship, e.g., “references” or “discusses”.
- Object: The research output being referenced, usually identified by a DOI.
- Timestamps: The date and time the event occurred and was processed.
- Source: The data contributor, such as Twitter or Reddit.
- Optional Metadata: Additional bibliographic information about the subject and object.
Query API
The service provides a Query API that allows users to retrieve events in bulk. Users can query events based on specific dates, DOIs, or sources. The API delivers data in JSON format and is free to use, with optional paid Service Level Agreements for more timely access.Transparency and Evidence
Crossref Event Data emphasizes transparency and evidence. Each event is linked to an Evidence Record that documents its origin, selection, and processing. This ensures that the data is traceable and auditable.Integration and Future Development
The service is part of a broader initiative to integrate event data with other metadata collected by Crossref. This includes the development of a Relationships API, which aims to capture the complex relationships between different research outputs and activities. By providing this comprehensive and transparent dataset, Crossref Event Data supports the assessment and visibility of scholarly research, fostering a more connected and transparent scholarly record.
Crossref Event Data - User Interface and Experience
The user interface and experience of Crossref Event Data are centered around providing a clear and accessible way to interact with and retrieve event-based data related to research outputs.
Query API and Data Access
The primary interface for accessing Crossref Event Data is through the Query API, which is a REST API. This API allows users to download batches of events based on various filters, such as specific date ranges, sources, or DOIs.Ease of Use
- The API is relatively straightforward, with simple JSON responses that include essential fields like the subject, relation, object, and timestamp of each event.
- Users can issue queries to get all events collected within a specific date range or from particular sources, such as Wikipedia, Twitter, or web pages.
- The API supports pagination, which helps in managing large datasets by providing ‘next’ and ‘previous’ links to facilitate the retrieval of all relevant data.
Documentation and Support
- Crossref provides comprehensive documentation, including a Quick Start guide and detailed sections on each data source and the structure of event records. This documentation helps users get started quickly and understand the data they are working with.
- The community forum and user guides offer additional support, where users can find examples, ask questions, and share feedback.
User Experience
- The interface is designed to be flexible, allowing users to decide how they want to process and represent the event data. For instance, users can choose to build their own database using the event stream.
- The system is dynamic, reflecting the constantly changing web environment. This means that the data available will change over time, but the API is designed to handle these changes smoothly.
- Crossref is also introducing new features, such as the Relationships endpoint, which will eventually replace the Event Data API and provide better support for various types of research output links, including article citations, funding relationships, and authorship.
Engagement and Factual Accuracy
- The system emphasizes providing accurate and up-to-date information. Each event includes documentation of its provenance and links to supporting evidence, ensuring transparency and reliability.
- The community engagement aspect is strong, with Crossref seeking feedback from users to improve the service and ensure it meets the community’s needs.

Crossref Event Data - Key Features and Functionality
Crossref Event Data Overview
Crossref Event Data is a comprehensive service that captures and distributes various types of online activities related to research items identified by DOIs. Here are the key features and functionalities of this service:Data Collection and Sources
Crossref Event Data collects events from a wide range of sources, including social media platforms (like Twitter and Reddit), blogs, news sites, Wikipedia, and other web activities. These sources are referred to as “data contributors,” and they provide data on how research items are being shared, discussed, and referenced online.Event Structure and Content
Each event is structured as a relationship between a subject (e.g., a discussion on Reddit), a relation type (e.g., “discusses”), and an object (e.g., a DOI). The event record includes fields such as `subj_id`, `relation_type_id`, `obj_id`, `occurred_at` (the date and time the event occurred), and a unique `id` for the event. This structure allows for clear tracking of interactions with research items.Query API
The Query API is a REST API that enables users to access event data in bulk. It supports various filters, such as querying events by date, source, or specific DOI. The API provides two main views: `occurred` and `collected`. The `occurred` view gives events based on when they happened, while the `collected` view provides events based on when they were collected by the system. This distinction is important because collected events are final and do not change, whereas occurred events can be updated as more data becomes available.Evidence and Transparency
Crossref Event Data follows an “Evidence First” approach, which means that every event is linked to an Evidence Record. This record documents the original data received, how it was processed, and how the event was created. This transparency helps bridge the gap between raw data from external sources and the resulting events, allowing users to verify the accuracy and context of each event.Data Quality and Reliability
The service ensures data quality through detailed logging and monitoring. The Evidence Logs describe the activity undertaken during the process of creating events, including external API access. This ensures that users can trust the data and understand any discrepancies that might arise from different data providers.Scalability and Maintenance
Crossref is continuously working on improving the stability, reliability, and scalability of the Event Data service. This includes modernizing server infrastructure, improving monitoring, and addressing scalability issues with Elastic Search. These efforts ensure that the service can handle the high volume of events (tens of thousands per day) efficiently.Use Cases
The data provided by Crossref Event Data is valuable for various stakeholders:- Publishers: To see how their articles are being shared and discussed.
- Authors: To track mentions and discussions of their work.
- Researchers: For conducting bibliometrics research and analyzing the impact of research items.
Integration with Other Services
Crossref Event Data collaborates closely with other services like DataCite and the Scholix API, ensuring comprehensive coverage of research item interactions across different platforms. Recent integrations include the incorporation of retractions and corrections from Retraction Watch, enhancing the metadata with third-party data sources.Conclusion
In summary, Crossref Event Data offers a powerful tool for tracking and analyzing online interactions with research items, providing transparent, reliable, and scalable data that can be accessed through a robust Query API. While AI is not explicitly mentioned as a core component of this service, the automated collection, processing, and logging of events involve sophisticated technical processes that ensure the accuracy and reliability of the data.
Crossref Event Data - Performance and Accuracy
Evaluating the Performance and Accuracy of Crossref Event Data
Evaluating the performance and accuracy of Crossref Event Data involves several key aspects, particularly in the context of research tools and AI-driven products.
Data Collection and Sources
Crossref Event Data monitors a wide range of sources, including scholarly articles, datasets, and other important venues for academic discussion. This data is collected from various websites and APIs, ensuring a broad coverage of scholarly activities.
Performance Metrics
The performance of Crossref Event Data can be assessed through several metrics:
- Daily Reports: The Status Service generates daily reports that include counts of events, events by DOI prefix, distinct DOIs, and events per source. These reports also track the lag between the event occurrence and collection dates, which can be useful for monitoring data freshness and integrity.
- Query Limits: To ensure reliability, Crossref has implemented query limits, reducing the number of rows in a single query from 10,000 to 1,000. This allows for multiple pages of results to be retrieved using cursors, which helps in managing large result sets.
Accuracy and Quality
The accuracy of the data is a critical factor:
- Error Detection: The Status Service includes reports such as `doi-validity` and `reversal` that help identify issues like invalid DOIs, stray characters, and domains that could not be reversed into DOIs. These reports are essential for maintaining data quality.
- Error Reporting: Users can report problems with the event data, such as inaccuracies or unexpected results. This feedback mechanism helps Crossref identify and address issues in the data processing and display.
Limitations and Areas for Improvement
Despite its strengths, there are some limitations and areas where improvements can be made:
- Data Inconsistencies: Users have reported finding errors and inaccuracies in the data, such as references from articles that are outside the expected domain. These issues highlight the need for continuous monitoring and improvement of data processing algorithms.
- Query Limitations: While the query limit reduction helps in managing large result sets, it may require users to adjust their query strategies to retrieve all necessary data, potentially increasing the complexity of data retrieval.
Engagement and Factual Accuracy
To ensure high engagement and factual accuracy, Crossref integrates additional data sources, such as retractions and corrections from Retraction Watch, into their existing metadata. This integration enhances the reliability and completeness of the data available through the API.
Conclusion
In summary, Crossref Event Data provides a comprehensive and regularly updated dataset with various tools for monitoring performance and accuracy. However, it is not immune to errors and inconsistencies, which users can report to help improve the service. By addressing these limitations and continuously enhancing data quality, Crossref Event Data remains a valuable resource for research and scholarly activities.

Crossref Event Data - Pricing and Plans
The Pricing Structure for Crossref Event Data
The pricing structure for Crossref Event Data is straightforward and user-friendly, particularly for those in the research community.
Free Access
The primary and most significant aspect of Crossref Event Data is that it is free of charge. The data available via the Query API can be accessed without any fees.
Features of Free Access
- Data Availability: All data collected by Crossref Event Data is available for free, including events from various sources such as Crossref, DataCite, Reddit, Twitter, Wikipedia, and more.
- JSON Format: The API data is provided in JSON format, making it easy to integrate into various applications.
- Evidence and Metadata: The data includes supporting evidence for every event, as well as metadata that documents the provenance and links to the original sources.
- No Data Expiration: Once data enters the Query API, it remains available and does not expire unless under extraordinary circumstances.
Future Paid Options
While the current service is free, Crossref plans to offer a paid-for Service Level Agreement (SLA) in the future. This SLA will provide more timely access to data, but details on the specific features and pricing of this tier are not yet available.
Summary
In summary, Crossref Event Data offers a comprehensive and free service for accessing event data, with no current tiers or additional costs, and a potential for a paid SLA in the future for enhanced service levels.

Crossref Event Data - Integration and Compatibility
Crossref Event Data Integration and Compatibility
Crossref Event Data is designed to be highly integrable and compatible across various platforms and devices, ensuring seamless interaction with other research tools. Here are some key aspects of its integration and compatibility:
API and Data Access
Crossref Event Data provides a REST API that allows users to query and download events in a structured format. This API is updated daily, shortly after midnight, ensuring that the data is current and accessible. The API supports various filters and delivers batches of events based on queries, making it easy to integrate with other systems that can handle REST API calls.
Data Sources and Compatibility
Event Data collects and distributes events from multiple sources, including Crossref, DataCite, Twitter, Wikipedia, and now Retraction Watch. This diverse range of sources ensures that the data is comprehensive and can be integrated with various research tools that rely on these data sources.
Use of DOIs
Crossref Event Data uses DOIs (Digital Object Identifiers) to identify and refer to content items. This standardization ensures that the data can be easily linked and integrated with other systems that use DOIs. The DOIs are normalized into a standard form, which helps in maintaining consistency across different platforms.
Evidence and Transparency
The “Evidence First” approach by Crossref Event Data ensures that every event is supported by an Evidence Record. This transparency helps in bridging the gap between external data and the resulting events, making it easier for different tools to compare and validate the data. This level of transparency enhances the reliability and compatibility of the data across different systems.
Open Source and Community Engagement
The Event Data code is open source and available on Crossref’s Gitlab repository. This openness encourages community engagement and allows developers to contribute and adapt the code to their specific needs, enhancing compatibility with a wide range of tools and platforms.
Format and Pagination
The Query API provides data in a JSON format and uses pagination to manage large datasets. This makes it easier for other tools to handle and process the data, even when dealing with large volumes of events.
Conclusion
In summary, Crossref Event Data is highly integrable due to its standardized API, use of DOIs, diverse data sources, and transparent evidence records. These features ensure that it can be seamlessly incorporated into various research tools and platforms, enhancing its compatibility and usability.

Crossref Event Data - Customer Support and Resources
User Guides and Documentation
Crossref provides comprehensive user guides that detail how to use the Event Data service. These guides cover topics such as the types of data collected, how events are generated, and how to query the data using the REST API. The guides are available on the Crossref Event Data website and include sections on service overview, data sources, and query examples.
Query API and Data Access
The Query API is a key resource for accessing event data. It allows users to retrieve batches of events based on various filters, such as date, source, and DOI. The API is updated daily, and users can query data using different views (e.g., “occurred” or “collected”) to suit their needs.
Evidence Registry and Logs
For users needing to verify the authenticity of events, the Evidence Registry stores supporting evidence for every event collected. This includes links to Evidence Records and logs that document the system’s behavior and the provenance of each event.
Status Service and Monitoring
The Status Service provides real-time information about the system’s activity, availability, and data completeness. This is useful for diagnosing any issues or understanding the system’s behavior at a given time.
Community and Support Channels
Crossref offers several channels for support and community engagement:
- Event Data Enquiries on Gitlab: Users can raise issues or ask questions through the Gitlab platform.
- Crossref Status Page: This page provides updates on the system’s status and any maintenance or issues.
- Event Data Working Group: This group involves stakeholders and users in discussions about the service and its development.
- Education Documentation and Jupyter Notebooks: These resources help users learn how to use the service effectively.
Additional Resources
Other resources include:
- Product Dashboard: Provides an overview of the service’s performance and usage.
- Crossref Services Pages: General information about Crossref services, including Event Data.
- Code Examples: Available in the user guide to help users implement queries and integrate the API into their applications.
By leveraging these resources, users can effectively engage with Crossref Event Data, ensure data accuracy, and get the support they need to make the most out of the service.

Crossref Event Data - Pros and Cons
Advantages of Crossref Event Data
Crossref Event Data offers several significant advantages in the context of research tools and AI-driven products:Comprehensive Monitoring
Crossref Event Data monitors a wide range of sources that are crucial for scholarly discussions. This includes various events such as citations, mentions, and social media posts, providing a holistic view of how research outputs are being engaged with and discussed.Accessibility via API
The data is made available through an API, allowing users to easily access and integrate this information into their own systems. This facilitates the development of tools that can analyze and visualize the impact and engagement of research outputs.Integration with Existing Metadata
Crossref Event Data is integrated with the existing metadata managed by Crossref, enhancing the richness and context of the data. For example, the recent inclusion of retractions and corrections from Retraction Watch adds another layer of important information to the metadata.Transparency and Accountability
By including retractions and corrections, Crossref Event Data promotes transparency and accountability in scholarly research. This helps maintain the integrity of the academic record and ensures that users have access to accurate and up-to-date information.Community Engagement
The service fosters community engagement by providing traceable information about the provenance and context of every event related to research outputs. This can help researchers understand how their work is being received and used by others.Disadvantages of Crossref Event Data
While Crossref Event Data offers many benefits, there are some limitations and potential drawbacks:Dependence on Data Sources
The quality and comprehensiveness of the data depend on the sources being monitored. If key sources are missed or if the monitoring is incomplete, the data may not provide a full picture of the engagement with research outputs.Data Validation
Ensuring the accuracy and validity of the event data can be challenging. While Crossref integrates data from reputable sources like Retraction Watch, there may still be issues with data consistency and reliability.Technical Requirements
Accessing and utilizing the data via the API may require technical expertise, which could be a barrier for some users. This might limit the accessibility of the data to those who are not familiar with API integration.Scope Limitations
The service may not capture all types of events or engagement metrics that are relevant to every researcher or institution. There could be gaps in the data that do not fully reflect the broader impact of research outputs. In summary, Crossref Event Data is a valuable tool for tracking the engagement and impact of research outputs, but it does come with some limitations related to data sources, validation, and technical requirements.
Crossref Event Data - Comparison with Competitors
When Comparing Crossref Event Data with Other Tools
When comparing Crossref Event Data with other tools in the research and academic analytics category, several unique features and differences become apparent.
Crossref Event Data
- Data Collection and Sources: Crossref Event Data collects and distributes events from various sources, including social media mentions, commentary, and citations from policy and technical documents. It integrates data from Crossref, DataCite, and other platforms, providing a comprehensive view of research-related online activity.
- API and Query System: The service offers a REST API that allows users to query and download events based on specific criteria, such as date ranges, sources, and DOIs. The API supports filters and pagination, making it versatile for different types of queries.
- Event Types and Metadata: Events captured by Crossref include discussions, citations, and mentions across different platforms like Reddit, Twitter, and blogs. Each event is detailed with metadata such as the subject and object of the relation, the type of relation, and the timestamp of when the event occurred.
- Evidence Registry: Crossref maintains an Evidence Registry that stores supporting evidence for each event, providing transparency and traceability of how events were generated.
Alternatives and Comparisons
Altmetric Services
- Unlike commercial altmetric services, Crossref Event Data is a raw data store with an API, requiring users to retrieve, process, and reformat the data themselves. This makes it more suitable for users who need detailed, customizable data rather than pre-curated metrics.
AI-Driven Research Tools
- Elicit: Elicit is an AI research assistant that helps with finding related papers, generating research questions, and optimizing database searches. While Elicit focuses on assisting researchers in finding relevant literature, it does not collect or analyze the broader online activity around research outputs like Crossref Event Data.
- Research Rabbit: This tool allows users to create collections of academic papers and visualize scholarly networks. It is more focused on personal research organization and recommendation rather than tracking and analyzing large-scale online engagement with research.
- ChatPDF: ChatPDF is an AI-powered tool for analyzing and summarizing research papers. It does not track or analyze online mentions or discussions about research, which is a core function of Crossref Event Data.
Key Differences
- Scope and Purpose: Crossref Event Data is primarily aimed at capturing and providing visibility to the broader conversations and engagements around research outputs, whereas tools like Elicit, Research Rabbit, and ChatPDF are more focused on assisting individual researchers in their literature search and analysis.
- Data Integration: Crossref Event Data integrates with other metadata services like DataCite, making it a unique resource for tracking data citations and other research-related activities across multiple platforms.
In summary, Crossref Event Data stands out for its comprehensive collection and distribution of research-related events from a wide range of sources, its detailed metadata, and its integration with other scholarly metadata services. While other tools like Elicit, Research Rabbit, and ChatPDF offer valuable assistance in research organization and analysis, they do not provide the same level of broad, event-based data that Crossref Event Data offers.

Crossref Event Data - Frequently Asked Questions
Here are some frequently asked questions about Crossref Event Data, along with detailed responses to each:
What is Crossref Event Data?
Crossref Event Data is a system designed to collect and distribute events related to scholarly content. It captures various types of online activities such as social media mentions, commentary, and citations, providing a broader picture of attention and activity around published research.What types of events does Crossref Event Data collect?
Crossref Event Data collects a wide range of events, including mentions on social media platforms like Twitter and Reddit, links from web pages, references on Wikipedia and WordPress.com blogs, and citations from DataCite items. These events are gathered from multiple sources and include activities such as discussions, shares, and links to registered content items.How is the data collected and processed?
The data is collected through Agents operated by Crossref, DataCite, or their partners. These Agents connect to external data sources, extract relevant information, and convert it into events. Each event includes a subject-relation-object triple and additional metadata to provide context on how, why, where, and by whom the event was created.What is the structure of an Event in Crossref Event Data?
An Event in Crossref Event Data consists of several key fields:subj_id
: The subject of the relation as a URI.relation_type_id
: The type of relation (e.g., discusses, links).obj_id
: The object of the relation as a URI, typically a DOI.occurred_at
: The date and time when the event occurred.id
: The unique ID of the event.source_id
: The ID of the source.subj
andobj
: Optional subject and object metadata.timestamp
: The date and time the event was processed by Event Data.evidence-record
: A link to the evidence record for the event.
How can I access and query the Event Data?
You can access Crossref Event Data through the Query API, which is a JSON REST API. The API allows you to query events based on various filters such as date range, source, and DOI. Queries can be made using either the “collected” or “occurred” views, depending on whether you are interested in the time of collection or the time the event occurred. The API responses are paginated and can be quite large, so it is recommended to handle them as a stream.What is the difference between the “collected” and “occurred” views in the Query API?
The “collected” view provides events based on when they were collected by the system, which is useful for ensuring you get all the data and for daily queries to build a complete database. The “occurred” view provides events based on when they actually occurred, which is useful if you are interested in a specific time period and are willing to re-issue queries for the given date range.What is the Evidence Registry and how does it support Event Data?
The Evidence Registry stores supporting evidence for every event collected by Crossref. Each event can link to an Evidence Record, which documents how the event was generated, including logs of URLs visited and HTTP status codes received. This transparency helps users understand the origin and validity of the events.Can I use Crossref Event Data to track connections between different types of research outputs?
Yes, Crossref Event Data is designed to capture connections between various types of research outputs, including journal articles, books, conference proceedings, preprints, and more. It also links these outputs to non-traditional sources like social media and web pages, providing a comprehensive view of research-related online activity.How does Crossref ensure data transparency and trust in Event Data?
Crossref ensures transparency by providing open activity logs and evidence records that explain how the data was collected. The code for Event Data is also open source, promoting trust and helping users decide how to use the data effectively.Can I build my own database using the Event Data?
Yes, you can build your own database using the event stream provided by Crossref Event Data. The API delivers events in a format that allows you to collect and store them as needed, enabling you to create a customized database of events.
Crossref Event Data - Conclusion and Recommendation
Final Assessment of Crossref Event Data
Crossref Event Data is a valuable tool in the research tools category, particularly for those interested in tracking and analyzing the online activity and impact of scholarly works. Here’s a breakdown of its benefits and who would most benefit from using it:Who Would Benefit Most
- Researchers and Authors: Event Data provides insights into how their work is being shared, discussed, and cited across various platforms, including social media, blogs, and other online sources. This helps authors understand the reach and impact of their research.
- Publishers: Publishers can use Event Data to monitor how their articles and other content are being engaged with online. This information can help in assessing the effectiveness of their publication strategies and identifying trends in reader engagement.
- Bibliometrics Researchers: For those conducting bibliometric studies, Event Data offers a rich source of data on citations, mentions, and other forms of online engagement, which can be crucial for evaluating the impact of research outputs.
- Institutional Administrators: Universities and research institutions can benefit from Event Data by gaining a broader view of how their affiliated researchers’ work is being received and discussed online, which can inform institutional assessment and funding decisions.
Key Features and Benefits
- Comprehensive Data Collection: Event Data collects activity from a variety of web sources, including social media platforms, Wikipedia, and researcher tools like Hypothes.is and F1000 Prime. This comprehensive approach ensures a wide range of online activities are captured.
- Transparency and Evidence: Each event is linked to an Evidence Record, which details where the data came from, why it was selected, and how it was processed. This transparency is crucial for maintaining trust in the data.
- API Access: The data is accessible via a Query API, allowing users to retrieve and process the data in bulk. This API supports various filters and views, such as “collected” and “occurred,” which help in managing the timing and consistency of the data.
- Integration with Other Metadata: Event Data is part of a larger effort to integrate various types of metadata, including data citations and other scholarly outputs. This integration enhances the utility of the data by providing a more holistic view of research activities.
Recommendations
- For Data Integrity: If you need data that never changes once collected, use the “collected” view. This ensures that the data you retrieve will remain consistent over time.
- For Real-Time Activity: If you are interested in events as they occur, the “occurred” view is more suitable, although it may require re-issuing queries for the same date range as new events are collected.
- For Specific Use Cases: Use the various filters available in the Query API to tailor your data retrieval to specific sources, DOIs, or time periods. This helps in focusing on the most relevant data for your analysis.