| Title: | Data-Driven Search Strategy Development and Evidence Synthesis Reporting |
|---|---|
| Description: | Deduplicates bibliographic citations from multiple sources while preserving customizable metadata, supporting data-driven search strategy development and evidence synthesis reporting. Search results can be analyzed using plots and tables, and imported or exported in 'RIS' and 'CSV' formats. An interactive 'shiny' application is included for exploratory use. |
| Authors: | Trevor Riley [aut, cre] (ORCID: <https://orcid.org/0000-0002-6834-9802>), Kaitlyn Hair [aut] (ORCID: <https://orcid.org/0000-0003-0180-7343>), Lukas Wallrich [aut] (ORCID: <https://orcid.org/0000-0003-2121-5177>), Matthew Grainger [aut] (ORCID: <https://orcid.org/0000-0001-8426-6495>), Sarah Young [aut] (ORCID: <https://orcid.org/0000-0002-8301-5106>), Chris Pritchard [aut] (ORCID: <https://orcid.org/0000-0002-1143-9751>), Neal Haddaway [aut] (ORCID: <https://orcid.org/0000-0003-3902-2234>), Martin Westgate [cph] (Author of included synthesisr fragments), Eliza Grames [cph] (Author of included synthesisr fragments), Kaitlyn Hair [cph] (Author of included ASySD deduplication code), CAMARADES Group [cph] (Authors of ASySD (github.com/camaradesuk/ASySD)) |
| Maintainer: | Trevor Riley <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.0.0 |
| Built: | 2026-06-17 15:11:31 UTC |
| Source: | https://github.com/eshackathon/citesource |
The CiteSource package supports evidence aggregation by helping with the processing of results of various searches in different sources. It allows to deduplicate results while retaining meta-data on where those results were found and then enables users to compare the contribution of different sources.
Maintainer: Trevor Riley [email protected] (ORCID)
Authors:
Trevor Riley [email protected] (ORCID)
Kaitlyn Hair [email protected] (ORCID)
Lukas Wallrich [email protected] (ORCID)
Matthew Grainger [email protected] (ORCID)
Sarah Young [email protected] (ORCID)
Chris Pritchard [email protected] (ORCID)
Neal Haddaway [email protected] (ORCID)
Other contributors:
Martin Westgate (Author of included synthesisr fragments) [copyright holder]
Eliza Grames (Author of included synthesisr fragments) [copyright holder]
Kaitlyn Hair (Author of included ASySD deduplication code) [copyright holder]
CAMARADES Group (Authors of ASySD (github.com/camaradesuk/ASySD)) [copyright holder]
Useful links:
Report bugs at https://github.com/ESHackathon/CiteSource/issues
This function processes a dataset and expands the 'cite_source' column, filters on user-specified labels (if provided), and calculates detailed counts such as the records imported, distinct records, unique records, non-unique records, and several percentage contributions for each citation source/method it also adds a total row summarizing these counts.
calculate_detailed_records( unique_citations, n_unique, labels_to_include = NULL )calculate_detailed_records( unique_citations, n_unique, labels_to_include = NULL )
unique_citations |
A data frame containing unique citations.
The data frame must include the columns |
n_unique |
A data frame containing counts of unique records, typically filtered
by specific criteria (e.g., |
labels_to_include |
An optional character vector of labels to filter the citations. If provided, only citations matching these labels will be included in the counts. if 'NULL' all labels are included. Default is 'NULL'. |
The function first checks if the required columns are present in the input data frames.
It then expands the cite_source column, filters the data based on the provided labels (if any),
and calculates various counts and percentages for each citation source. The function also adds
a total row summarizing these counts across all sources.
A data frame with detailed counts for each citation source, including:
Records Imported: Total number of records imported.
Distinct Records: Number of distinct records after deduplication.
Unique Records: Number of unique records specific to a source.
Non-unique Records: Number of records found in other sources.
Source Contribution %: Percentage contribution of each source to the total distinct records.
Source Unique Contribution %: Percentage contribution of each source to the total unique records.
Source Unique %: Percentage of unique records within the distinct records for each source.
# Example usage with a sample dataset unique_citations <- data.frame( cite_source = c("Source1, Source2", "Source2", "Source3"), cite_label = c("Label1", "Label2", "Label1"), duplicate_id = c(1, 2, 3) ) n_unique <- data.frame( cite_source = c("Source1", "Source2", "Source3"), cite_label = c("search", "search", "search"), unique = c(10, 20, 30) ) calculate_detailed_records(unique_citations, n_unique, labels_to_include = "search")# Example usage with a sample dataset unique_citations <- data.frame( cite_source = c("Source1, Source2", "Source2", "Source3"), cite_label = c("Label1", "Label2", "Label1"), duplicate_id = c(1, 2, 3) ) n_unique <- data.frame( cite_source = c("Source1", "Source2", "Source3"), cite_label = c("search", "search", "search"), unique = c(10, 20, 30) ) calculate_detailed_records(unique_citations, n_unique, labels_to_include = "search")
This function processes a dataset of unique citations, expands the cite_source column,
filters based on user-specified labels (if provided), and then calculates the number
of records imported and distinct records for each citation source. It also adds a
total row summarizing these counts.
calculate_initial_records(unique_citations, labels_to_include = NULL)calculate_initial_records(unique_citations, labels_to_include = NULL)
unique_citations |
A data frame containing the unique citations.
It must contain the columns |
labels_to_include |
An optional character vector of labels to filter the citations. If provided, only citations matching these labels will be included in the counts. Default is NULL, meaning no filtering will be applied. |
The function first checks if the required columns are present in the input data frame.
It then expands the cite_source column to handle multiple sources listed in a
single row and filters the dataset based on the provided labels (if any).
The function calculates the number of records imported (total rows) and the number
of distinct records (unique duplicate_id values) for each citation source.
Finally, a total row is added to summarize the counts across all sources.
A data frame containing the counts of Records Imported and Distinct Records
for each citation source. The data frame also includes a "Total" row summing
the counts across all sources.
# Example usage with a sample dataset unique_citations <- data.frame( cite_source = c("Source1", "Source2", "Source3"), cite_label = c("Label1", "Label2", "Label3"), duplicate_id = c(1, 2, 3) ) calculate_initial_records(unique_citations)# Example usage with a sample dataset unique_citations <- data.frame( cite_source = c("Source1", "Source2", "Source3"), cite_label = c("Label1", "Label2", "Label3"), duplicate_id = c(1, 2, 3) ) calculate_initial_records(unique_citations)
This function calculates counts for different phases and calculates precision and recall for each source based on unique citations and citations dataframe. The phases should be labeled as 'screened' and 'final' (case-insensitive) in the input dataframes. The function will give a warning if these labels are not present in the input dataframes.
calculate_phase_count(unique_citations, citations, db_colname)calculate_phase_count(unique_citations, citations, db_colname)
unique_citations |
A dataframe containing unique citations with phase information. The phase information must be provided in a column named 'cite_label' in the dataframe. |
citations |
A dataframe containing all citations with phase information. The phase information must be provided in a column named 'cite_label' in the dataframe. |
db_colname |
The name of the column representing the source database. |
The function will give a warning if 'screened' and 'final' labels are not present in the 'cite_label' column of the input dataframes.
A dataframe containing distinct counts, counts for different phases, precision, and recall for each source, as well as totals.
unique_citations <- data.frame( db_source = c("Database1", "Database1", "Database2", "Database3", "Database3", "Database3"), cite_label = c("screened", "final", "screened", "final", "screened", "final"), duplicate_id = c(102, 102, 103, 103, 104, 104), other_data = 1:6 ) citations <- data.frame( db_source = c("Database1", "Database1", "Database1", "Database2", "Database2", "Database3"), cite_label = c("screened", "final", "screened", "final", "screened", "final"), other_data = 7:12 ) result <- calculate_phase_count(unique_citations, citations, "db_source") resultunique_citations <- data.frame( db_source = c("Database1", "Database1", "Database2", "Database3", "Database3", "Database3"), cite_label = c("screened", "final", "screened", "final", "screened", "final"), duplicate_id = c(102, 102, 103, 103, 104, 104), other_data = 1:6 ) citations <- data.frame( db_source = c("Database1", "Database1", "Database1", "Database2", "Database2", "Database3"), cite_label = c("screened", "final", "screened", "final", "screened", "final"), other_data = 7:12 ) result <- calculate_phase_count(unique_citations, citations, "db_source") result
This function calculates the distinct record counts, as well as screened and final record counts, for each citation source across different phases (e.g., "screened", "final"). It also calculates precision and recall metrics for each source.
calculate_phase_records(unique_citations, n_unique, db_colname)calculate_phase_records(unique_citations, n_unique, db_colname)
unique_citations |
A data frame containing unique citations.
It must include the columns |
n_unique |
A data frame containing counts of unique records.
Typically filtered by specific criteria, such as |
db_colname |
The name of the column representing the citation source
in the |
The function starts by calculating the total distinct records, as well as the total "screened" and "final" records across all sources. It then calculates distinct counts for each source, followed by counts for "screened" and "final" records. Finally, it calculates precision and recall metrics and adds a total row summarizing these counts across all sources.
A data frame with phase counts and calculated precision and recall for each citation source, including:
Distinct Records: The count of distinct records per source.
screened: The count of records in the "screened" phase.
final: The count of records in the "final" phase.
Precision: The precision metric calculated as final / Distinct Records.
Recall: The recall metric calculated as final / Total final records.
# Example usage with a sample dataset unique_citations <- data.frame( cite_source = c("Source1", "Source2", "Source3"), cite_label = c("screened","screened", "final"), duplicate_id = c(1, 2, 3) ) n_unique <- data.frame( cite_source = c("Source1", "Source2", "Source3"), unique = c(10, 20, 30) ) calculate_phase_records(unique_citations, n_unique, "cite_source")# Example usage with a sample dataset unique_citations <- data.frame( cite_source = c("Source1", "Source2", "Source3"), cite_label = c("screened","screened", "final"), duplicate_id = c(1, 2, 3) ) n_unique <- data.frame( cite_source = c("Source1", "Source2", "Source3"), unique = c(10, 20, 30) ) calculate_phase_records(unique_citations, n_unique, "cite_source")
This function calculates the counts of distinct records, records imported, and unique records for each database source. It combines these counts into one dataframe and calculates several ratios and percentages related to the unique and distinct counts. It also calculates the total for each count type.
calculate_record_counts(unique_citations, citations, n_unique, db_colname)calculate_record_counts(unique_citations, citations, n_unique, db_colname)
unique_citations |
Dataframe. The dataframe for calculating distinct records count. |
citations |
Dataframe. The dataframe for calculating records imported count. |
n_unique |
Dataframe. The dataframe for calculating unique records count. |
db_colname |
Character. The name of the column containing the database source information. |
A dataframe with counts of distinct records, imported records, and unique records for each source, including total counts and several calculated ratios and percentages.
unique_citations <- data.frame( db_source = c("Database1", "Database1", "Database2", "Database3", "Database3", "Database3"), other_data = 1:6 ) citations <- data.frame( db_source = c("Database1", "Database1", "Database1", "Database2", "Database2", "Database3"), other_data = 7:12 ) n_unique <- data.frame( cite_source = c("Database1", "Database2", "Database2", "Database3", "Database3", "Database3"), cite_label = c("search", "final", "search", "search", "search", "final"), unique = c(1, 0, 1, 1, 1, 0) ) result <- calculate_record_counts(unique_citations, citations, n_unique, "db_source") print(result)unique_citations <- data.frame( db_source = c("Database1", "Database1", "Database2", "Database3", "Database3", "Database3"), other_data = 1:6 ) citations <- data.frame( db_source = c("Database1", "Database1", "Database1", "Database2", "Database2", "Database3"), other_data = 7:12 ) n_unique <- data.frame( cite_source = c("Database1", "Database2", "Database2", "Database3", "Database3", "Database3"), cite_label = c("search", "final", "search", "search", "search", "final"), unique = c(1, 0, 1, 1, 1, 0) ) result <- calculate_record_counts(unique_citations, citations, n_unique, "db_source") print(result)
Create a summary table to show the contribution of each source and the overall performance of the search. For this to work, labels need to be used that contrast a "search" stage with one or more later stages.
citation_summary_table( citations, comparison_type = "sources", search_label = "search", screening_label = "final", top_n = NULL )citation_summary_table( citations, comparison_type = "sources", search_label = "search", screening_label = "final", top_n = NULL )
citations |
A deduplicated tibble as returned by |
comparison_type |
Either "sources" to summarise and assess sources or "strings" to consider strings. |
search_label |
One or multiple labels that identify initial search results (default: "search") - if multiple labels are provided, they are merged. |
screening_label |
One or multiple label that identify screened records (default: "final") - if multiple are provided, each is compared to the search stage. |
top_n |
Number of sources/strings to display, based on the number of total records they contributed at the search stage. Note that calculations and totals will still be based on all citations. Defaults to NULL, then all sources/strings are displayed. |
A tibble containing the contribution summary table, which shows the contribution of each source and the overall performance of the search
if (interactive()) { # Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate citations and compare sources unique_citations <- dedup_citations(examplecitations) unique_citations |> dplyr::filter(stringr::str_detect(cite_label, "final")) |> record_level_table(return = "DT") citation_summary_table(unique_citations, screening_label = c("screened", "final")) }if (interactive()) { # Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate citations and compare sources unique_citations <- dedup_citations(examplecitations) unique_citations |> dplyr::filter(stringr::str_detect(cite_label, "final")) |> record_level_table(return = "DT") citation_summary_table(unique_citations, screening_label = c("screened", "final")) }
Compare duplicate citations across sources, labels, and strings
compare_sources( unique_data, comp_type = c("sources", "strings", "labels"), include_references = FALSE )compare_sources( unique_data, comp_type = c("sources", "strings", "labels"), include_references = FALSE )
unique_data |
from ASySD, merged unique rows with duplicate IDs |
comp_type |
Specify which fields are to be included. One or more of "sources", "strings" or "labels" - defaults to all. |
include_references |
Should bibliographic detail be included in return? |
dataframe with indicators of where a citation appears, with sources/labels/strings as columns
if (interactive()) { # Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate citations and compare sources dedup_results <- dedup_citations(examplecitations) compare_sources(dedup_results, comp_type = "sources") }if (interactive()) { # Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate citations and compare sources dedup_results <- dedup_citations(examplecitations) compare_sources(dedup_results, comp_type = "sources") }
Count number of unique and non-unique citations from different sources, labels, and strings
count_unique(unique_data, include_references = FALSE)count_unique(unique_data, include_references = FALSE)
unique_data |
from ASySD, merged unique rows with duplicate IDs |
include_references |
Should bibliographic detail be included in return? |
dataframe with indicators of where a citation appears, with source/label/string as column
# Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate citations dedup_results <- dedup_citations(examplecitations) # Count unique and non-unique citations count_unique(dedup_results)# Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate citations dedup_results <- dedup_citations(examplecitations) # Count unique and non-unique citations count_unique(dedup_results)
This function generates a formatted summary table using the gt package,
which displays detailed counts for each citation source. The table includes
columns for the number of records imported, distinct records, unique records,
non-unique records, and various contribution percentages. Data from the
function calculate_detailed_records is pre-formatted for this table.
create_detailed_record_table(data)create_detailed_record_table(data)
data |
A data frame containing the detailed counts for each citation source. The data frame must include the following columns:
|
A gt table object summarizing the detailed record counts for each citation source.
sample_data <- data.frame( Source = c("Source1", "Source2", "Total"), `Records Imported` = c(100, 150, 250), `Distinct Records` = c(90, 140, 230), `Unique Records` = c(50, 70, 120), `Non-unique Records` = c(40, 70, 110), `Source Contribution %` = c("39.1%", "60.9%", "100%"), `Source Unique Contribution %` = c("41.7%", "58.3%", "100%"), `Source Unique %` = c("55.6%", "50%", "52.2%"), check.names = FALSE ) create_detailed_record_table(sample_data)sample_data <- data.frame( Source = c("Source1", "Source2", "Total"), `Records Imported` = c(100, 150, 250), `Distinct Records` = c(90, 140, 230), `Unique Records` = c(50, 70, 120), `Non-unique Records` = c(40, 70, 110), `Source Contribution %` = c("39.1%", "60.9%", "100%"), `Source Unique Contribution %` = c("41.7%", "58.3%", "100%"), `Source Unique %` = c("55.6%", "50%", "52.2%"), check.names = FALSE ) create_detailed_record_table(sample_data)
This function generates a formatted table displaying the record counts for each citation source, including the number of records imported and the distinct records after deduplication.
create_initial_record_table(data)create_initial_record_table(data)
data |
A data frame containing the record counts for each citation source.
It must include columns |
The function checks if the input data frame is empty and returns an empty gt table
if no data is present. Otherwise, it generates a formatted table with labeled columns
and adds footnotes explaining the meaning of each column.
A gt table object summarizing the record counts for each citation source.
sample_data <- data.frame( Source = c("Source1", "Source2", "Source3"), Records_Imported = c(100, 150, 250), Distinct_Records = c(90, 140, 230) ) create_initial_record_table(sample_data)sample_data <- data.frame( Source = c("Source1", "Source2", "Source3"), Records_Imported = c(100, 150, 250), Distinct_Records = c(90, 140, 230) ) create_initial_record_table(sample_data)
This function generates a formatted table that displays the precision and sensitivity (recall) metrics for each citation source, along with distinct records and phase-specific counts such as "screened" and "final".
create_precision_sensitivity_table(data)create_precision_sensitivity_table(data)
data |
A data frame containing phase-specific counts and calculated metrics
for each citation source. It must include columns such as |
The function first checks whether all values in the screened column are zero.
If so, the column is removed from the table. The table is then generated
using the gt package, with labeled columns and footnotes explaining the metrics.
A gt table object summarizing the precision and sensitivity
metrics for each citation source, with relevant footnotes and labels.
sample_data <- data.frame( Source = c("Source1", "Source2", "Total"), Distinct_Records = c(100, 150, 250), final = c(80, 120, 200), Precision = c(80.0, 80.0, 80.0), Recall = c(40.0, 60.0, 100.0), screened = c(90, 140, 230) ) create_precision_sensitivity_table(sample_data)sample_data <- data.frame( Source = c("Source1", "Source2", "Total"), Distinct_Records = c(100, 150, 250), final = c(80, 120, 200), Precision = c(80.0, 80.0, 80.0), Recall = c(40.0, 60.0, 100.0), screened = c(90, 140, 230) ) create_precision_sensitivity_table(sample_data)
Deduplicates citation data. Duplicates are assumed to be published in the same journal, so pre-prints vs. their published versions will not be merged.
dedup_citations(raw_citations, manual = FALSE, show_unknown_tags = FALSE)dedup_citations(raw_citations, manual = FALSE, show_unknown_tags = FALSE)
raw_citations |
Citation dataframe with relevant columns |
manual |
logical. If TRUE, return the full result list including potential pairs for manual review. Default is FALSE. |
show_unknown_tags |
When a label, source, or other merged field is missing, show it as "unknown"? Default FALSE. |
When manual = FALSE: a dataframe of unique citations. When
manual = TRUE: a list with $unique (unique citations),
$manual_dedup (potential pairs for review), and $auto_pairs
(pairs that were merged automatically - feed to dedup_log() together
with confirmed manual pairs to build a full provenance log).
# Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate citations dedup_results <- dedup_citations(examplecitations) # Return potential pairs for manual review dedup_results_manual <- dedup_citations(examplecitations, manual = TRUE)# Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate citations dedup_results <- dedup_citations(examplecitations) # Return potential pairs for manual review dedup_results_manual <- dedup_citations(examplecitations, manual = TRUE)
Add manually identified duplicate pairs to a deduplicated dataset
dedup_citations_add_manual(unique_citations, additional_pairs)dedup_citations_add_manual(unique_citations, additional_pairs)
unique_citations |
Unique citations returned by |
additional_pairs |
Dataframe of manually confirmed duplicate pairs
(a subset of the |
Updated unique citations dataframe with manual duplicates merged.
# Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate and retrieve manual pairs dedup_results <- dedup_citations(examplecitations, manual = TRUE) # (user reviews dedup_results$manual_dedup and sets result == "match" for true dups) # final <- dedup_citations_add_manual(dedup_results$unique, dedup_results$manual_dedup)# Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate and retrieve manual pairs dedup_results <- dedup_citations(examplecitations, manual = TRUE) # (user reviews dedup_results$manual_dedup and sets result == "match" for true dups) # final <- dedup_citations_add_manual(dedup_results$unique, dedup_results$manual_dedup)
Adds further citations (e.g. an additional database search) to a set that was
already deduplicated, and deduplicates the new records against both the
existing set and each other - without discarding the work already done. Each
existing unique record enters as a single row, so prior automatic and manual
merge decisions are preserved; the new records are integrated and full
provenance (the original record_ids behind every merged record) is carried
through.
dedup_citations_add_sources( existing_citations, new_citations, manual = FALSE, show_unknown_tags = FALSE )dedup_citations_add_sources( existing_citations, new_citations, manual = FALSE, show_unknown_tags = FALSE )
existing_citations |
A previously deduplicated set (from
|
new_citations |
New raw citations to add, as returned by
|
manual |
logical. If TRUE, return the full result list including
|
show_unknown_tags |
When a label, source, or other merged field is missing, show it as "unknown"? Default FALSE. |
This is the incremental counterpart to running dedup_citations() on all
sources from scratch and, for the same data, produces the same unique set.
When manual = FALSE: a dataframe of unique citations across both
sets. When manual = TRUE: a list with $unique, $manual_dedup and
$auto_pairs (as in dedup_citations()). In both cases record_ids
retains the original record IDs behind every merged record.
dedup_citations(), dedup_citations_add_manual()
if (interactive()) { existing <- dedup_citations(read_citations(old_files, cite_sources = old_srcs)) new_raw <- read_citations(new_files, cite_sources = new_srcs) combined <- dedup_citations_add_sources(existing, new_raw) }if (interactive()) { existing <- dedup_citations(read_citations(old_files, cite_sources = old_srcs)) new_raw <- read_citations(new_files, cite_sources = new_srcs) combined <- dedup_citations_add_sources(existing, new_raw) }
Combines automatically merged pairs and user-confirmed manual pairs into a
single tibble with a method column ("auto" / "manual"). Useful for
reporting and auditing - e.g. as supplementary material for a systematic
review.
dedup_log(dedup_result, confirmed_manual_pairs = NULL)dedup_log(dedup_result, confirmed_manual_pairs = NULL)
dedup_result |
List returned by |
confirmed_manual_pairs |
Optional dataframe of manual pairs the user
confirmed as duplicates. Typically a subset of |
Tibble with columns method, record_id1, record_id2, and the
common bibliographic fields (title1/2, author1/2, year1/2,
journal1/2, doi1/2) when available.
examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) dedup_results <- dedup_citations(examplecitations, manual = TRUE) # Log of just the auto-merged pairs dedup_log(dedup_results) # Or include user-confirmed manual pairs # dedup_log(dedup_results, confirmed_manual_pairs = my_confirmed_pairs)examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) dedup_results <- dedup_citations(examplecitations, manual = TRUE) # Log of just the auto-merged pairs dedup_log(dedup_results) # Or include user-confirmed manual pairs # dedup_log(dedup_results, confirmed_manual_pairs = my_confirmed_pairs)
Bibliographic data can be stored in a number of different file types, meaning that detecting consistent attributes of those files is necessary if they are to be parsed accurately. These functions attempt to identify some of those key file attributes. Specifically, detect_parser determines which parse_ function to use; detect_delimiter and detect_lookup identify different attributes of RIS files; and detect_year attempts to fill gaps in publication years from other information stored in a data.frame.
detect_parser(x) detect_delimiter(x) detect_lookup(tags) detect_year(df)detect_parser(x) detect_delimiter(x) detect_lookup(tags) detect_year(df)
x |
A character vector containing bibliographic data |
tags |
A character vector containing RIS tags. |
df |
a data.frame containing bibliographic data |
detect_parser and detect_delimiter return a length-1 character; detect_year returns a character vector listing estimated publication years; and detect_lookup returns a data.frame.
This function saves deduplicated citations as a BibTex file with sources, labels and strings
included in the note field (if they were initially provided for any of the citations). Therefore,
beware that any note field that might be included in citations will be overwritten. Also note that
existing files are overwritten without warning.
export_bib(citations, filename, include = c("sources", "labels", "strings"))export_bib(citations, filename, include = c("sources", "labels", "strings"))
citations |
Dataframe with unique citations, resulting from |
filename |
Name (and path) of file, should end in .ris |
include |
Character. One or more of sources, labels or strings |
No return value, called for side effects. Saves deduplicated citations as a 'BibTeX' file to the specified location.
if (interactive()) { # Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) dedup_results <- dedup_citations(examplecitations, merge_citations = TRUE) export_bib(dedup_results$unique, tempfile(fileext = ".bib"), include = "sources") }if (interactive()) { # Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) dedup_results <- dedup_citations(examplecitations, merge_citations = TRUE) export_bib(dedup_results$unique, tempfile(fileext = ".bib"), include = "sources") }
This function saves deduplicated citations as a CSV file for further analysis and/or reporting. Metadata can be separated into one column per source, label or string, which facilitates analysis. Note that existing files are overwritten without warning.
export_csv( unique_citations, filename, fields = "full", separate = NULL, trim_abstracts = 32000, manual_dedup_complete = FALSE )export_csv( unique_citations, filename, fields = "full", separate = NULL, trim_abstracts = 32000, manual_dedup_complete = FALSE )
unique_citations |
Dataframe with unique citations, resulting from |
filename |
Name (and path) of file, should end in .csv |
fields |
Controls which columns are included. Use |
separate |
Character vector indicating which (if any) of cite_source, cite_string and cite_label should be split into separate columns to facilitate further analysis. |
trim_abstracts |
Some databases may return full-text that is misidentified as an abstract. This inflates file size and may lead to issues with Excel, which cannot deal with more than 32,000 characters per field. Therefore, the default is to trim very long abstracts to 32,000 characters. Set a lower number to reduce file size, or NULL to retain abstracts as they are. |
manual_dedup_complete |
Logical. Records, in a |
No return value, called for side effects. Saves the deduplicated citations as a 'CSV' file to the specified location.
if (interactive()) { # Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) dedup_results <- dedup_citations(examplecitations, merge_citations = TRUE) export_csv(dedup_results, tempfile(fileext = ".csv"), separate = "cite_source") # Standard export for RELApp / screening tools (not reimportable into CiteSource): export_csv(dedup_results, tempfile(fileext = ".csv"), fields = "standard") }if (interactive()) { # Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) dedup_results <- dedup_citations(examplecitations, merge_citations = TRUE) export_csv(dedup_results, tempfile(fileext = ".csv"), separate = "cite_source") # Standard export for RELApp / screening tools (not reimportable into CiteSource): export_csv(dedup_results, tempfile(fileext = ".csv"), fields = "standard") }
Saves the candidate duplicate pairs returned as the $manual_dedup element
of dedup_citations(manual = TRUE) so that manual review can be completed
later. Combine with export_csv() to defer manual deduplication: export the
automatically deduplicated unique citations and these candidate pairs now,
then re-import both later with reimport_csv() and
reimport_dedup_candidates() to finish the review. Note that existing files
are overwritten without warning.
export_dedup_candidates(manual_dedup, filename)export_dedup_candidates(manual_dedup, filename)
manual_dedup |
Data frame of candidate pairs, i.e. the |
filename |
Name (and path) of file, should end in .csv |
No return value, called for side effects. Saves the candidate pairs as a 'CSV' file to the specified location.
reimport_dedup_candidates(), dedup_citations_add_manual()
if (interactive()) { examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) dedup_results <- dedup_citations(examplecitations, manual = TRUE) export_dedup_candidates(dedup_results$manual_dedup, tempfile(fileext = ".csv")) }if (interactive()) { examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) dedup_results <- dedup_citations(examplecitations, manual = TRUE) export_dedup_candidates(dedup_results$manual_dedup, tempfile(fileext = ".csv")) }
This function saves a data frame as a RIS file with specified columns mapped to RIS fields. Note that existing files are overwritten without warning.
export_ris( citations, filename, source_field = "DB", label_field = "C7", string_field = "C8" )export_ris( citations, filename, source_field = "DB", label_field = "C7", string_field = "C8" )
citations |
Dataframe to be exported to RIS file |
filename |
Name (and path) of file, should end in .ris |
source_field |
Field in |
label_field |
Field in |
string_field |
Field in |
No return value, called for side effects. Saves the citations as a 'RIS' file to the specified location.
if (interactive()) { # Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) dedup_results <- dedup_citations(examplecitations, merge_citations = TRUE) export_ris(dedup_results$unique, tempfile(fileext = ".ris")) }if (interactive()) { # Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) dedup_results <- dedup_citations(examplecitations, merge_citations = TRUE) export_ris(dedup_results$unique, tempfile(fileext = ".ris")) }
Takes two or more data.frames with different column names or different column orders and binds them to a single data.frame.
merge_columns(x, y)merge_columns(x, y)
x |
Either a data.frame or a list of data.frames. |
y |
A data.frame, optional if x is a list. |
Returns a single data.frame with all the input data frames merged.
Text in standard formats - such as imported via readLines - can be parsed using a variety of standard formats. Use detect_parser to determine which is the most appropriate parser for your situation.
parse_pubmed(x) parse_ris(x, tag_naming = "best_guess") parse_bibtex(x) parse_csv(x) parse_tsv(x)parse_pubmed(x) parse_ris(x, tag_naming = "best_guess") parse_bibtex(x) parse_csv(x) parse_tsv(x)
x |
A character vector containing bibliographic information in ris format. |
tag_naming |
What format are ris tags in? Defaults to "best_guess" See |
Returns an object of class bibliography (ris, bib, or pubmed formats) or data.frame (csv or tsv).
Create a faceted plot that shows unique contributions and duplicated records across two metadata dimensions. Most typical use-case might be to show the contributions of each source across different screening stages.
plot_contributions( data, facets = cite_source, bars = cite_label, color = type, center = FALSE, bar_order = "keep", facet_order = "keep", color_order = "keep", totals_in_legend = FALSE )plot_contributions( data, facets = cite_source, bars = cite_label, color = type, center = FALSE, bar_order = "keep", facet_order = "keep", color_order = "keep", totals_in_legend = FALSE )
data |
A tibble with one hit per row, with variables indicating meta-data of interest. |
facets |
Variable in data used for facets (i.e. sub-plots). Defaults to source (i.e. cite_source). Specify NULL to refrain from faceting. |
bars |
Variable in data used for bars. Defaults to label (i.e. cite_label) |
color |
Color used to fill bars. Default to |
center |
Logical. Should one color be above and one below the axis? |
bar_order |
Character. Order of bars within each facet, any levels not specified will follow at the end. If "keep", then this is based on factor levels (or the first value) in the input data. |
facet_order |
Character. Order of facets. Any levels not specified will follow at the end. |
color_order |
Character. Order of values on the color scale. |
totals_in_legend |
Logical. Should totals be shown in legend (e.g. as Unique (N = 1234)) |
A ggplot2 object showing source contributions as a faceted bar chart. The object can
be further customized using ggplot2 functions or saved with ggsave.
data <- data.frame( article_id = 1:100, cite_source = sample(c("DB 1", "DB 2", "DB 3"), 100, replace = TRUE), cite_label = sample(c("2020", "2021", "2022"), 100, replace = TRUE), type = c("unique", "duplicated")[rbinom(100, 1, .7) + 1] ) plot_contributions(data, center = TRUE, bar_order = c("2022", "2021", "2020"), color_order = c("unique", "duplicated") )data <- data.frame( article_id = 1:100, cite_source = sample(c("DB 1", "DB 2", "DB 3"), 100, replace = TRUE), cite_label = sample(c("2020", "2021", "2022"), 100, replace = TRUE), type = c("unique", "duplicated")[rbinom(100, 1, .7) + 1] ) plot_contributions(data, center = TRUE, bar_order = c("2022", "2021", "2020"), color_order = c("unique", "duplicated") )
Show overlap between different record sources, either by showing the number or the percentages of shared records between any pair of sources.
plot_source_overlap_heatmap( data, cells = "source", facets = NULL, plot_type = c("counts", "percentages"), sort_sources = TRUE, interactive = FALSE, show_labels = "auto", log_scale = FALSE )plot_source_overlap_heatmap( data, cells = "source", facets = NULL, plot_type = c("counts", "percentages"), sort_sources = TRUE, interactive = FALSE, show_labels = "auto", log_scale = FALSE )
data |
A tibble with one record per row, an id column and then one column
per source indicating whether the record was found in that source (usually obtained from |
cells |
Variable to display in the cells. Should be 'source', 'label' or 'string' |
facets |
Variable in data used for facets (i.e. sub-plots). Should be NULL, 'source', 'label' or 'string' |
plot_type |
Either |
sort_sources |
Should sources be shown based on the number of records they contained? If FALSE, order of data is retained. |
interactive |
Should returned plot be interactive and enable user to export records underlying each field? |
show_labels |
Whether to show text labels in cells. |
log_scale |
Should the fill colour scale be log-transformed? Useful when counts
vary greatly across cells. Ignored when |
The requested plot as a either a ggplot2 object (when interactive = FALSE), which can then be
further formatted or saved using ggplot2::ggsave(), or a plotly object when interactive = TRUE
data <- data.frame( article_id = 1:500, source__source1 = rbinom(500, 1, .5) == 1, source__source2 = rbinom(500, 1, .2) == 1, source__source3 = rbinom(500, 1, .1) == 1, source__source4 = rbinom(500, 1, .6) == 1, source__source5 = rbinom(500, 1, .7) == 1 ) plot_source_overlap_heatmap(data) plot_source_overlap_heatmap(data, plot_type = "percentages")data <- data.frame( article_id = 1:500, source__source1 = rbinom(500, 1, .5) == 1, source__source2 = rbinom(500, 1, .2) == 1, source__source3 = rbinom(500, 1, .1) == 1, source__source4 = rbinom(500, 1, .6) == 1, source__source5 = rbinom(500, 1, .7) == 1 ) plot_source_overlap_heatmap(data) plot_source_overlap_heatmap(data, plot_type = "percentages")
Show records found in specific sets of sources to identify the unique contribution of each source and of any subsets
plot_source_overlap_upset( data, groups = "source", nsets = NULL, sets.x.label = "Number of records", mainbar.y.label = "Overlapping record count", order.by = c("freq", "degree"), ... )plot_source_overlap_upset( data, groups = "source", nsets = NULL, sets.x.label = "Number of records", mainbar.y.label = "Overlapping record count", order.by = c("freq", "degree"), ... )
data |
A tibble with one record per row, an id column and then one column per source indicating whether the record was found in that source. |
groups |
Variable to use as groups. Should be 'source', 'label' or 'string' - defaults to source. |
nsets |
Number of sets to look at |
sets.x.label |
The x-axis label of the set size bar plot |
mainbar.y.label |
The y-axis label of the intersection size bar plot |
order.by |
How the intersections in the matrix should be ordered by. Options include frequency (entered as "freq"), degree, or both in any order. |
... |
Arguments passed on to
|
No return value, called for side effects. Renders an UpSet plot showing record overlap between sources to the current graphics device.
Conway, J. R., Lex, A., & Gehlenborg, N. (2017). UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics.
data <- data.frame( article_id = 1:500, source__source1 = rbinom(500, 1, .5) == 1, source__source2 = rbinom(500, 1, .2) == 1, source__source3 = rbinom(500, 1, .1) == 1, source__source4 = rbinom(500, 1, .6) == 1, source__source5 = rbinom(500, 1, .7) == 1 ) plot_source_overlap_upset(data) # To start with the records shared among the greatest number of sources, use plot_source_overlap_upset(data, decreasing = c(TRUE, TRUE))data <- data.frame( article_id = 1:500, source__source1 = rbinom(500, 1, .5) == 1, source__source2 = rbinom(500, 1, .2) == 1, source__source3 = rbinom(500, 1, .1) == 1, source__source4 = rbinom(500, 1, .6) == 1, source__source5 = rbinom(500, 1, .7) == 1 ) plot_source_overlap_upset(data) # To start with the records shared among the greatest number of sources, use plot_source_overlap_upset(data, decreasing = c(TRUE, TRUE))
This function imports RIS and Bibtex files with citations and merges them into one long tibble with one record per line.
read_citations( files = NULL, cite_sources = NULL, cite_strings = NULL, cite_labels = NULL, metadata = NULL, verbose = TRUE, tag_naming = "best_guess", only_key_fields = TRUE )read_citations( files = NULL, cite_sources = NULL, cite_strings = NULL, cite_labels = NULL, metadata = NULL, verbose = TRUE, tag_naming = "best_guess", only_key_fields = TRUE )
files |
One or multiple RIS or Bibtex files with citations. Should be .bib or .ris files |
cite_sources |
The origin of the citation files (e.g. "Scopus", "WOS", "Medline") - vector with one value per file, defaults to file names. |
cite_strings |
Optional. The search string used (or another grouping to analyse) - vector with one value per file |
cite_labels |
Optional. An additional label per file, for instance the stage of search - vector with one value per file |
metadata |
A tibble with file names and metadata for each file. Can be specified as an alternative to files, cite_sources, cite_strings and cite_labels. |
verbose |
Should number of reference and allocation of labels be reported? |
tag_naming |
Either a length-1 character stating how should ris tags be replaced (see details for a list of options), or an object inheriting from class |
only_key_fields |
Should only key fields (e.g., those used by CiteCourse) be imported? If FALSE, all RIS data is retained. Can also be a character vector of field names to retain (after they have been renamed by the import function) in addition to the essential ones. |
A tibble with one row per citation
if (interactive()) { # Import only key fields from the RIS files read_citations(c("res.ris", "res.bib"), cite_sources = c("CINAHL", "MEDLINE"), cite_strings = c("Search1", "Search2"), cite_labels = c("raw", "screened"), only_key_fields = TRUE ) # or equivalently metadata_tbl_key_fields <- tibble::tribble( ~files, ~cite_sources, ~cite_strings, ~cite_labels, ~only_key_fields, "res.ris", "CINAHL", "Search1", "raw", TRUE, "res.bib", "MEDLINE", "Search2", "screened", TRUE ) read_citations(metadata = metadata_tbl_key_fields) }if (interactive()) { # Import only key fields from the RIS files read_citations(c("res.ris", "res.bib"), cite_sources = c("CINAHL", "MEDLINE"), cite_strings = c("Search1", "Search2"), cite_labels = c("raw", "screened"), only_key_fields = TRUE ) # or equivalently metadata_tbl_key_fields <- tibble::tribble( ~files, ~cite_sources, ~cite_strings, ~cite_labels, ~only_key_fields, "res.ris", "CINAHL", "Search1", "raw", TRUE, "res.bib", "MEDLINE", "Search2", "screened", TRUE ) read_citations(metadata = metadata_tbl_key_fields) }
This function calculates the counts of distinct records and records imported for each database source. It combines these counts into one dataframe and calculates the total for each count type.
record_counts(unique_citations, citations, db_colname)record_counts(unique_citations, citations, db_colname)
unique_citations |
Dataframe. The dataframe for calculating distinct records count. |
citations |
Dataframe. The dataframe for calculating records imported count. |
db_colname |
Character. The name of the column containing the database source information. |
A dataframe with counts of distinct records and imported records for each source, including total counts.
# Create synthetic data for example unique_citations <- data.frame( title = paste("Article", 1:10), db_source = sample(c("Database 1", "Database 2", "Database 3"), 10, replace = TRUE), stringsAsFactors = FALSE ) citations <- data.frame( title = paste("Article", 1:20), db_source = sample(c("Database 1", "Database 2", "Database 3"), 20, replace = TRUE), stringsAsFactors = FALSE ) # Use the synthetic data with the function result <- record_counts(unique_citations, citations, "db_source") result# Create synthetic data for example unique_citations <- data.frame( title = paste("Article", 1:10), db_source = sample(c("Database 1", "Database 2", "Database 3"), 10, replace = TRUE), stringsAsFactors = FALSE ) citations <- data.frame( title = paste("Article", 1:20), db_source = sample(c("Database 1", "Database 2", "Database 3"), 20, replace = TRUE), stringsAsFactors = FALSE ) # Use the synthetic data with the function result <- record_counts(unique_citations, citations, "db_source") result
Creates a per-record table that shows which sources (and/or labels/strings) each item was found in.
record_level_table( citations, include = "sources", include_empty = TRUE, return = c("tibble", "DT"), indicator_presence = NULL, indicator_absence = NULL )record_level_table( citations, include = "sources", include_empty = TRUE, return = c("tibble", "DT"), indicator_presence = NULL, indicator_absence = NULL )
citations |
A deduplicated tibble as returned by |
include |
Which metadata should be included in the table? Defaults to 'sources', can be replaced or expanded with 'labels' and/or 'strings' |
include_empty |
Should records with empty metadata (e.g., no information on 'sources') be included in the table? Defaults to FALSE. |
return |
Either a |
indicator_presence |
How should it be indicated that a value is present in a source/label/string? Defaults to TRUE in tibbles and a tickmark in DT tables |
indicator_absence |
How should it be indicated that a value is not present in a source/label/string? Defaults to FALSE in tibbles and a cross in DT tables |
A tibble or DataTable containing the per-record table that shows which sources (and/or labels/strings) each item was found in.
# Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate citations and compare sources unique_citations <- dedup_citations(examplecitations) unique_citations |> dplyr::filter(stringr::str_detect(cite_label, "final")) |> record_level_table(return = "DT")# Load example data from the package examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource") examplecitations <- readRDS(examplecitations_path) # Deduplicate citations and compare sources unique_citations <- dedup_citations(examplecitations) unique_citations |> dplyr::filter(stringr::str_detect(cite_label, "final")) |> record_level_table(return = "DT")
This function reimports a csv file that was tagged and deduplicated by CiteSource.
It allows to continue with further analyses without repeating that step, and also
allows users to make any manual corrections to tagging or deduplication. Note that
this function only works on CSV files that were written with export_csv(..., separate = NULL)
reimport_csv(filename)reimport_csv(filename)
filename |
Name (and path) of CSV file to be reimported, should end in .csv |
A data frame containing the imported citation data if all required columns are present.
if (interactive()) { citations <- reimport_csv("path/to/citations.csv") }if (interactive()) { citations <- reimport_csv("path/to/citations.csv") }
Reads a CSV of candidate duplicate pairs previously written by
export_dedup_candidates() (i.e. the $manual_dedup element of
dedup_citations(manual = TRUE)). This supports a deferred workflow: run
automatic deduplication now, export both the unique citations and the
candidate pairs, and complete the manual review later after re-importing.
reimport_dedup_candidates(filename)reimport_dedup_candidates(filename)
filename |
Name (and path) of the candidate-pairs CSV, should end in .csv |
After review, set the result column to "match" for confirmed duplicates
and pass the result, together with the reimported unique citations, to
dedup_citations_add_manual().
A data frame of candidate pairs with duplicate_id.x / duplicate_id.y
read as character (matching the unique citations from reimport_csv()), ready
for review and dedup_citations_add_manual().
export_dedup_candidates(), dedup_citations_add_manual()
if (interactive()) { candidates <- reimport_dedup_candidates("path/to/candidates.csv") # mark confirmed duplicates, then merge into the reimported unique set candidates$result <- ifelse(candidates$result == "match", "match", "no_match") final <- dedup_citations_add_manual(reimport_csv("unique.csv"), candidates) }if (interactive()) { candidates <- reimport_dedup_candidates("path/to/candidates.csv") # mark confirmed duplicates, then merge into the reimported unique set candidates$result <- ifelse(candidates$result == "match", "match", "no_match") final <- dedup_citations_add_manual(reimport_csv("unique.csv"), candidates) }
This function reimports a RIS file that was tagged and deduplicated by CiteSource.
It allows to continue with further analyses without repeating that step, and also
allows users to make any manual corrections to tagging or deduplication. The function
can also be used to replace the import step (for instance if tags are to be added to
individual citations rather than entire files) - in this case, just call dedup_citations()
after the import.
reimport_ris( filename = "citations.ris", source_field = "DB", label_field = "C7", string_field = "C8", duplicate_id_field = "C1", record_id_field = "C2", tag_naming = "ris_synthesisr", verbose = TRUE )reimport_ris( filename = "citations.ris", source_field = "DB", label_field = "C7", string_field = "C8", duplicate_id_field = "C1", record_id_field = "C2", tag_naming = "ris_synthesisr", verbose = TRUE )
filename |
Name (and path) of RIS file to be reimported, should end in .ris |
source_field |
Character. Which RIS field should cite_sources be read from? NULL to set to missing |
label_field |
Character. Which RIS field should cite_labels be read from? NULL to set to missing |
string_field |
Character. Which RIS field should cite_strings be read from? NULL to set to missing |
duplicate_id_field |
Character. Which RIS field should duplicate IDs be read from? NULL to recreate based on row number (note that neither duplicate nor record IDs directly affect CiteSource analyses - they can only allow you to connect processed data with raw data) |
record_id_field |
Character. Which RIS field should record IDs be read from? NULL to recreate based on row number |
tag_naming |
Synthesisr option specifying how RIS tags should be replaced with names. This should not
be changed when using this function to reimport a file exported from CiteSource. If you import your own
RIS, check |
verbose |
Should confirmation message be displayed? |
Note that this functions defaults' are based on those in export_ris() so that these functions
can easily be combined.
A data frame containing the reimported citation data, with 'CiteSource' metadata columns (cite_source, cite_label, cite_string, duplicate_id, record_ids) restored from the 'RIS' fields.
if (interactive()) { dedup_results <- dedup_citations(citations, merge_citations = TRUE) tmp <- tempfile(fileext = ".ris") export_ris(dedup_results$unique, tmp) unique_citations2 <- reimport_ris(tmp) }if (interactive()) { dedup_results <- dedup_citations(citations, merge_citations = TRUE) tmp <- tempfile(fileext = ".ris") export_ris(dedup_results$unique, tmp) unique_citations2 <- reimport_ris(tmp) }
CiteSource.Running this function will launch the CiteSource shiny app
runShiny(app = "CiteSource", offer_install = interactive())runShiny(app = "CiteSource", offer_install = interactive())
app |
Defaults to CiteSource - possibly other apps will be included in the future |
offer_install |
Should user be prompted to install required packages if they are missing? |
CiteSource shiny app
if (interactive()) { # To run the CiteSource Shiny app: runShiny() }if (interactive()) { # To run the CiteSource Shiny app: runShiny() }
Imports common bibliographic reference formats (i.e. .bib, .ris, or .txt).
synthesisr_read_refs( filename, tag_naming = "best_guess", return_df = TRUE, verbose = FALSE, select_fields = NULL ) read_ref( filename, tag_naming = "best_guess", return_df = TRUE, verbose = FALSE, select_fields = NULL )synthesisr_read_refs( filename, tag_naming = "best_guess", return_df = TRUE, verbose = FALSE, select_fields = NULL ) read_ref( filename, tag_naming = "best_guess", return_df = TRUE, verbose = FALSE, select_fields = NULL )
filename |
A path to a filename or vector of filenames containing search results to import. |
tag_naming |
Either a length-1 character stating how should ris tags be replaced (see details for a list of options), or an object inheriting from class |
return_df |
If TRUE (default), returns a data.frame; if FALSE, returns a list. |
verbose |
If TRUE, prints status updates (defaults to FALSE). |
select_fields |
Character vector of fields to be retained. If NULL, all fields from the RIS file are returned |
The default for argument tag_naming is "best_guess", which estimates what database has been used for ris tag replacement, then fills any gaps with generic tags. Any tags missing from the database (i.e. code_lookup) are passed unchanged. Other options are to use tags from Web of Science ("wos"), Scopus ("scopus"), Ovid ("ovid") or Academic Search Premier ("asp"). If a data.frame is given, then it must contain two columns: "code" listing the original tags in the source document, and "field" listing the replacement column/tag names. The data.frame may optionally include a third column named "order", which specifies the order of columns in the resulting data.frame; otherwise this will be taken as the row order. Finally, passing "none" to replace_tags suppresses tag replacement.
Returns a data.frame or list of assembled search results.
read_ref(): Import a single file
This function exports data.frames containing bibliographic information to either a .ris or .bib file.
write_bib(x) write_ris(x, tag_naming = "synthesisr") write_refs(x, format = "ris", tag_naming = "synthesisr", file = FALSE)write_bib(x) write_ris(x, tag_naming = "synthesisr") write_refs(x, format = "ris", tag_naming = "synthesisr", file = FALSE)
x |
Either a data.frame containing bibliographic information or an object of class bibliography. |
tag_naming |
what naming convention should be used to write RIS files? See details for options. |
format |
What format should the data be exported as? Options are ris or bib. |
file |
Either logical indicating whether a file should be written (defaulting to FALSE), or a character giving the name of the file to be written. |
Returns a character vector containing bibliographic information in the specified format if file is FALSE, or saves output to a file if TRUE.
write_bib(): Format a bib file for export
write_ris(): Format a ris file for export