BiSO#
- class dibisoplot.biso.AnrProjects(lab: str, year: int | None = None, **kwargs)#
Bases:
Biso
A class to fetch and plot data about ANR projects.
- Variables:
orientation – Orientation for plots (‘h’ for horizontal).
- __init__(lab: str, year: int | None = None, **kwargs)#
Initialize the AnrProjects class.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | none, optional) – The year for which to fetch data. If None, uses the current year.
args – Additional positional arguments.
kwargs – Additional keyword arguments.
- fetch_data() dict[str, Any] #
Fetch data about ANR projects from the HAL API.
This method queries the API to get the list of ANR projects and their counts. The data is stored in the data attribute as a dictionary where keys are ANR project acronyms and values are their respective counts.
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]
- orientation = 'h'#
- class dibisoplot.biso.Biso(lab, year: int | None = None, barcornerradius: int = 10, dynamic_height: bool = True, dynamic_min_height: int | float = 150, dynamic_height_per_bar: int | float = 25, height: int = 600, language: str = 'fr', legend_pos: dict = None, main_color: str = 'blue', margin: dict = None, max_entities: int | None = 1000, max_plotted_entities: int = 25, scanr_api_password: str | None = None, scanr_api_port: int = 443, scanr_api_scheme: str | None = 'https', scanr_api_url: str | None = None, scanr_api_username: str | None = None, scanr_bso_index: str | None = None, scanr_bso_version: str = '2024Q4', scanr_chunk_size: int = 50, scanr_publications_index: str | None = None, template: str = 'simple_white', text_position: str = 'outside', title: str | None = None, width: int = 800)#
Bases:
object
Base class for generating plots and tables from data fetched from various APIs. The fetch methods are located in each child classes. This class is not designed to be called directly but rather to provide general methods to the different plot types.
- Variables:
orientation – Orientation for plots (‘v’ for vertical, ‘h’ for horizontal).
figure_file_extension – File extension of the figure (pdf, tex…).
default_barcornerradius – Default corner radius for bars in plots.
default_dynamic_min_height – Default minimum height for plots when the height is set dynamically.
default_dynamic_height_per_bar – Default height per bar for plots when the height is set dynamically.
default_dynamic_bar_width – Default width for bars in plots when the height is set dynamically.
default_hal_cursor_rows_per_request – Default number of rows per request when using the cursor API.
default_height – Default height for plots.
default_legend_pos – Default position for the legend.
default_main_color – Default color for plots.
default_margin – Default margins for plots.
default_max_entities – Default maximum number of entities used to create the plot. Default 1000. Set to None to disable the limit. This value limits the number of queried entities when doing analysis. For example, when creating the collaboration map, it limits the number of works to query from HAL to extract the collaborating institutions from.
default_max_plotted_entities – Maximum number of bars in the plot or rows in the table. Default to 25.
default_template – Default template for plots.
default_width – Default width for plots.
- __init__(lab, year: int | None = None, barcornerradius: int = 10, dynamic_height: bool = True, dynamic_min_height: int | float = 150, dynamic_height_per_bar: int | float = 25, height: int = 600, language: str = 'fr', legend_pos: dict = None, main_color: str = 'blue', margin: dict = None, max_entities: int | None = 1000, max_plotted_entities: int = 25, scanr_api_password: str | None = None, scanr_api_port: int = 443, scanr_api_scheme: str | None = 'https', scanr_api_url: str | None = None, scanr_api_username: str | None = None, scanr_bso_index: str | None = None, scanr_bso_version: str = '2024Q4', scanr_chunk_size: int = 50, scanr_publications_index: str | None = None, template: str = 'simple_white', text_position: str = 'outside', title: str | None = None, width: int = 800)#
Initialize the Biso class with the given parameters.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | none, optional) – The year for which to fetch data. If None, uses the current year.
barcornerradius (int, optional) – Corner radius for bars in plots.
dynamic_height (bool, optional) – Whether to use dynamic height for the plot. Only implemented for horizontal bar plots.
dynamic_min_height (int | float, optional) – Minimum height for the plot when the height is set dynamically.
dynamic_height_per_bar (int | float, optional) – Height per bar for plots when the height is set dynamically.
height (int, optional) – Height of the plot.
language (str, optional) – Language for the plot. Default to ‘fr’.
legend_pos (dict, optional) – Position of the legend.
main_color (str, optional) – Main color for the plot.
margin (dict, optional) – Margins for the plot.
max_entities (int | None, optional) – Default maximum number of entities used to create the plot. Default 1000. Set to None to disable the limit. This value limits the number of queried entities when doing analysis. For example, when creating the collaboration map, it limits the number of works to query from HAL to extract the collaborating institutions from.
max_plotted_entities (int, optional) – Maximum number of bars in the plot or rows in the table. Default to 25.
scanr_api_password (str | None, optional) – scanR API password.
scanr_api_port (int | None, optional) – scanR API port. Default to 443.
scanr_api_scheme – scanR API scheme. Default to ‘https’.
scanr_api_url (str | None, optional) – scanR API URL. If None, data won’t be queried.
scanr_api_username (str | None, optional) – scanR API username.
scanr_bso_index (str | None, optional) – scanR BSO index.
scanr_bso_version (str, optional) – Version of the BSO data. Default to “2024Q4”.
scanr_chunk_size (int, optional) – Number of publications to fetch at a time when using the scanR API. Default to 50.
scanr_publications_index (str | None, optional) – scanR publications index.
template (str, optional) – Template for the plot.
text_position (str, optional) – Position of the text on bars.
title (str | None, optional) – Title of the plot.
width (int, optional) – Width of the plot.
- connect_to_elasticsearch() Elasticsearch #
Connect to Elasticsearch using the provided credentials.
- dataframe_to_longtable(table_df, alignments: list | None = None, caption: str | None = None, label: str | None = None, vertical_lines: bool = True, classic_horizontal_lines: bool = False, minimal_horizontal_lines: bool = True, max_plotted_entities: int | None = None) str #
Convert a pandas DataFrame to LaTeX longtable code without document headers.
This function generates LaTeX code for a longtable from a pandas DataFrame. It handles various formatting options such as alignments, captions, labels, and lines between rows and columns.
- Parameters:
table_df (pd.DataFrame) – pandas DataFrame to convert.
alignments (list | None, optional) – List of column alignments (e.g., [‘l’, ‘c’, ‘r’]).
caption (str | None , optional) – Caption for the table.
label (str | None, optional) – Label for referencing the table.
vertical_lines (bool, optional) – Whether to include vertical lines between columns.
classic_horizontal_lines (bool, optional) – Whether to include horizontal lines between rows in a classic style.
minimal_horizontal_lines (bool, optional) – Whether to include minimal horizontal lines between rows.
max_plotted_entities (int | None, optional) – Maximum number of entities to show in the table. If None, show all entities in the table.
- Returns:
LaTeX code for the longtable (without document headers).
- Return type:
str
- Raises:
AttributeError – If both classic_horizontal_lines and minimal_horizontal_lines are True.
ValueError – If the number of alignments does not match the number of columns.
- default_barcornerradius = 10#
- default_dynamic_bar_width = 0.7#
- default_dynamic_height_per_bar = 25#
- default_dynamic_min_height = 150#
- default_hal_cursor_rows_per_request = 10000#
- default_height = 600#
- default_language = 'fr'#
- default_legend_pos = {'x': 1, 'xanchor': 'right', 'y': 1, 'yanchor': 'top'}#
- default_main_color = 'blue'#
- default_margin = {'b': 15, 'l': 15, 'pad': 4, 'r': 15, 't': 15}#
- default_max_entities = 1000#
- default_max_plotted_entities = 25#
- default_scanr_bso_version = '2024Q4'#
- default_scanr_chunk_size = 50#
- default_template = 'simple_white'#
- default_text_position = 'outside'#
- default_width = 800#
- figure_file_extension = 'pdf'#
- generate_plot_info(hide_max_entities_reached_warning: bool = False, hide_n_entities_warning: bool = False)#
Generate the plot info. This information is used to print a warning on the report.
- Parameters:
hide_max_entities_reached_warning (bool, optional) – If True, the warning about the maximum number of entities processed is not displayed.
hide_n_entities_warning (bool, optional) – If True, the warning about the number of entities found is not displayed.
- get_all_ids_with_cursor(id_type='doi')#
Get all DOI articles using cursor pagination
- get_error_latex() str #
Create the error LaTeX code.
- get_error_plot() Figure #
Create the error plot.
- get_figure() Figure #
Generate a bar plot based on the fetched data.
- Returns:
The plotly figure.
- Return type:
go.Figure
- get_no_data_latex() str #
Create the error LaTeX code.
- get_no_data_plot() Figure #
Create the error plot.
- get_works_from_es_index_from_id(index: str, ids: list[str] | tuple[str], fields_to_retrieve: list[str] | tuple[str] | None = None, es: Elasticsearch = None) list[dict] #
Get works by their id from an elasticsearch index.
- Parameters:
index (str) – Index to search in.
ids (list[str] | tuple[str]) – List of ids to fetch.
fields_to_retrieve (list[str] | tuple[str] | None) – List of fields to retrieve. If None, all fields are retrieved.
es (Elasticsearch | None) – Elasticsearch client. If None, a new client is created.
- Returns:
List of works.
- Return type:
list[dict]
- get_works_from_es_index_from_id_and_private_sector(index: str, ids: list[str] | tuple[str], fields_to_retrieve: list[str] | tuple[str] | None = None, es: Elasticsearch = None) list[dict] #
Get works which are from the private sector by their id from an elasticsearch index.
- Parameters:
index (str) – Index to search in.
ids (list[str] | tuple[str]) – List of ids to fetch.
fields_to_retrieve (list[str] | tuple[str] | None) – List of fields to retrieve. If None, all fields are retrieved.
es (Elasticsearch | None) – Elasticsearch client. If None, a new client is created.
- Returns:
List of works.
- Return type:
list[dict]
- get_works_from_es_index_from_id_by_chunk(index: str, ids: list[str] | tuple[str], fields_to_retrieve: list[str] | tuple[str] | None = None, query_type: str | None = None) list[dict] #
Get works by their id from an elasticsearch index by chuncks.
- Parameters:
index (str) – Index to search in.
ids (list[str] | tuple[str]) – List of ids to fetch.
fields_to_retrieve (list[str] | tuple[str] | None) – List of fields to retrieve. If None, all fields are retrieved.
query_type (str | None) – Type of query to use, if None, the query will simply get all the documents by their IDs.
- Returns:
List of works.
- Return type:
list[dict]
- orientation = 'v'#
- class dibisoplot.biso.Chapters(lab: str, year: int | None = None, **kwargs)#
Bases:
Biso
A class to fetch and generate a table of book chapters.
- Variables:
figure_file_extension – The file extension for the figures (“tex” for LaTeX file).
- __init__(lab: str, year: int | None = None, **kwargs)#
Initialize the Chapters class.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | none, optional) – The year for which to fetch data. If None, uses the current year.
args – Additional positional arguments.
kwargs – Additional keyword arguments.
- fetch_data() dict[str, Any] #
Fetch data about book chapters from the HAL API.
This method queries the API to get the list of book chapters and their metadata. The data is stored in the data attribute as a pandas DataFrame with columns for title (title_s), book title (bookTitle_s), and publisher (publisher_s).
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]
- figure_file_extension = 'tex'#
- get_figure() str #
Generate a LaTeX longtable of book chapters.
- Returns:
LaTeX code for the longtable representing the book chapters data.
- Return type:
str
- class dibisoplot.biso.CollaborationMap(lab: str, year: int | None = None, countries_land_color: str | None = None, countries_lines_color: str | None = None, countries_to_ignore: list[str] | None = None, frame_color: str | None = None, height: int | None = None, institutions_to_exclude: list[str] | None = None, map_zoom: bool = False, markers_scale_factor: float | int | None = None, resolution: int = 110, width: int | None = None, zoom_lat_range: list[float | int] | None = None, zoom_lon_range: list[float | int] | None = None, **kwargs)#
Bases:
Biso
A class to fetch and plot data about collaborations on a map.
- Variables:
default_countries_land_color – Default color for the land in the map.
default_countries_lines_color – Default color for the lines in the map.
default_frame_color – Default color for the frame of the map.
default_height – Default height for the map.
default_width – Default width for the map.
default_zoom_lat_range – Default latitude range for zoomed map.
default_zoom_lon_range – Default longitude range for zoomed map.
- __init__(lab: str, year: int | None = None, countries_land_color: str | None = None, countries_lines_color: str | None = None, countries_to_ignore: list[str] | None = None, frame_color: str | None = None, height: int | None = None, institutions_to_exclude: list[str] | None = None, map_zoom: bool = False, markers_scale_factor: float | int | None = None, resolution: int = 110, width: int | None = None, zoom_lat_range: list[float | int] | None = None, zoom_lon_range: list[float | int] | None = None, **kwargs)#
Initialize the CollaborationMap class.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | None, optional) – The year for which to fetch data. If None, uses the current year.
countries_land_color (str | None, optional) – Color of the land in the map.
countries_lines_color (str | None, optional) – Color of the country lines in the map.
countries_to_ignore (list[str] | None, optional) – List of countries to ignore in the data.
frame_color (str | None, optional) – Color of the frame of the plot.
height (int | None, optional) – Height of the plot.
institutions_to_exclude (list[str] | None, optional) – List of institutions to exclude from the data.
map_zoom (bool, optional) – If set to true, zoom the map according to the ranges of coordinates defined by zoom_lat_range and zoom_lat_range
markers_scale_factor (float | int | None, optional) – Scale factor for the markers. Default is 1. Increase to decrease marker size. If not set and map_zoom is True, default to 0.5.
resolution (int, optional) – Resolution of the plot: can either be 110 (low resolution) or 50 (high resolution).
width (int | None, optional) – Width of the plot.
zoom_lat_range (list[float | int] | None, optional) – Latitude range of coordinates for the zoom map. If set to None, the zoom will be on Europe.
zoom_lon_range (list[float | int] | None, optional) – Longitude range of coordinates for the zoom map. If set to None, the zoom will be on Europe.
args – Additional positional arguments.
kwargs – Additional keyword arguments.
- default_countries_land_color = '#eaeaea'#
- default_countries_lines_color = '#999999'#
- default_frame_color = '#999999'#
- default_height = 500#
- default_height_zoom = 800#
- default_width = 1200#
- default_width_zoom = 1200#
- default_zoom_lat_range = [33.5, 71]#
- default_zoom_lon_range = [-18.5, 39.5]#
- fetch_data() dict[str, Any] #
Fetch data about collaborations from the HAL API and OpenAlex API.
This method queries the API to get the list of collaborations and their metadata. It processes the data to create a DataFrame with latitude, longitude, and other relevant information.
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]
- get_figure() Figure #
Plot a map with the number of collaborations per institution.
- Returns:
The plotly figure.
- Return type:
go.Figure
- class dibisoplot.biso.CollaborationNames(lab: str, year: int | None = None, countries_to_exclude: list[str] | None = None, **kwargs)#
Bases:
Biso
A class to fetch and plot data about institutions collaboration names.
- Variables:
orientation – Orientation for plots (‘h’ for horizontal).
- __init__(lab: str, year: int | None = None, countries_to_exclude: list[str] | None = None, **kwargs)#
Initialize the CollaborationNames class.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | none, optional) – The year for which to fetch data. If None, uses the current year.
countries_to_exclude (list[str] | None, optional) – List of countries to exclude from the data. Use country code (e.g. ‘fr’ for France).
args – Additional positional arguments.
kwargs – Additional keyword arguments.
- fetch_data() dict[str, Any] #
Fetch data about collaboration names from the HAL API only.
This method queries the API to get the list of collaboration names and their counts. It processes the data to create a dictionary where keys are formatted structure names (including country flags) and values are their respective counts.
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]
- orientation = 'h'#
- class dibisoplot.biso.Conferences(lab: str, year: int | None = None, **kwargs)#
Bases:
Biso
A class to fetch and plot data about conferences.
- Variables:
orientation – Orientation for plots (‘h’ for horizontal).
- __init__(lab: str, year: int | None = None, **kwargs)#
Initialize the Conferences class.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | none, optional) – The year for which to fetch data. If None, uses the current year.
args – Additional positional arguments.
kwargs – Additional keyword arguments.
- fetch_data() dict[str, Any] #
Fetch data about conferences from the HAL API.
This method queries the API to get the list of conferences and their counts. It processes the data to create a dictionary where keys are formatted conference names (including country flags) and values are their respective counts.
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]
- orientation = 'h'#
- class dibisoplot.biso.EuropeanProjects(lab: str, year: int | None = None, **kwargs)#
Bases:
Biso
A class to fetch and plot data about European projects.
- Variables:
orientation – Orientation for plots (‘h’ for horizontal).
- __init__(lab: str, year: int | None = None, **kwargs)#
Initialize the EuropeanProjects class.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | none, optional) – The year for which to fetch data. If None, uses the current year.
args – Additional positional arguments.
kwargs – Additional keyword arguments.
- fetch_data() dict[str, Any] #
Fetch data about European projects from the HAL API.
This method queries the API to get the list of European projects and their counts. The data is stored in the data attribute as a dictionary where keys are European project acronyms and values are their respective counts.
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]
- orientation = 'h'#
- class dibisoplot.biso.Journals(lab: str, year: int | None = None, **kwargs)#
Bases:
Biso
A class to fetch and generate a table of journals.
- Variables:
figure_file_extension – The file extension for the figures (“tex” for LaTeX file).
- __init__(lab: str, year: int | None = None, **kwargs)#
Initialize the Journals class.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | none, optional) – The year for which to fetch data. If None, uses the current year.
args – Additional positional arguments.
kwargs – Additional keyword arguments.
- fetch_data() dict[str, Any] #
Fetch data about journals from the API.
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]
- figure_file_extension = 'tex'#
- get_figure() str #
Generate a LaTeX longtable of journals.
- Returns:
LaTeX code for the longtable representing the journals data.
- Return type:
str
- class dibisoplot.biso.JournalsHal(lab: str, year: int | None = None, **kwargs)#
Bases:
Biso
A class to fetch and plot data about journals from HAL data.
- Variables:
orientation – Orientation for plots (‘h’ for horizontal).
- __init__(lab: str, year: int | None = None, **kwargs)#
Initialize the JournalsHal class.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | none, optional) – The year for which to fetch data. If None, uses the current year.
args – Additional positional arguments.
kwargs – Additional keyword arguments.
- fetch_data() dict[str, Any] #
Fetch data about Journals from the HAL API.
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]
- orientation = 'h'#
- class dibisoplot.biso.OpenAccessWorks(lab: str, year: int | None = None, year_range: tuple[int, int] | int | None = None, colors: Any = None, **kwargs)#
Bases:
Biso
A class to fetch and plot data about the open access status of works.
- Variables:
default_year_range_difference – Default difference in years for the range when no year range is provided.
default_oa_colors – Default colors for different open access statuses.
- __init__(lab: str, year: int | None = None, year_range: tuple[int, int] | int | None = None, colors: Any = None, **kwargs)#
Initialize the OpenAccessWorks class.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | none, optional) – The year for which to fetch data. If None, uses the current year. Ignored if year_range is provided. If year_range is not provided, year_range will be set to [year - self.default_year_range_difference, year]
year_range (tuple[int, int] | int | None, optional) – Range of years to fetch data for. If None, fetch the years from self.year - default_year_range_difference to self.year. If only one int is provided, it replaces self.year.
colors (Any, optional) – Colors for different open access statuses.
args – Additional positional arguments.
kwargs – Additional keyword arguments.
- default_oa_colors = {'Closed access': '#C60B46', 'Full text in HAL': '#00807A', 'OA outside HAL': '#FEBC18'}#
- default_year_range_difference = 4#
- fetch_data() dict[str, Any] #
Fetch data about open access works from the HAL API.
This method queries the API to get the count of open access works for each year in the specified year range. The data is stored in the data attribute as a pandas DataFrame with counts for different open access statuses.
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]
- get_figure() Figure #
Plot the open access status of works.
- Returns:
The plotly figure.
- Return type:
go.Figure
- class dibisoplot.biso.PrivateSectorCollaborations(lab: str, year: int | None = None, **kwargs)#
Bases:
Biso
A class to fetch and generate a plots with the names of the private sector collaborations.
- Variables:
orientation – Orientation for plots (‘h’ for horizontal).
- __init__(lab: str, year: int | None = None, **kwargs)#
Initialize the PrivateSectorCollaborations class.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | none, optional) – The year for which to fetch data. If None, uses the current year.
args – Additional positional arguments.
kwargs – Additional keyword arguments.
- fetch_data() dict[str, Any] #
Fetch data about PrivateSectorCollaborations from the API.
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]
- orientation = 'h'#
- class dibisoplot.biso.WorksBibtex(lab, year: int | None = None, barcornerradius: int = 10, dynamic_height: bool = True, dynamic_min_height: int | float = 150, dynamic_height_per_bar: int | float = 25, height: int = 600, language: str = 'fr', legend_pos: dict = None, main_color: str = 'blue', margin: dict = None, max_entities: int | None = 1000, max_plotted_entities: int = 25, scanr_api_password: str | None = None, scanr_api_port: int = 443, scanr_api_scheme: str | None = 'https', scanr_api_url: str | None = None, scanr_api_username: str | None = None, scanr_bso_index: str | None = None, scanr_bso_version: str = '2024Q4', scanr_chunk_size: int = 50, scanr_publications_index: str | None = None, template: str = 'simple_white', text_position: str = 'outside', title: str | None = None, width: int = 800)#
Bases:
Biso
A class to fetch the works of a HAL collection and create the bibtex string.
- Variables:
figure_file_extension – The file extension for the figures (“bib” for LaTeX bibtex file).
- fetch_data() dict[str, Any] #
Fetch the data from HAL and convert it to bibtex. The bibtex data returned by HAL is not valid; therefore, we need to create our own bibtex string. The bibtex doesn’t support mathematical expressions or other latex commands to avoid compilation errors. This could be improved in the future to support mathematical expressions and other special characters.
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]
- figure_file_extension = 'bib'#
- get_figure() str #
Generate a LaTeX bibtex file.
- Returns:
LaTeX bibtex string with all the references of the HAL collection.
- Return type:
str
- class dibisoplot.biso.WorksType(lab, year: int | None = None, **kwargs)#
Bases:
Biso
A class to fetch and plot data about work types.
- __init__(lab, year: int | None = None, **kwargs)#
Initialize the WorksType class.
- Parameters:
lab (str) – The HAL collection identifier. This usually refers to the lab acronym.
year (int | none, optional) – The year for which to fetch data. If None, uses the current year.
args – Additional positional arguments.
kwargs – Additional keyword arguments.
- fetch_data() dict[str, Any] #
Fetch data about work types from the HAL API.
This method queries the API to get the list of work types and their counts. It processes the data to create a dictionary where keys are work type names and values are their respective counts.
- Returns:
The info about the fetched data.
- Return type:
dict[str, Any]