Resources

Here are some useful resources for downloading MEDLINE and PubMed Open Access (PubMed OA) XML data.

Download PubMed OA figures

Here, we explain how to download PubMed OA figures corresponded to the parsed information from parse_pubmed_caption function

  • In pubmed_parser, you can use parse_pubmed_caption to parse figures (to be specific figure_id) and captions corresponding to a manuscript.

  • To download the images corresponding to a given PMC or PMID, you can download a CSV file from ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_file_list.csv first. The file will have columns PMID, Accession ID (PMC), and File. In File column, you can see the path to download a tar file of an XML and corresponding figures in the following format oa_package/08/e0/PMC13900.tar.gz.

  • You can use the path to download a tar file for a given PMID or PMC in a following format: ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/08/e0/PMC13900.tar.gz. If you want to download all the tar files, check out ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/ to see all the files.

Alternative implementation of MEDLINE parsers

There are a few implementation to parse MEDLINE dataset. You can see below if you are interested to these alternative implementations.