========= Resources ========= Here are some useful resources for downloading MEDLINE and PubMed Open Access (PubMed OA) XML data. Links to download PubMed OA and MEDLINE dataset ----------------------------------------------- Below, we provide links for downloading PubMed OA and MEDLINE data - `PubMed Open-Access (OA) `_ dataset is available at ``http://www.ncbi.nlm.nih.gov/pmc/tools/ftp/``. Here is the `FTP link `_ for downloading the bulk of dataset. In the FTP link, you can go to `oa_bulk folder `_ to see the full available tar files. - the MEDLINE XMLs are available here ``ftp://ftp.nlm.nih.gov/nlmdata/.medleasebaseline/gz/`` - the MEDLINE XMLs weekly updates are available here ``ftp://ftp.nlm.nih.gov/nlmdata/.medlease/gz/`` - MEDLINE Document Type Definitions (DTDs) file is available at this `link `_. We can use it to see available tags from a given MEDLINE XML. Download PubMed OA figures -------------------------- Here, we explain how to download PubMed OA figures corresponded to the parsed information from ``parse_pubmed_caption`` function - In ``pubmed_parser``, you can use ``parse_pubmed_caption`` to parse figures (to be specific ``figure_id``) and captions corresponding to a manuscript. - To download the images corresponding to a given ``PMC`` or ``PMID``, you can download a CSV file from ``ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_file_list.csv`` first. The file will have columns ``PMID``, ``Accession ID`` (``PMC``), and ``File``. In ``File`` column, you can see the path to download a tar file of an XML and corresponding figures in the following format ``oa_package/08/e0/PMC13900.tar.gz``. - You can use the path to download a tar file for a given ``PMID`` or ``PMC`` in a following format: ``ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/08/e0/PMC13900.tar.gz``. If you want to download all the tar files, check out ``ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/`` to see all the files. PMC Copyright Notice -------------------- When you use Pubmed Parser to parse information from the website, do not download them as a bulk. Your IP might get banned from doing it. Please see copyright notice when you scrape data from website `here `_. Alternative implementation of MEDLINE parsers --------------------------------------------- There are a few implementation to parse MEDLINE dataset. You can see below if you are interested to these alternative implementations. - `MEDLINE Kung-Fu `_ which uses `medic `_ to parse MEDLINE to database - `MEDLINEXMLToJSON `_ implemented in JavaScript