I have hundreds of scientific papers stored in my hard disk drive. I try to organize them by author, by field, or by research group. However, many scientific papers
- have more than one author
- are written by more than one research group
- encompass more than one research field.
For example, let us imagine that I have this neat, fictional paper on quantum computing. What if I archive it by:
- research group? Problem: the paper was written by two research groups: one at Stanford and another one at Caltech.
- author? Problem: the paper is written by three “heavy-weight” researchers. Which one to choose?
- research field? Problem: the paper is about quantum computing, which is a multi-disciplinary area. I could archive it in the Quantum Physics folder, or in the Computer Science folder. I could, of course, store it in the Quantum Computing folder, but that may be a bit too specific. What if I have a paper on quantum error-control that is closely related to this quantum computing paper? Shouldn’t these papers be archived in the same folder?
Many questions, few answers. How should I organize my papers so that I can find them in an efficient manner whenever I need them?
Some months ago, I came across an interesting discussion on Nuclear Phynance about this topic. Some people on NP were using iTunes to organize their papers (no kidding! check this out) or some general-purpose document management systems. Nevertheless, these applications were not specifically tailored to manage scientific papers. No general-purpose application can perform well under all possible scenarios. I thus would love to have an application specifically designed to manage scientific papers.
What features should that application have? Well, it should work a bit like iTunes: in an easy, intuitive and efficient manner. One could have all papers stored in one folder. Or, if you prefer, in several folders: a folder for all 2007 papers, another one for all 2006 papers and so on (for example). Then, that application should allow one to add papers to the library (just like what happens in iTunes). It should allow one to specify several different fields for each paper, such as:
- type: is it a conference paper, a journal paper, a technical report?
- event: if it is a conference paper, then one should be able to specify which conference was that.
- title: this one is pretty self-explanatory.
- authors: one should be able to input a list of all the authors.
- keywords: instead of categorizing a given paper in a rigid manner, a fuzzy approach would be better. A given paper does not need to be a math paper or a physics paper, it can be both!
- abstract: this could be a logical extension to the “keywords” field.
- bibliography: this might seem irrelevant, but imagine that you could specify which other papers a given paper refers to! Even better: imagine that the application would immediately scan your library to try to find those papers mentioned in the bibliography!
The problem is: I have many hundreds (if not a few thousands) of papers. Specifying the aforementioned fields for each and every paper in my hard disk drive would be prohibitively time consuming. However, what if a new file format were created? That new format could encapsulate the file (in PDF or PS format, for instance) and all the metadata! To make things simple:
- the paper’s authors would input all the metadata at once, and then publish the files on their webpages.
- we would then simply download that file containing the paper plus all the metadata.
- we would allow the application to manage all the papers for us. That would be wonderful: the application would find the papers mentioned in the bibliography section, it would link to other papers from the same authors, etc.
Does anyone have any ideas on this? Does such an application already exist? The only application that seems to be similar to the one I idealized seems to be Papers (too bad it only runs on Mac OS).
Constructive comments are very much appreciated. Thanks in advance.
October 2, 2007 at 22:55 |
Google Desktop Search is my simple solution after trying many hard ones.
It parses pdf files and, with some additional plug-ins, it even parses all of my LaTeX files. Type in an author, title, or even a snippet of text I think I remember and there they are.
October 3, 2007 at 01:57 |
It is definitely overkill (and vaporware, at this point and for the near future), but there’s a project I’ve been working on that might help you with this (http://scen-connect.sourceforge.net/) by letting you add tags and scores virtually without constraints.
October 3, 2007 at 02:44 |
That iTunes hack is interesting, I’ll have to check it out. There’s something like iTunes for research papers for macs, if you have one (I don’t) – it’s called Papers.
I’ll tell you what I use – Zotero. It’s great. Two disadvantages: it’s only for Firefox (it’s a plug-in), which is fine for me, and as of now, they don’t offer a way to store papers online (although they say they’re working on it).
You may have already come across it. I was faced with the exact same problem as you and found Zotero not too long back.
October 3, 2007 at 07:27 |
I have the same problem and I have used Google Desktop for this purpose before. Problem was that GD was giving my disk a hard time and was making my system slow.
Another annoying thing was that – indeed it parses PDF files – but while doing that some characters produced a system beep, so my machine kept beeping once in every few minutes, quite a paranormal phenomenon until I found out what was going on…
Didn’t go back to GD since then. I’ve been classifying papers in a directory tree structure according to subject or research theme… will check out Zotero, definitely.
October 2, 2007 at 23:38 |
Mike,
Thanks for your input. Indeed Google Desktop Search can be quite useful! I used to rely on it until I found that there was a critical bug in Google’s Desktop application … then I uninstalled it. I suppose they have fixed the bug by now, so maybe I should install it again.
I didn’t know that Google Desktop Search did parse PDF and Latex files. Uhmm… I feel highly tempted to try that out now! :)
October 3, 2007 at 07:38 |
Have you tried Bibdesk? It works well for me, but I have only just under a hundred papers to deal with.
I actually don’t find something like google desktop search (well I use spotlight but same difference) to work that well, because it can’t distinguish between say the title of a paper and a reference. So searching by title gives a big bunch of papers that refer to the one I’m after.
Anyway Bibdesk is ok.
October 3, 2007 at 23:59 |
I have built a web site running on Mandriva Linux with Apache server. I have installed MediaWiki with Semantic wiki extensions. Now I have everything, version control system for storing, semantic web for searching by key words (attributes) and everything available over the Internet… Ideal solution (I was investigating for a year what to do… and then done it in about a month. The web server – PC is running fanless processor with the quietest hard disk on the market. So it just sits still with no noise, and the case is of Mini-itx form-factor with a good design.) You should pay me for this advice :-)
October 4, 2007 at 01:53 |
The most important independent variable is time. Sort by date.
October 4, 2007 at 08:06 |
Have you tried CiteULike yet? It is an online service to organize your papers. It is quite interesting as:
- You can import your existing bibtex database there.
- You can export all papers collected as bibtex.
- It supports the format of e.g. ACM Websites, so if you feed your CiteULike bookmark with an ACM Url to a paper you are interested in, CiteULike will scrap all auther, conference, title information.
- You can tag papers at your will.
- You can leave notes tagged to papers.
- You can decide whether to share your collection with others or keep it visible only to your account.
- You see (if they made the information public) which other users of CiteULike also read the paper you are looking at and can find out about other papers they also read.
Of course the site does not allow for any full text search, and I am not sure whether you can search in the references of a paper.
I mostly like the feature of scraping websites commonly publishing papers and sharing information about interesting papers with others.
October 9, 2007 at 04:28 |
I organize my papers (preferably PDF, otherwise PS) in directories by the first author’s last name. If two authors share the same last name they go in the same directory. I don’t sort by field. However I don’t indiscriminately add every paper I read. This does not seem to cause any confusion. Nothing gets printed out anymore and any paper can be found in 30 seconds. Also they used to fit on a CD which I could burn for newbie entrants to the field. Now it takes two CDs. I also maintain a bibtex file with all these papers. That way I can quickly use emacs or grep to search and add keywords, notes, and opinions which I enter where the abstract is supposed to go. In general the real “index” system is memorized though – usually by first author and year and sometimes journal e.g. W_ and W_ ApJ Supplements 1981. This method is very portable and makes citing papers a cinch when writing papers.
October 21, 2007 at 21:23 |
Dis you try Jabref ?
It’s for managing / editing / searching bibtex file in a convenient way
It also includes a way to manage pdf files, but I do not use it.
Pierre.
December 11, 2007 at 04:17 |
Endnote is a great piece of software.
December 20, 2008 at 11:01 |
Try the new Evernote (www.evernote.com). It has several great features including the ability to add multiple tags, lightning-fast search with highlighting and syncing between multiple computers.
I use this tool to store my collection of tech papers but also to organize all my notes, to-do lists, etc. It’s great.
January 25, 2009 at 08:02 |
I had the same problem as you Rod and I have to say that JabRef is the best solution that I have found. This is how JabRef solves your (also mine and others) needs:
* type:
It allows you to select if the document is an Article, Book, Conference, Master Thesis, etc…
* event
For instance, if the document is an article, you can enter in which Journal it appears. Other document types have different information accordingly
* title
Obviously you can enter the full title of the document
* authors
You can enter a list of authors
* keywords
There is a “general” tab where you can enter a lot of extra information such as a list of keywords
* abstract
There is a special tab for the abstract
* bibliography
It allows you to search from a lot of sites such as JSTOR, CiteSeer, ACM Portal, IEEEXplore, etc.. for citations and articles citing the ones in your reference. This way you can just copy the data to your database without the need to put it yourself.
Also, one of the best things is that you can add a a reference to your local pdf, also references to other filetypes, such as images, and you can include a URL (author’s homepage for instance). So you only have to click and the page is loaded in your browser, or just click and the pdf appears with your pdf reader. Very handy!. Also, it can write the data as meta-data in the pdf.
It is written in Java, therefore in can be run in Windows, Linux, etc…
I think this is the best tool available for the moment (and it’s free). Cheers!
March 14, 2009 at 10:34 |
I’ve been struggling with the same problem for a longer while until I decided to write my own piece of software to organize papers and other documents. I’ve recently released it hoping that someone will be willing to help in further development. It’s cross-platform (written in C++ using Qt4) and should be stable enough to be used (I’m using it for about a month, but some features are still missing). It helps to store, organize, browse, view and cite scientific documents (articles, books etc.) and other types of multimedia (e.g. videos). Keywords, authors, document types, journals etc. can be defined and documents can be grouped and filtered according to them.
Cheers,
Andrzej
March 17, 2009 at 14:41 |
That’s pretty cool! Kudos for the initiative. I have thought about writing my own piece of software too, but I have no time and no stamina to start from zero. I might be interested in collaborating in an on-going project, though. I will take a look at your project as soon as I find some time. Cheers!
April 5, 2009 at 02:09 |
Sure, you are very much welcome to contribute to the project if you find it interesting. Just send me an email to pronobis at users.sourceforge.net . Any contribution is welcome!
Cheers,
Andrzej
May 19, 2009 at 08:14 |
Interesting read through everyone’s comments. I have more than a thousand technical papers in pdf files on my computer and don’t have the will power or the time to enter author, title, source, etc. about each one into some database. I’ve looked at “Papers” – it would be perfect, as it searches various on-line databases for you, and automatically fills in the metadata – the problem is that it favours indexes of medical and pure scientific papers, but doesn’t access any index of engineering papers.
My solution so far is to create a relatively flat directory structure by “Subject” on my hard drive, and place my PDF files in that (sometimes they go into more than one folder). That works reasonably well – I still used to spend some time searching a particular folder for a relevant paper (when I used Windows), but 2 years ago I switched to a Mac, and that problem has mostly gone away. Using Mac’s “coverflow view” or “quickview” features, you can very quickly scan through the files in a particular folder and get a good idea of their contents. If you are not familiar with these features, have a look at the Mac web site or talk to someone you know with a Mac – you’ll be astounded at how useful they are (reason enough to switch to a Mac, as if you didn’t already have 1000 other good reasons – I’ll never go back to Windows :>)
I have seen an interesting program called Yep by Ironic Software that looks promising. I may try that out, but it will have to be easy to use and help me organize things without too much work, or I’ll just stay with my present system. It does automatically assign tags to each file based on where the file resides in your directory structure – that’s a good start for me. For info on Yep see:
Cheers,
John
May 29, 2009 at 10:44 |
The problem is that PDF files (even scientific) have no unified metadata that could be extracted and would assist a PDF management software with downloading a complete reference data from a reliable internet database. Many publishers are now including a DOI, which is good, but some (like PLoS) include many DOIs, for the whole paper and for each figure. What a mess! Most bibliography managers thus rely on various hacks to extract metadata from the Internet. Needless to say, the reliability is low.
Another problem is that many reliable internet repositories are for-profit and don’t have any APIs for free programmatic access. The few exceptions are PubMed, PubMed Central, NASA ADS, arXiv, SPIRES. If you know another one I would like to know about it.
I don’t know what scientific field you are working in, but if you are in biology, medicine or physics, I will recommend Librarian.
Each record may be assigned multiple custom keywords called Categories, that is your fuzzy logic. It is web-based and multi-platfrom. It also allows searching words inside PDF files (not available on Mac).
June 1, 2009 at 20:46 |
After reading through all the posts here and looking at all the cited programs, I find I like Zotero a lot. The program is fairly simple, powerful, and allows you to easily store information about material you find on web sites, as well as any of your own references. And the latest version (2.0) also allows sharing of reference lists with other people in any group you define. And its all open source software!!
June 17, 2009 at 23:40 |
One more interesting add on can be.. the list of references in the current paper actually links to a existing paper in the hard drive.. Something like the current paper being the root node and the other references as the child nodes
January 6, 2010 at 11:53 |
I know this isn’t useful for organizing large quantities of files automatically, but it is useful for sorting through a few new papers on a particular subject. In particular, I require that my students use this format for performing a literature search so that I can quickly look through what they have found. I use a low tech bibtex bst file I call pdfabstract.bst to create a list of papers on a particular topic. I copy/paste bibtex citations from google scholar, add a field for the abstract and copy/paste it from the document (or publishers website if it’s given in plaintext). Given a location of the local pdf, it also provides a link to the file using hyperref. Then I simply type ‘make’ and pdflatex/bibtex does the rest. Sample is at http://www.math.oregonstate.edu/~gibsonn/pdfabstract.zip .
June 4, 2010 at 23:26 |
You should definitely give mendeley a try. It is really easy to use and cross-platform. It is the best software that I have used for organizing academic papers. It is actually better than JabRef that I previously commented.
June 28, 2010 at 10:53 |
Papers is amazingly wonderful. It’s like iTunes built specifically for research papers. However it is Mac-only. It’s probably the main reason why I’m stuck with Apple, even though I really want to switch back to PC.
August 3, 2011 at 11:42 |
Hi Rod
I have found this thread today…and I wondering what you have finally chosen as a “scientific papers data base software”
that satisfies your needs.
Thanks!!!
Alex Nozik
August 3, 2011 at 12:36 |
I use del.icio.us. It’s not perfect, but since it allows me to use tags, I can organize my papers reasonably well. And since it’s online, I have access to it from any computer.
January 17, 2012 at 19:51 |
Mendeley is really an awesome piece of software to manage research papers. You can organize them in folders, link other papers to a particular paper, tags, write your own notes/comments which stays with paper (you don’t have to maintain a different file for your comments/notes, I found this feature very useful), etc…
Worth giving a try…
August 20, 2012 at 10:55 |
Readcube is built specifically to organize scientific papers. It does everything you mentioned here and more. Organize your articles any way you want, highlight and make notes, the articles are full text searchable, and you can export citations easily. There’s even a feature that automatically recommends the most relevant papers to you based on specific part of your library, and the enhanced PDF feature makes everything so accessible (as can be seen from their webreader on nature.com). Best of all it’s free! You can see how I use readcube to save me time and help me read better here.