The power of notes in research data

Very few research databases or data collections that are being populated through human labor can do without a field or other opportunity to enter notes. Notice that the following does not apply only to ‘real’ databases, but to any collection of data. It may be a folder with interview transcriptions. In such a case the notes do not go into a ‘field’ but in a separate section of the transcription, or in the form of comments in the word processing file. At least five possible uses of notes can be identified.

Five possible uses.

First of all, and perhaps the least surprising use: the notes can be used to point out outstanding values, interesting findings, or remarkable cases. ‘Hey, that is funny! Here there is …’ is typically the stuff that discoveries are made of. Usually, the notes will contain the more ‘mundane’ findings, but they are still important enough to write down. Often the point of writing down the notes is simply to review and become acquainted with the data.

Secondly, data editors (assistants, researchers and others) can use the notes to add data that does not fit elsewhere in the data structure, but that they for one reason or another deem important to add anyways. In other words, they voluntarily enrich the data. If the data structure is in the early stages of its use or life cycle, this is something to keep track of, by regularly checking what kind of data editors have added. In case certain kinds of additions occur often or relatively often, this may point out an omission in the data structure.

Thirdly, the notes can be used to write down qualifications about data elsewhere in the data structure. Take for example, a start year that has to be entered for an event in a historical data set. Next, suppose that the archival sources are unclear about when a particular event started. If the data structure does not allow the data editor to express such uncertainty, he/she may look for other locations where to enter this, and the notes could be the ideal location. The may enter ‘Start year uncertain. Could be earlier than 1655’. Again, if data editors often use the notes for such qualifications of data, this may be a reason to adjust the data structure so that these qualifications can be formalized and integrated, which in my view will lead to an improvement of the data quality.

Fourthly, notes can be used by data editors to communicate with each other, for example about the status of a piece of data : a transcription of an interview, a row in a spreadsheet, or a record in a relational database. Here too, regular reviewing of the notes may lead to identification of patterns in such communications that can then be used to improve the data collection’s structure or interface.

Fifthly, there is the ‘other’ use of the notes. The unforeseen use. Above, a number of possible uses is identified, but the list is not comprehensive. In fact, one could claim that it can not be comprehensive, which is where the power of notes really shows itself.

Reviewing the notes to unleash their power

In the previous, it becomes clear that an import part of the data management is to review the notes on a regular basis, in order to see if patterns in their use emerge. If so, one should consider taking action by adjusting the data structure, the interface, the instructions for the data editors, work processes and so on.

Depending on the adjustments, the data concerned should be reviewed and changed so that the notes that triggered the adjustments can be deleted. After all, the new structure and work processes have incorporated the particular use of the notes. For example, the start year for the event in the historical database, received an additional field that allows to indicate that the start could have been earlier than the year entered.

