An Open Knowledge Toolkit – DHSI 2022 Workshop
- Discoverability
- Visible
- Join Policy
- Invite Only
- Created
- 22 Apr 2022
Module 2: Using Open Data and Open Tools – Part 1
Analyzing open data and sharing findings online require some digital tools, and there is a wide variety of tools are freely available for anyone to use for just about any application you can think of. Open scholarship can also include developing open source tools for other researchers to use.
Many of these open tools depend on open source code, another critical element of the Open movement. According to the Open Source Initiative, open source code is similar to Open Data in that it can be reused and redistributed for free, must be available in a form that is easy to use and modify, and cannot restrict who can use it or for what purpose. Open source code should also not restrict the type of technology it can be used with.
The video in this module discusses a few widely used open tools and shows an example of how they can be used in a research context. It describes how to parse–or analyze–XML documents created in Atom (an open source text editor) using Python and a Python library called Beautiful Soup, all within a Jupyter Notebook.
Python
Python is one of the most popular programming languages in the world, and is used for everything from web development to game development to research and education. It is widely used among humanities researchers because it is open source, readable, and flexible, and has a strong community of practice. Another advantage of Python is that it has many libraries and packages–collections of functions for completing various tasks–such as the Natural Language Toolkit for linguistic and textual analysis.
Jupyter Notebook
Jupyter Notebook is an open source platform that allows you to keep different types of information together, including code, text, notes and documentation, data, and visualizations. These notebooks are shareable and interactive, allowing researchers to tell stories about their work.
XML
XML (eXtensible Markup Language) is a tool for marking the structural features of a text. It is called a descriptive markup language because it describes the text it marks up rather than indicating how it should be displayed as HTML does, for example.
Because XML is extensible and stores data as plain text, it is highly interoperable. This means that it is compatible with many different computer systems with different hardware and software, and is also readable to humans and to machines.
Video: Parsing XML with Python
By Luis Meneses
The files Luis refers to in this video are available to download and use in your own Jupyter Notebook.
Activity
Think about (and discuss, if you are completing this module with others) the tools you use in your research. Are any of them open source? What are some advantages and disadvantages of using open source tools in your research?
Think about some potential applications of the tools discussed in the video. What kind of research questions could you answer using these tools?
Resources
The Python Software Foundation
What are Jupyter Notebooks? Why Would I Want to Use Them?
What is XML and Why Should Humanists Care? An Even Gentler Introduction to XML
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.