Tutorial: Build your own data collection app in R

Author
Affiliations

Sebastian van Baalen

Department of Peace and Conflict Research

Uppsala University

Published

January 7, 2025

Are you embarking on a data collection project that aims to create a new dataset? Exciting! This tutorial describes how to make an online app that allows you and your team of coders to input information and save it to your dataset. I’ve developed several data collection apps for building datasets on electoral violence, wartime protest events, and election boycotts, and I’ve found it both rewarding and very useful.


The main user interface

Editing function
Figure 1: A complete data collection app

Building a working app is not terribly difficult, but because there are no full tutorials available it still took me quite some time to arrive at an app that works, is easy to customize, and includes all the tools I envisioned. With this tutorial, albeit lengthy, I hope to make it easier for you to develop your own data collection app so that you can focus your attention on the data collection itself.

The tutorial should be helpful for anyone, regardless of the subject of their data collection, as long as the end goal is to produce a dataset consisting of variables (columns) and observations (rows). However, since I am a conflict researcher, the examples here will focus on variables that are typical for collecting violent event data.

Try out the tutorial example app here. All code for the tutorial app is available from my Github. Part of the code is based on three existing (but slightly different and less extensive) tutorials shared by Dean Attali, Niels van der Velden, and Dan Turner.

Why use a data collection app?

Building a data collection app takes some time and involves some minor costs, at least if you want to make it available to other members of your team through a server at Shinyapps.io. Many data collection projects work perfectly fine by simply recording information in an Excel spreadsheet. So why bother building an app?

I have used custom-built data collection apps for three different data collection projects and believe there are a number of advantages (see also Dan Turner’s tutorial):

  1. An app ensures that all data is recorded uniformly, reducing the data cleaning burden. Inputation in Excel has a tendency to generate a lot of minor formatting errors, especially when many people work on the same spreadsheet.

  2. An app ascertains that all coders stick to the codebook, limiting errors. The app template involves specifying available options and ranges for the dataset variables, meaning that coders cannot input unavailable options without first asking the project leader to update the app source code. Changes can still be made, but the app forces the team to agree on any changes and subsequently update the app and codebook.

  3. An app makes the coding more intuitive, minimizing error. Many data collection efforts involve categorical variables expressed as integers, increasing the risk of human error. A Shiny app makes it possible to work with character values instead and transform values to numbers automatically, which makes it easier for coders to select the correct value. And an app makes it possible to add information from the codebook directly in the user interface, meaning that coders can be reminded of the coding rules without having to flip between the spreadsheet and the codebook.

  4. An app prevents accidental deletion or edits of existing rows. Large spreadsheets with thousands of rows increases the risk that coders accidentally edit the wrong row or overwrite existing rows. With an app, coders have no access to the spreadsheet and never need to enter it directly.

  5. An app can draw on existing data, reducing the work burden. Variable values are sometimes sourced from other dataset to ensure compatibility, a process that can be automated. In my own projects, coders often need to enter the corresponding UCDP conflict ID, an ID that is automatically sourced in the app.

  6. An app can integrate automatic quality checks, improving data reliability. Beyond the coding interface, a Shiny app can make use of data visualization and processing tools to automatically check that the data is coded correctly, for example, by listing all observations with missing values or with certain unwanted features.

Of course, there are also a few drawbacks:

  1. Building an app can seem daunting, especially if you have limited experience working in R or Shiny. My hope is that this tutorial flattens the learning curve, making this useful tool available to more people. The payoff is likely greater if you plan to have many people involved in the data collection over a longer period of time, and if you intend to collect data on many different variables.

  2. Depending on how many hours the coding team will spend working with the app active, you may need to pay for additional server time. However, this cost is relatively minor, and negligible when compared to the cost of hiring a programmer.

Suggestions on workflow

You can of course build the app in whatever order you prefer. However, my experience is that its useful to at least start doing things in a particular order to make the process easier.

  1. Ensure that you have a sufficiently developed codebook for your data collection project. Otherwise you won’t know what input controls to add to your app, what variable values to enable, and in what order to organize your variables. You don’t need a fully developed codebook (developing the app is a good way to further polish it), but you do need a basic overview.

  2. Develop a basic user interface. Start adding a few input controls, ideally at least one for each type of variable so that you can test that they work.

  3. Specify the server logic and link the app to a MySQL database. Maintain a relatively small number of variables to allow yourself to test the app properly before making it more complex.

  4. Add all remaining variables and test again. Have multiple users test all functions and pilot code some observations to ensure there are no serious bugs.

Getting started

The data collection app we will build uses R and Shiny, so the tutorial requires some familiarity with both. If you have never worked with Shiny before, take some time to familiarize yourself with the basic logic and functions. I’ve found Shiny to be quite easy and intuitive, and the Shiny webpage has excellent documentation.

The basic app structure and idea is relatively straightforward. Using Shiny, we will create a user interface (UI) with input controls for the variables in our dataset, and buttons to save, edit, and delete dataset rows. Because Shiny does not store user inputs, we will then link our app to a remote MySQL database that stores our dataset. Finally, we will deploy our app to Shinyapps.io so that it can be used by an entire team of project leaders and coders.

To get started, open R Studio and create a new R project and select Shiny application. Create a basic app structure that consists of the code for the UI (ui), the server logic (server), and the code to launch the app (shinyApp). Below, I save all code in a single app.R file, but you can also split them up into a ui.R and server.R file. An advanced data collection app will require quite a lengthy script, so I prefer to also include headers to help keep track of and navigate the code. Throughout the tutorial, I will indicate whether code chunks should be added to one of the three main sections: preamble, UI, or server logic.

#### Preamble ####

# This is where we will load libraries, define variable options, and define
# useful functions

##### Load packages ####

library(shiny)

#### Define UI ####

ui <- fluidPage(
  
  # This is where we will add the code for creating the UI
  
)

#### Define server logic ####

server <- function(input, output, session) {
  
  # This is where we will add the code for saving, editing, and deleting dataset rows,
  # as well as for other functionalities
  
}

#### Run the application ####

shinyApp(ui = ui, server = server)

Later on, we will need several additional packages to add functionalities to our app, so you can go ahead and install and load the following packages in your script as well:

--> Add to preamble

library(shiny)
library(shinythemes) # To select a theme
library(DT) # To display an interactive dataset table
library(tidyverse) # To perform data manipulation tasks
library(RMariaDB) # To communicate with the MySQL database
library(pool) # To manage connections with the MySQL database
library(shinyalert) # To display informative dialogue boxes
library(shinyjs) # To add Javascript to the app
library(ggplot2) # To visualize your data
library(leaflet) # To create interactive maps
library(uuid) # To create unique identifiers for each entry
library(shinyhelper) # To add helper notes
library(lubridate) # To work with dates

Step 1: Design the app UI

My existing data collection apps use the navbarPage layout, with a menu on the top, a data inputation form on the left, and an overview of the full dataset on the right. You can explore and experiment with other Shiny layouts, but I’ve found that this layout works well for our purposes. The UI will include two main components: code that creates the overall layout of the app, and code that determines the input controls that we will use to record the variables in our dataset. The below code creates a very basic UI with three input controls that allow the user to select a coder name, a reviewer name, and define the status of their entry. Running the code should yield the app displayed in Figure 2.

ui <- fluidPage(
  navbarPage(
    title = "A Basic Data Collection App",
    theme = shinytheme("sandstone"),
    tabPanel(
      "Let's start coding",
      sidebarLayout(
        sidebarPanel(
          h3("Welcome to our coding platform"),
          "Use the form below to start coding. Use the help buttons 
            to consult the code book for further details on specific variables and coding rules.
            Once you have filled in all relevant fields, click the ADD RECORD button 
            at the bottom of the form to save the record.",
          br(),
    
          ##### Coding form ####
          
          h3("Start coding"),
          br(),
          br(),
          fluidRow(
            column(4, selectInput("coder_input", label = "Select coder name", choices = c("Coder 1", "Coder 2"), selected = "")),
            column(4, selectInput("reviewer_input", label = "Select reviewer", choices = c("Coder 1", "Coder 2"), selected = "")),
            column(4, selectInput("status_input", label = "Coding status", choices = c("Not finished", "Finished"), selected = "Not finished"))
          )
        ),
        mainPanel()
      )
    )
  )
)

The key argument here is selectInput, which creates an input control for the variables coder, reviewer, and status (three meta variables I use in one of my data collection projects). selectInput is ideal for categorical values, and forces the user to pick one of several pre-defined variable values, in this case the user names specified in choices = c("Coder 1", "Coder 2"). We use the suffix _input to distinguish the between variables that temporarily record user input and the variables in the final dataset (more on that later).

There are several additional things to note about the code. First, note that we’ve selected a theme using theme = shinytheme("sandstone"). You can find your own favorite theme in the shinythemes repository. Second, note the use of tidy indentation to clearly distinguish the code hierarchy. Using such formatting is optional, but I find it very helpful when you write a long app script that may need to be edited as the data collection project develops. Third, note that we’ve enclosed the input controls in a fluidRow, which allows me to put three input controls side by side instead of on top of one another.


Defining possible variable values

Before we proceed to look at the different input controls that may be useful in a data collection app, let’s revise the code slightly to make it easier to keep track of our variable values and update them in the future. The code above specifies the possible values for coder directly in the input control. While this works well when you only have a few values that are unlikely to change, it is less practical when you have many values or want to add values as you inductively discover more possible values that you want to record in the dataset.

I recommend storing possible variable values in character vectors and then calling the vectors in the input control. The updated code below adds a subsection to the preamble that defines the options for all categorical variables in our dataset (using the _options suffix). Since both the coder and reviewer variables should have the same options, we call the vector users_options. We can then refer to this vector in the input controls.

The benefit of this solution is that you can now edit the available options for both variables simply by editing the users_options vector, a functionality that will be even more useful once we start building the editing function.

--> Add to preamble

##### Define variable options ####

# User options

users_options <- c(
  "",
  "Coder 1",
  "Coder 2",
  "Coder 3"
)

--> Revise the input controls in the UI

fluidRow(
  column(4, selectInput("coder_input", "Select coder name", choices = users_options, selected = "")),
  column(4, selectInput("reviewer_input", label = "Select reviewer", choices = users_options, selected = ""))
)

An alternative for ordinal variables

For ordinal scale variables, such as different Lickert scale items, we can use the radioButtons control. Below, we define the options for the reliability variable, and the choiceValues argument to make the app transform the verbatim user input into integer values from 0 to 4.

--> Add to preamble

#### Define variable options ####

# Reliability options

reliability_options <- c(
  "Very poorly documented",
  "Poorly documented",
  "Moderately documented",
  "Well documented",
  "Very well documented"
)

--> Add to the UI

radioButtons("reliability_input", "Indicate how well-documented the event is", choiceNames = reliability_options, choiceValues = c(0, 1, 2, 3, 4), selected = 2)

Add input control for binary variable

For binary variables, we can use the checkboxInput control and set the default value to FALSE.

checkboxInput("violence_input", "Was the event violent?", value = F)

Add input control for numerical variable

For numerical variables, we can use the numericInput control. Below, we use the min and max options to limit what values the coders can select, thereby ensuring that the resulting data cannot include any incorrect values. Note that we have set the default value to 1946.

numericInput("year_input", "Select a year", 1946, min = 1946, max = 2022)

Add input control for date variable

For date variables, we can use the dateInput control. Below we use the min and max options to limit what dates the coders can select, thereby ensuring that the resulting data cannot include dates outside the dataset’s temporal scope. Note that we have set the default value to 1 January 2023.

dateInput("event_date_input", "Select the event date", value = "2023-01-01", min = "1946-01-01", max = "2023-12-31")

Add input control for free text variable

For free text variables, we can use the textAreaInput control. Below we use the placeholder option to exemplify what text the coders could enter into the field. We can also use the width and height options to determine how large the text box should be.

textAreaInput(
  "source_input",
  "Enter the full source text and reference(s) used to code the event",
  placeholder = "Sebastian van Baalen & Abel Gbala (2024) Patterns of electoral violence during Côte d'Ivoire's third-term crisis. African Affairs 122(488).",
  width = "100%", height = "100px"
)

Add functions to auto-generate variables

Sometimes, it can also be useful to write functions that auto-generate some variables. Here, we add two functions to our preamblethat we can later call on to automatically assign a unique value to the id variable and a date to the coding_date variable. While coding_date is optional, the id variable is important, as the app will later use this variable to identify what rows to edit or delete.

--> Add to preamble

# Function to generate unique ID using UUID
generateUniqueID <- function() {
  uuid::UUIDgenerate()
}

# Function to add the coding date
generateDate <- function() {
  today <- format(Sys.Date(), "%Y%m%d")
  as.numeric(today)
}

Action buttons

So far, we have added key input controls to our app. Now it’s time to add some action buttons that will later be used to save, edit, and delete dataset rows, and download the entire dataset as a csv file. To add these buttons, we add the following to the beginning of the UI code (after h3("Start coding")):

--> Add to UI

h3("Start coding"),
  br(),
  fluidRow(
  column(
    width = 4,
    downloadButton("download_button", "Download as CSV", class = "btn-info")
  ),
  column(
    width = 4,
    actionButton("edit_button", icon = icon("pen-to-square"), "Edit record", class = "btn-warning")
  ),
  column(
    width = 4,
    actionButton("delete_button", icon = icon("trash"), "Delete record", class = "btn-danger")
  )

Moreover, we add the following to the end of the UI code (after the last input control in the sidebarPanel):

actionButton("add_button", icon = icon("plus"), "Add record", class = "btn-success")

Step 2: Connect the app to a MySQL database

Creating the UI takes some time, but is relatively easy and straightforward. The tricky part (which I myself have learned the hard way) is getting our app to communicate with an external MySQL database. Let’s break down the different steps.

Creating a MySQL database

The first thing we need to do is to create a MySQL database. There are several different hosting options. I use freesqldatabase.com, which provides free hosting for smaller databases and cheap options for somewhat larger ones. Follow their instructions and create a database. Once you are done, click on your database and select “New” to create your dataset (Figure 3).


Creating the dataset

Once we have created the database, its time to add the variables (columns). Click on STRUCTURE and follow the instructions to add variables (Figure 4). Ensure that you select the correct variable type for each variable: TEXT for variables produced by textAreaInput or selectInput; INT for variables produced by numericInput and radioButtons; and DATE for variables produced by dateInput. For binary variables created using checkboxInput, we use the INT type, but set the default to NULL. Don’t forget to give your dataset a name (tutorial_dataset), and to define the collation (utf8_unicode_ci).

Linking the app to the MySQL database

Once we have a MySQL database with the corresponding variables, it’s time to connect our app to the database using the RMariaDB package. To connect it, we create a new subsection in the preamble of our app script and include the below code. This code establishes a SQL connection, creates an empty dataframe, and creates a function for fetching the data already recorded in the MySQL database. Note that we need to fill in our information in both the con and getDataFromMySQL functions.

--> Add to preamble

##### MySQL-related commands ####

# Define the SQL connection
con <- dbPool(
  drv = RMariaDB::MariaDB(),
  dbname = "your_database_name",
  host = "your_host_name",
  user = "your_user_name",
  password = "your_password",
  encoding = "utf8"
)

# Define the reactiveValues object
values <- reactiveValues(dataframe = NULL, selected_rows = NULL)

# Function to retrieve data from MySQL table
getDataFromMySQL <- function() {
  query <- "SELECT * FROM my_dataset"
  dbGetQuery(con, query)
}

# Create initial dataframe from MySQL table
values$dataframe <- getDataFromMySQL()

Next, we create the function addRowToMySQL which will update the MySQL database with a new row from the app when we press the submit button. This step is where I’ve most frequently experienced problems, as it is easy to misspell variable names or forget a question mark. Note that each variable (don’t forget the auto-generated id and coding_date variables) must appear in the INSERT INTO call and in the params vector.

# Function to update the MySQL table with new row
addRowToMySQL <- function(new_row) {
  query <- "INSERT INTO my_dataset (id, coding_date, coder, reviewer, status, reliability, violence, year, event_date, source) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
  params <- c(new_row$id, new_row$coding_date, new_row$coder, new_row$reviewer, new_row$status, new_row$reliability, new_row$violence, new_row$year, as.character(new_row$event_date))

  poolWithTransaction(con, function(conn) {
    dbExecute(conn, query, params = params)
  })
} 

Finally, we add the below code at the end of the server logic to ensure that the app closes the connection to the MySQL database when we close the app:

# Function to close MySQL connection ####
on.exit(function() {
  dbPoolClose(con)
})

Step 3: Specifying the server logic

Once we have built a user interface and connected the app to our MySQL database, it is time to get our app to do things by specifying the server logic.

Display the dataset

To add a tabPanel that displays the dataset, we add the below code to the UI, after the sidebarPanel.

--> Add to UI

mainPanel(
  tabsetPanel(
    tabPanel(
      "Our dataset",
      h3("Show our dataset"),
      br(),
      tags$hr(),
      br(),
      DTOutput("data_table")
    )
  )
)

Moreover, to make our app load and display the dataset, we add the below code somewhere in our server logic. df_ordered <- values$dataframe creates a dataframe that can be displayed using the datatable call. Launch the app to make sure that it displays the dataframe (even if the dataframe is still empty).

--> Add to server logic

output$data_table <- renderDT({
  req(values$dataframe)
  
  df_ordered <- values$dataframe
  
  datatable(
    df_ordered,
    options = list(
      dom = 'tfip',
      pageLength = 30,
      autoWidth = TRUE,
      searching = TRUE,
      order = list(list(2, 'desc')),
      scrollX = TRUE,
      columnDefs = list(
        list(
          targets = "_all",
          render = JS(
            "function(data, type, row, meta) {
              if (type === 'display' && typeof data === 'string') {
                if (data.length > 20) {
                  return data.substr(0, 20) + '...';  // Truncate long strings
                }
                return data;  // Return short strings as-is
              }
              return data;  // Return other types (e.g., numbers, dates) as-is
            }"
          )
        )
      )
    ),
    rownames = FALSE
  )
})

Activate the submit button

Above, we specified the addRowToMySQL function to make it possible to add rows to the MySQL database table. Now its time to activate the submit button to tell the app what information to push to the dataset.

Here, we can use the Shiny command observeEvent(input$add_button,{}) to define what should happen when the user clicks the submit button. The upper part of the code chunk adds the inputs from the input controls to the new_row vector, after which we use our addRowToMySQL function to append the new row to the dataset.

Note that while most variables fetch their information from the input controls, id and coding_date use information from the functions specified in our preamble. Also note that variables recorded through the radioButtons input control need to be wrapped in the as.integer() call to ensure they are recorded as integers. Moreover, we use the different update commands to specify what values should be displayed in the input controls after we have submitted the row. Finally, as a finishing touch we can make use of shinyalert to open up a dialogue box that declares to the user that the row was successfully added to the dataset.

Congratulations, you should now have a minimally working app! I usually pause here for a moment to celebrate partial success, and to try adding a few rows to make sure everything works.

--> Add to server logic

#### Add a new row to the data table ####
  
observeEvent(input$add_button, {
    
  # Create the new_row object from user inputs
  new_row <- data.frame(
    id = generateUniqueID(),
    coding_date = generateDate(),
    coder = input$coder_input,
    reviewer = input$reviewer_input,
    status = input$status_input,
    reliability = as.integer(input$reliability_input),
    violence = input$violence_input,
    year = input$year_input,
    event_date = input$event_date_input,
    source = input$source_input
  )
    
  # Add row to dataset
  addRowToMySQL(new_row)
  
  # Refresh the dataframe by re-reading data from the MySQL database
  values$dataframe <- getDataFromMySQL()
  
  # Clear input values
  updateSelectInput(session, "coder_input", selected = NULL)
  updateSelectInput(session, "reviewer_input", selected = NULL)
  updateSelectInput(session, "status_input", selected = "Not finished")
  updateRadioButtons(session, "reliability_input", selected = 2)
  updateCheckboxInput(session, "violence_input", value = FALSE)
  updateNumericInput(session, "year_input", value = 1946)
  updateDateInput(session, "event_date_input", value = "2023-01-01")
  updateTextInput(session, "source_input", value = "")
    
  # Open alert box
  shinyalert(title = "Boycott record successfully added!", type = "success")
})
Add debugging aids to log all entries

If you run the app from R, you will notice that the app we have created produces no output log. This lack of a printed record makes it hard to trouble-shoot issues. It also means there is no record in case the app suffers from bugs that don’t break the app.

To avoid this scenario, it is worth adding debugging aids to all functions in the app. Simply use print() in the functions to record user input. For example, to print the information entered through the input controls, add print(new_row) to the addRowToMySQL function.

The full app code on Github includes examples of debugging aids that I find useful for maintaining the app.

--> Add to addRowToMySQL
print(new_row)

# Example output
                                    id coding_date   coder reviewer   status reliability violence year event_date source
1 6254394d-a88d-4df8-9a48-0305555264ac    20241218 Coder 1  Coder 2 Finished           2    FALSE 1946 2023-01-01   Test

Activate the delete button

To activate the delete button, we first add the below function to the preamble of our app script. The function tells the app to delete the row currently selected in the dataset viewer.

--> Add to preamble

# Function to delete a row from MySQL table
deleteRowFromMySQL <- function(row_id) {
  query <- "DELETE FROM my_dataset WHERE id = ?"
  poolWithTransaction(con, function(conn) {
    dbExecute(conn, query, params = list(row_id))
  })
}

We then add the below code to the server logic of our app. The call executes the deleteRowFromMySQL function, but only after double-checking that the user has only selected a single row and is certain that they want to delete the selected row. Pretty neat, huh?

--> Add to server logic

#### Delete a row from the data table ####
observeEvent(input$delete_button, {
  selected_rows <- input$data_table_rows_selected
  
  if (length(selected_rows) != 1) {
    showModal(
      modalDialog(
        title = "Warning",
        "Please select only one record to delete.",
        easyClose = TRUE,
        footer = NULL
      )
    )
  } else {
    showModal(
      modalDialog(
        title = "Delete confirmation",
        "Are you sure you want to delete the selected record?",
        footer = tagList(
          modalButton("Cancel"),
          actionButton("confirm_delete_button", "Delete", class = "btn-danger")
        )
      )
    )
  }
})

Enable editing

The last step before we are ready to deploy our app is to enable users to edit existing rows. Building the editing function is a little delicate. But I promise that its worth it; building datasets involves a lot of revisions as the codebook develops, especially in the beginning of a project. So for my data collection projects, an editing function has been an absolute must!

We begin by creating the updateRowInMySQL function in the preamble, which will take values from the updated_row vector and use the id variable to update the selected row in the dataset. An important quirk here is that updated_row$id must be the last entry in the params vector (don’t ask me why).

--> Add to preamble

# Function to update a row in the MySQL table
updateRowInMySQL <- function(updated_row) {
  query <- "UPDATE tutorial_dataset SET coding_date = ?, coder = ?, reviewer = ?,
  status = ?, reliability = ?, violence = ?, year = ?, event_date = ?, source = ? 
  WHERE id = ?"
  
  params <- c(updated_row$coding_date, updated_row$coder, updated_row$reviewer,
              updated_row$status, updated_row$reliability, updated_row$violence, 
              updated_row$year, as.character(updated_row$event_date), 
              updated_row$source, updated_row$id)
  
  # Error handling
  result <- try({
    rows_affected <- poolWithTransaction(con, function(conn) {
      dbExecute(conn, query, params = params)
    })
    print(paste("Rows affected: ", rows_affected))
  }, silent = TRUE)
  
  if (class(result) == "try-error") {
    shinyalert::shinyalert("An error occurred. Go talk to your project leader!", type = "error")
  }
  
  poolWithTransaction(con, function(conn) {
    dbExecute(conn, query, params = params)
  })
}

The server side code consists of two code chunks: one call to open an editing dialogue box and populate the input controls with the recorded values (observeEvent(input$edit_button, {})), and one call to save the edits (observeEvent(input$save_edit_button, {})). Note that we now switch to the input_edit suffix for our variables to ensure that all code refers to the input controls in the editing dialogue box and not in the main user interface.

--> Add to server logic

#### Edit a row in the data table ####
  
observeEvent(input$edit_button, {
  selected_rows <- input$data_table_rows_selected
  
  # Debugging aid
  print(paste("Selected row when clicking edit button: ", values$dataframe[selected_rows, "id"]))
  
  # Error handling to prevent selection of multiple rows
  if (length(selected_rows) != 1) {
    showModal(
      modalDialog(
        title = "Warning",
        "Please select only one row to edit.",
        easyClose = TRUE,
        footer = NULL
      )
    )
  } else {
    row_id <- values$dataframe[selected_rows, "id"]
    values$selected_rows <- selected_rows
    selected_row <- values$dataframe[selected_rows, ]
    
    # Pre-fill input controls with data from the selected row
    updateSelectInput(session, "coder_input_edit", selected = selected_row$coder)
    updateSelectInput(session, "reviewer_input_edit", selected = selected_row$reviewer)
    updateSelectInput(session, "status_input_edit", selected = selected_row$status)
    updateRadioButtons(session, "reliability_input_edit", selected = selected_row$reliability)
    updateCheckboxInput(session, "violence_input_edit", value = selected_row$violence)
    updateNumericInput(session, "year_input_edit", value = selected_row$year)
    updateDateInput(session, "event_date_input_edit", value = as.Date(selected_row$event_date))
    updateTextAreaInput(session, "source_input_edit", value = selected_row$source)
 
    # Show modal box for editing
    showModal(
      modalDialog(
        title = "Edit record",
        size = "l",
        fluidPage(
          fluidRow(
            column(6, selectInput("coder_input_edit", "Select coder name", choices = users_options)),
            column(6, selectInput("reviewer_input_edit", "Select reviewer", choices = users_options)),
            column(6, selectInput("status_input_edit", label = "Coding status", choices = c("Not finished", "Finished"))),
            column(6, radioButtons("reliability_input_edit", "Indicate how well-documented the event is", choiceNames = reliability_options, choiceValues = c(0, 1, 2, 3, 4), selected = 2)),              column(6, checkboxInput("violence_input_edit", "Was the event violent?", value = FALSE)),
            column(6, numericInput("year_input_edit", "Select a year", 1946, min = 1946, max = 2022)),
            column(12, dateInput("event_date_input_edit", "Select the event date", value = "2023-01-01", min = "1946-01-01", max = "2023-12-31")),
            column(12, textAreaInput(
              "source_input_edit",
              "Enter the source",
              placeholder = "Sebastian van Baalen & Abel Gbala (2024)...",
              width = "100%", height = "100px"
            )),
            column(12, actionButton("save_edit_button", "Save edits", class = "btn-success"))
          )
        ),
        footer = modalButton("Dismiss")
      )
    )
  }
})

#### Save  edits ####
  
observeEvent(input$save_edit_button, {
    
  # Retrieve the row ID of the record being edited
  selected_rows <- values$selected_rows
  row_id <- values$dataframe[selected_rows, "id"]
  
  # Debugging aid
  print(paste("Selected row when saving edit: ", row_id))
  
  if (!is.null(row_id)) {
    # Create an updated_row object with the edited data
    updated_row <- data.frame(
      id = row_id,
      coding_date = values$dataframe[values$dataframe$id == row_id, "coding_date"],
      coder = input$coder_input_edit,
      reviewer = input$reviewer_input_edit,
      status = input$status_input_edit,
      reliability = as.integer(input$reliability_input_edit),
      violence = as.integer(input$violence_input_edit),
      year = input$year_input_edit,
      event_date = as.Date(input$event_date_input_edit),
      source = input$source_input
    )
    
    # Debugging aid
    print("Variable values recorded by the user in the edit modal:")
    print(updated_row)
    
    # Update the database with the edited row
    updateRowInMySQL(updated_row)
    
    # Debugging aid
    print(paste("Updated row id: ", updated_row$id))
    
    # Update only the modified row in the dataframe
    values$dataframe[values$dataframe$id == row_id, ] <- updated_row
    
    # Debugging aid
    print("Dataframe updated with new values.")
    
    # Remove the modal and show success notification
    removeModal()
    shinyalert(title = "Record successfully updated!", type = "success")
    
    # Refresh the table output
    output$data_table <- renderDT({
      req(values$dataframe)
      datatable(
        values$dataframe,
        options = list(
          dom = 'tfip',
          pageLength = 30,
          autoWidth = TRUE,
          searching = TRUE,
          order = list(list(2, 'desc')),
          scrollX = TRUE,
          columnDefs = list(
            list(
              targets = "_all",
              render = JS(
                "function(data, type, row, meta) {
                  if (type === 'display' && typeof data === 'string') {
                    if (data.length > 20) {
                      return data.substr(0, 20) + '...';  // Truncate long strings
                    }
                    return data;  // Return short strings as-is
                  }
                  return data;  // Return other types as-is
                }"
              )
            )
          )
        ),
        rownames = FALSE
      )
    })
  } else {
    shinyalert(title = "Error: No row selected!", text = "Please try again.", type = "error")
  }
})

Activate the download button

Before we take the final step and deploy our app, let’s activate the download button so that we can easily download a CSV file that contains our dataset. In my previous projects, I have made it a practice to download a copy every 2-3 days to ensure there are backups. So let’s add the below code to our server logic. Note that you can tweak the filename argument; its currently set to provide the dataset name and today’s date.

--> Add to server logic
output$download_button <- downloadHandler(
  filename = function() {
    paste("Tutorial dataset ", Sys.Date(), ".csv", sep = "")
  },
  content = function(file) {
    write.csv(values$dataframe, file, row.names = FALSE)
  }
)

Done!

That’s it, we should now have a working data collection app. Try it out locally by launching the app in R and testing that all the functionalities work as expected. The full app code is available on my Github.

Step 4: Deploy to Shinyapps.io (optional)

If you are the only one working on the app, or if all your collaborators have access to the app source scripts (e.g. through a cloud sharing account), you can run the app locally on your computer through RStudio. But the really neat thing about our data collection app is that we can deploy it to a Shinyapps.io server, making it available to all our team members without having to use Rstudio.

Password protection

Before deploying your app to Shinyapps.io, I strongly urge you to read the subsection on password protection. Unless you password protect the app, anyone with the URL will be able to add, edit, and delete observations in your app.

To deploy your app, first go to Shinyapps.io and create a personal account. The basic plan is free, but if your coders will spend a lot of time working in the app, it is worth getting one of the paid plans to ensure that there is enough run time.

Once you have an account, add your account in RStudio by clicking “Publish”. Once you have added your account, you should be able to determine what files to push to the server. Click “Publish” and go to the URL to see if your app is up and working. And that’s it!

Questions and recommendations?

Do you have any questions, issues with the code, requests for more advanced functionalities, or just inspiring examples of how you’ve used this tutorial? Send me an email and let me know so that I can update the tutorial accordingly! And if you end up making a data collection app using this tutorial, send me a screenshot of your final app and I will display it in this tutorial as inspiration for others.

Advanced functionalities

Once we have a fully working app that includes input controls for all variables that we want to collect data on, we can start customizing the app further and add more advanced functionalities. Some of these functionalities are very useful, such as password protection and automatic quality checks, whereas others are primarily fun but not super necessary additions.

The examples below help demonstrate the basic logic of customizing your data collection app. If you are comfortable in R, the sky is the limit and you can add whatever advanced functionalities that you are capable of coding.

Password protection

Some of the more expensive Shinyapps.io plans include the option to password protect your app. However, although its probably less safe, you can also build your own login panel and password protection using the below code:

--> Add to preamble

##### Define user credentials ####

# Define a list of valid usernames and passwords

valid_users <- c("user1", "user2", "user3", "user4")
valid_passwords <- c("password1", "password2", "password3", "password4")

--> Add to UI (before navbarPage)

useShinyjs(),
  hidden(textInput("row_id_hidden", "")),
  
  # Login panel
  div(
    id = "login_panel",
    textInput("username", "Username"),
    passwordInput("password", "Password"),
    actionButton("login_button", "Login")
  ),
  
  # Main app content
  shinyjs::hidden(
    div(
      id = "main_panel",
      div(id = "navbar-container",
          
--> Add to server logic

#### Login panel ####
  
# Create a reactive value to track login status
logged_in <- reactiveVal(FALSE)
  
# Event handler for login button
observeEvent(input$login_button, {
  # Check if entered username and password match the valid credentials
  if (input$username %in% valid_users && input$password %in% valid_passwords) {
    # Set logged_in to TRUE if credentials are valid
    logged_in(TRUE)
    
    # Hide the login panel and show the main panel
    shinyjs::hide("login_panel")
    shinyjs::show("main_panel")
  } else {
    # Show an error message if credentials are invalid
    showModal(
      modalDialog(
        title = "Login Error",
        "Invalid username or password. Please try again.",
        easyClose = TRUE
      )
    )
  }
})

Automatic quality checks

All coding efforts invariably involve some human error. One of the benefits of using a data collection app is that you can program automatic quality checks that help identify common errors. Automatic quality checks are easy to define, and consist of one code chunk that determines how to report the check in the UI, and one code chunk that determines what to check in the server logic. Below is an example of a simple quality check that verifies that there are no observations with the same coder and reviewer, a procedure that I use in my own data collection efforts to ensure that all observations are vetted by a project manager. The UI side code then tells the app to display a new tab with verification tools, and to print a table that lists all observations for which coder == reviewer.

This simple logic can be used to create a range of automatic quality checks. I typically use dplyr to manipulate the data to identify potentially erroneous dataset entries, and either tables produced by tableOutput or graphs produced by plotOutput to communicate what rows are in need to manual revision.

--> Add to server logic

#### Automatic quality checks ####
  
##### Non-unique reviewer ####

output$nonunique_reviewer <- renderTable({
  req(values$dataframe)
  
  df <- values$dataframe %>%
    filter(coder == reviewer) %>% 
    select(id, coder, reviewer)
  
  return(df)
  
})

--> Add to UI

##### Verification tools ####
tabPanel(
  "Verification tools",
  h3("Verification tools"),
  br(),
  fluidRow(
    column(4, h4("Non-unique reviewer"), br(), tableOutput("nonunique_reviewer"))
  )
)

Helper notes

Another great feature are so called helper notes, question mark icons littered around your app that open a dialogue box with additional information (Figure 6). I use these helper notes to integrate information from the codebook into the app so that coders can easily remind themselves of the coding rules.

To use helper notes, we first create a subdirectory called helpfiles in the main directory. We then create a markdown document called example.md, which we can edit to include specific information.

--> Add to example.md

### Example helper note

------------------------------------------------------------------------

This is an example helper note.

Thereafter, we can add a link to our helper note in the UI using the pipe %>% operator.

--> Add to UI

column(
  4,
  selectInput(
    "coder_input", "Select coder name", choices = users_options, selected = "") %>% 
    helper(
      type = "markdown",
      content = "example",
      colour = "black",
      buttonLabel = "Got it!"
  )
)

Finally, we need to add a pointer to our helpfiles directory to the server logic.

--> Add to server logic

#### Set help file directory ####
  
observe_helpers(help_dir = "helpfiles")