Abstract¶
Every academic is confronted with the need to stay on top of trends and advancements within their field and to detect those aspects and involved researchers that help their own research progress and vice versa. One important path to that end is scanning the posters during poster sessions at academic conferences efficiently and effectively. By applying eye tracking technology to various academic poster designs, this exploratory experimental study aims to reveal how design choices foster or hinder information collection during poster sessions.
The eye tracking experiment was conducted using an eyelink 1000 Pro. Eight academic posters were presented on screen, 20 sec. each, en bloc, but in randomized order. The test persons were thirteen academics. Time to search for the desired information, findability of relevant messages, as well as cognitive load during the information collection process were examined.
Putting all limitations aside (e.g. n=13, arbitrary stimuli selection and AOI definition, missing baseline corrections, large individual differences), results show, that reduction of content (as inherent to #betterposter design) forced the gaze to the main parts of the poster. But it was only the landscape layout of #betterposter v2 design, that attracted all participants to look at all relevant parts and it’s been the “Presenter Mode” of #betterposter v2 design, that best matched the gaze sequence, established in western culture (top left as starting point). Both findings are supported by corresponding good ratings in questionnaire. Therefore the #betterposter v2 “Presenter Mode” is recommended for use.
This pilot study expanded the knowledge on information collection and cognitive load, that is of growing importance with the increasing amount of research produced. This pilot study enabled the author to learn dos and don’ts, compile recommendations, and collect what to consider when conducting the next eye tracking experiments on academic poster design.
Video results¶
Introduction¶
As Mike Morrison has pointed a out (Morrison, 2020), the way how academic posters are presented has not changed during the last 30 years, although research has advanced. To tackle our global problems, help research advancing, and prevent research from being ineffective and ignored[^avgvisits], we need new ways to communicate research results and ideas (Morrison, 2020; Morrison, Merlo, and Woessner, 2020; Oronje et al., 2022; Rowe & Ilic, 2015; Ilic & Rowe, 2013). That was and is the motivation of Mike Morrison, who in 2019 created and presented the first #betterposter design and sparked a movement (Morrison, 2019).
By applying digital tools (eg QR codes pointing to a full paper that is documenting the research process and/or linking to all resources, necessary to reproduce the study, as required e.g. by the Guidelines for Good Scientific Practice of the Austrian Agency for Research Integrity (2015), the academic posters’ content may be reduced to the most important messages (Morrison et al., 2020). By further applying findings of UI/UX design and psychological research on communication and information foraging theory (Pirolli & Card, 1999; Mayer & Moreno, 2003), academic posters may be designed, that are faster and easier to understand and thus better serve their purpose of spreading knowledge.
By means of eye tracking technology, this exploratory experimental study aims at revealing how design choices foster or hinder information collection. Various academic poster designs are examined regarding time and effort to search for the desired information, as well as cognitive load expressed during the information collection process. The findings shall inform the next #betterposter design, supporting research and academics with user experience (UX) tested communication and sharing facilities, better adjusted to the increasing amount of research produced.
Traditional scientific posters received an average of 6.4 visitors, according to presenters’ own subjective count” (Morrison, Merlo, and Woessner, 2020)
Methods¶
Eye tracking technology is used to locate and follow a person’s gaze while looking at objects. Since it is difficult to look at something and think about something different, the focus of the eye is often equated with the focus of the brain (Krasich et al., 2020). This may be observed by a changing focus and altered gaze traces, when the tasks given to the participants change (Buswell, 1935, p. 136 and Yarbus, 1967, p. 174).
In user experience testing the think-aloud approach[1], observation, questionnaires, as well as eye tracking technology, and combinations are used (Bojko, 2013. p. 106). Eye tracking is used to a smaller degree, since it requires expensive equipment and trained personnel. Eye tracking delivers high-precision measurements and outcomes, that in general are not necessary for most applications in UX testing (Bojko 2013, p. 44). The current study relies on eye tracking technology based on the fact, that cognitive load[2] may be reliably measured by the participants’ pupil size and average fixation duration, and this was key in the context of this study.
Study Design¶
Following Aga Bojko’s (Bojko, 2013, p. 124ff) UX procedure, mental workload and cognitive processes (providing insights on how easy /difficult the message was to convey) were measured by pupil diameter and average fixation duration; effective target identification (disclosing how well topic and results are presented) were determined by the percentage of test persons that fixated on AOIs, efficiency by the number of fixations and timespan, before the first fixation on any of the targets (AOI) took place.
As proposed by Aga Bojko (Bojko, 2013, p. 80), a within-subjects[3] approach was chosen, carryover effects were controlled by stimuli presented in randomized order.
Experiment Setup¶
The experiment took place at the MediaLab of the University of Vienna, November 23rd 2022, in the afternoon. The experiment was conducted in a separated, quiet room with two persons present: the experimenter on the one hand, an experienced member of the MediaLab who supervised the whole procedure, including the calibration, and the participant on the other hand.
Each of them was sitting in front of a computer, the two desks were separated, with no visual contact between the persons present.
The heads of the participants were not fixed (no chinrest), but freely moveable. Three different blocs of stimuli were presented to every participant, every stimulus for 20 seconds. The data collection process (including introduction and calibration) took about 10 minutes per participant. After the data collecting process, the participants filled in a questionnaire in a separate room with the stimuli presented as printouts, recording their ratings regarding understandability and how much they liked the overall appearance of the various posters (Likert Scale 1-5).
The eye tracking experiment was conducted using an Eyelink 1000 Pro with a data sample size of ~1000 data points per second. In total more than two million data points were collected and analyzed.
The data sets, derived from the Eyelink 1000 Pro eyetracking system, were compiled, analyzed, and visualized in R (R version 4.2.2 (2022-10-31 ucrt) -- “Innocent and Trusting”), as provided by the R Core Team, together with the documentation of the research process, using the IDE RStudio/Posit (RStudio Team, 2020; Posit team, 2022) and RMarkdown.
Measurements¶
In this experiment academic posters were investigated by using seven measurements associated with four categories: cognitive load, efficiency, effectivity, subjective ratings/opinions.
Signs of cognitive load were measured by the average fixation duration and by pupil size (Bojko, 2013, pp. 36, 96, 135 and Holmqvist et al., 2011, p. 381ff), targeting relevant information (effectivity) was measured as percentage of participants, who fixated all relevant parts of the posters (defined as Area of Interest) at least once (Bojko, 2013, p. 127ff), efficiency was measured by counting fixation steps to target (relevant messages predefined as AOI area of interest) and time to target (Bojko, 2013, p. 126f). Opinions and ratings were collected via a questionnaire.
Participants¶
Participants were four members of the MediaLab and nine participants of the university course “Introduction to Eye Tracking”, 13 in total, all academics. They were introduced to the experiment setting, did a calibration run and watched the stimuli with no task assigned.
Their experiences with the #betterposter initiative, posters and poster sessions at conferences differed greatly: 11 participants had never heard of, 1 was unsure, 1 (the author) was familiar with #betterposter design; 7 have already created poster(s) themselves; 4 participants have never been at academic conferences, 7 have attended 1-5 conferences, 2 have attended more than 5 conferences.
Stimuli¶
Eight academic posters were used as stimuli (Figure 2): two related to the traditional poster layout, two in #betterposter version 1 design, two in text-only #betterposter version 1 design, two in #betterposter version 2 design, all in landscape format. They stem from various fields of research.
Results¶
The averages of the measures of the 13 participants are displayed in tabular form (Table 1) and are shown in overview as parallel coordinates chart in Figure 3. The data in this chart are normalized, a procedure that allows to compare data with disparate sizes and units, as created by eye tracking. The results back the assumption, that the most recent version of #betterposter design (posters 1 and 2, Figure 2) work best in comparison.
Cognitive Load¶
Larger pupil diameter and longer fixation duration are associated with higher cognitive load (Bojko, 2013, pp. 36, 96, 135; Galley et al., 2015; and Holmqvist et al., 2011, p. 381ff). Low measures in both aspects for #betterposter version 2 indicate easy to grasp and easy to process information.
Regarding fixation duration the #betterposter version 2 designs are ranked first and third in this aspect (Table 1, Figure 3). In pupil size #betterposter version 2 designs are ranked first and fourth (Table 1, Figure 3).
Efficiency¶
Efficiency measured as steps2target (number of fixations) and time2target (in milliseconds) till the first fixation within any area of interest takes place, results in #betterforest version 2 designs ranked first and fourth (both measurements are highly correlated). Overall, five out of eight posters score quite low in this aspect (Table 1, Figure 3), indicating that at least one area of interest is looked at quite quickly (Figure 4 suggests this to be the title text).
Effectivity¶
Regarding effectivity only #betterposter version 2 designs attracted all participants to look at all relevant parts of the posters (Table 1, Figure 3). Although all participants looked at all posters undisturbed for 20 seconds, relevant parts of the other posters were not even looked at (Figure 4).
Questionnaire¶
Results derived from analyzing the questionnaire show as well high scores for #betterposter version 2 designs: regarding understandability of the results, they were ranked first and fourth, regarding overall appearance they were ranked first and third (Table 1, Figure 3).
Discussion¶
As Aga Bojko (Bojko, 2013, p. 123) reminds us “interpretation depends on goals and stimuli”. Therefore, we have to mention, that one (of the many) limitation of this pilot studyin[4] is associated with poor experiment design, e.g. not providing tasks (what to watch for) for the participants, - something that changes significantly the way how an image is looked at, as already pointed out by Alfred Yarbus (Yarbus, 1967, pp. 171 ff.). As the raincloud plots (Figure 5 and Figure 6) for average fixation duration and pupil size show, the measures of the 13 participants are widespread, this holds true for the other measures including the questionnaire as well. An additional hint, why we may not conclude from this small sample to a larger population.
A non-exhaustive list of things to consider for the next study is listed in the section Conclusion and Next Steps.
Regarding the treatment and interpretation of measures, derived from EyeLink 1000 Pro, here some criticism:
Results in averages of fixation duration might not be as significant, as they are (almost all) situated within the normal range for reading (200 -250 ms according to Bojko, 2013, p. 135 and 200 ms for light fiction and 260 ms for texts on biology and physics according to Holmqvist et al., 2011, p. 382), but looking at the gaze signature (Figure 7) depicting the number of fixations and their duration as timelines, we see spikes up to 500 – 700 ms (even much higher spiking for other posters). There is also evidence (Holmqvist et al., 2011, p. 383) that shorter fixation duration may occur with high stress levels, - an additional measure e.g. NASA Task Load Index [TLX] needs to be applied to make the distinction.
Pupil size, a highly idiosyncratic measure (Bojko, 2013, p. 131 and Holmqvist et al., 2011, p. 393) needs a special experiment set-up (scrambled pictures for brightness adjustments) and a different treatment of resulting data (no merging all participants averages into one average, but calculating individual baselines, measuring differences, and building rankings).
Steps2target and time2target as well show widespread data, it is recommended to measure these for all AOIs of a poster separately (not just for the one, that is looked at first).
Effectivity is the only category for which this study provides a clear and unquestionable outcome: only #betterposter design version 2 designs attract all participants to look at all areas of interest (Figure 4). This result most probably is caused by radical de-cluttering of the vers. 2 posters, an effect, that might be observed by comparing the heatmaps (Figure 8) of all posters, but although the #betterposter version 2 designs are extremely reduced, there are still parts outside the AOIs that are looked at, as shown by the participants’ gaze signatures of poster n°2 (Figure 7). Effectivity measures shall include dwell time in future experiments.
Questionnaire data again show large individual differences, providing no indications for generalization.
Conclusion and Next Steps¶
This exploratory experimental pilot study shows promising ways to answer pressing questions regarding better research communication with academic posters and has detected and explicated many shortcomings in experiment design, analysis, and interpretation of results.
See more
Subsequent experiments shall consider (non-exhaustive list)...
Stimuli shall be tailored, so that the same topic, the same research is presented in all poster design versions studied. This will help to rule out topic related interests. Number of posters need to be increased significantly.
AOIs need to be defined based on careful consideration (size of AOIs influencing result, additional interviews with experts: what is necessary for the decision, whether research presented on poster is important and worthwhile to download and read the paper?).
Dwell time on AOI needs to be studied (the current study only determined whether or not an AOI was looked at, but did not analyze how long the participants looked at the AOIs).
Participants need to represent experts (attended many conferences and have created many posters), intermediates, and newcomer (unexperienced) to the field. This will help to determine, whether experience compensates bad design and favors traditional design in general and/or tends to reject unfamiliar poster designs as inconvenient. In addition, an item in the questionnaire shall inquire the attitude towards poster design. Perhaps an introductory note on the purpose of posters should precede the experiment? Shall participants be able to propose their own work as stimuli and prepare posters in all designs studied or will this cause additional biases?
Think about confounding factors (e.g. Holmqvist et al., 2011).
Questionnaire should include questions already used in academic poster studies (Oronje, Morrison, Suharlim, et al., 2022) for comparability.
Number of participants needs to be increased. 30 participants for every group (experts, intermediates, newcomer)? How to define necessary sample size is explained by Bojko (213, p. 156 ff).
Prevent participants from getting tired by watching too many posters in a single set. Allow for breaks, do fatigue assessments. Set-up measures to identify stress-induced short fixation duration. Add subjective assessments of workload for comparison, e.g. the NASA Task Load Index (TLX) as proposed by Bojko (2013, p. 35).
Include questions for self-reported mind-wandering in questionnaire (Krasich, 2020).
To simulate conference conditions, set up an accompanying study to analyze attention levels and declarative memory function.
Compare screen based eye tracking with ‘real world’ eye tracking (using glasses and a real or simulated ‘poster session’ environment).
Task description, - “what to watch” - needs to be added. Simulating the intended behavior at conferences (scanning posters for trends, relevant topics, research to build upon, exchange ideas, knowledge).
Perhaps kind of a memory test should be included?
To measure pupil dilation correctly, each and every poster presented on screen needs to be preceded by a scrambled representation (in equal brightness) of the following picture to provide some time for the pupil to adjust (on latency see Holmqvist et al., 2011 p. 434ff; for the procedure see Bojko, 2013, p. 131).
Because of large individual variability and idiosyncrasies (Bojko, 2013, p. 131 and Holmqvist 2011, p. 393), the pupil size in particular may not be calculated as average of averages but needs to be assessed as rank for each participant in a within-subject design, where “each participant serves as her own baseline” (Bojko, 2013, p. 80).
Calculate baselines for each measurement as proposed by Bojko, 2013, p. 131 and Holmqvist (2011, p. 393) for pupil size. “Since most eye tracking measures are highly idiosyncratic, the change between baseline value (looking at the scrambled picture for pupil diameter) and the value derived from looking at the stimulus itself, shall be used as measurement” (Bojko, 2013, p. 131).
Combine pupil size with measures of blink rate, blink duration, fixation durations, saccadic extent, fixation rate, and dwell time as proposed by Holmqvist 2011, p. 393 to better estimate cognitive requirements of a task.
Efficiency should be defined as “how fast are all AOIs visited” and/or calculate measure for each AOI separately for more meaningful data.
Present heatmaps inverted: transform red areas (watched intensely) into clearly visible spots and make lesser watched to totally neglected areas invisible with a grey-to-black layer. This will clarify the results by showing only those parts, that were looked at and recognized, whereas the rest disappears in darkness.
Data should be prepared in a way, that everything “good” has a high or low level (no mix, - this leads to confusion, esp. in visualizations).
Add concurrent think-aloud verbal protocols (CVP) (cave: adds extra cognitive workload!) or Retrospective Verbal Protocol (RVP) as described by Bojko, 2013, p. 108 for comparison and/or to get additional insights on information processing. Review Holmqvist 2011, p. 295 on triangulating eye-movement data with verbal data).
Research, whether studies have already shown that negative words and those indicating conflict and dissent (negative, no, none, not, in-, un-, dis-, etc.) and subclauses with but, although, except, etc. evoke higher attention (emotional arousal, alert?). The same for graphs and diagrams with crossing lines (including segmented bar charts), with and without alarming colors (red, orange, yellow).
Figures & Tables¶
Table 1:Measurements
postID | avgFixDur | avgPupilSize | steps2target | time2target | percOnAOI | quest_results | quest_appear |
---|---|---|---|---|---|---|---|
bpv1_p03 | 233.63 | 399.84 | 5.31 | 1095.00 | 69.25 | 2.92 | 2.77 |
bpv1_p05 | 232.86 | 425.64 | 0.31 | 32.08 | 96.00 | 1.92 | 2.08 |
bpv2_p01 | 217.25 | 369.19 | 0.62 | 85.62 | 100.00 | 3.54 | 2.85 |
bpv2_p02 | 208.64 | 384.18 | 0.31 | 12.08 | 100.00 | 4.46 | 3.69 |
bv1t_p06 | 213.91 | 414.33 | 0.46 | 59.15 | 92.50 | 4.23 | 2.85 |
bv1t_p07 | 253.60 | 407.40 | 0.69 | 92.15 | 66.67 | 3.69 | 2.77 |
trad_p04 | 231.48 | 376.38 | 2.54 | 495.15 | 74.33 | 1.85 | 2.38 |
trad_p08 | 225.06 | 380.21 | 2.15 | 359.62 | 88.50 | 2.77 | 2.38 |
The table with the measurements shows, that the differences are sometimes quite small and of questionable significance. Note. Average Fixation Duration (avgFixDur) measures the average of all fixations timespans in milliseconds (not to mix up with dwell time!); Eyelink 1000 Pro provides the Average Pupil Size (avgPupilSize) as value, not related to any unit; Steps2target is derived from counting the number of fixations until the first AOI (any) is reached; Time2target, by taking the average of all participants’ summed up fixation durations until an AOI is reached in milliseconds; Percentages on AOI (percOnAOI) denotes the percentage of the participants who looked at each AOI of the poster (100% = have seen all relevant parts); quest_results is an assessment (given in a separate questionnaire) regarding the understandability of the results on a Likert scale (1= difficult, 5=very easy); quest_appear rates the overall appearance of the poster (how much the participants liked it) on a Likert scale (1=dislike, 5=like very much);
Raincloud Plot Depicting the Average Fixation Duration of Participants as Dots, Centrality Measures, and Density Distributions. Note: The mean value is marked with an “x”. The participants’ values are spread out (except those of poster N°2) and there is an outlier present, that perhaps should have been removed.
Author’s Note¶
“In a concurrent verbal protocol (CVP, also known as the “think-aloud protocol”), participants articulate their thoughts in real time during the execution of a task.” (Bojko, 2013, p. 106ff)
3 Cognitive Load can be assessed by performance measures (e.g. time spent on task and scores achieved), subjective rating on task difficulty, and psychophysical tools, e.g. eye tracking, with longer fixation duration and increased pupil dilation as signs for higher cognitive load (Katona, 2022; Holmqvist, 2011, p. 393ff)
In within-subjects study designs all participants are exposed to all tested stimuli (Bojko, 2013, p. 79).
For a non-exhaustive list of shortcomings of this study and necessary improvements for subsequent studies see the sections “Discussion” and “Conclusion and Next Steps”
- Morrison, M., Merlo, K., & Woessner, Z. (2020). How to Boost the Impact of Scientific Conferences. Cell, 182(5), 1067–1071. 10.1016/j.cell.2020.07.029
- Oronje, B., Morrison, M., Suharlim, C., Folkman, K., Glaude-Hosh, A., & Jeisy-Scott, V. (2022). A step in the right direction: Billboard-style posters preferred overall at two conferences, but should include more methods and limitations. 10.32388/p7n5bo
- Rowe, N., & Ilic, D. (2015). Rethinking poster presentations at large‐scale scientific meetings – is it time for the format to evolve? The FEBS Journal, 282(19), 3661–3668. 10.1111/febs.13383
- Ilic, D., & Rowe, N. (2013). What is the evidence that poster presentations are effective in promoting knowledge transfer? A state of the art review. Health Information & Libraries Journal, 30(1), 4–12. 10.1111/hir.12015
- Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review, 106(4), 643–675. 10.1037/0033-295x.106.4.643