Template

Section 1
Section 2
Section 3

Introduction

For no particular reason, I decided to get into photography in the fall of 2022. Since then, I've taken over 20,000 pictures and videos across nearly 3 years. Recently, I realized that I could access the EXIF data (metadata created by the camera when each picture or video is taken) and use my Data Science experience to see what I could learn from it.

As a Canon camera enthusiast, I use Canon's lenses with my EOS R8, which uses the Canon RF lens mount. A top lens I used throughout the past several years was the the RF 24-105 F4L. This lens has great sharpness, and allowed me to get both wide-angled and zoomed-in pictures in a wide variety of circumstances. In Semptember of 2024, the RF 28-70 F2.8 was released by Canon. This lens, although it had a smalller zoom range, is able to perform better in low-light situations, as well as give more subject separation, which means the background blurs out more, isolating the subject. Both of these are desireable aspects of photography.

Because I found myself shooting video in low-light situations, and also wanted to transition more into portrait photography, (and as a budget-limited college student,) trading out the RF 24-105 F4 for the RF 28-70 F2.8 seemed like an appealing idea. I would get 1 stop more of light (f4 → f2.8) at the cost of the 24mm - 27mm and 71mm - 105mm ranges. I realized this data could help me answer this question:

Was it in my best interest to trade out my 24-105 f4L for the 28-70 2.8?

After my analysis, I sold my 24-105 and purchased a 28-70. I found that the trade off wasn't worth it, but not for the reasons I initially thought. Let's take a look at the data. I'll show you my reasoning, explain my situation, and walk you through my experience with the lenses.

Extraction

Data was extracted using a powershell script that would call Exiftool, a free tool for extracting image metadata. I initially tried using the tool to extract the exif data without a powershell script, but the script would stall randomly. It seemed to be doing this because it would hit a memory cap for parsing the exif data, so I used a powershell to run Exiftool on each subdirectory, then write the results to a csv before moving on. The workaround was successful. Here's the powershell script:

With data extracted, I was ready to investigate.

Wrangling

I decided to use R for this project. There were several goals with wrangling the data:

Isolate the shots to just my R8. I had previously shot with the m50 ii, so those files were intermixed.
Clean up information problems related to manual lenses. Manual lenses are vintage lenses that don't have electronics to communicate to the camera. This can introduce non-numeric or unexpected inputs where there otherwise wouldn't be any.
Standardize certain parameters like file size to just be MB instead of kb and MB.
Remove 'garbage' shots, for example, pictures with ISO past 25600, which would only be taken for testing purposes.
Reformat alphanumeric fields to just numeric to make them easier to work with.

Here's the R code for that:

This next piece of code filters out large file sizes and videos. It's helpful for one chart which we will see shortly, but otherwise I didn't run it for the output of the other graphs that you'll see.

Initially, I filtered out videos and large files using this next chunk, but in hindsight I'm not really sure why I did this- I wanted to focus on video as much if not more than photo, so filtering out videos could have potentially skewed my findings. Comparing video + picture to picture only, my findings in all charts looked nearly identical. So, in the long run it didn't make a difference but it had the potential to.

Let's see how many pictures and videos we took:

[1] 23126

That's a lot of data. With the data cleaned and formatted, we can now create some visuals.

Analysis

Data was graphed in ggplot, then converted using ggplotly to plotly so the charts could be exported in a standalone format.

File Sizes

Let's take a look at file sizes:

The massive spike we see on the left is all the pictures, and everything to the right is essentially videos, except for the 1 or 2 90MB RAW picture files I found (-I still don't know why they took up that much data). This is where that code chunk earlier came in handy. We can use it to just look at the files sizes of the pictures:

What the Data Doesn't Tell You

There were several assumptions I had made when choosing the 28-70. I had assumed that the lens would be the same sharpness. In my testing, it seemed it wasn't as sharp. Various tests online showed that the 28-70 2.8 had inconsistencies in manufacturing quality that made some lenses less sharp than others. Although I didn't have access to high quality tools for testing, or other lenses for comparison, I had a gut feeling that the len wasn't as sharp as it could be, or as online tests had shown.

Size, zoom range problems. 2.8 not that big of a difference. Bokeh quality more important than separation, and 2.8 was 'busy.' Which meant both 'lively' and 'distracting'. I took out bigger file sizes to just focus on pictures and videos, but the 24-105 was what I had used primarily for videos, so it skewed my analysis.

Examining EXIF Data from My Photography - Should I Trade Lenses?

Table of Contents