Time Restricted Eating ExperimenTS API

Process collected data from the myCircadianClock app.

Utils

These functions primarily serve as parts of other functions, but are provided here for utility.


source

file_loader

 file_loader (data_source:Union[str,pandas.core.frame.DataFrame])

Flexible file loader able to read a single file path or folder path. Accepts .csv and .json file format loading.

Type Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame.
Existing dataframes are read as is.
Returns pd.DataFrame A single dataframe consisting of all data matching the provided file or folder path.

Providing the file loader with a specific file path outputs a single Pandas dataframe generated from that data source.

file_loader("data/col_test_data/toy_data_2000.csv").head(2)
original_logtime desc_text food_type PID
0 2021-05-12 02:30:00 +0000 milk b yrt1999
1 2021-05-12 02:45:00 +0000 some medication m yrt1999

The file loader can also accept string patterns to read in multiple files at once. Providing a patterened path such as yrt*_food_data*.csv would load all data matching this pattern.

file_loader('data/col_test_data/yrt*_food_data*.csv').head(2)
original_logtime desc_text food_type PID
0 2021-05-12 02:30:00 +0000 Milk b yrt1999
1 2021-05-12 02:45:00 +0000 Some Medication m yrt1999

It can also handle reading mixed file types. The below dataframe consists of data read from all .json and .csv files in the data/output/ folder.

file_loader('data/output/*').head(2)
ID unique_code research_info_id desc_text food_type original_logtime date local_time time week_from_start year cleaned day_count
0 7572733.0 alqt14018795225 150.0 Water w 2017-12-08 17:30:00+00:00 2017-12-08 17.500000 17:30:00 1.0 2017.0 NaN NaN
1 411111.0 alqt14018795225 150.0 Coffee White b 2017-12-09 00:01:00+00:00 2017-12-08 24.016667 00:01:00 1.0 2017.0 NaN NaN

source

find_date

 find_date (data_source:Union[str,pandas.core.frame.DataFrame], h:int=4,
            date_col:int=5)

Extracts date from a datetime column after shifting datetime by ‘h’ hours. A day starts ‘h’ hours early if ‘h’ is negative, or ‘h’ hours later if ‘h’ is positive.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is.
h int 4 Number of hours to shift the definition for ‘date’ by. h = 4 would shift days so that time membership
to each date starts at 4:00 AM and ends at 3:59:59 AM the next calendar day.
date_col int 5 Column number for existing datetime column in provided data source. Data exported from mCC typically
has datetime as its 5th column (with indexing starting from 0).
Returns pd.Series Series of dates in ISO 8601 format.

By default, find_date expects log dates for studies to begin at 4:00 AM. To use regular calendar dates, remember to set h = 0.

df = file_loader('data/test_food_details.csv')
df['original_logtime'] = pd.to_datetime(df['original_logtime'])
df['date'] = find_date(df, h = 0)
df[['original_logtime', 'date']].head(3)
original_logtime date
0 2017-12-08 17:30:00+00:00 2017-12-08
1 2017-12-09 00:01:00+00:00 2017-12-09
2 2017-12-09 00:58:00+00:00 2017-12-09

In this example, with log dates starting at the default value of 4 (4:00 AM), we see that two logs from very early morning on 2017-12-09 are counted as being logged on 2017-12-08 instead.

df['date'] = find_date(df, h = 4)
df[['original_logtime', 'date']].head(3)
original_logtime date
0 2017-12-08 17:30:00+00:00 2017-12-08
1 2017-12-09 00:01:00+00:00 2017-12-08
2 2017-12-09 00:58:00+00:00 2017-12-08

Similarly, in an example where we start log days four hours earlier, the last two rows have dates that are shifted so their log date is one day later than their exact calendar datetime.

df['date'] = find_date(df, h = -4)
df[['original_logtime', 'date']].head(5)
original_logtime date
0 2017-12-08 17:30:00+00:00 2017-12-08
1 2017-12-09 00:01:00+00:00 2017-12-09
2 2017-12-09 00:58:00+00:00 2017-12-09
3 2018-02-22 21:52:00+00:00 2018-02-23
4 2018-02-22 22:53:00+00:00 2018-02-23

source

find_float_time

 find_float_time (data_source:Union[str,pandas.core.frame.DataFrame],
                  h:int=4, date_col:int=5)

Extracts time from a datetime column after shifting datetime by ‘h’ hours. A day starts ‘h’ hours early if ‘h’ is negative, or ‘h’ hours later if ‘h’ is positive.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is.
h int 4 Number of hours to shift the definition for ‘time’ by. h = 4 would allow float representations of time
between 4 (inclusive) and 28 (exclusive), representing time that goes from 4:00 AM to 3:59:59 AM the next
calendar day. NOTE: h value for this function should match the h value used for generating dates.
date_col int 5 Column number for existing datetime column in provided data source. Data exported from mCC typically
has datetime as its 5th column (with indexing starting from 0).
Returns pd.Series Series of times in float format (e.g. 4:36 AM -> 4.6).
df = file_loader('data/test_food_details.csv')
df['original_logtime'] = pd.to_datetime(df['original_logtime'])

By default, find_float_time expects studies to begin at 4:00 AM. To preserve regular calendar dates use h = 0.

df['float_time'] = find_float_time(df, h = 0)
df[['original_logtime', 'float_time']].head(3)
original_logtime float_time
0 2017-12-08 17:30:00+00:00 17.500000
1 2017-12-09 00:01:00+00:00 0.016667
2 2017-12-09 00:58:00+00:00 0.966667

Using positive values for h for both date and float time functions changes date ownership for a row based on its original logtime. Float time should be shifted by the same h value as date membership so that times belonging to a different calendar date can be differentiated when necessary (e.g. 2:00 AM –> 2.0, whereas 2:00 AM the next calendar day –> 26.0, for cases where these rows should still be grouped together on the same logging date).

df['float_time'] = find_float_time(df, h = 4)
df['date'] = find_date(df, h = 4)
df[['original_logtime','date', 'float_time']].head(3)
original_logtime date float_time
0 2017-12-08 17:30:00+00:00 2017-12-08 17.500000
1 2017-12-09 00:01:00+00:00 2017-12-08 24.016667
2 2017-12-09 00:58:00+00:00 2017-12-08 24.966667

In rare cases, it may be valuable to shift date and time by negative values. In this example where a log date starts at 8:00 PM the previous calendar day and ends at 8:00 PM the current calendar day, note that the last two rows have negative float times and their date membership is shifted one date further than their original calendar datetime.

df['float_time'] = find_float_time(df, h = -4)
df['date'] = find_date(df, h = -4)
df[['original_logtime','date', 'float_time']].head(5)
original_logtime date float_time
0 2017-12-08 17:30:00+00:00 2017-12-08 17.500000
1 2017-12-09 00:01:00+00:00 2017-12-09 0.016667
2 2017-12-09 00:58:00+00:00 2017-12-09 0.966667
3 2018-02-22 21:52:00+00:00 2018-02-23 -2.133333
4 2018-02-22 22:53:00+00:00 2018-02-23 -1.116667

source

week_from_start

 week_from_start (data_source:Union[str,pandas.core.frame.DataFrame],
                  identifier:int=1)

Calculates the number of weeks between each logging entry and the first logging entry for each participant. A ‘date’ column must exist in the provided data source. Using the provided find_date function is recommended.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column (with indexing starting from 0).
Returns np.array Array of weeks passed from log date to the minimum date for each participant.
df = file_loader('data/test_food_details.csv')
df['original_logtime'] = pd.to_datetime(df['original_logtime'])

Using find_date to ensure that a date column exists in the data source is recommended. A column labeled ‘date’ is a requirement of this function.

df['date'] = find_date(df)
df['week_from_start'] = week_from_start(df)
df[['unique_code','original_logtime','week_from_start']][2:4]
unique_code original_logtime week_from_start
2 alqt14018795225 2017-12-09 00:58:00+00:00 1
3 alqt14018795225 2018-02-22 21:52:00+00:00 11

source

find_phase_duration

 find_phase_duration (df:pandas.core.frame.DataFrame)

Calculates the duration (in days) of the study phase for each row.

Type Details
df pd.DataFrame Participant information dataframe with columns for start and ending date for that row’s study phase.
The expected column numbers for starting and ending dates are outlined in the HOWTO document that accompanies TREETS.
Returns pd.DataFrame Dataframe with an additional column describing study phase duration.
find_phase_duration(pd.read_excel('data/col_test_data/toy_data_17May2021.xlsx'))[['phase_duration']]
phase_duration
0 3 days
1 4 days
2 3 days
3 4 days
4 NaT

source

load_food_data

 load_food_data (data_source:Union[str,pandas.core.frame.DataFrame],
                 h:int, identifier:int=1, datetime_col:int=5)

Loads and processes existing logging data, adding specific datetime information in formats more suitable for TREETS functions.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is.
h int Number of hours to shift the definition of ‘date’ by. h = 4 would indicate that a log date begins at
4:00 AM and ends the following calendar day at 3:59:59 AM. Float representations of time would therefore
go from 4.0 (inclusive) to 28.0 (exclusive) to represent ‘date’ membership for days shifted from their
original calendar date.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column (with indexing starting from 0).
datetime_col int 5 Column number for an existing datetime column in provided data source. Data exported from mCC typically
has datetime as its 5th column (with indexing starting from 0).
Returns pd.DataFrame Dataframe with additional date, float time, and week from start columns.
load_food_data('data/test_food_details.csv', h = 4).head(2)
ID unique_code research_info_id desc_text food_type original_logtime date float_time time week_from_start year
0 7572733 alqt14018795225 150 Water w 2017-12-08 17:30:00+00:00 2017-12-08 17.500000 17:30:00 1 2017
1 411111 alqt14018795225 150 Coffee White b 2017-12-09 00:01:00+00:00 2017-12-08 24.016667 00:01:00 1 2017

source

in_good_logging_day

 in_good_logging_day (data_source:Union[str,pandas.core.frame.DataFrame],
                      min_log_num:int=2, min_separation:int=5,
                      identifier:int=1, date_col:int=6, time_col:int=7)

Calculates if each log is considered to be within a ‘good logging day’. A log day is considered ‘good’ if there are at least the minimum number of required logs, with a minimum specified hour separation between the first and last log for that log date. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is.
min_log_num int 2 Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation int 5 Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column (with indexing starting from 0).
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns np.array Boolean array describing whether each log is a ‘good’ logging day.
df = load_food_data('data/test_food_details.csv', h = 4)
df['in_good_logging_day'] = in_good_logging_day(df)
df.head(2)
ID unique_code research_info_id desc_text food_type original_logtime date float_time time week_from_start year in_good_logging_day
0 7572733 alqt14018795225 150 Water w 2017-12-08 17:30:00+00:00 2017-12-08 17.500000 17:30:00 1 2017 True
1 411111 alqt14018795225 150 Coffee White b 2017-12-09 00:01:00+00:00 2017-12-08 24.016667 00:01:00 1 2017 True

source

FoodParser

 FoodParser ()

Food parser handles taking unprocessed food log entries and adding relevant information from a pre-made dictionary. This includes matching unprocessed terms to their likely matches, adding food type and other identifying information.


source

clean_loggings

 clean_loggings (data_source:Union[str,pandas.core.frame.DataFrame],
                 identifier:int=1)

Cleans and attempts typo correction for all logging text entries.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column (with indexing starting from 0).
Returns pd.DataFrame Dataframe with an additional column containing cleaned and typo corrected item entries.

Text descriptions of food items are cleaned using a built-in dictionary of common typos and corrections for each phrase. Phrases are then matched using a dictionary of known n-gram item names. The resulting item(s) are provided as a list.

clean_loggings('data/output/public.json').head(3)
unique_code desc_text cleaned
0 alqt14018795225 Water [water]
1 alqt14018795225 Coffee White [coffee, white]
2 alqt14018795225 Salad [salad]

source

get_types

 get_types (data_source:Union[str,pandas.core.frame.DataFrame],
            food_type:Union[str,list])

Filters logs for only logs of specified type(s).

Type Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths
with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is. A column ‘food_type’ is required to be within the data.
food_type str | list A single food type, or list of food types. Valid types are ‘f’: food, ‘b’: beverage, ‘w’: water,
and ‘m’: medication.
Returns pd.DataFrame Dataframe filtered for only logs of specific type(s).

Type selection accepts multiple types at once as a list of entry types. All types chosen must be valid.

Available food types include:

  1. ‘f’: Food

  2. ‘b’: Beverage

  3. ‘w’: Water

  4. ‘m’: Medication

Flavored water beverages such as La Croix are counted as ‘water’ and not as ‘beverage’.

get_types('data/output/baseline.json',['w', 'f'])[['unique_code','desc_text','food_type']].head(3)
unique_code desc_text food_type
0 alqt14018795225 Water w
2 alqt14018795225 Salad f
3 alqt78896444285 Water w

Filtering for a single type is also possible.

df = load_food_data('data/test_food_details.csv', h = 4)
get_types(df, 'm')[['unique_code','desc_text','food_type']].head(3)
unique_code desc_text food_type
323 alqt14018795225 Caffeine m
361 alqt14018795225 Caffeine m
420 alqt14018795225 Caffeine m

source

count_caloric_entries

 count_caloric_entries (df:pandas.core.frame.DataFrame)

Counts the number of food (‘f’) and beverage (‘b’) loggings.

Type Details
df pd.DataFrame Dataframe of food logging data.
Returns int Number of caloric (food or beverage) entries found.
df = load_food_data('data/test_food_details.csv', h = 4)
count_caloric_entries(df)
4603

source

mean_daily_eating_duration

 mean_daily_eating_duration (df:pandas.core.frame.DataFrame,
                             date_col:int=6, time_col:int=7)

Calculates mean daily eating window by taking the average of each day’s eating window. An eating window is defined as the duration of time between first and last caloric (food or beverage) intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns float Float representation of average daily eating window duration.
df = load_food_data('data/test_food_details.csv', h = 4)
mean_daily_eating_duration(df)
14.038679245283017

source

std_daily_eating_duration

 std_daily_eating_duration (df:pandas.core.frame.DataFrame,
                            date_col:int=6, time_col:int=7)

Calculates the standard deviation of the daily eating window. An eating window is defined as the duration of time between first and last caloric (food or beverage) intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns float Float representation of the standard deviation of daily eating window duration.
df = load_food_data('data/test_food_details.csv', h = 4)
std_daily_eating_duration(df)
7.018679942775867

source

earliest_entry

 earliest_entry (df:pandas.core.frame.DataFrame, time_col:int=7)

Calculates the earliest recorded caloric (food or beverage) entry. It is recommended that you use find_float_time to generate necessary the time column for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
time_col int 7 Column number for an existing time column in provided data source.
Returns float Float representation of the earliest logtime on any date.
df = load_food_data('data/test_food_details.csv', h = 4)
earliest_entry(df)
4.0

source

mean_first_cal

 mean_first_cal (df:pandas.core.frame.DataFrame, date_col:int=6,
                 time_col:int=7)

Calculates the average time of first caloric intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns float Float representation of average first caloric entry time.
df = load_food_data('data/test_food_details.csv', h = 4)
mean_first_cal(df)
9.22680817610063
# find the average mean first cal time for each participant
df.groupby(['unique_code']).agg(mean_first_cal, date_col = 6, time_col = 7).iloc[:,0]
unique_code
alqt1148284857      7.315278
alqt14018795225     7.635938
alqt16675467779     6.153904
alqt21525720972    13.211957
alqt45631586569    15.056295
alqt5833085442     12.551515
alqt62359040167     7.252137
alqt6695047873      7.573077
alqt78896444285     6.347510
alqt8668165687      9.702555
Name: ID, dtype: float64

source

std_first_cal

 std_first_cal (df:pandas.core.frame.DataFrame, date_col:int=6,
                time_col:int=7)

Calculates the standard deviation for time of first caloric intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns float Float representation of the standard deviation of first caloric entry time.
df = load_food_data('data/test_food_details.csv', h = 4)
std_first_cal(df)
4.591417471559444

source

mean_last_cal

 mean_last_cal (df:pandas.core.frame.DataFrame, date_col:int=6,
                time_col:int=7)

Calculates the average time of last caloric intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns float Float representation of average last caloric entry time.
df = load_food_data('data/test_food_details.csv', h = 4)
mean_last_cal(df)
23.265487421383646

source

std_last_cal

 std_last_cal (df:pandas.core.frame.DataFrame, date_col:int=6,
               time_col:int=7)

Calculates the standard deviation for time of last caloric intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns float Float representation of the standard deviation of last caloric entry time.
df = load_food_data('data/test_food_details.csv', h = 4)
std_last_cal(df, 'date', 'float_time')
4.359435007580498

source

mean_daily_eating_occasions

 mean_daily_eating_occasions (df:pandas.core.frame.DataFrame,
                              date_col:int=6, time_col:int=7)

Calculates the average number of daily eating occasions. An eating occasion is a single caloric (food or beverage) log. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns int Average number of daily eating occasions.
df = load_food_data('data/test_food_details.csv', h = 4)
mean_daily_eating_occasions(df, 'date', 'float_time')
6.8915094339622645

source

std_daily_eating_occasions

 std_daily_eating_occasions (df:pandas.core.frame.DataFrame,
                             date_col:int=6, time_col:int=7)

Calculates the standard deviation of the number of daily eating occasions. An eating occasion is a single caloric (food or beverage) log. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns int Standard deviation of the number of daily eating occasions.
df = load_food_data('data/test_food_details.csv', h = 4)
std_daily_eating_occasions(df, 'date', 'float_time')
4.44839423402741

source

mean_daily_eating_midpoint

 mean_daily_eating_midpoint (df:pandas.core.frame.DataFrame,
                             date_col:int=6, time_col:int=7)

Calculates the average daily midpoint eating occasion time. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns int Float representation of the average daily midpoint eating occasion time.
df = load_food_data('data/test_food_details.csv', h = 4)
mean_daily_eating_midpoint(df, 'date', 'float_time')
16.536425576519914

source

std_daily_eating_midpoint

 std_daily_eating_midpoint (df:pandas.core.frame.DataFrame,
                            date_col:int=6, time_col:int=7)

Calculates the standard deviation of the daily midpoint eating occasion time. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns int Float representation of the standard deviation of the daily midpoint eating occasion time.
df = load_food_data('data/test_food_details.csv', h = 4)
std_daily_eating_midpoint(df, 'date', 'float_time')
4.107072970435106

source

logging_day_counts

 logging_day_counts (df:pandas.core.frame.DataFrame)

Calculates the number of days that contain any logs. It is recommended that you use find_date to generate the necessary date column for this function.

Type Details
df pd.DataFrame Dataframe of food logging data. A column for ‘date’ must exist within the data.
Returns int Number of days with at least one log on that day.
df = load_food_data('data/test_food_details.csv', h = 4)
logging_day_counts(df)
636

source

find_missing_logging_days

 find_missing_logging_days (df:pandas.core.frame.DataFrame,
                            start_date:datetime.date='not_defined',
                            end_date:datetime.date='not_defined')

Finds days that have no log entries between a start (inclusive) and end date (inclusive). It is recommended that you use find_date to generate the necessary date column for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data.
start_date datetime.date not_defined Starting date for missing day evaluation. By default the earliest date in the data will be used.
end_date datetime.date not_defined Ending date for missing day evaluation. By default the latest date in the data will be used.
Returns list List of days within the given timeframe that have no log entries.

The phrase ‘not_defined’ is the intended default value for start and end dates to signify that the earliest and/or latest date within the data should be used. If a participant is missing a valid start or end date, null is returned.

df = load_food_data('data/test_food_details.csv', h = 4)
find_missing_logging_days(df, datetime.date(2017, 12, 7), datetime.date(2017, 12, 10))
[datetime.date(2017, 12, 7),
 datetime.date(2017, 12, 9),
 datetime.date(2017, 12, 10)]
df[df['date'].astype(str).str.contains("2017-12")]
ID unique_code research_info_id desc_text food_type original_logtime date float_time time week_from_start year
0 7572733 alqt14018795225 150 Water w 2017-12-08 17:30:00+00:00 2017-12-08 17.500000 17:30:00 1 2017
1 411111 alqt14018795225 150 Coffee White b 2017-12-09 00:01:00+00:00 2017-12-08 24.016667 00:01:00 1 2017
2 8409118 alqt14018795225 150 Salad f 2017-12-09 00:58:00+00:00 2017-12-08 24.966667 00:58:00 1 2017

source

good_lwa_day_counts

 good_lwa_day_counts (df:pandas.core.frame.DataFrame,
                      window_start:datetime.time,
                      window_end:datetime.time, min_log_num:int=2,
                      min_separation:int=5, buffer_time:str='15 minutes',
                      h:int=4, start_date:datetime.date='not_defined',
                      end_date:datetime.date='not_defined',
                      time_col:int=7)

Calculates the number of ‘good’ logging days, ‘good’ window days, ‘outside’ window days and adherent days.

Type Default Details
df pd.DataFrame Dataframe of food logging data.
window_start datetime.time Starting time for a time restriction window.
window_end datetime.time Ending time for a time restriction window.
min_log_num int 2 Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation int 5 Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
buffer_time str 15 minutes pd.Timedelta parsable string, representing ‘wiggle room’ for adherence.
h int 4 Number of hours to shift the definition of ‘date’ by. h = 4 would indicate that a log date begins at
4:00 AM and ends the following calendar day at 3:59:59. Float representations of time would therefore
go from 4.0 (inclusive) to 28.0 (exclusive) to represent ‘date’ membership for days shifted from their
original calendar date.
start_date datetime.date not_defined Starting date for missing day evaluation. By default the earliest date in the data will be used.
end_date datetime.date not_defined Ending date for missing day evaluation. By default the latest date in the data will be used.
time_col int 7 Column number for an existing time column in provided data source.
Returns tuple[list, list] List containing number of ‘good’ logging days, ‘good’ window days, ‘outside’ window days, and adherent days.
List of three lists. The lists contains dates that are not considered ‘good’ logging days, ‘good’ window days,
or adherent days (in that order).

The main use of this function is to calculate window and logging adherence. These are represented as ‘good’ (valid) logging days, ‘good’ window days, ‘outside’ (invalid) window days, and adherent days.

The definition of each is:

  1. ‘Good’ Logging Day

    • A day with at least a specified minimum number of caloric (food or beverage) logs with a minimum specified number of hours between the first and last log for that day.
  2. ‘Good’ Window Day

    • A day where all food loggings are within the participant’s assigned eating restriction window plus any wiggle room, if allowed.
  3. Adherent Day

    • A day that is both a ‘good’ logging day and a ‘good’ window day.
df = load_food_data('data/test_food_details.csv', h = 4)
dates, bad_dates = good_lwa_day_counts(df, datetime.time(8,0,0), datetime.time(23,59,59))
dates

The second product of this function is three lists that outline which days are not compliant with one of the definitions above. The first list (index 0) consists of dates that are not ‘good’ logging days, the second contains days that are not ‘good’ window days. The final list consists of dates that are not adherent (neither ‘good’ window nor ‘good’ logging dates).

bad_dates[0][:5]

Experiment Design

This group of functions provides methods for filtering participant data.


source

filtering_usable_data

 filtering_usable_data (df:pandas.core.frame.DataFrame, num_items:int,
                        num_days:int, identifier:int=1, date_col:int=6)

Filters data for only participants who’s data satisfies the minimum number of days and logs. It is recommended that you use find_date to generate the necessary date column for this function.

Type Default Details
df pd.DataFrame Dataframe of food logging data. A column ‘desc_text’, typically found in mCC data
is required.
num_items int Minimum number of logs required to pass filter criteria.
num_days int Minimum number of unique logging days required to pass filter criteria.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column (with indexing starting from 0).
date_col int 6 Column number for an existing date column in provided data source.
Returns tuple[pd.DataFrame, set] Data filtered to only include data from participants that have passed filtering criteria.
Set of participants that passed filtering criteria.
df = file_loader('data/output/public.json')
filtering_usable_data(df, num_items = 1000, num_days = 14)[0].shape

source

prepare_baseline_and_intervention_usable_data

 prepare_baseline_and_intervention_usable_data
                                                (data_source:Union[str,pan
                                                das.core.frame.DataFrame],
                                                baseline_num_items:int,
                                                baseline_num_days:int, int
                                                ervention_num_items:int,
                                                intervention_num_days:int,
                                                identifier:int=1,
                                                date_col:int=6)

Filters data for ‘usable’ data within baseline and last two weeks of intervention (weeks 13 and 14). It is recommended that you use the function ‘week_from_start’ to generate the necessary week column for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is.
baseline_num_items int Number of logs for a participant’s baseline data to pass filter criteria.
baseline_num_days int Number of unique logging days for a participant’s baseline data to pass filter criteria.
intervention_num_items int Number of logs for a participant’s intervention data to pass filter criteria.
intervention_num_days int Number of unique logging days for a participant’s intervention data to pass filter criteria.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column (with indexing starting from 0).
date_col int 6 Column number for an existing date column in provided data source.
Returns list List of two dataframes: usable baseline data, usable intervention data.
df= prepare_baseline_and_intervention_usable_data('data/output/public.json', 20, 10, 40, 12)[0]
df.head(2)
df.shape

Analysis and Data Summaries

Data analysis and summary functions, including summary functions for specific statistics.


source

users_sorted_by_logging

 users_sorted_by_logging
                          (data_source:Union[str,pandas.core.frame.DataFra
                          me], food_type:list=['f', 'b', 'm', 'w'],
                          min_log_num:int=2, min_separation:int=4,
                          identifier:int=1, date_col:int=6,
                          time_col:int=7)

Reports the number of ‘good’ logging days for each user, in descending order based on number of ‘good’ logging days.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is.
food_type list [‘f’, ‘b’, ‘m’, ‘w’] A single food type, or list of food types. Valid types are ‘f’: food, ‘b’: beverage,
‘w’: water, and ‘m’: medication.
min_log_num int 2 Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation int 4 Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column (with indexing starting from 0).
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns pd.DataFrame Dataframe containing the number of good logging days for each user.
users_sorted_by_logging('data/output/public.json', ['f','b']).head(2)

source

eating_intervals_percentile

 eating_intervals_percentile
                              (data_source:Union[str,pandas.core.frame.Dat
                              aFrame], identifier:int=1, time_col:int=7)

Calculates the 2.5, 5, 10, 12.5, 25, 50, 75, 87.5, 90, 95, and 97.5 percentile eating time for each participant. It also calculates the middle 95, 90, 80, 75, and 50 percentile eating windows for each participant. It is recommended that you use find_float_time to generate necessary the time column for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column (with indexing starting from 0).
time_col int 7 Column number for an existing time column in provided data source.
Returns pd.DataFrame Dataframe with count, mean, std, min, quantiles and mid XX%tile eating window durations for all participants.
df = load_food_data('data/test_food_details.csv', h = 4)
eating_intervals_percentile(df).iloc[:2]

source

first_cal_analysis_summary

 first_cal_analysis_summary
                             (data_source:Union[str,pandas.core.frame.Data
                             Frame], min_log_num:int=2,
                             min_separation:int=4, identifier:int=1,
                             date_col:int=6, time_col:int=7)

Calculates the 5, 10, 25 , 50, 75, 90, 95 percentile of first caloric entry time for each participant on ‘good’ logging days. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is.
min_log_num int 2 Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation int 4 Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column (with indexing starting from 0).
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns pd.DataFrame Dataframe with 5, 10, 25, 50, 75, 90, 95 percentile of first caloric entry time for all participants.
first_cal_analysis_summary('data/output/baseline.json').head(3)

source

last_cal_analysis_summary

 last_cal_analysis_summary
                            (data_source:Union[str,pandas.core.frame.DataF
                            rame], min_log_num:int=2,
                            min_separation:int=4, identifier:int=1,
                            date_col:int=6, time_col:int=7)

Calculates the 5, 10, 25 , 50, 75, 90, 95 percentile of last caloric entry time for each participant on ‘good’ logging days. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is.
min_log_num int 2 Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation int 4 Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column (with indexing starting from 0).
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns pd.DataFrame Dataframe with 5, 10, 25, 50, 75, 90, 95 percentile of last caloric entry time for all participants.
last_cal_analysis_summary('data/output/baseline.json').head(3)

source

summarize_data

 summarize_data (data_source:Union[str,pandas.core.frame.DataFrame],
                 min_log_num:int=2, min_separation:int=4,
                 identifier:int=1, date_col:int=6, time_col:int=7)

Summarizes participant data, including number of days, total number of logs, number of food/beverage logs, number of medication logs, number of water logs, eating window duration information, first and last caloric log information, and adherence.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is. Must have a column for ‘food_type’ within the data.
min_log_num int 2 Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation int 4 Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column (with indexing starting from 0).
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns pd.DataFrame Summary dataframe.

This function provides summary data for an entire study, without separating for study phases. Summaries include statistics for first and last caloric log, eating window, and relevant calculations for middle 95 percentile eating window.

df = load_food_data('data/test_food_details.csv', h = 4)
summarize_data(df)

source

summarize_data_with_experiment_phases

 summarize_data_with_experiment_phases
                                        (food_data:pandas.core.frame.DataF
                                        rame, ref_tbl:pandas.core.frame.Da
                                        taFrame, min_log_num:int=2,
                                        min_separation:int=5,
                                        buffer_time:str='15 minutes',
                                        h:int=4, report_level:int=2,
                                        txt:bool=False)

Summarizes participant data for each experiment phase and eating window assignment. Summary includes number of days, total number of logs, number of food/beverage logs, number of medication logs, number of water logs, eating window duration information, first and last caloric log information, and adherence.

Type Default Details
food_data pd.DataFrame Dataframe of food logging data. A column for “original_logtime” must exist within the data. mCC output style
data is expected.
ref_tbl pd.DataFrame Participant data reference table. See the accompanying HOWTO document for required column positions and
formatting.
min_log_num int 2 Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation int 5 Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
buffer_time str 15 minutes pd.Timedelta parsable string, representing ‘wiggle room’ for adherence.
h int 4 Number of hours to shift the definition of ‘date’ by. h = 4 would indicate that a log date begins at
4:00 AM and ends the following calendar day at 3:59:59. Float representations of time would therefore
go from 4.0 (inclusive) to 28.0 (exclusive) to represent ‘date’ membership for days shifted from their
original calendar date.
report_level int 2 Additional printed info detail level. 0 = No Report. 1 = Report ‘No Logging Days’.
2 = Report ‘No Logging Days’, ‘Bad Logging Days’, ‘Bad Window Days’, and ‘Non-Adherent Days’.
txt bool False If True, a text format (.txt) report will be saved in the current directory, with the name
‘treets_warning_dates.txt’
Returns pd.DataFrame Summary dataframe, where each row represents the summary for a participant during a particular
study phase. Participants can have multiple rows for a single study phase if, during that study phase,
their assigned eating window is altered.
df = summarize_data_with_experiment_phases(pd.read_csv('data/col_test_data/toy_data_2000.csv')\
                      , pd.read_excel('data/col_test_data/toy_data_17May2021.xlsx'), report_level = 2)
df.T

Plots

Plotting functions.


source

first_cal_mean_with_error_bar

 first_cal_mean_with_error_bar
                                (data_source:Union[str,pandas.core.frame.D
                                ataFrame], min_log_num:int=2,
                                min_separation:int=4, identifier:int=1,
                                date_col:int=6, time_col:int=7)

Represents mean and standard deviation of first caloric intake time for each participant as a scatter plot, with participants as the x-axis and time as the y-axis. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is. Must have a column for ‘food_type’ within the data.
min_log_num int 2 Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation int 4 Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns matplotlib.figure.Figure Matplotlib figure object.
first_cal_mean_fig = first_cal_mean_with_error_bar('data/output/baseline.json')

source

last_cal_mean_with_error_bar

 last_cal_mean_with_error_bar
                               (data_source:Union[str,pandas.core.frame.Da
                               taFrame], min_log_num:int=2,
                               min_separation:int=4, identifier:int=1,
                               date_col:int=6, time_col:int=7)

Represents mean and standard deviation of last caloric intake time for each participant as a scatter plot, with the x-axis as participants and the y-axis as time. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is. Must have a column for ‘food_type’ within the data.
min_log_num int 2 Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation int 4 Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns matplotlib.figure.Figure Matplotlib figure object.
last_cal_mean_fig = last_cal_mean_with_error_bar('data/output/baseline.json')

source

first_cal_analysis_variability_plot

 first_cal_analysis_variability_plot
                                      (data_source:Union[str,pandas.core.f
                                      rame.DataFrame], min_log_num:int=2,
                                      min_separation:int=4,
                                      identifier:int=1, date_col:int=6,
                                      time_col:int=7)

Calculates first caloric log time variability for ‘good’ logging days by subtracting 5, 10, 25, 50, 75, 90, 95 percentile of first caloric intake time from the 50th percentile first caloric intake time. It also produces a histogram that represents the 90%-10% interval for all participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is. Must have a column for ‘food_type’ within the data.
min_log_num int 2 Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation int 4 Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns matplotlib.figure.Figure Matplotlib figure object.
first_cal_var_plot = first_cal_analysis_variability_plot('data/output/baseline.json')

source

last_cal_analysis_variability_plot

 last_cal_analysis_variability_plot
                                     (data_source:Union[str,pandas.core.fr
                                     ame.DataFrame], min_log_num:int=2,
                                     min_separation:int=4,
                                     identifier:int=1, date_col:int=6,
                                     time_col:int=7)

Calculates last caloric log time variability for ‘good’ logging days by subtracting 5, 10, 25, 50, 75, 90, 95 percentile of last caloric intake time from the 50th percentile last caloric intake time. It also produces a histogram that represents the 90%-10% interval for all participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is. Must have a column for ‘food_type’ within the data.
min_log_num int 2 Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation int 4 Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns matplotlib.figure.Figure Matplotlib figure object.
last_cal_var_plot = last_cal_analysis_variability_plot('data/output/baseline.json')

source

first_cal_avg_histplot

 first_cal_avg_histplot
                         (data_source:Union[str,pandas.core.frame.DataFram
                         e], identifier:int=1, date_col:int=6,
                         time_col:int=7)

Plots a histogram of average first caloric intake for all participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is. Must have a column for ‘food_type’ within the data.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns matplotlib.figure.Figure Matplotlib figure object.
first_cal_avg_plot = first_cal_avg_histplot('data/output/baseline.json')

source

first_cal_sample_distplot

 first_cal_sample_distplot
                            (data_source:Union[str,pandas.core.frame.DataF
                            rame], n:int, replace:bool=False,
                            identifier:int=1, date_col:int=6,
                            time_col:int=7)

Creates a distplot for the first caloric intake time for a random selection of ‘n’ number of participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is. Must have a column for ‘food_type’ within the data.
n int Number of participants to plot for, selected randomly without replacement.
replace bool False If true, samples with replacement. Samples without replacement by default.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns matplotlib.figure.Figure Matplotlib figure object.
first_cal_distplot = first_cal_sample_distplot('data/output/intervention.json', n = 5, replace = False)

source

last_cal_avg_histplot

 last_cal_avg_histplot
                        (data_source:Union[str,pandas.core.frame.DataFrame
                        ], identifier:int=1, date_col:int=6,
                        time_col:int=7)

Plots a histogram of average last caloric intake for all participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is. Must have a column for ‘food_type’ within the data.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns matplotlib.figure.Figure Matplotlib figure object.
last_cal_avg_hist = last_cal_avg_histplot('data/output/baseline.json')

source

last_cal_sample_distplot

 last_cal_sample_distplot
                           (data_source:Union[str,pandas.core.frame.DataFr
                           ame], n:int, replace:bool=False,
                           identifier:int=1, date_col:int=6,
                           time_col:int=7)

Creates a distplot for the last caloric intake time for a random selection of ‘n’ number of participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is. Must have a column for ‘food_type’ within the data.
n int Number of participants to plot for, selected randomly without replacement.
replace bool False If true, samples with replacement. Samples without replacement by default.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns matplotlib.figure.Figure Matplotlib figure object.
last_cal_distplot = last_cal_sample_distplot('data/output/intervention.json', n = 5, replace=False)

source

swarmplot

 swarmplot (data_source:Union[str,pandas.core.frame.DataFrame],
            max_loggings:int, identifier:int=1, date_col:int=6,
            time_col:int=7)

Creates a swarmplot for participants logging data. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

Type Default Details
data_source str | pd.DataFrame String file or folder path. Single .json or .csv paths create a pd.DataFrame.
Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing
dataframes are read as is. Must have a column for ‘food_type’ within the data.
max_loggings int Maximum number of randomly selected logs to be plotted for each participant.
identifier int 1 Column number for an existing unique identifier column in provided data source. Data exported from mCC typically
has a unique identifier as its 1st column.
date_col int 6 Column number for an existing date column in provided data source.
time_col int 7 Column number for an existing time column in provided data source.
Returns matplotlib.figure.Figure Matplotlib figure object.
swarm = swarmplot('data/output/public.json', max_loggings = 20)