Time Restricted Eating ExperimenTS API

Process collected data from the myCircadianClock app.

Utils

These functions primarily serve as parts of other functions, but are provided here for utility.

file_loader

 file_loader (data_source:Union[str,pandas.core.frame.DataFrame])

Flexible file loader able to read a single file path or folder path. Accepts .csv and .json file format loading.

	Type	Details
data_source	str \| pd.DataFrame	String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
Returns	pd.DataFrame	A single dataframe consisting of all data matching the provided file or folder path.

Providing the file loader with a specific file path outputs a single Pandas dataframe generated from that data source.

file_loader("data/col_test_data/toy_data_2000.csv").head(2)

	original_logtime	desc_text	food_type	PID
0	2021-05-12 02:30:00 +0000	milk	b	yrt1999
1	2021-05-12 02:45:00 +0000	some medication	m	yrt1999

The file loader can also accept string patterns to read in multiple files at once. Providing a patterened path such as yrt*_food_data*.csv would load all data matching this pattern.

file_loader('data/col_test_data/yrt*_food_data*.csv').head(2)

	original_logtime	desc_text	food_type	PID
0	2021-05-12 02:30:00 +0000	Milk	b	yrt1999
1	2021-05-12 02:45:00 +0000	Some Medication	m	yrt1999

It can also handle reading mixed file types. The below dataframe consists of data read from all .json and .csv files in the data/output/ folder.

file_loader('data/output/*').head(2)

	ID	unique_code	research_info_id	desc_text	food_type	original_logtime	date	local_time	time	week_from_start	year	cleaned	day_count
0	7572733.0	alqt14018795225	150.0	Water	w	2017-12-08 17:30:00+00:00	2017-12-08	17.500000	17:30:00	1.0	2017.0	NaN	NaN
1	411111.0	alqt14018795225	150.0	Coffee White	b	2017-12-09 00:01:00+00:00	2017-12-08	24.016667	00:01:00	1.0	2017.0	NaN	NaN

source

find_date

 find_date (data_source:Union[str,pandas.core.frame.DataFrame], h:int=4,
            date_col:int=5)

Extracts date from a datetime column after shifting datetime by ‘h’ hours. A day starts ‘h’ hours early if ‘h’ is negative, or ‘h’ hours later if ‘h’ is positive.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
h	int	4	Number of hours to shift the definition for ‘date’ by. h = 4 would shift days so that time membership to each date starts at 4:00 AM and ends at 3:59:59 AM the next calendar day.
date_col	int	5	Column number for existing datetime column in provided data source. Data exported from mCC typically has datetime as its 5th column (with indexing starting from 0).
Returns	pd.Series		Series of dates in ISO 8601 format.

By default, find_date expects log dates for studies to begin at 4:00 AM. To use regular calendar dates, remember to set h = 0.

df = file_loader('data/test_food_details.csv')
df['original_logtime'] = pd.to_datetime(df['original_logtime'])
df['date'] = find_date(df, h = 0)
df[['original_logtime', 'date']].head(3)

	original_logtime	date
0	2017-12-08 17:30:00+00:00	2017-12-08
1	2017-12-09 00:01:00+00:00	2017-12-09
2	2017-12-09 00:58:00+00:00	2017-12-09

In this example, with log dates starting at the default value of 4 (4:00 AM), we see that two logs from very early morning on 2017-12-09 are counted as being logged on 2017-12-08 instead.

df['date'] = find_date(df, h = 4)
df[['original_logtime', 'date']].head(3)

	original_logtime	date
0	2017-12-08 17:30:00+00:00	2017-12-08
1	2017-12-09 00:01:00+00:00	2017-12-08
2	2017-12-09 00:58:00+00:00	2017-12-08

Similarly, in an example where we start log days four hours earlier, the last two rows have dates that are shifted so their log date is one day later than their exact calendar datetime.

df['date'] = find_date(df, h = -4)
df[['original_logtime', 'date']].head(5)

	original_logtime	date
0	2017-12-08 17:30:00+00:00	2017-12-08
1	2017-12-09 00:01:00+00:00	2017-12-09
2	2017-12-09 00:58:00+00:00	2017-12-09
3	2018-02-22 21:52:00+00:00	2018-02-23
4	2018-02-22 22:53:00+00:00	2018-02-23

source

find_float_time

 find_float_time (data_source:Union[str,pandas.core.frame.DataFrame],
                  h:int=4, date_col:int=5)

Extracts time from a datetime column after shifting datetime by ‘h’ hours. A day starts ‘h’ hours early if ‘h’ is negative, or ‘h’ hours later if ‘h’ is positive.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
h	int	4	Number of hours to shift the definition for ‘time’ by. h = 4 would allow float representations of time between 4 (inclusive) and 28 (exclusive), representing time that goes from 4:00 AM to 3:59:59 AM the next calendar day. NOTE: h value for this function should match the h value used for generating dates.
date_col	int	5	Column number for existing datetime column in provided data source. Data exported from mCC typically has datetime as its 5th column (with indexing starting from 0).
Returns	pd.Series		Series of times in float format (e.g. 4:36 AM -> 4.6).

df = file_loader('data/test_food_details.csv')
df['original_logtime'] = pd.to_datetime(df['original_logtime'])

By default, find_float_time expects studies to begin at 4:00 AM. To preserve regular calendar dates use h = 0.

df['float_time'] = find_float_time(df, h = 0)
df[['original_logtime', 'float_time']].head(3)

	original_logtime	float_time
0	2017-12-08 17:30:00+00:00	17.500000
1	2017-12-09 00:01:00+00:00	0.016667
2	2017-12-09 00:58:00+00:00	0.966667

Using positive values for h for both date and float time functions changes date ownership for a row based on its original logtime. Float time should be shifted by the same h value as date membership so that times belonging to a different calendar date can be differentiated when necessary (e.g. 2:00 AM –> 2.0, whereas 2:00 AM the next calendar day –> 26.0, for cases where these rows should still be grouped together on the same logging date).

df['float_time'] = find_float_time(df, h = 4)
df['date'] = find_date(df, h = 4)
df[['original_logtime','date', 'float_time']].head(3)

	original_logtime	date	float_time
0	2017-12-08 17:30:00+00:00	2017-12-08	17.500000
1	2017-12-09 00:01:00+00:00	2017-12-08	24.016667
2	2017-12-09 00:58:00+00:00	2017-12-08	24.966667

In rare cases, it may be valuable to shift date and time by negative values. In this example where a log date starts at 8:00 PM the previous calendar day and ends at 8:00 PM the current calendar day, note that the last two rows have negative float times and their date membership is shifted one date further than their original calendar datetime.

df['float_time'] = find_float_time(df, h = -4)
df['date'] = find_date(df, h = -4)
df[['original_logtime','date', 'float_time']].head(5)

	original_logtime	date	float_time
0	2017-12-08 17:30:00+00:00	2017-12-08	17.500000
1	2017-12-09 00:01:00+00:00	2017-12-09	0.016667
2	2017-12-09 00:58:00+00:00	2017-12-09	0.966667
3	2018-02-22 21:52:00+00:00	2018-02-23	-2.133333
4	2018-02-22 22:53:00+00:00	2018-02-23	-1.116667

source

week_from_start

 week_from_start (data_source:Union[str,pandas.core.frame.DataFrame],
                  identifier:int=1)

Calculates the number of weeks between each logging entry and the first logging entry for each participant. A ‘date’ column must exist in the provided data source. Using the provided find_date function is recommended.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column (with indexing starting from 0).
Returns	np.array		Array of weeks passed from log date to the minimum date for each participant.

df = file_loader('data/test_food_details.csv')
df['original_logtime'] = pd.to_datetime(df['original_logtime'])

Using find_date to ensure that a date column exists in the data source is recommended. A column labeled ‘date’ is a requirement of this function.

df['date'] = find_date(df)
df['week_from_start'] = week_from_start(df)
df[['unique_code','original_logtime','week_from_start']][2:4]

	unique_code	original_logtime	week_from_start
2	alqt14018795225	2017-12-09 00:58:00+00:00	1
3	alqt14018795225	2018-02-22 21:52:00+00:00	11

source

find_phase_duration

 find_phase_duration (df:pandas.core.frame.DataFrame)

Calculates the duration (in days) of the study phase for each row.

	Type	Details
df	pd.DataFrame	Participant information dataframe with columns for start and ending date for that row’s study phase. The expected column numbers for starting and ending dates are outlined in the HOWTO document that accompanies TREETS.
Returns	pd.DataFrame	Dataframe with an additional column describing study phase duration.

find_phase_duration(pd.read_excel('data/col_test_data/toy_data_17May2021.xlsx'))[['phase_duration']]

	phase_duration
0	3 days
1	4 days
2	3 days
3	4 days
4	NaT

source

load_food_data

 load_food_data (data_source:Union[str,pandas.core.frame.DataFrame],
                 h:int, identifier:int=1, datetime_col:int=5)

Loads and processes existing logging data, adding specific datetime information in formats more suitable for TREETS functions.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
h	int		Number of hours to shift the definition of ‘date’ by. h = 4 would indicate that a log date begins at 4:00 AM and ends the following calendar day at 3:59:59 AM. Float representations of time would therefore go from 4.0 (inclusive) to 28.0 (exclusive) to represent ‘date’ membership for days shifted from their original calendar date.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column (with indexing starting from 0).
datetime_col	int	5	Column number for an existing datetime column in provided data source. Data exported from mCC typically has datetime as its 5th column (with indexing starting from 0).
Returns	pd.DataFrame		Dataframe with additional date, float time, and week from start columns.

load_food_data('data/test_food_details.csv', h = 4).head(2)

	ID	unique_code	research_info_id	desc_text	food_type	original_logtime	date	float_time	time	week_from_start	year
0	7572733	alqt14018795225	150	Water	w	2017-12-08 17:30:00+00:00	2017-12-08	17.500000	17:30:00	1	2017
1	411111	alqt14018795225	150	Coffee White	b	2017-12-09 00:01:00+00:00	2017-12-08	24.016667	00:01:00	1	2017

source

in_good_logging_day

 in_good_logging_day (data_source:Union[str,pandas.core.frame.DataFrame],
                      min_log_num:int=2, min_separation:int=5,
                      identifier:int=1, date_col:int=6, time_col:int=7)

Calculates if each log is considered to be within a ‘good logging day’. A log day is considered ‘good’ if there are at least the minimum number of required logs, with a minimum specified hour separation between the first and last log for that log date. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
min_log_num	int	2	Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation	int	5	Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column (with indexing starting from 0).
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	np.array		Boolean array describing whether each log is a ‘good’ logging day.

df = load_food_data('data/test_food_details.csv', h = 4)
df['in_good_logging_day'] = in_good_logging_day(df)
df.head(2)

	ID	unique_code	research_info_id	desc_text	food_type	original_logtime	date	float_time	time	week_from_start	year	in_good_logging_day
0	7572733	alqt14018795225	150	Water	w	2017-12-08 17:30:00+00:00	2017-12-08	17.500000	17:30:00	1	2017	True
1	411111	alqt14018795225	150	Coffee White	b	2017-12-09 00:01:00+00:00	2017-12-08	24.016667	00:01:00	1	2017	True

source

FoodParser

 FoodParser ()

Food parser handles taking unprocessed food log entries and adding relevant information from a pre-made dictionary. This includes matching unprocessed terms to their likely matches, adding food type and other identifying information.

source

clean_loggings

 clean_loggings (data_source:Union[str,pandas.core.frame.DataFrame],
                 identifier:int=1)

Cleans and attempts typo correction for all logging text entries.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column (with indexing starting from 0).
Returns	pd.DataFrame		Dataframe with an additional column containing cleaned and typo corrected item entries.

Text descriptions of food items are cleaned using a built-in dictionary of common typos and corrections for each phrase. Phrases are then matched using a dictionary of known n-gram item names. The resulting item(s) are provided as a list.

clean_loggings('data/output/public.json').head(3)

	unique_code	desc_text	cleaned
0	alqt14018795225	Water	[water]
1	alqt14018795225	Coffee White	[coffee, white]
2	alqt14018795225	Salad	[salad]

source

get_types

 get_types (data_source:Union[str,pandas.core.frame.DataFrame],
            food_type:Union[str,list])

Filters logs for only logs of specified type(s).

	Type	Details
data_source	str \| pd.DataFrame	String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is. A column ‘food_type’ is required to be within the data.
food_type	str \| list	A single food type, or list of food types. Valid types are ‘f’: food, ‘b’: beverage, ‘w’: water, and ‘m’: medication.
Returns	pd.DataFrame	Dataframe filtered for only logs of specific type(s).

Type selection accepts multiple types at once as a list of entry types. All types chosen must be valid.

Available food types include:

‘f’: Food
‘b’: Beverage
‘w’: Water
‘m’: Medication

Flavored water beverages such as La Croix are counted as ‘water’ and not as ‘beverage’.

get_types('data/output/baseline.json',['w', 'f'])[['unique_code','desc_text','food_type']].head(3)

	unique_code	desc_text	food_type
0	alqt14018795225	Water	w
2	alqt14018795225	Salad	f
3	alqt78896444285	Water	w

Filtering for a single type is also possible.

df = load_food_data('data/test_food_details.csv', h = 4)
get_types(df, 'm')[['unique_code','desc_text','food_type']].head(3)

	unique_code	desc_text	food_type
323	alqt14018795225	Caffeine	m
361	alqt14018795225	Caffeine	m
420	alqt14018795225	Caffeine	m

source

count_caloric_entries

 count_caloric_entries (df:pandas.core.frame.DataFrame)

Counts the number of food (‘f’) and beverage (‘b’) loggings.

	Type	Details
df	pd.DataFrame	Dataframe of food logging data.
Returns	int	Number of caloric (food or beverage) entries found.

df = load_food_data('data/test_food_details.csv', h = 4)
count_caloric_entries(df)

source

mean_daily_eating_duration

 mean_daily_eating_duration (df:pandas.core.frame.DataFrame,
                             date_col:int=6, time_col:int=7)

Calculates mean daily eating window by taking the average of each day’s eating window. An eating window is defined as the duration of time between first and last caloric (food or beverage) intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	float		Float representation of average daily eating window duration.

df = load_food_data('data/test_food_details.csv', h = 4)
mean_daily_eating_duration(df)

14.038679245283017

source

std_daily_eating_duration

 std_daily_eating_duration (df:pandas.core.frame.DataFrame,
                            date_col:int=6, time_col:int=7)

Calculates the standard deviation of the daily eating window. An eating window is defined as the duration of time between first and last caloric (food or beverage) intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	float		Float representation of the standard deviation of daily eating window duration.

df = load_food_data('data/test_food_details.csv', h = 4)
std_daily_eating_duration(df)

7.018679942775867

source

earliest_entry

 earliest_entry (df:pandas.core.frame.DataFrame, time_col:int=7)

Calculates the earliest recorded caloric (food or beverage) entry. It is recommended that you use find_float_time to generate necessary the time column for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	float		Float representation of the earliest logtime on any date.

df = load_food_data('data/test_food_details.csv', h = 4)
earliest_entry(df)

4.0

source

mean_first_cal

 mean_first_cal (df:pandas.core.frame.DataFrame, date_col:int=6,
                 time_col:int=7)

Calculates the average time of first caloric intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	float		Float representation of average first caloric entry time.

df = load_food_data('data/test_food_details.csv', h = 4)
mean_first_cal(df)

9.22680817610063

# find the average mean first cal time for each participant
df.groupby(['unique_code']).agg(mean_first_cal, date_col = 6, time_col = 7).iloc[:,0]

unique_code
alqt1148284857      7.315278
alqt14018795225     7.635938
alqt16675467779     6.153904
alqt21525720972    13.211957
alqt45631586569    15.056295
alqt5833085442     12.551515
alqt62359040167     7.252137
alqt6695047873      7.573077
alqt78896444285     6.347510
alqt8668165687      9.702555
Name: ID, dtype: float64

source

std_first_cal

 std_first_cal (df:pandas.core.frame.DataFrame, date_col:int=6,
                time_col:int=7)

Calculates the standard deviation for time of first caloric intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	float		Float representation of the standard deviation of first caloric entry time.

df = load_food_data('data/test_food_details.csv', h = 4)
std_first_cal(df)

4.591417471559444

source

mean_last_cal

 mean_last_cal (df:pandas.core.frame.DataFrame, date_col:int=6,
                time_col:int=7)

Calculates the average time of last caloric intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	float		Float representation of average last caloric entry time.

df = load_food_data('data/test_food_details.csv', h = 4)
mean_last_cal(df)

23.265487421383646

source

std_last_cal

 std_last_cal (df:pandas.core.frame.DataFrame, date_col:int=6,
               time_col:int=7)

Calculates the standard deviation for time of last caloric intake. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	float		Float representation of the standard deviation of last caloric entry time.

df = load_food_data('data/test_food_details.csv', h = 4)
std_last_cal(df, 'date', 'float_time')

4.359435007580498

source

mean_daily_eating_occasions

 mean_daily_eating_occasions (df:pandas.core.frame.DataFrame,
                              date_col:int=6, time_col:int=7)

Calculates the average number of daily eating occasions. An eating occasion is a single caloric (food or beverage) log. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	int		Average number of daily eating occasions.

df = load_food_data('data/test_food_details.csv', h = 4)
mean_daily_eating_occasions(df, 'date', 'float_time')

6.8915094339622645

source

std_daily_eating_occasions

 std_daily_eating_occasions (df:pandas.core.frame.DataFrame,
                             date_col:int=6, time_col:int=7)

Calculates the standard deviation of the number of daily eating occasions. An eating occasion is a single caloric (food or beverage) log. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	int		Standard deviation of the number of daily eating occasions.

df = load_food_data('data/test_food_details.csv', h = 4)
std_daily_eating_occasions(df, 'date', 'float_time')

4.44839423402741

source

mean_daily_eating_midpoint

 mean_daily_eating_midpoint (df:pandas.core.frame.DataFrame,
                             date_col:int=6, time_col:int=7)

Calculates the average daily midpoint eating occasion time. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	int		Float representation of the average daily midpoint eating occasion time.

df = load_food_data('data/test_food_details.csv', h = 4)
mean_daily_eating_midpoint(df, 'date', 'float_time')

16.536425576519914

source

std_daily_eating_midpoint

 std_daily_eating_midpoint (df:pandas.core.frame.DataFrame,
                            date_col:int=6, time_col:int=7)

Calculates the standard deviation of the daily midpoint eating occasion time. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column for ‘food_type’ must exist within the data.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	int		Float representation of the standard deviation of the daily midpoint eating occasion time.

df = load_food_data('data/test_food_details.csv', h = 4)
std_daily_eating_midpoint(df, 'date', 'float_time')

4.107072970435106

source

logging_day_counts

 logging_day_counts (df:pandas.core.frame.DataFrame)

Calculates the number of days that contain any logs. It is recommended that you use find_date to generate the necessary date column for this function.

	Type	Details
df	pd.DataFrame	Dataframe of food logging data. A column for ‘date’ must exist within the data.
Returns	int	Number of days with at least one log on that day.

df = load_food_data('data/test_food_details.csv', h = 4)
logging_day_counts(df)

source

find_missing_logging_days

 find_missing_logging_days (df:pandas.core.frame.DataFrame,
                            start_date:datetime.date='not_defined',
                            end_date:datetime.date='not_defined')

Finds days that have no log entries between a start (inclusive) and end date (inclusive). It is recommended that you use find_date to generate the necessary date column for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data.
start_date	datetime.date	not_defined	Starting date for missing day evaluation. By default the earliest date in the data will be used.
end_date	datetime.date	not_defined	Ending date for missing day evaluation. By default the latest date in the data will be used.
Returns	list		List of days within the given timeframe that have no log entries.

The phrase ‘not_defined’ is the intended default value for start and end dates to signify that the earliest and/or latest date within the data should be used. If a participant is missing a valid start or end date, null is returned.

df = load_food_data('data/test_food_details.csv', h = 4)
find_missing_logging_days(df, datetime.date(2017, 12, 7), datetime.date(2017, 12, 10))

[datetime.date(2017, 12, 7),
 datetime.date(2017, 12, 9),
 datetime.date(2017, 12, 10)]

df[df['date'].astype(str).str.contains("2017-12")]

	ID	unique_code	research_info_id	desc_text	food_type	original_logtime	date	float_time	time	week_from_start	year
0	7572733	alqt14018795225	150	Water	w	2017-12-08 17:30:00+00:00	2017-12-08	17.500000	17:30:00	1	2017
1	411111	alqt14018795225	150	Coffee White	b	2017-12-09 00:01:00+00:00	2017-12-08	24.016667	00:01:00	1	2017
2	8409118	alqt14018795225	150	Salad	f	2017-12-09 00:58:00+00:00	2017-12-08	24.966667	00:58:00	1	2017

source

good_lwa_day_counts

 good_lwa_day_counts (df:pandas.core.frame.DataFrame,
                      window_start:datetime.time,
                      window_end:datetime.time, min_log_num:int=2,
                      min_separation:int=5, buffer_time:str='15 minutes',
                      h:int=4, start_date:datetime.date='not_defined',
                      end_date:datetime.date='not_defined',
                      time_col:int=7)

Calculates the number of ‘good’ logging days, ‘good’ window days, ‘outside’ window days and adherent days.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data.
window_start	datetime.time		Starting time for a time restriction window.
window_end	datetime.time		Ending time for a time restriction window.
min_log_num	int	2	Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation	int	5	Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
buffer_time	str	15 minutes	pd.Timedelta parsable string, representing ‘wiggle room’ for adherence.
h	int	4	Number of hours to shift the definition of ‘date’ by. h = 4 would indicate that a log date begins at 4:00 AM and ends the following calendar day at 3:59:59. Float representations of time would therefore go from 4.0 (inclusive) to 28.0 (exclusive) to represent ‘date’ membership for days shifted from their original calendar date.
start_date	datetime.date	not_defined	Starting date for missing day evaluation. By default the earliest date in the data will be used.
end_date	datetime.date	not_defined	Ending date for missing day evaluation. By default the latest date in the data will be used.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	tuple[list, list]		List containing number of ‘good’ logging days, ‘good’ window days, ‘outside’ window days, and adherent days. List of three lists. The lists contains dates that are not considered ‘good’ logging days, ‘good’ window days, or adherent days (in that order).

The main use of this function is to calculate window and logging adherence. These are represented as ‘good’ (valid) logging days, ‘good’ window days, ‘outside’ (invalid) window days, and adherent days.

The definition of each is:

‘Good’ Logging Day
- A day with at least a specified minimum number of caloric (food or beverage) logs with a minimum specified number of hours between the first and last log for that day.
‘Good’ Window Day
- A day where all food loggings are within the participant’s assigned eating restriction window plus any wiggle room, if allowed.
Adherent Day
- A day that is both a ‘good’ logging day and a ‘good’ window day.

df = load_food_data('data/test_food_details.csv', h = 4)
dates, bad_dates = good_lwa_day_counts(df, datetime.time(8,0,0), datetime.time(23,59,59))
dates

The second product of this function is three lists that outline which days are not compliant with one of the definitions above. The first list (index 0) consists of dates that are not ‘good’ logging days, the second contains days that are not ‘good’ window days. The final list consists of dates that are not adherent (neither ‘good’ window nor ‘good’ logging dates).

bad_dates[0][:5]

Experiment Design

This group of functions provides methods for filtering participant data.

source

filtering_usable_data

 filtering_usable_data (df:pandas.core.frame.DataFrame, num_items:int,
                        num_days:int, identifier:int=1, date_col:int=6)

Filters data for only participants who’s data satisfies the minimum number of days and logs. It is recommended that you use find_date to generate the necessary date column for this function.

	Type	Default	Details
df	pd.DataFrame		Dataframe of food logging data. A column ‘desc_text’, typically found in mCC data is required.
num_items	int		Minimum number of logs required to pass filter criteria.
num_days	int		Minimum number of unique logging days required to pass filter criteria.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column (with indexing starting from 0).
date_col	int	6	Column number for an existing date column in provided data source.
Returns	tuple[pd.DataFrame, set]		Data filtered to only include data from participants that have passed filtering criteria. Set of participants that passed filtering criteria.

df = file_loader('data/output/public.json')

filtering_usable_data(df, num_items = 1000, num_days = 14)[0].shape

source

prepare_baseline_and_intervention_usable_data

 prepare_baseline_and_intervention_usable_data
                                                (data_source:Union[str,pan
                                                das.core.frame.DataFrame],
                                                baseline_num_items:int,
                                                baseline_num_days:int, int
                                                ervention_num_items:int,
                                                intervention_num_days:int,
                                                identifier:int=1,
                                                date_col:int=6)

Filters data for ‘usable’ data within baseline and last two weeks of intervention (weeks 13 and 14). It is recommended that you use the function ‘week_from_start’ to generate the necessary week column for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
baseline_num_items	int		Number of logs for a participant’s baseline data to pass filter criteria.
baseline_num_days	int		Number of unique logging days for a participant’s baseline data to pass filter criteria.
intervention_num_items	int		Number of logs for a participant’s intervention data to pass filter criteria.
intervention_num_days	int		Number of unique logging days for a participant’s intervention data to pass filter criteria.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column (with indexing starting from 0).
date_col	int	6	Column number for an existing date column in provided data source.
Returns	list		List of two dataframes: usable baseline data, usable intervention data.

df= prepare_baseline_and_intervention_usable_data('data/output/public.json', 20, 10, 40, 12)[0]
df.head(2)

df.shape

Analysis and Data Summaries

Data analysis and summary functions, including summary functions for specific statistics.

source

users_sorted_by_logging

 users_sorted_by_logging
                          (data_source:Union[str,pandas.core.frame.DataFra
                          me], food_type:list=['f', 'b', 'm', 'w'],
                          min_log_num:int=2, min_separation:int=4,
                          identifier:int=1, date_col:int=6,
                          time_col:int=7)

Reports the number of ‘good’ logging days for each user, in descending order based on number of ‘good’ logging days.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
food_type	list	[‘f’, ‘b’, ‘m’, ‘w’]	A single food type, or list of food types. Valid types are ‘f’: food, ‘b’: beverage, ‘w’: water, and ‘m’: medication.
min_log_num	int	2	Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation	int	4	Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column (with indexing starting from 0).
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	pd.DataFrame		Dataframe containing the number of good logging days for each user.

users_sorted_by_logging('data/output/public.json', ['f','b']).head(2)

source

eating_intervals_percentile

 eating_intervals_percentile
                              (data_source:Union[str,pandas.core.frame.Dat
                              aFrame], identifier:int=1, time_col:int=7)

Calculates the 2.5, 5, 10, 12.5, 25, 50, 75, 87.5, 90, 95, and 97.5 percentile eating time for each participant. It also calculates the middle 95, 90, 80, 75, and 50 percentile eating windows for each participant. It is recommended that you use find_float_time to generate necessary the time column for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column (with indexing starting from 0).
time_col	int	7	Column number for an existing time column in provided data source.
Returns	pd.DataFrame		Dataframe with count, mean, std, min, quantiles and mid XX%tile eating window durations for all participants.

df = load_food_data('data/test_food_details.csv', h = 4)
eating_intervals_percentile(df).iloc[:2]

source

first_cal_analysis_summary

 first_cal_analysis_summary
                             (data_source:Union[str,pandas.core.frame.Data
                             Frame], min_log_num:int=2,
                             min_separation:int=4, identifier:int=1,
                             date_col:int=6, time_col:int=7)

Calculates the 5, 10, 25 , 50, 75, 90, 95 percentile of first caloric entry time for each participant on ‘good’ logging days. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
min_log_num	int	2	Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation	int	4	Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column (with indexing starting from 0).
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	pd.DataFrame		Dataframe with 5, 10, 25, 50, 75, 90, 95 percentile of first caloric entry time for all participants.

first_cal_analysis_summary('data/output/baseline.json').head(3)

source

last_cal_analysis_summary

 last_cal_analysis_summary
                            (data_source:Union[str,pandas.core.frame.DataF
                            rame], min_log_num:int=2,
                            min_separation:int=4, identifier:int=1,
                            date_col:int=6, time_col:int=7)

Calculates the 5, 10, 25 , 50, 75, 90, 95 percentile of last caloric entry time for each participant on ‘good’ logging days. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is.
min_log_num	int	2	Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation	int	4	Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column (with indexing starting from 0).
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	pd.DataFrame		Dataframe with 5, 10, 25, 50, 75, 90, 95 percentile of last caloric entry time for all participants.

last_cal_analysis_summary('data/output/baseline.json').head(3)

source

summarize_data

 summarize_data (data_source:Union[str,pandas.core.frame.DataFrame],
                 min_log_num:int=2, min_separation:int=4,
                 identifier:int=1, date_col:int=6, time_col:int=7)

Summarizes participant data, including number of days, total number of logs, number of food/beverage logs, number of medication logs, number of water logs, eating window duration information, first and last caloric log information, and adherence.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is. Must have a column for ‘food_type’ within the data.
min_log_num	int	2	Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation	int	4	Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column (with indexing starting from 0).
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	pd.DataFrame		Summary dataframe.

This function provides summary data for an entire study, without separating for study phases. Summaries include statistics for first and last caloric log, eating window, and relevant calculations for middle 95 percentile eating window.

df = load_food_data('data/test_food_details.csv', h = 4)
summarize_data(df)

source

summarize_data_with_experiment_phases

 summarize_data_with_experiment_phases
                                        (food_data:pandas.core.frame.DataF
                                        rame, ref_tbl:pandas.core.frame.Da
                                        taFrame, min_log_num:int=2,
                                        min_separation:int=5,
                                        buffer_time:str='15 minutes',
                                        h:int=4, report_level:int=2,
                                        txt:bool=False)

Summarizes participant data for each experiment phase and eating window assignment. Summary includes number of days, total number of logs, number of food/beverage logs, number of medication logs, number of water logs, eating window duration information, first and last caloric log information, and adherence.

	Type	Default	Details
food_data	pd.DataFrame		Dataframe of food logging data. A column for “original_logtime” must exist within the data. mCC output style data is expected.
ref_tbl	pd.DataFrame		Participant data reference table. See the accompanying HOWTO document for required column positions and formatting.
min_log_num	int	2	Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation	int	5	Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
buffer_time	str	15 minutes	pd.Timedelta parsable string, representing ‘wiggle room’ for adherence.
h	int	4	Number of hours to shift the definition of ‘date’ by. h = 4 would indicate that a log date begins at 4:00 AM and ends the following calendar day at 3:59:59. Float representations of time would therefore go from 4.0 (inclusive) to 28.0 (exclusive) to represent ‘date’ membership for days shifted from their original calendar date.
report_level	int	2	Additional printed info detail level. 0 = No Report. 1 = Report ‘No Logging Days’. 2 = Report ‘No Logging Days’, ‘Bad Logging Days’, ‘Bad Window Days’, and ‘Non-Adherent Days’.
txt	bool	False	If True, a text format (.txt) report will be saved in the current directory, with the name ‘treets_warning_dates.txt’
Returns	pd.DataFrame		Summary dataframe, where each row represents the summary for a participant during a particular study phase. Participants can have multiple rows for a single study phase if, during that study phase, their assigned eating window is altered.

df = summarize_data_with_experiment_phases(pd.read_csv('data/col_test_data/toy_data_2000.csv')\
                      , pd.read_excel('data/col_test_data/toy_data_17May2021.xlsx'), report_level = 2)
df.T

Plots

Plotting functions.

source

first_cal_mean_with_error_bar

 first_cal_mean_with_error_bar
                                (data_source:Union[str,pandas.core.frame.D
                                ataFrame], min_log_num:int=2,
                                min_separation:int=4, identifier:int=1,
                                date_col:int=6, time_col:int=7)

Represents mean and standard deviation of first caloric intake time for each participant as a scatter plot, with participants as the x-axis and time as the y-axis. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is. Must have a column for ‘food_type’ within the data.
min_log_num	int	2	Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation	int	4	Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	matplotlib.figure.Figure		Matplotlib figure object.

first_cal_mean_fig = first_cal_mean_with_error_bar('data/output/baseline.json')

source

last_cal_mean_with_error_bar

 last_cal_mean_with_error_bar
                               (data_source:Union[str,pandas.core.frame.Da
                               taFrame], min_log_num:int=2,
                               min_separation:int=4, identifier:int=1,
                               date_col:int=6, time_col:int=7)

Represents mean and standard deviation of last caloric intake time for each participant as a scatter plot, with the x-axis as participants and the y-axis as time. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is. Must have a column for ‘food_type’ within the data.
min_log_num	int	2	Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation	int	4	Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	matplotlib.figure.Figure		Matplotlib figure object.

last_cal_mean_fig = last_cal_mean_with_error_bar('data/output/baseline.json')

source

first_cal_analysis_variability_plot

 first_cal_analysis_variability_plot
                                      (data_source:Union[str,pandas.core.f
                                      rame.DataFrame], min_log_num:int=2,
                                      min_separation:int=4,
                                      identifier:int=1, date_col:int=6,
                                      time_col:int=7)

Calculates first caloric log time variability for ‘good’ logging days by subtracting 5, 10, 25, 50, 75, 90, 95 percentile of first caloric intake time from the 50th percentile first caloric intake time. It also produces a histogram that represents the 90%-10% interval for all participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is. Must have a column for ‘food_type’ within the data.
min_log_num	int	2	Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation	int	4	Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	matplotlib.figure.Figure		Matplotlib figure object.

first_cal_var_plot = first_cal_analysis_variability_plot('data/output/baseline.json')

source

last_cal_analysis_variability_plot

 last_cal_analysis_variability_plot
                                     (data_source:Union[str,pandas.core.fr
                                     ame.DataFrame], min_log_num:int=2,
                                     min_separation:int=4,
                                     identifier:int=1, date_col:int=6,
                                     time_col:int=7)

Calculates last caloric log time variability for ‘good’ logging days by subtracting 5, 10, 25, 50, 75, 90, 95 percentile of last caloric intake time from the 50th percentile last caloric intake time. It also produces a histogram that represents the 90%-10% interval for all participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is. Must have a column for ‘food_type’ within the data.
min_log_num	int	2	Minimum number of logs required for a day to be considered a ‘good’ logging day.
min_separation	int	4	Minimum number of hours between first and last log on a log day for it to be considered a ‘good’ logging day.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	matplotlib.figure.Figure		Matplotlib figure object.

last_cal_var_plot = last_cal_analysis_variability_plot('data/output/baseline.json')

source

first_cal_avg_histplot

 first_cal_avg_histplot
                         (data_source:Union[str,pandas.core.frame.DataFram
                         e], identifier:int=1, date_col:int=6,
                         time_col:int=7)

Plots a histogram of average first caloric intake for all participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is. Must have a column for ‘food_type’ within the data.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	matplotlib.figure.Figure		Matplotlib figure object.

first_cal_avg_plot = first_cal_avg_histplot('data/output/baseline.json')

source

first_cal_sample_distplot

 first_cal_sample_distplot
                            (data_source:Union[str,pandas.core.frame.DataF
                            rame], n:int, replace:bool=False,
                            identifier:int=1, date_col:int=6,
                            time_col:int=7)

Creates a distplot for the first caloric intake time for a random selection of ‘n’ number of participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is. Must have a column for ‘food_type’ within the data.
n	int		Number of participants to plot for, selected randomly without replacement.
replace	bool	False	If true, samples with replacement. Samples without replacement by default.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	matplotlib.figure.Figure		Matplotlib figure object.

first_cal_distplot = first_cal_sample_distplot('data/output/intervention.json', n = 5, replace = False)

source

last_cal_avg_histplot

 last_cal_avg_histplot
                        (data_source:Union[str,pandas.core.frame.DataFrame
                        ], identifier:int=1, date_col:int=6,
                        time_col:int=7)

Plots a histogram of average last caloric intake for all participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is. Must have a column for ‘food_type’ within the data.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	matplotlib.figure.Figure		Matplotlib figure object.

last_cal_avg_hist = last_cal_avg_histplot('data/output/baseline.json')

source

last_cal_sample_distplot

 last_cal_sample_distplot
                           (data_source:Union[str,pandas.core.frame.DataFr
                           ame], n:int, replace:bool=False,
                           identifier:int=1, date_col:int=6,
                           time_col:int=7)

Creates a distplot for the last caloric intake time for a random selection of ‘n’ number of participants. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is. Must have a column for ‘food_type’ within the data.
n	int		Number of participants to plot for, selected randomly without replacement.
replace	bool	False	If true, samples with replacement. Samples without replacement by default.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	matplotlib.figure.Figure		Matplotlib figure object.

last_cal_distplot = last_cal_sample_distplot('data/output/intervention.json', n = 5, replace=False)

source

swarmplot

 swarmplot (data_source:Union[str,pandas.core.frame.DataFrame],
            max_loggings:int, identifier:int=1, date_col:int=6,
            time_col:int=7)

Creates a swarmplot for participants logging data. It is recommended that you use find_date and find_float_time to generate necessary date and time columns for this function.

	Type	Default	Details
data_source	str \| pd.DataFrame		String file or folder path. Single .json or .csv paths create a pd.DataFrame. Folder paths with files matching the input pattern are read together into a single pd.DataFrame. Existing dataframes are read as is. Must have a column for ‘food_type’ within the data.
max_loggings	int		Maximum number of randomly selected logs to be plotted for each participant.
identifier	int	1	Column number for an existing unique identifier column in provided data source. Data exported from mCC typically has a unique identifier as its 1st column.
date_col	int	6	Column number for an existing date column in provided data source.
time_col	int	7	Column number for an existing time column in provided data source.
Returns	matplotlib.figure.Figure		Matplotlib figure object.

swarm = swarmplot('data/output/public.json', max_loggings = 20)