Date/time examples¶
How can I filter by date?¶
Given input like
cat dates.csv
date,event 2018-02-03,initialization 2018-03-07,discovery 2018-02-03,allocation
we can use strptime to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the strptime format-string. For example:
mlr --csv filter ' strptime($date, "%Y-%m-%d") > strptime("2018-03-03", "%Y-%m-%d") ' dates.csv
date,event 2018-03-07,discovery
Caveat: localtime-handling in timezones with DST is still a work in progress; see https://github.com/johnkerl/miller/issues/170 . See also https://github.com/johnkerl/miller/issues/208 -- thanks @aborruso!
Finding missing dates¶
Suppose you have some date-stamped data which may (or may not) be missing entries for one or more dates:
head -n 10 data/miss-date.csv
date,qoh 2012-03-05,10055 2012-03-06,10486 2012-03-07,10430 2012-03-08,10674 2012-03-09,10880 2012-03-10,10718 2012-03-11,10795 2012-03-12,11043 2012-03-13,11177
wc -l data/miss-date.csv
1372 data/miss-date.csv
Since there are 1372 lines in the data file, some automation is called for. To find the missing dates, you can convert the dates to seconds since the epoch using strptime
, then compute adjacent differences (the cat -n
simply inserts record-counters):
mlr --from data/miss-date.csv --icsv \ cat -n \ then put '$datestamp = strptime($date, "%Y-%m-%d")' \ then step -a delta -f datestamp \ | head
n=1,date=2012-03-05,qoh=10055,datestamp=1330905600,datestamp_delta=0 n=2,date=2012-03-06,qoh=10486,datestamp=1330992000,datestamp_delta=86400 n=3,date=2012-03-07,qoh=10430,datestamp=1331078400,datestamp_delta=86400 n=4,date=2012-03-08,qoh=10674,datestamp=1331164800,datestamp_delta=86400 n=5,date=2012-03-09,qoh=10880,datestamp=1331251200,datestamp_delta=86400 n=6,date=2012-03-10,qoh=10718,datestamp=1331337600,datestamp_delta=86400 n=7,date=2012-03-11,qoh=10795,datestamp=1331424000,datestamp_delta=86400 n=8,date=2012-03-12,qoh=11043,datestamp=1331510400,datestamp_delta=86400 n=9,date=2012-03-13,qoh=11177,datestamp=1331596800,datestamp_delta=86400 n=10,date=2012-03-14,qoh=11498,datestamp=1331683200,datestamp_delta=86400
Then, filter for adjacent difference not being 86400 (the number of seconds in a day):
mlr --from data/miss-date.csv --icsv \ cat -n \ then put '$datestamp = strptime($date, "%Y-%m-%d")' \ then step -a delta -f datestamp \ then filter '$datestamp_delta != 86400 && $n != 1'
n=774,date=2014-04-19,qoh=130140,datestamp=1397865600,datestamp_delta=259200 n=1119,date=2015-03-31,qoh=181625,datestamp=1427760000,datestamp_delta=172800
Given this, it's now easy to see where the gaps are:
mlr cat -n then filter '$n >= 770 && $n <= 780' data/miss-date.csv
n=770,1=2014-04-12,2=129435 n=771,1=2014-04-13,2=129868 n=772,1=2014-04-14,2=129797 n=773,1=2014-04-15,2=129919 n=774,1=2014-04-16,2=130181 n=775,1=2014-04-19,2=130140 n=776,1=2014-04-20,2=130271 n=777,1=2014-04-21,2=130368 n=778,1=2014-04-22,2=130368 n=779,1=2014-04-23,2=130849 n=780,1=2014-04-24,2=131026
mlr cat -n then filter '$n >= 1115 && $n <= 1125' data/miss-date.csv
n=1115,1=2015-03-25,2=181006 n=1116,1=2015-03-26,2=180995 n=1117,1=2015-03-27,2=181043 n=1118,1=2015-03-28,2=181112 n=1119,1=2015-03-29,2=181306 n=1120,1=2015-03-31,2=181625 n=1121,1=2015-04-01,2=181494 n=1122,1=2015-04-02,2=181718 n=1123,1=2015-04-03,2=181835 n=1124,1=2015-04-04,2=182104 n=1125,1=2015-04-05,2=182528