{"id":82,"date":"2022-03-05T15:37:57","date_gmt":"2022-03-05T15:37:57","guid":{"rendered":"https:\/\/aida.unicas.it\/icprchallenge2022\/?page_id=82"},"modified":"2022-04-25T09:49:46","modified_gmt":"2022-04-25T09:49:46","slug":"dataset","status":"publish","type":"page","link":"https:\/\/aida.unicas.it\/icprchallenge2022\/dataset\/","title":{"rendered":"Dataset &#038; Evaluation"},"content":{"rendered":"<p lang=\"en-US\">All acquisitions were made between 2019 and 2021 with different sensors and in two different laboratories (Polish, Italian).<\/p>\n<p align=\"justify\"><span lang=\"en-US\">The dataset contains measures <\/span><span lang=\"en-US\">obtained on <\/span><span lang=\"en-US\">1<\/span><span lang=\"en-US\">0<\/span><span lang=\"en-US\"> substances plus the background (the <\/span><span lang=\"en-US\">w<\/span><span lang=\"en-US\">aste<\/span><span lang=\"en-US\">w<\/span><span lang=\"en-US\">ater \u2013 WW):<\/span><\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"373\" src=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table2-1-1024x373.png\" alt=\"\" class=\"wp-image-153\" srcset=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table2-1-1024x373.png 1024w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table2-1-300x109.png 300w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table2-1-768x279.png 768w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table2-1.png 1330w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n<p lang=\"en-US\" align=\"justify\">The measurement protocol has been divided into two steps:<\/p>\n<ul>\n<li lang=\"en-US\"><b>6<\/b><b>00<\/b> samples are acquired in warm-up mode, in this period of time sensors are exposed to WW only;<\/li>\n<li lang=\"en-US\"><b>1000<\/b> samples are acquired after analyte injection.<\/li>\n<\/ul>\n<p lang=\"en-US\" align=\"justify\">In this way, each acquisition contains 1,600 samples where the first 600 samples are measured in WW and the remaining 1000 samples are measured with the analyte mixed to WW. We can observe that obviously in the case of the 10 acquisitions of WW no substance has been injected during the entire acquisition (1600 samples).<\/p>\n<p lang=\"en-US\" align=\"justify\">The acquisition of each sample requires 1.6 seconds and consequently, each cycle is about 40 minutes long.<\/p>\n<p lang=\"en-US\" align=\"justify\">The dataset consists of 10 data acquisitions for each of the <i>11<\/i> substances (13<i> <\/i>pollutants plus WW) carried out with the measurement protocol previously described.<\/p>\n<p lang=\"en-US\" align=\"justify\">In the following figure is reported an example of the signals measured for different substances (ACETONE, HYDROCHLORICACID &#8211; HCL, etc.), for different frequencies (200Hz, 78kHz) in the case of PLATINUM IDE.<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"564\" src=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/figure3-1024x564.png\" alt=\"\" class=\"wp-image-152\" srcset=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/figure3-1024x564.png 1024w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/figure3-300x165.png 300w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/figure3-768x423.png 768w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/figure3.png 1367w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n<p lang=\"en-US\">The step in the middle of each acquisition corresponds to the injection of the substance. Let&#8217;s make some observations about the temporal trend of the measurements:<\/p>\n<ul>\n<li>\n<p lang=\"en-US\">often, the value of the baseline (the starting point of the curve) changes between different acquisitions: this phenomenon is partially related to poisoning, aging, and to the differences in sensors;<\/p>\n<\/li>\n<li>\n<p lang=\"en-US\">the shape of the curves after the injection time could be very different among different experiments;<\/p>\n<\/li>\n<li>\n<p lang=\"en-US\">the curve should not be interpreted as a \u201ctime series\u201d, because the slope after the injection and in general its shape, is strongly related to the speed of the injection of the substance that in the real scenario could have huge variability.<\/p>\n<\/li>\n<\/ul>\n\n\n<h2 class=\"wp-block-heading\">Training and test set<\/h2>\n\n\n<p lang=\"en-US\">For each substance, 9 acquisitions have been added to the training set and 1 has been added to the test set. The following table report the number of samples for each substance in the training and the test set:<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"371\" src=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table3-1-1024x371.png\" alt=\"\" class=\"wp-image-151\" srcset=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table3-1-1024x371.png 1024w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table3-1-300x109.png 300w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table3-1-768x279.png 768w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table3-1.png 1329w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n<p lang=\"en-US\">In particular, for wastewater, in the training set there are:<\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>\n<p><span lang=\"en-US\">600 samples <\/span><span lang=\"en-US\">coming from warm-up for each of the <\/span><span lang=\"en-US\">10 substances and for each of the 9 acquisitions; <\/span><\/p>\n<\/li>\n<li>\n<p><span lang=\"en-US\">1<\/span><span lang=\"en-US\">600 samples coming from 9 acquisitions made only in wastewater<\/span><\/p>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p lang=\"en-US\">while in the test set there are:<\/p>\n<ul>\n<li>\n<p><span lang=\"en-US\">600 samples coming from warm-up for each of the 10 substances <\/span><span lang=\"en-US\">and from <\/span><span lang=\"en-US\">1 acquisition.<\/span><\/p>\n<\/li>\n<li>\n<p><span lang=\"en-US\">1<\/span><span lang=\"en-US\">600 samples coming from <\/span><span lang=\"en-US\">1<\/span><span lang=\"en-US\"> acquisition made only in wastewater.<\/span><\/p>\n<\/li>\n<\/ul>\n\n\n<h2 class=\"wp-block-heading\">Data structure<\/h2>\n\n\n<p lang=\"en-US\">Each acquisition is contained in a .csv file where the name contains:<\/p>\n<p lang=\"en-US\">1_Experiment_19-11-2019_16-15_GLOBAL_L1_7N6_SWW_ACETICACID_1_ADC.csv<\/p>\n<ul>\n<li>\n<p lang=\"en-US\">The number of the experiment: 1_Experiment<\/p>\n<\/li>\n<li>\n<p lang=\"en-US\">Date and hour of the acquisition: 19-11-2019_16-15<\/p>\n<\/li>\n<li>\n<p lang=\"en-US\">Id of the adopted sensor: L1_7N6<\/p>\n<\/li>\n<li>\n<p lang=\"en-US\">Substance: ACETICACID (this name will not be present in the test set!)<\/p>\n<\/li>\n<li>\n<p lang=\"en-US\">Data type: ADC =&gt; integer values (Analog to Digital Converter values)<\/p>\n<\/li>\n<\/ul>\n<p lang=\"en-US\">and the data column follows the structure reported in the following table:<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"377\" src=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table1-1024x377.png\" alt=\"\" class=\"wp-image-77\" srcset=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table1-1024x377.png 1024w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table1-300x111.png 300w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table1-768x283.png 768w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table1-1536x566.png 1536w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/table1.png 1568w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Table 2<\/figcaption><\/figure>\n\n\n<p><span lang=\"en-US\">The training data are contained <\/span><span lang=\"en-US\">in a folder that <\/span><span lang=\"en-US\">contain<\/span><span lang=\"en-US\">s<\/span><span lang=\"en-US\"> a sub-folder for each substance. The folder of a single substance contains 9 files, one for each acquisition.<\/span><\/p>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"885\" height=\"767\" src=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/filesystem1.png\" alt=\"\" class=\"wp-image-87\" srcset=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/filesystem1.png 885w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/filesystem1-300x260.png 300w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/filesystem1-768x666.png 768w\" sizes=\"auto, (max-width: 885px) 100vw, 885px\" \/><\/figure>\n\n\n<p lang=\"en-US\">The test data (that will be hidden from the challengers) are contained into a folder that contains a sub-folder for each substance. The folder of a single substance contains 1 file (a single acquisition).<\/p>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"972\" height=\"783\" src=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/testFS.png\" alt=\"\" class=\"wp-image-143\" srcset=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/testFS.png 972w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/testFS-300x242.png 300w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/testFS-768x619.png 768w\" sizes=\"auto, (max-width: 972px) 100vw, 972px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Example of a csv file for training<\/h2>\n\n\n\n<p>This is an example of CSV file for training:<\/p>\n\n\n\n<div class=\"wp-block-file\"><a id=\"wp-block-file--media-1158c66f-787e-4c61-81c5-428e17338362\" href=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/04\/1_Experiment_19-11-2019_16-15_GLOBAL_L1_7N6_SWW_ACETICACID_1_ADC.csv\">1_Experiment_19-11-2019_16-15_GLOBAL_L1_7N6_SWW_ACETICACID_1_ADC<\/a><a href=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/04\/1_Experiment_19-11-2019_16-15_GLOBAL_L1_7N6_SWW_ACETICACID_1_ADC.csv\" class=\"wp-block-file__button\" download aria-describedby=\"wp-block-file--media-1158c66f-787e-4c61-81c5-428e17338362\">Download<\/a><\/div>\n\n\n<p>Each row represents a sample containing a value for all the features (as reported in Table 2). Each column represents the trend over time of the single feature. Please remember the first 600 rows are measured in wastewater, while the remaining 1000 samples are measured in the current substance (ACEDICACID in this example, as reported in the filename). The total number of rows for each file is equal to 1600 and the substance has been injected at the timestamp 600.<\/p>\n\n\n<h2 class=\"wp-block-heading\">Example of a CSV file for test<\/h2>\n\n\n<p>This is an example of a CSV file for training. In this case, the file contains only the features while there isn&#8217;t any information that gives some hints on the measured substance.<\/p>\n\n\n<div class=\"wp-block-file\"><a id=\"wp-block-file--media-4cc57a55-8d47-4c18-a91e-18eab9a18d3c\" href=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/04\/10_Experiment.csv\">10_Experiment<\/a><a href=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/04\/10_Experiment.csv\" class=\"wp-block-file__button\" download aria-describedby=\"wp-block-file--media-4cc57a55-8d47-4c18-a91e-18eab9a18d3c\">Download<\/a><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Downloads of the entire training set<\/h2>\n\n\n\n<div class=\"wp-block-file\"><a id=\"wp-block-file--media-3e8818bf-d252-4efd-94d0-a0c3309bcdd9\" href=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/04\/Train.zip\">Training set<\/a><a href=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/04\/Train.zip\" class=\"wp-block-file__button\" download aria-describedby=\"wp-block-file--media-3e8818bf-d252-4efd-94d0-a0c3309bcdd9\">Download<\/a><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Downloads of the entire test set<\/h2>\n\n\n\n<div class=\"wp-block-file\"><a id=\"wp-block-file--media-3e8818bf-d252-4efd-94d0-a0c3309bcdd9\" href=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/04\/Test.zip\">Test set<\/a><a href=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/04\/Test.zip\" class=\"wp-block-file__button\" download aria-describedby=\"wp-block-file--media-3e8818bf-d252-4efd-94d0-a0c3309bcdd9\">Download<\/a><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Evaluation metrics<\/h2>\n\n\n<p lang=\"en-US\" align=\"justify\">For the detection task, the Matthews Correlation Coefficient <b>MCC<\/b> for multi-classification (also called Rk statistic) evaluated on the test set will be the <b>metric used to rank the submissions<\/b>.<\/p>\n<p lang=\"en-US\" align=\"justify\">Starting from these definitions:<\/p>\n<p lang=\"en-US\" align=\"justify\"><b>True positive (TP) &#8211; <\/b>The number of correctly identified samples. The number of samples measured in presence of one of the 10 substances of interest.<\/p>\n<p lang=\"en-US\" align=\"justify\"><b>True negative (TN) &#8211; <\/b>The number of correctly identified negative samples, i.e., samples measured in wastewater.<\/p>\n<p lang=\"en-US\" align=\"justify\"><b>False positive (FP) &#8211; <\/b>The number of wrongly identified samples, i.e., a commonly called a &#8220;false alarm&#8221;. The number of samples classified as one of the 10 substances of interest but measured in wastewater.<\/p>\n<p lang=\"en-US\" align=\"justify\"><b>False negative (FN) &#8211; <\/b>The number of wrongly identified negative samples. The number of sample classified as wastewater but measured in presence of some of the 10 substances of interest.<\/p>\n<p lang=\"en-US\" align=\"justify\">The <b>MCC <\/b>is<b>:<\/b><\/p>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"951\" height=\"122\" src=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/mcc.png\" alt=\"\" class=\"wp-image-116\" srcset=\"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/mcc.png 951w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/mcc-300x38.png 300w, https:\/\/aida.unicas.it\/icprchallenge2022\/wp-content\/uploads\/2022\/03\/mcc-768x99.png 768w\" sizes=\"auto, (max-width: 951px) 100vw, 951px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>All acquisitions were made between 2019 and 2021 with different sensors and in two different laboratories (Polish, Italian). The dataset contains measures obtained on 10 substances plus the background (the wastewater \u2013 WW): The measurement protocol has been divided into two steps: 600 samples are acquired in warm-up mode, in this period of time sensors&hellip;&nbsp;<a href=\"https:\/\/aida.unicas.it\/icprchallenge2022\/dataset\/\" rel=\"bookmark\">Leggi tutto &raquo;<span class=\"screen-reader-text\">Dataset &#038; Evaluation<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","footnotes":""},"class_list":["post-82","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-json\/wp\/v2\/pages\/82","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-json\/wp\/v2\/comments?post=82"}],"version-history":[{"count":19,"href":"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-json\/wp\/v2\/pages\/82\/revisions"}],"predecessor-version":[{"id":197,"href":"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-json\/wp\/v2\/pages\/82\/revisions\/197"}],"wp:attachment":[{"href":"https:\/\/aida.unicas.it\/icprchallenge2022\/wp-json\/wp\/v2\/media?parent=82"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}