Difference: Real-BogusClassifications (12 vs. 13)

Revision 132019-02-19 - UmaaRebbapragada

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Overview

Line: 25 to 25
 
t7_f4_c3 08 May 2018
t8_f5_c3 29 May 2018
t12_f5_c3 07 Aug 2018
Added:
>
>
t15_f5_c3 10 Jan 2019
 

To Do List

Deleted:
<
<

Issues

  • Why are candids from IPAC DB not found in alerts db?
  • Why is cross validation performance decreasing steadily since t8
 

Pipeline Development

Changed:
<
<
  • Separate Galactic vs. Extra-Galactic
  • Get light curve observations from reals into training [ POSTPONED - no way to get all confirmed spectroscopic reals ]
  • Get an override label module (to override the majority vote)
>
>
  • Get light curve observations from reals into training vetted
 
  • Add cross matches from catalogs as a source (Nadia's from TNS)
  • Flag boguses with light curve observations greater than 2.
  • Catch variable stars with isdiffpos=True and false in the light curve
Deleted:
<
<
  • Kill very old objects (from before Feb 5, 2018)
 
  • Implement proper grid search over RF parameter space
Changed:
<
<
  • Get the testing framework into a Jupiter notebook
  • Get features from Kowalski vs. IPAC? Do same query against Kowalski and see if we recoup candidateIds
  • Pipeline Analysis:
    • Labeled Data Test Set * RB score improvements
>
>
  • Pipeline Analysis to automate:
  * Score improvement on known false positive, negatives (Ragnhild's List) * KL divergence between training set, test set features (to find major divergence) * Plot feature distributions on reals vs. boguses (Tiara's code)
Changed:
<
<
* what do low RB reals look like, what do high RB bogus look like?
>
>
    • What do low RB reals look like, what do high RB bogus look like?
  * Unlabeled Data * Score bias per features * Report feature importance by correlated feature groups
Line: 62 to 50
 
  • Active Learning to Improve Training Data Selection [ Sara ] * Use active learning to discover potential batches of boguses (and reals, alternatively)
  • Other Data Collection Sources (preferably automated)
Changed:
<
<
    • Find variables (bogus objects that have multiple alert packets, objects >= n_obs, objects with both positive and negative subtractions?)
>
>
    • Find variables (bogus objects that have multiple alert packets, objects >= n_obs, objects with both positive and negative subtractions?)
 
    • Automate cross matches from relevant catalogs (e.g., TNS)
  • Improving Quality of GROWTH Marshall Feed
    • ZTF objects provided in the GROWTH marshall are not necessarily spectroscopically-confirmed, they are saved. Is my list spectroscopically-confirmed? Email to Ashot and Mani
Line: 71 to 59
 

Open Issues / Experiments

  • Kowalski / IPAC candidate discrepancies
Added:
>
>
  • Separate Galactic vs. Extra-Galactic classifiers
 
  • Pixel Clump issues on certain x,y positions (see Ashish's email to Umaa on 6/28/18)
  • Use alert data for Real-Bogus in real-time (which means access to 150 features and postage stamps)
  • Deep Learning
  • Known boguses: ZTF18aabtvch, ZTF18aaiafnn, ZTF18aaizvmy
Deleted:
<
<

Papers:

  • ML Overview (response to Pub Board comments)
  • RB Paper
 

DONE

  • 2018-08 Correlation between nbad and boguses Report Here
Changed:
<
<
>
>
  • 2018-10 combine_labels.py has an override switch, to override the majority vote for examples that have been revetted
  • 2018-11 Filter out old sources (from before Feb 5, 2018)
  • 2018-11 Get features from Kowalski vs. IPAC? Kowalski packets are NOT a superset of IPAC db feats! An experiment limited to the intersection of Kowalski and IPAC db feats showed classifier performance decreased! But found workaround for DB performance issues (Frank gave me a way to get nid, rcid from a candid)
  • 2018-12 Cross validation performance decreasing steadily since t8 due to persistent contamination within the GROWTH marshall feed
  • 2019-01 Get the testing framework into a Jupyter notebook
  -- UmaaRebbapragada - 08 Aug 2018
Line: 93 to 80
 
META FILEATTACHMENT attachment="2018-08-07-ZTF-Team-Meeting-RB.pptx" attr="" comment="Umaa Rebbapragada's Presentation on RB at the ZTF Team Meeting, Stockholm" date="1533732627" name="2018-08-07-ZTF-Team-Meeting-RB.pptx" path="2018-08-07-ZTF-Team-Meeting-RB.pptx" size="641834" stream="2018-08-07-ZTF-Team-Meeting-RB.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT attachment="Nbad_analysis.pdf" attr="" comment="Analysis of nbad feature" date="1534290349" name="Nbad_analysis.pdf" path="Nbad analysis.pdf" size="704934" stream="Nbad analysis.pdf" user="Main.CharlotteWard" version="1"
META FILEATTACHMENT attachment="2018-08-30-t13_f5_c3.pptx" attr="" comment="Analysis of RB version: t13_f5_c3" date="1535748841" name="2018-08-30-t13_f5_c3.pptx" path="2018-08-30-t13_f5_c3.pptx" size="243456" stream="2018-08-30-t13_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
Added:
>
>
META FILEATTACHMENT attachment="2019-01-t15_f5_c3.pptx" attr="" comment="Analysis of RB version: t15_f5_c3" date="1550599262" name="2019-01-t15_f5_c3.pptx" path="2019-01-t15_f5_c3.pptx" size="269747" user="UmaaRebbapragada" version="1"
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback