Adding an additional corruption to our code requires adding a function and adding a dictionary entry to point to that function.
For meaning-altering/meaning-preserving corruptions, the function needs to be added in gather_corruptions.py, and should take in an entry (A dictionary containing a SICK dataset entry) and return 0 for no corruption found, 1 if sentence A is corrupted or 2 if sentence B is corrupted. This function then needs to be added to that file's global dictionary corruptions following this syntax:
'corr_name' : (function, 'description'),
Additionaly for meaning-preserving corruptions, the corruption name needs to be added to the global set m_p of metrics.py.
For fluency-disruption corruptions, the new function needs to be added in generate_corruptions.py, and it should take a sentence and return a corrupted sentence. The new function needs to be added in the global corruptions dictionary using the following syntax:
'corr_name':function,
If you wish to share a new corruption, submit a push request on github.
Name | Before | After |
negated subject | "A man is playing a harp" | "There is no man playing a harp" |
negated action | "A jet is flying" | "A jet is not flying" |
antonym replacement | "A dog with short hair" | "A dog with long hair" |
active to passive | "A man is cutting a potato" | "A potato is being cut by a man" |
synonymous phrases | "A dog is eating a doll" | "A dog is bighting a doll" |
determiner substitution | "A cat is eating food" | "The cat is eating food" |
double PP | "A boy walks at night" | "A boy walks at night at night" |
remove head from PP | "A man danced in costume" | "A man danced costume" |
re-order chunked phrases | "A woman is slicing garlics" | "Is slicing garlics a woman" |