Cropped 2880x1100 of library with books in background and magnifying glass on top of books in the foreground

Publications

Publishing papers in scientific journals and at research-focused conferences and workshops helps ensure that our work continues to be aligned with the state of the art in our fields

Multi-Modal News Understanding with Professionally Labelled Videos (ReutersViLNews)

Chou, Shih-Han, Matthew Kowal, Yasmin Niknam, Diana Moyano, Shayaan Mehdi, Richard Pito, Cheng Zhang, et al. “Multi-Modal News Understanding with Professionally Labelled Videos (ReutersViLNews).” In Canadian Conference on Artificial Intelligence, 2024. https://arxiv.org/abs/2401.12419

“While progress has been made in the domain of video-language understanding, current state-of-the-art algorithms are still limited in their ability to understand videos at high levels of abstraction, such as news-oriented videos. Alternatively, humans easily amalgamate information from video and language to infer information beyond what is visually observable in the pixels. An example of this is watching a news story, where the context of the event can play as big of a role in understanding the story as the event itself. Towards a solution for designing this ability in algorithms, we present a large-scale analysis on an in-house dataset collected by the Reuters News Agency, called Reuters Video-Language News (ReutersViLNews) dataset which focuses on high-level video-language understanding with an emphasis on long-form news. The ReutersViLNews Dataset consists of long-form news videos collected and labeled by news industry professionals over several years and contains prominent news reporting from around the world. Each video involves a single story and contains action shots of the actual event, interviews with people associated with the event, footage from nearby areas, and more. ReutersViLNews dataset contains videos from seven subject categories: disaster, finance, entertainment, health, politics, sports, and miscellaneous with annotations from high-level to low-level, title caption, visual video description, high-level story description, keywords, and location. We first present an analysis of the dataset statistics of ReutersViLNews compared to previous datasets. Then we benchmark state-of-the-art approaches for four different video-language tasks. The results suggest that news-oriented videos are a substantial challenge for current video-language understanding algorithms and we conclude by providing future directions in designing approaches to solve the ReutersViLNews dataset.”

Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence

Yang, Hsiu-Wei, Abhinav Agrawal, Pavlos Fragkogiannis, and Shubham Nitin Mulay. “Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence.” In Workshop on Psychology-Informed Information Access Systems (PsyIAS), 2024. https://arxiv.org/abs/2403.18183

“A well-designed document communicates not only through its words but also through its visual eloquence. Authors utilize aesthetic elements such as colors, fonts, graphics, and layouts to shape the perception of information. Thoughtful document design, informed by psychological insights, enhances both the visual appeal and the comprehension of the content. While state-of-the-art document AI models demonstrate the benefits of incorporating layout and image data, it remains unclear whether the nuances of document aesthetics are effectively captured. To bridge the gap between human cognition and AI interpretation of aesthetic elements, we formulated hypotheses concerning AI behavior in document understanding tasks, specifically anchored in document design principles. With a focus on legibility and layout quality, we tested four aspects of aesthetic effects: noise, font-size contrast, alignment, and complexity, on model confidence using correlational analysis. The results and observations highlight the value of model analysis rooted in document design theories. Our work serves as a trailhead for further studies and we advocate for continued research in this topic to deepen our understanding of how AI interprets document aesthetics.”

Evaluating Interactive Topic Models in Applied Settings

Gao, Sally, Milda Norkute, and Abhinav Agrawal. “Evaluating Interactive Topic Models in Applied Settings.” In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA’24), 2024. https://doi.org/10.1145/3613905.3637133.

“Topic modeling is a text analysis technique for automatically discovering common themes in a collection of documents. “Human-in-the-loop” topic modeling (HLTM) allows domain experts to steer and adjust the creation of topic models. In this case study, we use a custom-built HLTM interface to assess the impact of human refinement on model interpretability and predictive performance in collaboration with an analytics team within our organization. Using a small dataset (≈ 12k documents) of responses drawn from an organizational employee satisfaction survey, we compare the pre- and post-refinement models using both human judgments and automated metrics. We find that human refinement can enhance interpretability and predictive performance in some cases, but may lead to overfitting on the training data, which negatively impacts model quality. Furthermore, we observe that existing evaluation methods don’t sufficiently and clearly capture topic model quality in applied settings, and propose guidance for further HLTM tool development.”

The Right Model for the Job: An Evaluation of Legal Multi-Label Classification Baselines

Forster, Martina, Claudia Schulz, Prudhvi Nokku, Melicaalsadat Mirsafian, Jaykumar Kasundra, and Stavroula Skylaki. “The Right Model for the Job: An Evaluation of Legal Multi-Label Classification Baselines,” 2024. https://arxiv.org/abs/2401.11852.

“Multi-Label Classification (MLC) is a common task in the legal domain, where more than one label may be assigned to a legal document. A wide range of methods can be applied, ranging from traditional ML approaches to the latest Transformer-based architectures. In this work, we perform an evaluation of different MLC methods using two public legal datasets, POSTURE50K and EURLEX57K. By varying the amount of training data and the number of labels, we explore the comparative advantage offered by different approaches in relation to the dataset properties. Our findings highlight DistilRoBERTa and LegalBERT as performing consistently well in legal MLC with reasonable computational demands. T5 also demonstrates comparable performance while offering advantages as a generative model in the presence of changing label sets. Finally, we show that the CrossEncoder exhibits potential for notable macro-F1 score improvements, albeit with increased computational costs.”

ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

Frohmann, Markus, Carolin Holtermann, Shahed Masoudian, Anne Lauscher, and Navid Rekabsaz. “ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale.” In Findings of the Association for Computational Linguistics ACL 2024, edited by Lun-Wei Ku, Andre Martins, and Vivek Srikumar, 11743–76. Bangkok, Thailand and virtual meeting: Association for Computational Linguistics, 2024. https://aclanthology.org/2024.findings-acl.699.

Multi-task learning (MTL) has shown considerable practical benefits, particularly when using language models (LMs). While this is commonly achieved by learning tasks under a joint optimization procedure, some methods, such as AdapterFusion, divide the problem into two stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (e.g., adapters), and (ii) transfer, where this already learned knowledge is leveraged for a target task. This separation of concerns provides numerous benefits (e.g., promoting reusability). However, current two stage MTL introduces a substantial number of additional parameters. We address this issue by leveraging the usefulness of linearly scaling the output representations of source adapters for transfer learning. We introduce ScaLearn, a simple and highly parameter-efficient two-stage MTL method that capitalizes on the knowledge of the source tasks by learning a minimal set of scaling parameters that enable effective transfer to a target task. Our experiments on three benchmarks (GLUE, SuperGLUE, and HumSet) and two encoder LMs show that ScaLearn consistently outperforms strong baselines with a small number of transfer parameters (~0.35% of those of AdapterFusion). Remarkably, we observe that ScaLearn maintains its strong abilities even when further reducing parameters, achieving competitive results with only 8 transfer parameters per target task. Our proposed approach thus demonstrates the power of simple scaling as a promise for more efficient task transfer. Our code is available at https://github.com/CPJKU/ScaLearn.

Measuring the Groundedness of Legal Question-Answering Systems

Trautmann, Dietrich, Natalia Ostapuk, Quentin Grail, Adrian Alan Pol, Bonifazi, Shang Guglielmo Gao, and Martin Gajek. “Measuring the Groundedness of Legal Question-Answering Systems.” In Proceedings of the Natural Legal Language Processing Workshop 2024, 2024.

In high-stakes domains like legal question-answering, the accuracy and trustworthiness of generative AI systems are of paramount importance. This work presents a comprehensive benchmark of various methods to assess the groundedness of AI-generated responses, aiming to significantly enhance their reliability. Our experiments include similarity-based metrics and natural language inference models to evaluate whether responses are well-founded in the given contexts. We also explore different prompting strategies for large language models to improve the detection of ungrounded responses. We validated the effectiveness of these methods using a newly created grounding classification corpus, designed specifically for legal queries and corresponding responses from retrieval-augmented prompting, focusing on their alignment with source material. Our results indicate potential in groundedness classification of generated responses, with the best method achieving a macro-F1 score of 0.8. Additionally, we evaluated the methods in terms of their latency to determine their suitability for real-world applications, as this step typically follows the generation process. This capability is essential for processes that may trigger additional manual verification or automated response regeneration. In summary, this study demonstrates the potential of various detection methods to improve the trustworthiness of generative AI in legal settings.

LLM-Based Robust Product Classification in Commerce and Compliance

Gholamian, Sina, Gianfranco Romani, Bartosz Rudnikowicz, and Stavroula Skylaki. “LLM-Based Robust Product Classification in Commerce and Compliance.” In Proceedings of the EMNLP Workshop on Customizable NLP 2024, 2024.

Product classification is a crucial task in international trade, as compliance regulations are verified and taxes and duties are applied based on product categories. Manual classification of products is time-consuming and error-prone, and the sheer volume of products imported and exported renders the manual process infeasible. Consequently, e-commerce platforms and enterprises involved in international trade have turned to automatic product classification using machine learning. However, current approaches do not consider the real-world challenges associated with product classification, such as very abbreviated and incomplete product descriptions. In addition, recent advancements in generative Large Language Models (LLMs) and their reasoning capabilities are mainly untapped in product classification and e-commerce. In this research, we explore the real-life challenges of industrial classification and we propose data perturbations that allow for realistic data simulation. Furthermore, we employ LLM-based product classification to improve the robustness of the prediction in presence of incomplete data. Our research shows that LLMs with in-context learning outperform the supervised approaches in the clean-data scenario. Additionally, we illustrate that LLMs are significantly more robust than the supervised approaches when data attacks are present.

Towards an Automated Pointwise Evaluation Metric for Generated Long-Form Legal Summaries

Tan, Shao Min, Quentin Grail, and Lee Quartey. “Towards an Automated Pointwise Evaluation Metric for Generated Long-Form Legal Summaries.” In Proceedings of the EMNLP Workshop on Natural Legal Language Processing (NLLP) 2024. Miami, FL, USA, 2024.

Long-form abstractive summarization is a task that has particular importance in the legal domain. Automated evaluation metrics are important for the development of text generation models, but existing research on the evaluation of generated summaries has focused mainly on short summaries. We introduce an automated evaluation methodology for generated long-form legal summaries, which involves breaking each summary into individual points, comparing the points in a human-written and machine-generated summary, and calculating a recall and precision score for the latter. The method is designed to be particularly suited for the complexities of legal text and is also fully interpretable. We also created and released a small meta-dataset for the benchmarking of evaluation methods, focusing on long-form legal summarization. Our evaluation metric corresponds better with human evaluation compared to existing metrics which were not developed for legal data.

CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-Training

Brandfonbrener, David, Hanlin Zhang, Andreas Kirsch, Jonathan Richard Schwarz, and Sham Kakade. “CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-Training.” In Proceedings of Neural Information Processing Systems (NeurIPS) 2024. Vancouver, Canada, 2024.

Selecting high-quality data for pre-training is crucial in shaping the downstream task performance of language models. A major challenge lies in identifying this optimal subset, a problem generally considered intractable, thus necessitating scalable and effective heuristics. In this work, we propose a data selection method, CoLoR-Filter (Conditional Loss Reduction Filtering), which leverages an empirical Bayes-inspired approach to derive a simple and computationally efficient selection criterion based on the relative loss values of two auxiliary models.
In addition to the modeling rationale, we evaluate CoLoR-Filter empirically on two language modeling tasks: (1) selecting data from C4 for domain adaptation to evaluation on Books and (2) selecting data from C4 for a suite of downstream multiple-choice question answering tasks. We demonstrate favorable scaling both as we subselect more aggressively and using small auxiliary models to select data for large target models. As one headline result, CoLoR-Filter data selected using a pair of 150m parameter auxiliary models can train a 1.2b parameter target model to match a 1.2b parameter model trained on 25b randomly selected tokens with 25x less data for Books and 11x less data for the downstream tasks.

Composing Knowledge and Compression Interventions for Language Models

Kolbeinsson, Arinbjorn, Kyle O’Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, and Tianlong Chen. “Composing Knowledge and Compression Interventions for Language Models.” In Proceedings of CLR 2024 Workshop on Reliable and Responsible Foundation Models. Vienna, Austria, 2024.

Test-time interventions for language models aim to enhance factual accuracy, reduce harmful outputs, and improve model efficiency while avoiding excessive training costs. But existing interventions are developing independently. In practice, multiple interventions must be applied to the same model sequentially. We introduce composable interventions, a framework for studying the impacts of repeatedly intervening on the same language model. To showcase our framework, we compose interventions for two burgeoning interventions: knowledge editing and model compression. We find that compression undoes knowledge edits faster than it decays general model performance. We also find that compressing models makes them harder to edit and show that composing interventions impacts predicted logits.

Online Adaptation of Language Models with a Memory of Amortized Contexts

Tack, Jihoon, Jaehyung Kim, Eric Mitchell, Jinwoo Shin, Yee Whye Teh, and Jonathan Richard Schwarz. “Online Adaptation of Language Models with a Memory of Amortized Contexts.” In Proceedings of Neural Information Processing Systems (NeurIPS) 2024. Vancouver, Canada, 2024.

Due to the rapid generation and dissemination of information, large language models (LLMs) quickly run out of date despite enormous development costs. Due to this crucial need to keep models updated, online learning has emerged as a critical necessity when utilizing LLMs for real-world applications. However, given the ever-expanding corpus of unseen documents and the large parameter space of modern LLMs, efficient adaptation is essential. To address these challenges, we propose Memory of Amortized Contexts (MAC), an efficient and effective online adaptation framework for LLMs with strong knowledge retention. We propose an amortized feature extraction and memory-augmentation approach to compress and extract information from new documents into compact modulations stored in a memory bank. When answering questions, our model attends to and extracts relevant knowledge from this memory bank. To learn informative modulations in an efficient manner, we utilize amortization-based meta-learning, which substitutes the optimization process with a single forward pass of the encoder. Subsequently, we learn to choose from and aggregate selected documents into a single modulation by conditioning on the question, allowing us to adapt a frozen language model during test time without requiring further gradient updates. Our experiment demonstrates the superiority of MAC in multiple aspects, including online adaptation performance, time, and memory efficiency.

Unleashing the Power of Meta-Tuning for Few-Shot Generalization Through Sparse Interpolated Experts

Chen, Shengzhuang, Jihoon Tack, Yunqiao Yang, Yee Whye Teh, Jonathan Richard Schwarz, and Ying Wei. “Unleashing the Power of Meta-Tuning for Few-Shot Generalization Through Sparse Interpolated Experts.” In Proceedings of Forty-First International Conference on Machine Learning, ICML 2024. Vienna, Austria, 2024. https://arxiv.org/abs/2403.08477.

Conventional wisdom suggests parameter-efficient fine-tuning of foundation models as the state-of-the-art method for transfer learning in vision, replacing the rich literature of alternatives such as meta-learning. In trying to harness the best of both worlds, meta-tuning introduces a subsequent optimization stage of foundation models but has so far only shown limited success and crucially tends to underperform on out-of-domain (OOD) tasks. In this paper, we introduce Sparse MetA-Tuning (SMAT), a method inspired by sparse mixture-of-experts approaches and trained to isolate subsets of pre-trained parameters automatically for meta-tuning on each task. SMAT successfully overcomes OOD sensitivity and delivers on the promise of enhancing the transfer abilities of vision foundation models beyond parameter-efficient finetuning. We establish new state-of-the-art results on a challenging combination of Meta-Dataset augmented with additional OOD tasks in both zero-shot and gradient-based adaptation settings. In addition, we provide a thorough analysis of the superiority of learned over hand-designed sparsity patterns for sparse expert methods and the pivotal importance of the sparsity level in balancing between in-domain and out-of-domain generalization. Our code is publicly available.

QUARE: 2nd Workshop on Measuring the Quality of Explanations in Recommender Systems

Inel, Oana, Nicolas Mattis, Milda Norkute, Alessandro Piscopo, Timothée Schmude, Sanne Vrijenhoek, and Krisztian Balog. “QUARE: 2nd Workshop on Measuring the Quality of Explanations in Recommender Systems.” In Proceedings of the 17th ACM Conference on Recommender Systems, 1241–43. RecSys ’23. New York, NY, USA: Association for Computing Machinery, 2023. https://doi.org/10.1145/3604915.3608754.

“QUARE1—measuring the QUality of explAnations in REcommender systems—is the second workshop which focuses on evaluation methodologies for explanations in recommender systems. We bring together researchers and practitioners from academia and industry to facilitate discussions about the main issues and best practices in the respective areas, identify possible synergies, and outline priorities regarding future research directions. Additionally, we want to stimulate reflections around methods to systematically and holistically assess explanation approaches, impact, and goals, at the interplay between organisational and human values. To that end, this workshop aims to co-create a research agenda for evaluating the quality of explanations for recommender systems.”

Effects of XAI on Legal Process

Nielsen, Aileen, Stavroula Skylaki, Milda Norkute, and Alexander Stremitzer. “Effects of XAI on Legal Process.” In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law. Association for Computing Machinery, 2023.

“Despite strong scholarly interest in explainable features in AI (XAI), there is little experimental work to gauge the effect of XAI on human-AI cooperation in legal tasks. We study the effect of textual highlighting as an XAI feature used in tandem with a machine learning (ML) generated summary of a legal complaint. In a randomized controlled study we find that the XAI has no effect on the proportion of time participants devote to different sections of a legal document, but we identify potential signs of XAI’s influence on the reading process. XAI attention-based highlighting may change the spatio-temporal distribution of attention allocation, a result not anticipated by previous studies. Future work on the effect of XAI in legal tasks should measure process as well as outcomes to better gauge the effects of XAI in legal applications.”

Handwritten and Printed Text Segmentation: A Signature Case Study

Gholamian, Sina, and Ali Vahdat. “Handwritten and Printed Text Segmentation: A Signature Case Study.” In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.

“While analyzing scanned documents, handwritten text can overlap with printed text. This overlap causes difficulties during the optical character recognition (OCR) and digitization process of documents, and subsequently, hurts downstream NLP tasks. Prior research either focuses solely on the binary classification of handwritten text or performs a three-class segmentation of the document, i.e., recognition of handwritten, printed, and background pixels. This approach results in the assignment of overlapping handwritten and printed pixels to only one of the classes, and thus, they are not accounted for in the other class. Thus, in this research, we develop novel approaches to address the challenges of handwritten and printed text segmentation. Our objective is to recover text from different classes in their entirety, especially enhancing the segmentation performance on overlapping sections. To support this task, we introduce a new dataset, SignaTR6K, collected from real legal documents, as well as a new model architecture for the handwritten and printed text segmentation task. Our best configuration outperforms prior work on two different datasets by 17.9% and 7.3% on IoU scores. The SignaTR6K dataset is accessible for download via the following link: https://forms.office.com/r/2a5RDg7cAY.”

Exploring the Effectiveness of Prompt Engineering for Legal Reasoning Tasks

Yu, Fangyi, Lee Quartey, and Frank Schilder. “Exploring the Effectiveness of Prompt Engineering for Legal Reasoning Tasks.” In Findings of the Association for Computational Linguistics: ACL 2022. Toronto, Canada: Association for Computational Linguistics, 2023.

“The use of large language models (LLMs) for zero- or few-shot prompting in natural language processing has given rise to a new research area known as prompt engineering. Recent studies have demonstrated that Chain-of-Thought (CoT) prompts can lead to significant improvements in tasks such as arithmetic and common-sense reasoning. This paper explores the use of such approaches in legal reasoning tasks by conducting experiments on the COLIEE entailment task, which is based on the Japanese Bar exam. We evaluate zero-shot/few-shot and fine-tuning approaches with and without explanations, as well as various prompting strategies. Our results indicate that while CoT prompting and fine-tuning with explanations can improve performance, the best results are achieved with prompts derived from specific legal reasoning techniques, such as IRAC (Issue, Rule, Application, Conclusion). In addition, we observe that few-shot learning where the demonstrations are derived from clustering past training data consistently yields high performance on the COLIEE entailment task for both the years of the data that we tested. Through our experiments, we improve the previous best result on the 2021 COLIEE task from 0.7037 to 0.8025 and surpass the best system from 2022 with an accuracy of 0.789.”

A Theoretical Analysis of Out-of-Distribution Detection in Multi-Label Classification

Zhang, Dell, and Bilyana Taneva-Popova. “A Theoretical Analysis of Out-of-Distribution Detection in Multi-Label Classification.” In ICTIR ’23: The 2023 ACM SIGIR International Conference on the Theory of Information Retrieval, Taipei, Taiwan, July 23, 2023. ACM, 2023. https://doi.org/10.1145/3578337.3605116.

“The ability to detect out-of-distribution (OOD) inputs is essential for safely deploying machine learning models in an open world. Most existing research on OOD detection, and more generally uncertainty quantification, has focused on multi-class classification. However, for many information retrieval (IR) applications, the classification of documents or images is by nature not multi-class but multi-label. This paper presents a pure theoretical analysis of the under-explored problem of OOD detection in multi-label classification using deep neural networks. First, we examine main existing approaches such as MSP (proposed in ICLR-2017) and MaxLogit (proposed in ICML-2022), and summarize them as different combinations of label-wise scoring and aggregation functions. Some existing methods are shown to be equivalent. Then, we prove that JointEnergy (proposed in NeurIPS-2021) is indeed the optimal probabilistic solution when the class labels are conditionally independent with each other for any given data sample. This provides a more rigorous explanation for the effectiveness of JointEnergy than the original joint-likelihood interpretation, and also reveals its reliance upon the assumption of label independence rather than the exploitation of label relationships as previously thought. Finally, we discuss potential future research directions in this area.”

Unleashing the Power of Large Language Models for Legal Applications

Zhang, Dell, Alina Petrova, Dietrich Trautmann, and Frank Schilder. “Unleashing the Power of Large Language Models for Legal Applications.” In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, United Kingdom, October 21-25, 2023, edited by Ingo Frommholz, Frank Hopfgartner, Mark Lee, Michael Oakes, Mounia Lalmas, Min Zhang, and Rodrygo L. T. Santos, 5257–58. ACM, 2023. https://doi.org/10.1145/3583780.3615993.

“The use of Large Language Models (LLMs) is revolutionizing the legal industry. In this technical talk, we would like to explore the various use cases of LLMs in legal tasks, discuss the best practices, investigate the available resources, examine the ethical concerns, and suggest promising research directions.”

The 3rd International Workshop on Mining and Learning in the Legal Domain

Makrehchi, Masoud, Dell Zhang, Alina Petrova, and John Armour. “The 3rd International Workshop on Mining and Learning in the Legal Domain.” In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, United Kingdom, October 21-25, 2023, edited by Ingo Frommholz, Frank Hopfgartner, Mark Lee, Michael Oakes, Mounia Lalmas, Min Zhang, and Rodrygo L. T. Santos, 5277–80. ACM, 2023. https://doi.org/10.1145/3583780.3615308.

“The increasing accessibility of legal corpora and databases create opportunities to develop data-driven techniques and advanced tools that can facilitate a variety of tasks in the legal domain, such as legal search and research, legal document review and summary, legal contract drafting, and legal outcome prediction. Compared with other application domains, the legal domain is characterized by the huge scale of natural language text data, the high complexity of specialist knowledge, and the critical importance of ethical considerations. The MLLD workshop aims to bring together researchers and practitioners to share the latest research findings and innovative approaches in employing data mining, machine learning, information retrieval, and knowledge management techniques to transform the legal sector. Building upon the previous successes, the third edition of the MLLD workshop will emphasize the exploration of new research opportunities brought about by recent rapid advances in Large Language Models and Generative AI. We encourage submissions that intersect computer science and law, from both academia and industry, embodying the interdisciplinary spirit of CIKM.”

Adapting Open-Source LLMs for Contract Drafting and Analyzing Single-Role vs. Multi-Role Behavior in ChatGPT for Synthetic Data Generation

Kasundra, Jaykumar and Shreyans Dhankhar, “Adapting Open-Source LLMs for Contract Drafting and Analyzing Single-Role vs. Multi-Role Behavior in ChatGPT for Synthetic Data Generation,” Proceedings of the 3rd International Conference on AI-ML Systems. (Bengaluru, India), 2023, won the best industry track paper for AI-ML conference 2023.

“Large-scale language models, such as ChatGPT[3] and GPT-4, have demonstrated remarkable capabilities in generating human-like text for various applications. In this paper, we focus on two key aspects: (1) adapting open-source large language models (LLMs) for specific use cases like contract drafting using instruction tuning and parameter-efficient fine-tuning, and (2) analyzing the difference in ChatGPT’s behavior in single-role prompts compared to multi-role prompts for synthetic data generation tasks. We present a method for aligning open-source LLMs to follow instructions for customized contract drafting scenarios using parameter-efficient fine-tuning on synthetic data. Furthermore, we investigate the data quality of the synthetically generated instructions data by Chat- GPT with single-role vs. multi-role prompts. Our findings reveal that the model performs better when given single-role prompts, highlighting the importance of strategically designing prompting strategy to generate better quality data using LLMs. By combining the insights from these two aspects, we explore potential implications and opportunities for enhancing generative AI solutions for practical implementations. The Contract Drafting model and data are released.”

Long Text Classification Using Transformers with Paragraph Selection Strategies

Tuteja, Mohit, and Daniel González Juclà. “Long Text Classification Using Transformers with Paragraph Selection Strategies.” In Proceedings of the EMNLP Workshop on Natural Legal Language Processing (NLLP), 2023.

In the legal domain, we often perform classification tasks on very long documents, for example court judgements. These documents often contain thousands of words, so the length of these documents poses a challenge for this modelling task. In this research paper, we present a comprehensive evaluation of various strategies to perform long text classification using Transformers in conjunction with strategies to select document chunks using traditional NLP models. We conduct our experiments on 6 benchmark datasets comprising lengthy documents, 4 of which are publicly available. Each dataset has a median word count exceeding 1,000. Our evaluation encompasses state-of-the-art Transformer models, such as RoBERTa, Longformer, HAT, MEGA and LegalBERT and compares them with a traditional baseline TF-IDF + Neural Network (NN) model. We investigate the effectiveness of pre-training on large corpora, fine tuning strategies, and transfer learning techniques in the context of long text classification.

A Comparative Study of Prompting Strategies for Legal Text Classification

Hakimi Parizi, Ali, Yuyang Liu, Prudhvi Nokku, Sina Gholamian, and David B. Emerson. “A Comparative Study of Prompting Strategies for Legal Text Classification.” In Proceedings of the EMNLP Workshop on Natural Legal Language Processing (NLLP), 2023.

“In this study, we explore the performance of large language models (LLMs) using different prompt engineering approaches in the context of legal text classification. Prior research has demonstrated that various prompting techniques can improve the performance of a di-verse array of tasks done by LLMs. However, in this research, we observe that professional documents, and in particular legal documents, pose unique challenges for LLMs. We experiment with several LLMs and various prompting techniques, including zero/few-shot prompting, prompt ensembling, chain-of-thought, and activation fine-tuning and compare the performance on legal datasets. Although the new generation of LLMs and prompt optimization techniques have been shown to improve generation and understanding of generic tasks, our findings suggest that such improvements may not readily transfer to other domains. Specifically, experiments indicate that not all prompting approaches and models are well-suited for the legal domain which involves complexities such as long documents and domain-specific language.”

Extracting Complex Named Entities in Legal Documents via Weakly Supervised Object Detection

Yang, Leo, and Abhinav Agrawal. 2023 “Extracting Complex Named Entities in Legal Documents via Weakly Supervised Object Detection.” In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), July 23–27, 2023, Taipei, Taiwan. ACM.

“Accurate Named Entity Recognition (NER) is crucial for various information retrieval tasks in industry. However, despite significant progress in traditional NER methods, the extraction of Complex Named Entities remains a relatively unexplored area. In this paper, we propose a novel system that combines object detection for Document Layout Analysis (DLA) with weakly supervised learning to address the challenge of extracting discontinuous complex named entities in legal documents. Notably, to the best of our knowledge, this is the first work to apply weak supervision to DLA. Our experimental results show that the model trained solely on pseudo labels outperforms the supervised baseline when gold-standard data is limited, highlighting the effectiveness of our proposed approach in reducing the dependency on annotated data.”

Context-Aware Classification of Legal Document Pages

Fragkogiannis, Pavlos, Martina Forster, Grace E. Lee, and Dell Zhang. 2023. “Context-Aware Classification of Legal Document Pages.” In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), July 23–27, 2023, Taipei, Taiwan. ACM, 2023. https://dl.acm.org/doi/10.1145/3539618.3591839

“For many business applications that require the processing, indexing, and retrieval of professional documents such as legal briefs (in PDF format etc.), it is often essential to classify the pages of any given document into their corresponding types beforehand.

Most existing studies in the field of document image classification either focus on single-page documents or treat multiple pages in a document independently. Although in recent years a few techniques have been proposed to exploit the context information from neighbouring pages to enhance document page classification, they typically cannot be utilized with large pre-trained language models due to the constraint on input length. In this paper, we present a simple but effective approach that overcomes the above limitation. Specifically, we enhance the input with extra tokens carrying sequential information about previous pages - introducing recurrence - which enables the usage of pre-trained Transformer models like BERT for context-aware page classification. Our experiments conducted on two legal datasets in English and Portuguese respectively show that the proposed approach can significantly improve the performance of document page classification compared to the non-recurrent setup as well as the other context-aware baselines.”

Uncertainty Quantification for Text Classification

Zhang, Dell, Murat Sensoy, Masoud Makrehchi, and Bilyana Taneva-Popova. 2023
“Uncertainty Quantification for Text Classification.” In Advances in Information Retrieval - 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2-6, 2023, Proceedings, Part III, edited by Jaap Kamps, Lorraine Goeuriot, Fabio Crestani, Maria Maistro, Hideo Joho, Brian Davis, Cathal Gurrin, Udo Kruschwitz, and Annalina Caputo, 13982:362–69. Lecture Notes in Computer Science. Springer, 2023. https://doi.org/10.1007/978-3-031-28241-6_38

“This half-day tutorial introduces modern techniques for practical uncertainty quantification specifically in the context of multi-class and multi-label text classification. First, we explain the usefulness of estimating aleatoric uncertainty and epistemic uncertainty for text classification models. Then, we describe several state-of-the-art approaches to uncertainty quantification and analyze their scalability to big text data: Virtual Ensemble in GBDT, Bayesian Deep Learning (including Deep Ensemble, Monte-Carlo Dropout, Bayes by Backprop, and their generalization Epistemic Neural Networks), as well as Evidential Deep Learning (including Prior Networks and Posterior Networks). Next, we discuss typical application scenarios of uncertainty quantification in text classification (including in-domain calibration, cross-domain robustness, and novel class detection). Finally, we list popular performance metrics for the evaluation of uncertainty quantification effectiveness in text classification. Practical hands-on examples/exercises are provided to the attendees for them to experiment with different uncertainty quantification methods on a few real-world text classification datasets such as CLINC150.”

Making a Computational Attorney

Zhang, Dell, Frank Schilder, Jack G. Conrad, Masoud Makrehchi, David von Rickenbach, and Isabelle Moulinier. 2023. “Making a Computational Attorney”. In Proceedings of the 2023 SIAM International Conference on Data Mining, SDM 2023, Minneapolis, MN, USA, April 27-29, 2023. SIAM, 2023. https://epubs.siam.org/doi/abs/10.1137/1.9781611977653.ch111

“This "blue sky idea" paper outlines the opportunities and challenges in data mining and machine learning involving making a computational attorney -- an intelligent software agent capable of helping human lawyers with a wide range of complex high-level legal tasks such as drafting legal briefs for the prosecution or defense in court. In particular, we discuss what a ChatGPT-like Large Legal Language Model (L3M) can and cannot do today, which will inspire researchers with promising short-term and long-term research objectives.”

Enhanced Discrete Multi-Modal Hashing: More Constraints Yet Less Time to Learn

Chen, Yong, Hui Zhang, Zhibao Tian, Jun Wang, Dell Zhang, and Xuelong Li. 2023. “Enhanced Discrete Multi-Modal Hashing: More Constraints Yet Less Time to Learn.” In 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023. IEEE, 2023. https://ieeexplore.ieee.org/document/10184835

“This paper proposes a novel method, Enhanced Discrete Multi-modal Hashing (EDMH), which learns binary codes and hash functions simultaneously from the pairwise similarity matrix of data for large-scale cross-view retrieval. EDMH distinguishes itself from existing methods by considering not just the binarization constraint but also the balance and decorrelation constraints. Although those additional discrete constraints make the optimization problem of EDMH look a lot more complicated, we are actually able to develop a fast iterative learning algorithm in the alternating optimization framework for it, as after introducing a couple of auxiliary variables each subproblem of optimization turns out to have closed-form solutions. It has been confirmed by extensive experiments that EDMH can consistently deliver better retrieval performances than state-of-the-art MH methods at lower computational costs.”

A Framework for Monitoring and Retraining Language Models in Real-World Applications

Kasundra, Jaykumar, Claudia Schulz, Melicaalsadat Mirsafian, and Stavroula Skylaki. “A Framework for Monitoring and Retraining Language Models in Real-World Applications,” 2023. https://arxiv.org/abs/2311.09930.

In the Machine Learning (ML) model development lifecycle, training candidate models using an offline holdout dataset and identifying the best model for the given task is only the first step. After the deployment of the selected model, continuous model monitoring and model retraining is required in many real-world applications. There are multiple reasons for retraining, including data or concept drift, which may be reflected on the model performance as monitored by an appropriate metric. Another motivation for retraining is the acquisition of increasing amounts of data over time, which may be used to retrain and improve the model performance even in the absence of drifts. We examine the impact of various retraining decision points on crucial factors, such as model performance and resource utilization, in the context of Multilabel Classification models. We explain our key decision points and propose a reference framework for designing an effective model retraining strategy.

An Analysis of Negation in Natural Language Understanding Corpora

Hossain, Md Mosharaf, Dhivya Chinnappa, and Eduardo Blanco. 2022. “An Analysis of Negation in Natural Language Understanding Corpora.” In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics. https://arxiv.org/abs/2203.08929

This paper analyzes negation in eight popular corpora spanning six natural language understanding tasks. We show that these corpora have few negations compared to general-purpose English, and that the few negations in them are often unimportant. Indeed, one can often ignore negations and still make the right predictions. Additionally, experimental results show that state-of-the-art transformers trained with these corpora obtain substantially worse results with instances that contain negation, especially if the negations are important. We conclude that new corpora accounting for negation are needed to solve natural language understanding tasks when negation is present.

Gender and Racial Stereotype Detection in Legal Opinion Word Embeddings

Matthews, Sean, John Hudzina, and Dawn Sepehr. 2022. “Gender and Racial Stereotype Detection in Legal Opinion Word Embeddings.” In Proceedings of Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22). Vancouver, Canada: AAAI. https://arxiv.org/abs/2203.13369

Studies have shown that some Natural Language Processing (NLP) systems encode and replicate harmful biases with potential adverse ethical effects in our society. In this article, we propose an approach for identifying gender and racial stereotypes in word embeddings trained on judicial opinions from U.S. case law. Embeddings containing stereotype information may cause harm when used by downstream systems for classification, information extraction, question answering, or other machine learning systems used to build legal research tools. We first explain how previously proposed methods for identifying these biases are not well suited for use with word embeddings trained on legal opinion text. We then propose a domain adapted method for identifying gender and racial biases in the legal domain. Our analyses using these methods suggest that racial and gender biases are encoded into word embeddings trained on legal opinions. These biases are not mitigated by exclusion of historical data, and appear across multiple large topical areas of the law. Implications for downstream systems that use legal opinion word embeddings and suggestions for potential mitigation strategies based on our observations are also discussed.

BudgetLongformer: Can we Cheaply Pretrain a SOTA Legal Language Model From Scratch?

Niklaus, Joel, and Daniele Giofré. 2022. “BudgetLongformer: Can we Cheaply Pretrain a SOTA Legal Language Model From Scratch?” In Proceedings of the 2nd workshop on Efficient Natural Language and Speech Processing (ENLSP).

Pretrained transformer models have achieved state-of-the-art results in many tasks and bench-marks recently. Many state-of-the-art Language Models (LMs), however, do not scale well above the threshold of 512 input tokens. In specialized domains though (such as legal, scientific or biomedical), models often need to process very long text (sometimes well above 10000 tokens). Even though many efficient transformers have been proposed (such as Longformer, BigBird or FNet), so far, only very few such efficient models are available for specialized domains. Additionally, since the pretraining process is extremely costly in general – but even more so as the sequence length increases – it is often only in reach of large research labs. One way of making pretraining cheaper is the Replaced Token Detection (RTD) task, by providing more signal during training, since the loss can be computed over all tokens. In this work, we train Longformer models with the efficient RTD task on legal data to showcase that pretraining efficient LMs is possible using much less compute. We evaluate the trained models on challenging summarization tasks requiring the model to summarize long texts to show to what extent the models can achieve good performance on downstream tasks. We find that both the small and base models outperform their baselines on the in-domain BillSum and out-of-domain PubMed tasks in their respective parameter range. We publish our code and models for research purposes.

Thirty Years of Artificial Intelligence and Law: The Second Decade

Sartor, Giovanni, Michał Araszkiewicz, Katie Atkinson, Floris Bex, Tom van Engers, Enrico Francesconi, Henry Prakken, et al. 2022. “Thirty Years of Artificial Intelligence and Law: The Second Decade.” Artificial Intelligence and Law, August. https://doi.org/10.1007/s10506-022-09326-7

The first issue of Artificial Intelligence and Law journal was published in 1992. This paper provides commentaries on nine significant papers drawn from the Journal’s second decade. Four of the papers relate to reasoning with legal cases, introducing contextual considerations, predicting outcomes on the basis of natural language descriptions of the cases, comparing different ways of representing cases, and formalising precedential reasoning. One introduces a method of analysing arguments that was to become very widely used in AI and Law, namely argumentation schemes. Two relate to ontologies for the representation of legal concepts and two take advantage of the increasing availability of legal corpora in this decade, to automate document summarisation and for the mining of arguments.

Human in the Loop Information Extraction Increases Efficiency and Trust

Schleith, Johannes, Hella Hoffmann, Milda Norkute, and Brian Cechmanek. 2022. “Human in the Loop Information Extraction Increases Efficiency and Trust.” In Mensch Und Computer 2022 - Workshopband, edited by Karola Marky, Uwe Grünefeld, and Thomas Kosch. Bonn: Gesellschaft für Informatik e.V. https://doi.org/10.18420/muc2022-mci-ws12-249

Automation is often focused on data-centred measures of success, such as accuracy of the automation or efficiency gain of individual automated steps. This case study shows how a human-assisted information extraction system, that keeps the human in the loop throughout the creation of information extraction rules and their application, can outperform less transparent information extraction systems in terms of overall end-to-end time-on-task and perceived trust. We argue that the time gained through automation can be wiped out by the perceived need of end users to review and comprehend results, where the systems seem obscure to them.

Cognitive Strategies Prompts: Creativity Triggers for Human Centered AI Opportunity Detection

Schleith, Johannes, Milda Norkute, Mary Mikhail, and Daniella Tsar. 2022. “Cognitive Strategies Prompts: Creativity Triggers for Human Centered AI Opportunity Detection.” In Creativity and Cognition (C&C ’22). Venice, Italy. https://doi.org/10.1145/3527927.3532808

Creative problem solving and innovation powered by Artificial Intelligence (AI) requires detection of user needs that can be reframed into data science problems. We propose a framework of 10 creativity triggers for creative human centered AI opportunity detection, based on research and categorization of information retrieval tasks and cognitive task analysis. The method aims to facilitate a dialog between data scientists and underrepresented groups such as non-technical domain experts.

Impact on problem discovery and idea generation was evaluated in co-creation workshops. Results show that the method significantly increases ideas’ scores on the appropriateness to a specific problem and their AI relevancy. Participants experienced the prompts as a helpful mental framework about AI methods and felt encouraged to decompose user stories into more detailed cognitive tasks that help data scientists relate ideas to high level data science methods.

Triple Diamond Design Process: Human-Centered Design for Data-Driven Innovation

Schleith, Johannes, and Daniella Tsar. 2022. “Triple Diamond Design Process: Human-Centered Design for Data-Driven Innovation.” In HCI International 2022 - Late Breaking Papers. Design, User Experience and Interaction: 24th International Conference on Human-Computer Interaction, HCII 2022, Virtual Event, June 26 -- July 1, 2022, Proceedings, 136–46. Berlin, Heidelberg: Springer-Verlag. https://doi.org/10.1007/978-3-031-17615-9_9

Innovation is a team sport that requires interdisciplinary collaboration. This study discusses how design thinking methods can be adapted to support such collaborative AI innovation and Human centred AI (HCAI). We propose an enhancement to the traditional double diamond framework, by adding a notion of “data discovery” alongside problem discovery. Further we propose the use of “data user stories” to not only communicate user tasks and user goals, but also document input and output data of a given process.

On the Effectiveness of Pre-Trained Language Models for Legal Natural Language Processing: An Empirical Study

Song, Dezhao, Sally Gao, Baosheng He, and Frank Schilder. 2022. “On the Effectiveness of Pre-Trained Language Models for Legal Natural Language Processing: An Empirical Study.” IEEE Access 10: 75835–58. https://doi.org/10.1109/ACCESS.2022.3190408

We present the first comprehensive empirical evaluation of pre-trained language models (PLMs) for legal natural language processing (NLP) in order to examine their effectiveness in this domain. Our study covers eight representative and challenging legal datasets, ranging from 900 to 57K samples, across five NLP tasks: binary classification, multi-label classification, multiple choice question answering, summarization and information retrieval. We first run unsupervised, classical machine learning and/or non-PLM based deep learning methods on these datasets, and show that baseline systems’ performance can be 4%~35% lower than that of PLM-based methods. Next, we compare general-domain PLMs and those specifically pre-trained for the legal domain, and find that domain-specific PLMs demonstrate 1%~5% higher performance than general-domain models, but only when the datasets are extremely close to the pre-training corpora. Finally, we evaluate six general-domain state-of-the-art systems, and show that they have limited generalizability to legal data, with performance gains from 0.1% to 1.2% over other PLM-based methods. Our experiments suggest that both general-domain and domain-specific PLM-based methods generally achieve better results than simpler methods on most tasks, with the exception of the retrieval task, where the best-performing baseline outperformed all PLM-based methods by at least 5%. Our findings can help legal NLP practitioners choose the appropriate methods for different tasks, and also shed light on potential future directions for legal NLP research.

Multi-Label Legal Document Classification: A Deep Learning-Based Approach with Label-Attention and Domain-Specific Pre-Training

Song, Dezhao, Andrew Vold, Kanika Madan, and Frank Schilder. 2022. “Multi-Label Legal Document Classification: A Deep Learning-Based Approach with Label-Attention and Domain-Specific Pre-Training.” Information Systems 106: 101718. https://doi.org/10.1016/j.is.2021.101718

Multi-label document classification has a broad range of applicability to various practical problems, such as news article topic tagging, sentiment analysis, medical code classification, etc. A variety of approaches (e.g., tree-based methods, neural networks and deep learning systems that are specifically based on pre-trained language models) have been developed for multi-label document classification problems and have achieved satisfying performance on different datasets. In the legal domain, however, one is often faced with several key challenges when working with multi-label classification tasks. One critical challenge is the lack of high-quality human labeled datasets, which prevents researchers and practitioners from achieving decent performance on respective tasks. Also, existing methods on multi-label classification typically focus on the majority classes, which results in an unsatisfying performance for other important classes that do not have sufficient training samples. In order to tackle the above challenges, in this paper, we first present POSTURE50K, a novel legal extreme multi-label classification dataset, which we will release to the research community. The dataset contains 50,000 legal opinions and their manually labeled legal procedural postures. Labels in this dataset follow a Zipfian distribution, leaving many of the classes with only a few samples. Furthermore, we propose a deep learning architecture that adopts domain-specific pre-training and a label-attention mechanism for multi-label document classification. We evaluate our proposed architecture on POSTURE50K and another legal multi-label dataset EUROLEX57K, and show that our approach achieves better performances than two baseline systems and another four recent state-of-the-art methods on both datasets.

Thirty Years of Artificial Intelligence and Law: The Third Decade

Serena Villata, Michal Araszkiewicz, Kevin Ashley, Trevor Bench-Capon, L. Karl Branting, Jack G. Conrad, and Adam Wyner. 2022. “Thirty Years of Artificial Intelligence and Law: The Third Decade.” Artificial Intelligence and Law, August. https://doi.org/10.1007/s10506-022-09327-6

The first issue of Artificial Intelligence and Law journal was published in 1992. This paper offers some commentaries on papers drawn from the Journal’s third decade. They indicate a major shift within Artificial Intelligence, both generally and in AI and Law: away from symbolic techniques to those based on Machine Learning approaches, especially those based on Natural Language texts rather than feature sets. Eight papers are discussed: two concern the management and use of documents available on the World Wide Web, and six apply machine learning techniques to a variety of legal applications.

Legal Prompting: Teaching a Language Model to Think Like a Lawyer

Yu, Fangyi, Lee Quartey, and Frank Schilder. 2022. “Legal Prompting: Teaching a Language Model to Think Like a Lawyer.” In Proceedings of the Natural Legal Language Processing Workshop (NLLP).

Large language models that are capable of zero or few-shot prompting approaches have given rise to the new research area of prompt engineering. Recent advances showed that for example Chain-of-Thought (CoT) prompts can improve arithmetic or common sense tasks significantly. We explore how such approaches fare with legal reasoning tasks and take the COLIEE entailment task based on the Japanese Bar exam for testing zero-shot/few-shot and fine-tuning approaches. Our findings show that while CoT prompting and fine-tuning with explanations approaches show improvements, the best results are produced by prompts that are derived from specific legal reasoning techniques such as IRAC (Issue, Rule, Application, Conclusion). Based on our experiments we improve the 2021 best result from 0.7037 accuracy to 0.8148 accuracy, and beat the 2022 best system of 0.6789 accuracy with an accuracy of 0.7431.

Learning Trustworthy Web Sources to Derive Correct Answers and Reduce Health Misinformation in Search

Zhang, Dake, Amir Vakili Tahami, Mustafa Abualsaud, and Mark D. Smucker. 2022. “Learning Trustworthy Web Sources to Derive Correct Answers and Reduce Health Misinformation in Search.” In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2099–2104. SIGIR ’22. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3477495.3531812

When searching the web for answers to health questions, people can make incorrect decisions that have a negative effect on their lives if the search results contain misinformation. To reduce health misinformation in search results, we need to be able to detect documents with correct answers and promote them over documents containing misinformation. Determining the correct answer has been a difficult hurdle to overcome for participants in the TREC Health Misinformation Track. In the 2021 track, automatic runs were not allowed to use the known answer to a topic's health question, and as a result, the top automatic run had a compatibility-difference score of 0.043 while the top manual run, which used the known answer, had a score of 0.259. The compatibility-difference measures the ability of methods to rank correct and credible documents before incorrect and non-credible documents. By using an existing set of health questions and their known answers, we show it is possible to learn which web hosts are trustworthy, from which we can predict the correct answers to the 2021 health questions with an accuracy of 76%. Using our predicted answers, we can promote documents that we predict contain this answer and achieve a compatibility-difference score of 0.129, which is a three-fold increase in performance over the best previous automatic method.

Leveraging Narrative to Generate Movie Script

Zhu, Yutao, Ruihua Song, Jian-Yun Nie, Pan Du, Zhicheng Dou, and Jin Zhou. “Leveraging Narrative to Generate Movie Script.” ACM Transactions on Information and System Security 86, 40, no. 4 (March 9, 2022): 1–32. https://doi.org/10.1145/3507356

Generating a text based on a predefined guideline is an interesting but challenging problem. A series of studies have been carried out in recent years. In dialogue systems, researchers have explored driving a dialogue based on a plan, while in story generation, a storyline has also been proved to be useful. In this article, we address a new task—generating movie scripts based on a predefined narrative. As an early exploration, we study this problem in a “retrieval-based” setting. We propose a model (ScriptWriter-CPre) to select the best response (i.e., next script line) among the candidates that fit the context (i.e., previous script lines) as well as the given narrative. Our model can keep track of what in the narrative has been said and what is to be said. Besides, it can also predict which part of the narrative should be paid more attention to when selecting the next line of script. In our study, we find the narrative plays a different role than the context. Therefore, different mechanisms are designed for deal with them. Due to the unavailability of data for this new application, we construct a new large-scale data collection GraphMovie from a movie website where end-users can upload their narratives freely when watching a movie. This new dataset is made available publicly to facilitate other studies in text generation under the guideline. Experimental results on the dataset show that our proposed approach based on narratives significantly outperforms the baselines that simply use the narrative as a kind of context.

Learning to Classify Relations Between Entities from Noisy Data – A Meta Instance Reweighting Approach

Li, Zhenzhen, Jian-Yun Nie, Yiping Song, Pan Du, and Dongsheng Li. “Learning to Classify Relations between Entities from Noisy Data - A Meta Instance Reweighting Approach.” Expert Systems with Applications 202 (September 15, 2022): 117113. https://doi.org/10.1016/j.eswa.2022.117113

Relation classification between entities is a fundamental problem in knowledge extraction. It aims at determining if a semantic relation holds between a pair of entities based on textual descriptions. In general, the training data for each relation is limited. Distant supervision has thus been widely used to generate abundant weakly labeled data for the task. These data are noisy, containing many errors. A crucial problem is to select reliable instances for training or weigh them adequately. However, most existing approaches rely solely on statistics among the noisy data to generate classification features. They could be easily trapped by frequent wrong instances. To deal with this problem, we use a small set of manually annotated samples as reference data to guide the selection/weighting process. In this paper, we propose a new meta instance reweighting framework, which automatically adjusts the instance weights under the guidance of the reference data. The principle is that the classifier trained with the weighted data should perform well on the reference data. To further improve the approach, we propose to augment the limited reference data with a set of highly reliable instances — elite instances, selected from noisy data. The selection of elite instances is based on both their labeling quality and diversity so as to maximize the coverage of diverse expressions by the reference set. Extensive experiments on two public datasets by applying our method to two base neural models demonstrate the effectiveness of our method: It outperforms previous state-of-the-art approaches. We also show that the use of both reference data and elite instances is beneficial. The proposed approach is model-agnostic: it can be used on other classification models. It can also be applied to other classification tasks under distant supervision.

2021

Active Curriculum Learning

Borna Jafarpour, Dawn Sepehr, Nicolai Pogrebnyakov. 2021. “Active Curriculum Learning.” In Proceedings of the First Workshop on Interactive Learning for Natural Language Processing, ACL 2021.

This paper investigates and reveals the relationship between two closely related machine learning disciplines, namely Active Learning (AL) and Curriculum Learning (CL), from the lens of several novel curricula. This paper also introduces Active Curriculum Learning (ACL) which improves AL by combining AL with CL to benefit from the dynamic nature of the AL informativeness concept as well as the human insights used in the design of the curriculum heuristics. Comparison of the performance of ACL and AL on two public datasets for the Named Entity Recognition (NER) task shows the effectiveness of combining AL and CL using our proposed framework.

Noise Over Fear of Missing Out

Schleith, Johannes, Nina Hristozova, Brian Chechmanek, Carolyn Bussey, and Leszek Michalak. 2021. “Noise over Fear of Missing Out.” In Mensch Und Computer 2021 - Workshopband, edited by Carolin Wienrich, Philipp Wintersberger, and Benjamin Weyers. Bonn: Gesellschaft für Informatik e.V. https://doi.org/10.18420/muc2021-mci-ws02-290.

Natural language processing (NLP) techniques for information extraction commonly face the challenge to extract either ‘too much’ or ‘too little’ information from text. Extracting ‘too much’ means that a lot of the relevant information is captured, but also a lot of irrelevant information or ‘Noise’ is extracted. This usually results in high ‘Recall’, but lower ‘Precision’. Extracting ‘too little’ means that all of the information that is extracted is relevant, but not everything that is relevant is extracted – it is ‘missing’ information. This usually results in high ‘Precision’ and lower ‘Recall’. In this paper we present an approach combining quantitative and qualitative measures in order to evaluate the end-users’ experience with information extraction systems in addition to standard statistical metrics and interpret a preference for the above challenge. The method is applied in a case study of legal document review. Results from the case study suggest that legal professionals prefer seeing ‘too much’ over ‘too little’ when working on an AI-assisted legal document review tasks. Discussion of these results position the involvement of User Experience (UX) as a fundamental ingredient to NLP system design and evaluation.

Predicting the Success of Domain Adaptation in Text Similarity

Pogrebnyakov, Nicolai, and Shohreh Shaghaghian. 2021. “Predicting the Success of Domain Adaptation in Text Similarity.” In Proceedings of The 6th Workshop on Representation Learning for NLP, ACL 2021.

Transfer learning methods, and in particular domain adaptation, help exploit labeled data in one domain to improve the performance of a certain task in another domain. However, it is still not clear what factors affect the success of domain adaptation. This paper models adaptation success and selection of the most suitable source domains among several candidates in text similarity. We use descriptive domain information and cross-domain similarity metrics as predictive features. While mostly positive, the results also point to some domains where adaptation success was difficult to predict.

Understanding Dataset Shift and Potential Remedies

Pesaranghader, Ali, Andrew Alberts Scherer, George Sanchez, and Saeed Pouryazdian. 2021. “Understanding Dataset Shift and Potential Remedies.” Vector Institute.

Machine learning (ML) systems are trained under the premise that training data and real-world data will have similar distribution patterns. However, in dynamic industries and changing circum- stances, new data distribution patterns can emerge that differ significantly from the historical pat- terns used for training-so much so that they have a major impact on the reliability of predictions. This difference between training and test data is known as dataset shift, and, when severe enough, necessitates adaptation. This adaptation can be accomplished either through cumbersome and expensive model retraining or leaner and more focused dataset shift adaptation techniques.

TweetDrought: A Deep-Learning Drought Impacts Recognizer Based on Twitter Data

Zhang, Beichen, Frank Schilder, Kelly Smith, Michael Hayes, Sherri Harms, and Tsegaye Tadesse. 2021. “TweetDrought: A Deep-Learning Drought Impacts Recognizer Based on Twitter Data.” In Proceedings of the ICML 2021 Workshop on Tackling Climate Change with Machine Learning.

Acquiring a better understanding of drought impacts becomes increasingly vital under a warming climate. Traditional drought indices describe mainly biophysical variables and not impacts on social, economic, and environmental systems. We utilized natural language processing and bidirectional encoder representation from Transformers (BERT) based transfer learning to fine-tune the model on the data from the news-based Drought Impact Report (DIR) and then apply it to recognize seven types of drought impacts based on the filtered Twitter data from the United States. Our model achieved a satisfying macro-F1 score of 0.89 on the DIR test set. The model was then applied to California tweets and validated with keyword-based labels. The macro-F1 score was 0.58. However, due to the limitation of keywords, we also spot-checked tweets with controversial labels. 83.5% of BERT labels were correct compared to the keyword labels. Overall, the fine-tuned BERT-based recognizer provided proper predictions and valuable information on drought impacts. The interpretation and analysis of the model were consistent with experiential domain expertise.

Using Transformers to Improve Answer Retrieval for Legal Questions

Vold, Andrew, and Jack G. Conrad. 2021. “Using Transformers to Improve Answer Retrieval for Legal Questions.” In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, 1–5. ICAIL’21. Association for Computing Machinery.

Transformer architectures such as BERT, XLNet, and others are frequently used in the field of natural language processing. Transformers have achieved state-of-the-art performance in tasks such as text classification, passage summarization, machine translation, and question answering. Efficient hosting of transformer models, however, is a difficult task because of their large size and high latency. In this work, we describe how we deploy a RoBERTa Base question answer classification model in a production environment. We also compare the answer retrieval performance of a RoBERTa Base classifier against a traditional machine learning model in the legal domain by measuring the performance difference between a trained linear SVM on the publicly available PRIVACYQA dataset. We show that RoBERTa achieves a 31% improvement in F1-score and a 41% improvement in Mean Reciprocal Rank over the traditional SVM.

Multilingual Hope Speech Detection for Code-Mixed and Transliterated Texts

Chinnappa, Dhivya. 2021. “Dhivya-Hope-Detection@LT-EDI-EACL2021: Multilingual Hope Speech Detection for Code-Mixed and Transliterated Texts.” In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion, 73–78. Kyiv: Association for Computational Linguistics.

In recent years, several systems have been developed to regulate the spread of negativity and eliminate aggressive, offensive or abusive content from the online platforms. Nevertheless, a limited number of researches carried out to identify positive, encouraging and supportive contents. In this work, our goal is to identify whether a social media post/comment contains hope speech or not.

Extracting Possessions from Text: Experiments and Error Analysis

Chinnappa, Dhivya, and Eduardo Blanco. 2021. “Extracting Possessions from Text: Experiments and Error Analysis.” Natural Language Engineering, 1–22. https://doi.org/10.1017/S1351324921000012.

This paper presents a corpus and experiments to mine possession relations from text. Specifically, we target alienable and control possessions and assign temporal anchors indicating when a possession relation holds between the possessor and possesses.

Tamil lyrics corpus: Analysis and experiments

Chinnappa, Dhivya, and Praveenraj Dhandapani. 2021. “Tamil Lyrics Corpus: Analysis and Experiments.” In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, 1–9. Kyiv: Association for Computational Linguistics.

In this paper, we present a new Tamil lyrics corpus extracted from Tamil movies captured across a range of 65 years (1954 to 2019). We present a detailed corpus analysis showing the nature of Tamil lyrics with respect to lyricists and the year which it was written. We also present similar-ity score across different lyricists based on their song lyrics. We present experimental results based on the SOTA BERT Tamil models to identify the lyricists of a song. Finally, we present future research directions encouraging researchers to pursue Tamil NLP research.

The Role of Explanations of AI Systems: Beyond Trust and Helping to Form Mental Models

Norkute, Milda. 2021. “The Role of Explanations of AI Systems: Beyond Trust and Helping to Form Mental Models.” In Mensch Und Computer 2021 - Workshopband, edited by Carolin Wienrich, Philipp Wintersberger, and Benjamin Weyers. Bonn: Gesellschaft für Informatik e.V. https://doi.org/10.18420/muc2021-mci-ws02-387.

This paper discusses research that explored different roles for explanations of AI systems. A lot of the research focuses on investigating the role of explanations in mediating the level of users’ trust in the AI system and helping them form correct mental models about it. This paper argues that more research should be dedicated to investigate the alternative roles that explanations could play in supporting the user’s interactions with AI systems such as helping them enrich the AI suggestions they are presented with or correct them, help users do tasks more efficiently and effectively.

Towards Explainable AI: Assessing the Usefulness and Impact of Added Explainability Features in Legal Document Summarization

Norkute, Milda, Nadja Herger, Leszek Michalak, Andrew Mulder, and Sally Gao. 2021. “Towards Explainable AI: Assessing the Usefulness and Impact of Added Explainability Features in Legal Document Summarization.” In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. CHI EA ’21. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3411763.3443441.

This study tested two different approaches for adding an explainability feature to the implementation of a legal text summarization solution based on a Deep Learning (DL) model. Both approaches aimed to show the reviewers where the summary originated from by highlighting portions of the source text document. The participants had to review summaries generated by the DL model with two different types of text highlights and with no highlights at all.

A Pentapus Grapples with Legal Reasoning

Schilder, Frank, Dhivya Chinnappa, Kanika Madan, Jinane Harmouche, Andrew Vold, Hiroko Bretz, and John Hudzina. 2021. “A Pentapus Grapples with Legal Reasoning.” In Proceedings of the Eigth International Competition on Legal Information Extraction/Entailment (COLIEE 2021), 60–68.

This paper describes the techniques we followed for the various tasks we participated in for COLIEE-2021 competition. There were five tasks related to legal retrieval and entailment challenges for Canadian case law (in English) and Japanese Civil code. We explain the methodology we followed for each task presenting validation results. We use a variety of techniques ranging form simple metrics such as TF-IDF word overlap to the state-of-the-art embeddings models such as BERT or GPT-3.

2020

Information Extraction & Entailment of Common Law & Civil Code

Hudzina, J., Madan, K., Chinnappa, D., Harmouche, J., Bretz, H., Vold, A., and Schilder, F. (2021). Information Extraction & Entailment of Common Law & Civil Code. In New Frontiers in Artificial Intelligence, pages 162–175. Springer International Publishing.

With the recent advancements in machine learning models, we have seen improvements in Natural Language Inference (NLI) tasks, but legal entailment has been challenging, particularly for supervised approaches.

In this paper, we evaluate different approaches on handling entailment tasks for small domain-specific data sets provided in the Competition on Legal Information Extraction/Entailment (COLIEE). This year COLIEE had four tasks, which focused on legal information processing and finding textual entailment on legal data. We participated in all the four tasks this year, and evaluated different kinds of approaches, including classification, ranking, and transfer learning approaches against the entailment tasks. In some of the tasks, we achieved competitive results when compared to simpler rule-based approaches, which so far have dominated the competition for the last six years.

https://www.springerprofessional.de/en/information-extraction-entailment-of-common-law-and-civil-code/19300830

Quick Check: A Legal Research Recommendation System

Merine Thomas, Thomas Vacek, Xin Shuai, Wenhui Liao, George Sanchez, Paras Sethia, Don Teo, Kanika Madan, and Tonya Custis. Quick Check: A Legal Research Recommendation System. Proceedings of the 2020 Natural Legal Language Processing (NLLP) Workshop, 2020.

Finding relevant sources of law that discuss a specific legal issue and support a favorable decision is an onerous and time-consuming task for litigation attorneys. In this paper, we present Quick Check, a system that extracts the legal arguments from a user’s brief and recommends highly relevant case law opinions. Using a combi- nation of full-text search, citation network analysis, clickstream analysis, and a hierarchy of ranking models trained on a set of over 10K annotations, the system is able to effectively recommend cases that are similar in both legal issue and facts. Importantly, the system leverages a detailed legal taxonomy and an extensive body of editorial summaries of case law. We demonstrate how recommended cases from the system are surfaced through a user interface that enables a legal researcher to quickly determine the applicability of a case with respect to a given legal issue.
http://ceur-ws.org/Vol-2645/short3.pdf

Regularizing Pattern Recognition with Conditional Probability Estimates

Thomas Vacek. Regularizing Pattern Recognition with Conditional Probability Estimates. Proceedings of the 2020 International Joint Conference on Neural Networks, 2020.

Recent contributions in non-parametric statistical pattern recognition have investigated augmenting the task with information about the conditional probability distribution P(Y|X) away from the 0.5 level set, i.e. the decision boundary. Many hypothesis spaces satisfy generous smoothness criteria, so the behavior of a function away from the decision boundary can serve as a regularizer for its behavior at the decision boundary. This paper proposes a paradigm to capture observable information about the conditional distribution and describe a learning formulation that can take advantage of it. Finally, it investigates why conditional probability can be an effective regularizer for inseparable pattern recognition problems.
https://ieeexplore.ieee.org/abstract/document/9207004

Beyond Possession Existence: Duration and Co-possession

Dhivya Chinnappa, Srikala Murugan, and Eduardo Blanco. Beyond Possession Existence: Duration and Co-possession. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) International Joint, 2020.

This paper introduces two tasks: determining (a) the duration of possession relations and (b) co-possessions, i.e., whether multiple possessors possess a possessee at the same time. We present new annotations on top of corpora annotating possession existence and experimental results. Regarding possession duration, we derive the time spans we work with empirically from annotations indicating lower and upper bounds. Regarding co-possessions, we use a binary label. Cohen’s kappa coefficients indicate substantial agreement, and experimental results show that text is more useful than the image for solving these tasks.
https://www.aclweb.org/anthology/2020.acl-main.739/

WikiPossessions: Possession timeline generation as an evaluation benchmark for machine reading comprehension of long texts

Alexis Palmer Dhivya Chinnappa and Eduardo Blanco. WikiPossessions: Possession timeline generation as an evaluation benchmark for machine reading comprehension of long texts. . Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020), 2020.

This paper presents WikiPossessions, a new benchmark corpus for the task of temporally-oriented possession (TOP), or tracking objects as they change hands over time. We annotate Wikipedia articles for 90 different well-known artifacts paintings, diamonds, and archaeological artifacts), producing 799 artifact-possessor relations with associated attributes. For each article, we also produce a full possession timeline. The full version of the task combines straightforward entity-relation extraction with complex temporal reasoning, as well as verification of textual support for the relevant types of knowledge. Specifically, to complete the full TOP task for a given article, a system must do the following: a) identify possessors; b) anchor possessors to times/events; c) identify temporal relations between each temporal anchor and the possession relation it corresponds to; d) assign certainty scores to each possessor and each temporal relation; and e) assemble individual possession events into a global possession timeline. In addition to the corpus, we release evaluation scripts and a baseline model for the task.
https://www.aclweb.org/anthology/2020.lrec-1.140/

A smart system to generate and validate question answer pairs for COVID-19 literature

Bhambhoria, R., Feng, L., Sepehr, D., Chen, J., Cowling, C., Kocak, S., and Dolatabadi, E. (2020). A smart system to generate and validate question answer pairs for COVID-19 literature. In Proceedings of the First Workshop on Scholarly Document Processing, pages 20–30, Online. Association for Computational Linguistics.

Automatically generating question answer (QA) pairs from the rapidly growing coronavirus-related literature is of great value to the medical community. Creating high quality QA pairs would allow researchers to build models to address scientific queries for answers which are not readily available in support of the ongoing fight against the pandemic. QA pair generation is, however, a very tedious and time consuming task requiring domain expertise for annotation and evaluation. In this paper we present our contribution in addressing some of the challenges of building a QA system without gold data. We first present a method to create QA pairs from a large semi-structured dataset through the use of transformer and rule-based models. Next, we propose a means of engaging subject matter experts (SMEs) for annotating the QA pairs through the usage of a web application. Finally, we demonstrate some experiments showcasing the effectiveness of leveraging active learning in designing a high performing model with a substantially lower annotation effort from the domain experts.
https://www.aclweb.org/anthology/2020.sdp-1.4/

Determining event outcomes: The case of #fail

Murugan, S., Chinnappa, D., and Blanco, E. (2020). Determining event outcomes: The case of #fail. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4021–4033, Online. Association for Computational Linguistics.

This paper targets the task of determining event outcomes in social media. We work with tweets containing either #cookingFail or #bakingFail, and show that many of the events described in them resulted in something edible. Tweets that contain images are more likely to result in edible albeit imperfect outcomes. Experimental results show that edibility is easier to predict than outcome quality.
https://www.aclweb.org/anthology/2020.findings-emnlp.359/

Customizing contextualized language models for legal document reviews

Shaghaghian, S., Feng, L. Y., Jafarpour, B., and Pogrebnyakov, N. (2020). Customizing contextualized language models for legal document reviews. In 2020 IEEE International Con- ference on Big Data (Big Data), pages 2139–2148. IEEE.

Inspired by the inductive transfer learning on computer vision, many efforts have been made to train contextualized language models that boost the performance of natural language processing tasks. These models are mostly trained on large general-domain corpora such as news, books, or Wikipedia.Although these pre-trained generic language models well perceive the semantic and syntactic essence of a language structure, exploiting them in a real-world domain-specific scenario still needs some practical considerations to be taken into account such as token distribution shifts, inference time, memory, and their simultaneous proficiency in multiple tasks. In this paper, we focus on the legal domain and present how different language model strained on general-domain corpora can be best customized for multiple legal document reviewing tasks. We compare their efficiencies with respect to task performances and present practical considerations.

2019

Statutory entailment using similarity features and decomposable attention models

John Hudzina, Thomas Vacek, Kanika Madan, Tonya Custis, and Frank Schilder. Statutory entailment using similarity features and decomposable attention models. Proceedings of Competition on Legal Information Extraction/Entailment (COLIEE), COLIEE-2019 Workshop on June, 21st 2019 in International Conference on Artificial Intelligence and Law (ICAIL), 2019.

Textual entailment using word embeddings and linguistic similarity

Kanika Madan, John Hudzina, Thomas Vacek, Frank Schilder, and Tonya Custis. Textual entailment using word embeddings and linguistic similarity. Proceedings of Competition on Legal Information Extraction/Entailment (COLIEE), COLIEE-2019 Workshop on June, 21st 2019 in International Conference on Artificial Intelligence and Law (ICAIL), 2019.

Exploiting Search Logs to Aid in Training and Automating Infrastructure for Question Answering in Professional Domains

Filippo Pompili, Jack G. Conrad, and Carter Kolbeck. Exploiting Search Logs to Aid in Training and Automating Infrastructure for Question Answering in Professional Domains. Proceedings of the 17th International Conference on Artificial Intelligence and Law (ICAIL), 2019.

Litigation Analytics: Case Outcomes Extracted from US Federal Court Dockets

Thomas Vacek, Ronald Teo, Dezhao Song, Timothy Nugent, Conner Cowling, and Frank Schilder. Litigation Analytics: Case Outcomes Extracted from US Federal Court Dockets. Proceedings of the Natural Legal Language Processing Workshop 2019, 45--54, 2019.

Dockets contain a wealth of information for planning a litigation strategy, but the information is locked up in semi-structured text. Manually deriving the outcomes for each party (e.g., settlement, verdict) would be very labor intensive. Having such information available for every past court case, however, would be very useful for developing a strategy because it potentially reveals tendencies and trends of judges and courts and the opposing counsel. We used Natural Language Processing (NLP) techniques and deep learning methods allowing us to scale the automatic analysis of millions of US federal court dockets. The automatically extracted information is fed into a Litigation Analytics tool that is used by lawyers to plan how they approach concrete litigations.

Litigation Analytics: Extracting and querying motions and orders from US federal courts

Thomas Vacek, Dezhao Song, Hugo Molina-Salgado, Ronald Teo, Conner Cowling, and Frank Schilder. Litigation Analytics: Extracting and querying motions and orders from US federal courts. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), 116--121, 2019.

Legal litigation planning can benefit from statistics collected from past decisions made by judges. Information on the typical duration for a submitted motion, for example, can give valuable clues for developing a successful strategy. Such information is encoded in semi-structured documents called dockets. In order to extract and aggregate this information, we deployed various information extraction and machine learning techniques. The aggregated data can be queried in real time within the Westlaw Edge search engine. In addition to a keyword search for judges, lawyers, law firms, parties and courts, we also implemented a question answering interface that offers targeted questions in order to get to the respective answers quicker.

Sentence Boundary Detection in Legal Text

George Sanchez. Sentence Boundary Detection in Legal Text. Proceedings of the Natural Legal Language Processing Workshop 2019, 31--38, 2019.
https://www.aclweb.org/anthology/W19-2204

In this paper, we examined several algorithms to detect sentence boundaries in legal text. Legal text presents challenges for sentence tokenizers because of the variety of punctuations and syntax of legal text. Out-of-the-box algorithms perform poorly on legal text affecting further analysis of the text. A novel and domain-specific approach is needed to detect sentence boundaries to further analyze legal text. We present the results of our investigation in this paper.

Litigation Analytics: Case outcomes extracted from US federal court dockets

Thomas Vacek, Ronald Teo, Dezhao Song, Timothy Nugent, Conner Cowling, and Frank Schilder. Litigation Analytics: Case outcomes extracted from US federal court dockets. Proceedings of the first Workshop on Natural Legal Language Processing (NLLP), 2019.

Westlaw Edge AI Features Demo: KeyCite Overruling Risk, Litigation Analytics, and WestSearch Plus

Tonya Custis, Frank Schilder, Thomas Vacek, Gayle McElvain, and Hector Martinez Alonso. Westlaw Edge AI Features Demo: KeyCite Overruling Risk, Litigation Analytics, and WestSearch Plus. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, ICAIL '19, 256--257, 2019.
http://doi.acm.org/10.1145/3322640.3326739

WestSearch Plus: A Non-factoid Question-Answering System for the Legal Domain

Gayle McElvain, George Sanchez, Sean Matthews, Don Teo, Filippo Pompili, and Tonya Custis. WestSearch Plus: A Non-factoid Question-Answering System for the Legal Domain. Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'19, 1361--1364, 2019.
http://doi.acm.org/10.1145/3331184.3331397

2018

A Comparison of Two Paraphrase Models for Taxonomy Augmentation

Vassilis Plachouras, Fabio Petroni, Timothy Nugent, and Jochen L. Leidner. A Comparison of Two Paraphrase Models for Taxonomy Augmentation. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 315--320, 2018.
https://www.aclweb.org/anthology/N18-2051

Taxonomies are often used to look up the concepts they contain in text documents (for instance, to classify a document). The more comprehensive the taxonomy, the higher recall the application has that uses the taxonomy. In this paper, we explore automatic taxonomy augmentation with paraphrases. We compare two state-of-the-art paraphrase models based on Moses, a statistical Machine Translation system, and a sequence-to-sequence neural network, trained on a paraphrase datasets with respect to their abilities to add novel nodes to an existing taxonomy from the risk domain. We conduct component-based and task-based evaluations. Our results show that paraphrasing is a viable method to enrich a taxonomy with more terms, and that Moses consistently outperforms the sequence-to-sequence neural...

attr2vec: Jointly Learning Word and Contextual Attribute Embeddings with Factorization Machines

Fabio Petroni, Vassilis Plachouras, Timothy Nugent, and Jochen L. Leidner. attr2vec: Jointly Learning Word and Contextual Attribute Embeddings with Factorization Machines. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 453--462, 2018.
https://www.aclweb.org/anthology/N18-1042

The widespread use of word embeddings is associated with the recent successes of many natural language processing (NLP) systems. The key approach of popular models such as word2vec and GloVe is to learn dense vector representations from the context of words. More recently, other approaches have been proposed that incorporate different types of contextual information, including topics, dependency relations, n-grams, and sentiment. However, these models typically integrate only limited additional contextual information, and often in ad hoc ways. In this work, we introduce attr2vec, a novel framework for jointly learning embeddings for words and contextual attributes based on factorization machines. We perform experiments with different types of contextual information. Our experimental...

TipMaster: A Knowledge Base of Authoritative Local News Sources on Social Media

Xin Shuai, Xiaomo Liu, Nourbakhsh Armineh, Sameena Shah, and Tonya Custis. TipMaster: A Knowledge Base of Authoritative Local News Sources on Social Media. 13th Conference on Innovative Applications of Artificial Intelligence, IAAI-2018, 2018.

Introduction to the special issue on legal text analytics

Jack G. Conrad and Luther Karl Branting Introduction to the special issue on legal text analytics. Artif. Intell. Law, 26, 99--102, 2018.
https://doi.org/10.1007/s10506-018-9227-z

The E2E NLG Challenge: A Tale of Two Systems

Charese Smiley, Elnaz Davoodi, Dezhao Song, and Frank Schilder. The E2E NLG Challenge: A Tale of Two Systems. Proceedings of the 11th International Conference on Natural Language Generation, 472--477, 2018.

An Extensible Event Extraction System With Cross-Media Event Resolution

Fabio Petroni, Natraj Raman, Tim Nugent, Armineh Nourbakhsh, Žarko Panić, Sameena Shah, and Jochen L. Leidner. An Extensible Event Extraction System With Cross-Media Event Resolution. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD '18, 626--635, 2018.
http://doi.acm.org/10.1145/3219819.3219827

2017

Scenario analytics: analyzing jury verdicts to evaluate legal case outcomes

Jack G. Conrad and Khalid Al-Kofahi. Scenario analytics: analyzing jury verdicts to evaluate legal case outcomes. Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law, ICAIL 2017, London, United Kingdom, June 12-16, 2017, 29--37, 2017.
https://doi.org/10.1145/3086512.3086516

Say the right thing right: Ethics issues in natural language generation systems

Charese Smiley, Frank Schilder, Vassilis Plachouras, and Jochen L Leidner. Say the right thing right: Ethics issues in natural language generation systems. Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, 103--108, 2017.

Building and querying an enterprise knowledge graph

Dezhao Song, Frank Schilder, Shai Hertz, Giuseppe Saltini, Charese Smiley, Phani Nivarthi, Oren Hazai, Dudi Landau, Mike Zaharkin, Tom Zielund, et al. Building and querying an enterprise knowledge graph. IEEE Transactions on Services Computing, 2017.

A sequence approach to case outcome detection

Tom Vacek and Frank Schilder. A sequence approach to case outcome detection. Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law, 209--215, 2017.

A Multidimensional Investigation of the Effects of Publication Retraction on Scholarly Impact

Xin Shuai, Jason Rollins, Isabelle Moulinier, Tonya Custis, Mathilda Edmunds, and Frank Schilder A Multidimensional Investigation of the Effects of Publication Retraction on Scholarly Impact. Journal of the Association for Information Science & Technology, 68, 2225-2236, 2017.

Hashtag Mining: Discovering Relationship Between Health Concepts and Hashtags

Quanzhi Li, Sameena Shah, Rui Fang, Armineh Nourbakhsh, and Xiaomo Liu (2017). In Public Health Intelligence and the Internet, Hashtag Mining: Discovering Relationship Between Health Concepts and Hashtags. (pp. 75--85). Springer.

Reuters Tracer: Toward Automated News Production Using Large Scale Social Media Data

Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Sameena Shah, Robert Martin, and John Duprey. Reuters Tracer: Toward Automated News Production Using Large Scale Social Media Data. 2017 IEEE International Conference on Big Data, 2017.

Mapping the echo-chamber: detecting and characterizing partisan networks on Twitter

Armineh Nourbakhsh, Xiaomo Liu, Quanzhi Li, and Sameena Shah. Mapping the echo-chamber: detecting and characterizing partisan networks on Twitter. International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS), 2017.

" Breaking" Disasters: Predicting and Characterizing the Global News Value of Natural and Man-made Disasters

Armineh Nourbakhsh, Quanzhi Li, Xiaomo Liu, and Sameena Shah. " Breaking" Disasters: Predicting and Characterizing the Global News Value of Natural and Man-made Disasters. KDD Workshop on Data Science + Journalism, 2017.

funSentiment at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs Using Word Vectors Built from StockTwits and Twitter

Quanzhi Li, Sameena Shah, Armineh Nourbakhsh, Rui Fang, and Xiaomo Liu. funSentiment at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs Using Word Vectors Built from StockTwits and Twitter. Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3-4, 2017, 852--856, 2017.

funSentiment at SemEval-2017 Task 4: Topic-Based Message Sentiment Classification by Exploiting Word Embeddings, Text Features and Target Contexts

Quanzhi Li, Armineh Nourbakhsh, Xiaomo Liu, Rui Fang, and Sameena Shah. funSentiment at SemEval-2017 Task 4: Topic-Based Message Sentiment Classification by Exploiting Word Embeddings, Text Features and Target Contexts. Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3-4, 2017, 741--746, 2017.

Data Sets: Word Embeddings Learned from Tweets and General Data

Quanzhi Li, Sameena Shah, Xiaomo Liu, and Armineh Nourbakhsh. Data Sets: Word Embeddings Learned from Tweets and General Data. The 11th International Conference on Weblogs and Social Media (ICWSM), 2017.

Real-time novel event detection from social media

Quanzhi Li, Armineh Nourbakhsh, Sameena Shah, and Xiaomo Liu. Real-time novel event detection from social media. 2017 IEEE 33rd International Conference on Data Engineering (ICDE), 1129--1139, 2017.

2016

Fifteenth International Conference on Artificial Intelligence and Law (ICAIL 2015)

Katie Atkinson, Jack G. Conrad, Anne Gardner, and Ted Sichelman Fifteenth International Conference on Artificial Intelligence and Law (ICAIL 2015). AI Magazine, 37, 107--108, 2016.
https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/2633

Semi-Supervised Events Clustering in News Retrieval

Jack G. Conrad and Michael Bender. Semi-Supervised Events Clustering in News Retrieval. Proceedings of the First International Workshop on Recent Trends in News Information Retrieval co-located with 38th European Conference on Information Retrieval (ECIR 2016), Padua, Italy, March 20, 2016., 21--26, 2016.
http://ceur-ws.org/Vol-1568/paper4.pdf

When to Plummet and When to Soar: Corpus Based Verb Selection for Natural Language Generation

Charese Smiley, Vassilis Plachouras, Frank Schilder, Hiroko Bretz, Jochen Leidner, and Dezhao Song. When to Plummet and When to Soar: Corpus Based Verb Selection for Natural Language Generation. Proceedings of the 9th International Natural Language Generation conference, 36--39, 2016.

Interacting with financial data using natural language

Vassilis Plachouras, Charese Smiley, Hiroko Bretz, Ola Taylor, Jochen L Leidner, Dezhao Song, and Frank Schilder. Interacting with financial data using natural language. Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 1121--1124, 2016.

Witness identification in twitter

Rui Fang, Armineh Nourbakhsh, Xiaomo Liu, Sameena Shah, and Quanzhi Li. Witness identification in twitter. Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media, 65--73, 2016.

Reuters tracer: A large scale system of detecting & verifying real-time news events from twitter

Xiaomo Liu, Quanzhi Li, Armineh Nourbakhsh, Rui Fang, Merine Thomas, Kajsa Anderson, Russ Kociuba, Mark Vedder, Steven Pomerville, Ramdev Wudali, et al.. Reuters tracer: A large scale system of detecting & verifying real-time news events from twitter. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 207--216, 2016.

Hashtag recommendation based on topic enhanced embedding, tweet entity data and learning to rank

Quanzhi Li, Sameena Shah, Armineh Nourbakhsh, Xiaomo Liu, and Rui Fang. Hashtag recommendation based on topic enhanced embedding, tweet entity data and learning to rank. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2085--2088, 2016.

Tweetsift: Tweet topic classification based on entity knowledge base and topic enhanced word embedding

Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh, and Rui Fang. Tweetsift: Tweet topic classification based on entity knowledge base and topic enhanced word embedding. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2429--2432, 2016.

Tweet topic classification using distributed language representations

Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh, and Rui Fang. Tweet topic classification using distributed language representations. 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), 81--88, 2016.

Tweet Sentiment Analysis by Incorporating Sentiment-Specific Word Embedding and Weighted Text Features

Quanzhi Li, Sameena Shah, Rui Fang, Armineh Nourbakhsh, and Xiaomo Liu. Tweet Sentiment Analysis by Incorporating Sentiment-Specific Word Embedding and Weighted Text Features. 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), 568--571, 2016.

Sentiment Analysis of Political Figures across News and Social Media

Quanzhi Li, Armineh Nourbakhsh, Rui Fang, Xiaomo Liu, and Sameena Shah. Sentiment Analysis of Political Figures across News and Social Media. 2016 International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS), 2016.

How Much Data Do You Need? Twitter Decahose Data Analysis

Quanzhi Li, Sameena Shah, Merine Thomas, Kajsa Anderson, Xiaomo Liu, Armineh Nourbakhsh, and Rui Fang. How Much Data Do You Need? Twitter Decahose Data Analysis. 2016 International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS), 2016.

Discovering Relevant Hashtags for Health Concepts: A Case Study of Twitter

Quanzhi Li, Sameena Shah, Rui Fang, Armineh Nourbakhsh, and Xiaomo Liu. Discovering Relevant Hashtags for Health Concepts: A Case Study of Twitter. AAAI Workshop: WWW and Population Health Intelligence, 2016.

User Behaviors in Newsworthy Rumors: A Case Study of Twitter

Quanzhi Li, Xiaomo Liu, Rui Fang, Armineh Nourbakhsh, and Sameena Shah. User Behaviors in Newsworthy Rumors: A Case Study of Twitter. The 10th International Conference on Weblogs and Social Media (ICWSM), 627--630, 2016.

Georeferencing

Jochen L. Leidner (2016). In Wiley International Encyclopedia of Geography, Georeferencing. Oxford, England, UK: Wiley-Blackwell.

Newton: Building an authority-driven company tagging and resolution system

Merine Thomas, Hiroko Bretz, Thomas Vacek, Benjamin Hachey, Sudhanshu Singh, and Frank Schilder (2016). In Working With Text: Tools, Techniques and Approaches for Text Mining, Tonkin, Emma and Taylor, Stephanie (Eds.), Newton: Building an authority-driven company tagging and resolution system. (pp. 159--187). Chandos Publishing.
https://www.sciencedirect.com/science/article/pii/B9781843347...

2015

Quantifying socio-economic indicators in developing countries from mobile phone communication data: applications to Côte d'Ivoire

Huina Mao, Xin Shuai, Yong-Yeol Ahn, and Johan Bollen Quantifying socio-economic indicators in developing countries from mobile phone communication data: applications to Côte d'Ivoire. EPJ Data Science, 4, 2015.
http://dx.doi.org/10.1140/epjds/s13688-015-0053-1

Natural Language Question Answering and Analytics for Diverse and Interlinked Datasets

Dezhao Song, Frank Schilder, Charese Smiley, and Chris Brew. Natural Language Question Answering and Analytics for Diverse and Interlinked Datasets. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 101--105, 2015.
http://www.aclweb.org/anthology/N15-3021

Newsworthy rumor events: A case study of twitter

Armineh Nourbakhsh, Xiaomo Liu, Sameena Shah, Rui Fang, Mohammad Mahdi Ghassemi, and Quanzhi Li. Newsworthy rumor events: A case study of twitter. 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 27--32, 2015.

Real-time rumor debunking on twitter

Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Rui Fang, and Sameena Shah. Real-time rumor debunking on twitter. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM), 1867--1870, 2015.

The Role of Evaluation in AI and Law: An Examination of Its Different Forms in the AI and Law Journal

Jack G. Conrad and John Zeleznikow. The Role of Evaluation in AI and Law: An Examination of Its Different Forms in the AI and Law Journal. Proceedings of the 15th International Conference on Artificial Intelligence and Law, ICAIL '15, 181--186, 2015.
http://doi.acm.org/10.1145/2746090.2746116

Real-time Rumor Debunking on Twitter

Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Rui Fang, and Sameena Shah. Real-time Rumor Debunking on Twitter. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM '15, 1867--1870, 2015.

Newsworthy Rumor events: A Case Study of Twitter

Armineh Nourbakhsh, Xiaomo Liu, Sameena Shah, Rui Fang, Mohammad Ghassemi, and Quanzhi Li. Newsworthy Rumor events: A Case Study of Twitter. Proceedings of the ICDM workshop on Event Analytics using social media data, 2015.

Information Extraction of Regulatory Enforcement Action: From Anti-Money Laundering Compliance to Countering Terrorism Finance

Vassilis Plachouras and Jochen L. Leidner. Information Extraction of Regulatory Enforcement Action: From Anti-Money Laundering Compliance to Countering Terrorism Finance. International Symposium on Open Source Intelligence and Security Informatics, FOSINT-SI, 2015.

Multimodal Entity Coreference for Cervical Dysplasia Diagnosis

Dezhao Song, Edward Kim, Xiaolei Huang, Joseph Patruno, Héctor Muñoz-Avila, Jeff Heflin, L. Rodney Long, and Sameer Antani Multimodal Entity Coreference for Cervical Dysplasia Diagnosis. IEEE Transactions on Medical Imaging (IEEE TMI), 34, 229--245, 2015.

TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets

Dezhao Song, Frank Schilder, Charese Smiley, Chris Brew, Tom Zielund, Hiroko Bretz, Robert Martin, Chris Dale, John Duprey, Tim Miller, and Johanna Harrison (2015). In The Semantic Web - ISWC 2015, TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets. (pp. 21-37). Springer International Publishing.
http://dx.doi.org/10.1007/978-3-319-25010-6_2

Currently, the dominant technology for providing non-technical users with access to Linked Data is keyword-based search. This is problematic because keywords are often inadequate as a means for expressing user intent. In addition, while a structured query language can provide convenient access to the information needed by advanced analytics, unstructured keyword-based search cannot meet this extremely common need. This makes it harder than necessary for non-technical users to generate analytics. We address these difficulties by developing a natural language-based system that allows non-technical users to create well-formed questions. Our system, called TR Discover, maps from a fragment of English into an intermediate First Order Logic representation, which is in turn mapped into SPARQL...

2014

Text Analytics at Thomson Reuters

Jochen L. Leidner Text Analytics at Thomson Reuters. Invited Talk, London Text Analytics Meetup, London, England, 2014-10-16, 2014.
http://www.meetup.com/textanalytics/events/207765012/

Thomson Reuters is an information company that develops and sells information products to professionals in verticals such as Finance, Risk/Compliance, News, Law, Tax, Accounting, Intellectual Property, and Science. In this talk, I will describe how making money from information differs from making money from advertising, and the role of state-of-the-art text analytics techniques in the process will be described using some case studies. In addition, I will compare and contrast our industry research work with academic research.

Research and Development in Information Access at Thomson Reuters Corporate R&D

Jochen L. Leidner Research and Development in Information Access at Thomson Reuters Corporate R&D. Invited Talk, Language and Computation Day (LAC), University of Essex, Colchester, England, 2014-10-06, 2014.
http://lac.essex.ac.uk/language-and-computation-day-2014

Thomson Reuters is a modern information company. In this talk, I characterise the nature of carrying out research, development and innovation activities as part of its Corporate R&D group that add value to end customers and translate into additional revenue. A couple of R&D projects in the are of natural language processing, information retrieval and applied machine learning will be described, covering the legal, scientific, financial and news areas. The talk will conclude with a cautious outlook of what the near future may hold. Additionally, I will attempt a comparison of doing research in a company with pursuing academic research at a university.

A practical SIM learning formulation with margin capacity control

Thomas Vacek. A practical SIM learning formulation with margin capacity control. Proceedings of 2014 International Joint Conference on Neural Networks (IJCNN), 4160-4167, 2014.
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6889963

Given a finite i.i.d. dataset of the form (yi, Xi), the Single Index Model (SIM) learning problem is to estimate a regression of the form u o f(xi) where u is some Lipschitz-continuous nondecreasing function and / is a linear function. This paper applies Vapnik's Structural Risk Minimization principle to SIM learning. I show that a risk structure for the space of model functions/gives a risk structure for the space of functions u o f. Second, I provide a practical learning formulation for SIM using a risk structure defined by margin-based capacity control. The new learning formulation is compared with support vector regression.

Winning by Following the Winners: Mining the Behaviour of Stock Market Experts in Social Media

Wenhui Liao, Sameena Shah, and Masoud Makrehchi. Winning by Following the Winners: Mining the Behaviour of Stock Market Experts in Social Media. Proceedings of the International Social Computing, Behavioral-Cultural Modeling and Prediction Conference (SBP 2014), 2014.

A novel yet simple method is proposed to exercise in stock market by following successful stock market expert in social media. The problem of "how and where to invest" is translated into "who to follow in my investment". In other words, looking for stock market investment strategy is converted into stock market expert search. Fortunately, many stock market experts are active in social media and openly express their opinion about market. By analyzing their behaviour and mining their opinions and suggested actions in Twitter, and virtually exercise based on their suggestions, we are able to score each expert based on his/her performance. Using this scoring system, experts with most successful trading are recommended. The main objective in this research is to identify traders that...

Social Informatics: Revised Selected Papers from SocInfo 2013 International Workshops, QMC and HISTOINFORMATICS, Kyoto, Japan, November 25, 2013

http://www.springer.com/computer/database+management+%26+info...

This book constitutes the refereed post-proceedings of two workshops held at the 5th International Conference on Social Informatics, SocInfo 2013, in Kyoto, Japan, in November 2013: the First Workshop on Quality, Motivation and Coordination of Open Collaboration, QMC 2013, and the First International Workshop on Histoinformatics, HISTOINFORMATICS 2013. The 11 revised papers presented at the workshops were carefully reviewed and selected from numerous submissions. They cover specific areas of social informatics. The QMC 2013 workshop attracted papers on new algorithms and methods to improve the quality or to increase the motivation of open collaboration, to reduce the cost of financial motivation or to decrease the time needed to finish collaborative tasks. The papers presented at...

Exploring Linked Data with contextual tag clouds

Xingjian Zhang, Dezhao Song, Sambhawa Priya, Zachary Daniels, Kelly Reynolds, and Jeff Heflin Exploring Linked Data with contextual tag clouds. Web Semantics: Science, Services and Agents on the World Wide Web, 24, 33 - 39, 2014.
http://www.sciencedirect.com/science/article/pii/S1570826814000055

Abstract In this paper we present the contextual tag cloud system: a novel application that helps users explore a large scale \{RDF\} dataset. Unlike folksonomy tags used in most traditional tag clouds, the tags in our system are ontological terms (classes and properties), and a user can construct a context with a set of tags that defines a subset of instances. Then in the contextual tag cloud, the font size of each tag depends on the number of instances that are associated with that tag and all tags in the context. Each contextual tag cloud serves as a summary of the distribution of relevant data, and by changing the context, the user can quickly gain an understanding of patterns in the data. Furthermore, the user can choose to include \{RDFS\} taxonomic and/or domain/range entailment...

2013

A Statistical NLG Framework for Aggregated Planning and Realization

Ravi Kondadadi, Blake Howald, and Frank Schilder. A Statistical NLG Framework for Aggregated Planning and Realization. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1406--1415, 2013.
http://www.aclweb.org/anthology/P13-1138

We present a hybrid natural language generation (NLG) system that consolidates macro and micro planning and surface realization tasks into one statistical learning process. Our novel approach is based on deriving a template bank automatically from a corpus of texts from a target domain. First, we identify domain specific entity tags and Discourse Representation Structures on a per sentence basis. Each sentence is then organized into semantically similar groups (representing a domain specific concept) by k-means clustering. After this semi-automatic processing (human review of cluster assignments), a number of corpus-level statistics are compiled and used as features by a ranking SVM to develop model weights from a training corpus. At generation time, a set of input data, the collection...

GenNext: A Consolidated Domain Adaptable NLG System

Frank Schilder, Blake Howald, and Ravi Kondadadi. GenNext: A Consolidated Domain Adaptable NLG System. Proceedings of the 14th European Workshop on Natural Language Generation, 178--182, 2013.
http://www.aclweb.org/anthology/W13-2124

We introduce GenNext, an NLG system designed specifically to adapt quickly and easily to different domains. Given a domain corpus of historical texts, GenNext allows the user to generate a template bank organized by semantic concept via derived discourse representation structures in conjunction with general and domain-specific entity tags. Based on various features collected from the training corpus, the system statistically learns template representations and document structure and produces well-formed texts (as evaluated by crowdsourced and expert evaluations). In addition to domain adaptation, the GenNext hybrid approach significantly reduces complexity as compared to traditional NLG systems by relying on templates (consolidating micro-planning and surface realization) and...

Domain Adaptable Semantic Clustering in Statistical NLG

Blake Howald, Ravikumar Kondadadi, and Frank Schilder. Domain Adaptable Semantic Clustering in Statistical NLG. Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) -- Long Papers, 143--154, 2013.
http://www.aclweb.org/anthology/W13-0113

We present a hybrid natural language generation system that utilizes Discourse Representation Structures (DRSs) for statistically learning syntactic templates from a given domain of discourse in sentence micro planning. In particular, given a training corpus of target texts, we extract semantic predicates and domain general tags from each sentence and then organize the sentences using supervised clustering to represent the conceptual meaning of the corpus. The sentences, additionally tagged with domain specific information (determined separately), are reduced to templates. We use a SVM ranking model trained on a subset of the corpus to determine the optimal template during generation. The combination of the conceptual unit, a set of ranked syntactic templates, and a given set of...

Next Generation Legal Search - It's Already Here

Qiang Lu and Jack G. Conrad Next Generation Legal Search - It's Already Here. Vox Populii blog, Legal Information Institute (LII), Cornell University, 2013.
http://blog.law.cornell.edu/voxpop/2013/03/28/next-generation

Editor's Note: We are pleased to publish this piece from Qiang Lu and Jack Conrad, both of whom worked with Thomson Reuters R&D on the WestlawNext research team. Jack Conrad continues to work with Thomson Reuters, though currently on loan to the Catalyst Lab at Thomson Reuters Global Resources in Switzerland. Qiang Lu is now based at Kore Federal in the Washington, D.C. area. We read with interest their 2012 paper from the International Conference on Knowledge Engineering and Ontology Development (KEOD), ``Bringing order to legal documents: An issue-based recommendation system via cluster association'', and are grateful that they have agreed to offer some system-specific context for their work in this area. Their current contribution represents a practical description of the advances...

Evaluating Entity Linking with Wikipedia

Ben Hachey, Will Radford, Joel Nothman, Matthew Honnibal, and James R. Curran Evaluating Entity Linking with Wikipedia. Artificial Intelligence, 194, 130-150, 2013.
http://www.sciencedirect.com/science/article/pii/S0004370212000446

Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate entities and then disambiguate them, returning either the best candidate or nil. However, comparison has focused on disambiguation accuracy, making it difficult to determine how search impacts performance. Furthermore, important approaches from the literature have not been systematically compared on standard data sets. We reimplement three seminal nel systems and present a detailed evaluation of search strategies. Our experiments find that coreference and acronym handling lead to substantial improvement, and search strategies account...

The Significance of Evaluation in AI and Law: A Case Study Re-examining ICAIL Proceedings

Jack G. Conrad and John Zeleznikow. The Significance of Evaluation in AI and Law: A Case Study Re-examining ICAIL Proceedings. Proceedings of the 14th International Conference on Artificial Intelligence and Law (ICAIL), 186-191, 2013.

This paper examines the presence of performance evaluation in works published at ICAIL conferences since 2000. As such, it is a self-reflexive, meta-level study that investigates the proportion of works that include some form of performance assessment in their contribution. It also reports on the categories of evaluation present as well as their degree. In addition the paper compares current trends in performance measurement with those of earlier ICAILs, as reported in the Hall and Zeleznikow work on the same topic (ICAIL 2001). The paper also develops an argument for why evaluation in formal Artificial Intelligence and Law reports such as ICAIL proceedings is imperative. It underscores the importance of answering the question: how good is the system?, how reliable is the approach?,...

Ants find the shortest path: A mathematical Proof

Jayadeva, Sameena Shah, A. Bhaya, R. Kothari, and S. Chandra Ants find the shortest path: A mathematical Proof. Swarm Intelligence, 7, 43-62, 2013.

In the most basic application of Ant Colony Optimization (ACO), a set of artificial ants find the shortest path between a source and a destination. Ants deposit pheromone on paths they take, preferring paths that have more pheromone on them. Since shorter paths are traversed faster, more pheromone accumulates on them in a given time, attracting more ants and leading to reinforcement of the pheromone trail on shorter paths. This is a positive feedback process that can also cause trails to persist on longer paths, even when a shorter path becomes available. To counteract this persistence on a longer path, ACO algorithms employ remedial measures, such as using negative feedback in the form of uniform evaporation on all paths. Obtaining high performance in ACO algorithms typically requires...

Making Structured Data Searchable via Natural Language Generation with an Application to ESG Data

Jochen L. Leidner and Darya Kamkova. Making Structured Data Searchable via Natural Language Generation with an Application to ESG Data. Proceedings of the 10th International Conference Flexible Query Answering Systems (FQAS 2013), Granada, Spain, September 18-20, 2013, Lecture Notes in Computer Science, 8132, 495--506, 2013.

Relational Databases are used to store structured data, which is typically accessed using report builders based on SQL queries. To search, forms need to be understood and filled out, which demands a high cognitive load. Due to the success of Web search engines, users have become acquainted with the easier mechanism of natural language search for accessing unstructured data. However, such keyword-based search methods are not easily applicable to structured data, especially where structured records contain non-textual content such as numbers. We present a method to make structured data, including numeric data, searchable with a Web search engine-like keyword search access mechanism. Our method is based on the creation of surrogate text documents using Natural Language Generation (NLG)...

Stock Prediction Using Event-based Sentiment Analysis

M Makrehchi, Sameena Shah, and W. Liao. Stock Prediction Using Event-based Sentiment Analysis. Proceedings of IEEE/ACM International Conference on Web Intelligence, 2013.

We propose a novel approach to label social media text using significant stock market events (big losses or gains). Since stock events are easily quantifiable using returns from indices or individual stocks, they provide meaningful and automated labels. We extract significant stock movements and collect appropriate pre, post and contemporaneous text from social media sources (for example, tweets from twitter). Subsequently, we assign the respective label (positive or negative) for each tweet. We train a model on this collected set and make predictions for labels of future tweets. We aggregate the net sentiment per each day (amongst other metrics) and show that it holds significant predictive power for subsequent stock market movement. We create successful trading strategies based on...

Benchmarks for Enterprise Linking: Thomson Reuters R&D at TAC 2013

Thomas Vacek, Hiroko Bretz, Frank Schilder, and Ben Hachey. Benchmarks for Enterprise Linking: Thomson Reuters R&D at TAC 2013. proceeding of Text Analysis Conference (TAC), 2013.

2012

Event Linking: Grounding Event Reference in a News Archive

Joel Nothman, Matthew Honnibal, Ben Hachey, and James R. Curran. Event Linking: Grounding Event Reference in a News Archive. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 228-232, 2012.
http://www.aclweb.org/anthology/P12-2045

Interpreting news requires identifying its constituent events. Events are complex linguistically and ontologically, so disambiguating their reference is challenging. We introduce event linking, which canonically labels an event reference with the article where it was first reported. This implicitly relaxes coreference to co-reporting, and will practically enable augmenting news archives with semantic hyperlinks. We annotate and analyse a corpus of 150 documents, extracting 501 links to a news archive with reasonable inter-annotator agreement.

Bringing Order to Legal Documents - An Issue-based Recommendation System Via Cluster Association

Qiang Lu and Jack G. Conrad. Bringing Order to Legal Documents - An Issue-based Recommendation System Via Cluster Association. KEOD, 76-88, 2012.

The task of recommending content to professionals (such as attorneys or brokers) differs greatly from the task of recommending news to casual readers. A casual reader may be satisfied with a couple of good recommendations, whereas an attorney will demand precise and comprehensive recommendations from various content sources when conducting legal research. Legal documents are intrinsically complex and multi-topical, contain carefully crafted, professional, domain specific language, and possess a broad and unevenly distributed coverage of issues. Consequently, a high quality content recommendation system for legal documents requires the ability to detect significant topics from a document and recommend high quality content accordingly. Moreover, a litigation attorney preparing for a case...

A history of AI and Law in 50 papers: 25 years of the international conference on AI and Law

Trevor J.M. Bench-Capon, Michal Araszkiewicz, Kevin D. Ashley, Katie Atkinson, Floris Bex, Filipe Borges, Daniele Bourcier, Paul Bourgine, Jack G. Conrad, Enrico Francesconi, Thomas F. Gordon, Guido Governatori, Jochen L. Leidner, David D. Lewis, Ronald Prescott Loui, L. Thorne McCarty, Henry Prakken, Frank Schilder, Erich Schweighofer, Paul Thompson, Alex Tyrrell, Bart Verheij, Douglas N. Walton, and Adam Zachary Wyner A history of AI and Law in 50 papers: 25 years of the international conference on AI and Law. Artif. Intell. Law, 20, 215-319, 2012.
http://dx.doi.org/10.1007/s10506-012-9131-x

Convergence of the Dynamic Load Balancing Problem to Nash Equilibrium using Distributed Local Interactions

Sameena Shah and R. Kothari Convergence of the Dynamic Load Balancing Problem to Nash Equilibrium using Distributed Local Interactions. Information Sciences, 221, 297-305, 2012.

Load balancers distribute workload across multiple nodes based on a variation of the round robin algorithm, or a more complex algorithm that optimizes a specified objective or allows for horizontal scalability and higher availability. In this paper, we investigate whether robust load balancing can be achieved using a local co-operative mechanism between the resources (nodes). The local aspect of the mechanism implies that each node interacts with a small subset of the nodes that define its neighborhood. The co-operative aspect of the mechanism implies that a node may offload some of load to its neighbor nodes that have lesser load or accept jobs from neighbor nodes that have higher load. Each node is thus only aware of the state of its neighboring nodes and there is no central entity...

2011

The Role of HLT in High-end Search and the Persistent Need for Advanced HLT Technologies

Jack G. Conrad The Role of HLT in High-end Search and the Persistent Need for Advanced HLT Technologies. Invited Talk, Workshop on Applying Human Language Technologies to Law (AHLTL 2011), held in conjunction with The Thirteenth International Conference on Artificial Intelligence and Law (ICAIL11), Pittsburgh, PA, 2011.

This talk will first address the multiple 'views' into legal materials that are harnessed by today's high-end legal search engines. These dimensions include the traditional document view (e.g., tf.idf scoring of a document's terms relative to a query), the taxonomic view (the classification of a candidate document using an expansive legal taxonomy such as the Key Number System), the citation network view (where legal documents are characterized by numerous citations, both in-bound and out-bound, some which remain based on solid decisions and some which may be weakened by subsequent judicial opinions), and the user view (records of thousands of user interactions with candidate documents including views, prints, cites, etc.). This is hardly a Saltonian search engine applied to legal...

Public Record Aggregation Using Semi-supervised Entity Resolution

Jack G. Conrad, Christopher Dozier, Hugo Molina-Salgado, Merine Thomas, and Sriharsha Veeramachaneni. Public Record Aggregation Using Semi-supervised Entity Resolution. Proceedings of the 13th International Conference on Artificial Intelligence and Law (ICAIL 2011), 239-248, 2011.
http://www.law.pitt.edu/events/2011/06/icail-2011-the-thirtee...

This paper describes a highly scalable state of the art record aggregation system and the backbone infrastructure developed to support it. The system, called PeopleMap, allows legal professionals to effectively and efficiently explore a broad spectrum of public records databases by way of a single person-centric search. The backbone support system, called Concord, is a toolkit that allows developers to economically create record resolution solutions. The PeopleMap system is capable of linking billions of public records to a master data set consisting of hundreds of millions of person records. It was constructed using successive applications of Concord to link disparate public record data sets to a central person authority file. To our knowledge, the PeopleMap system is the largest of...

Review of: Handbook of Natural Language Processing (second edition) Nitin Indurkhya and Fred J. Damerau (editors) (University of New South Wales; IBM Thomas J. Watson Research Center)Boca Raton, FL: CRC Press, 2010, xxxiii+678 pp; hardbound, ISBN 978-1-4200-8592-1

Jochen L. Leidner Review of: Handbook of Natural Language Processing (second edition) Nitin Indurkhya and Fred J. Damerau (editors) (University of New South Wales; IBM Thomas J. Watson Research Center)Boca Raton, FL: CRC Press, 2010, xxxiii+678 pp; hardbound, ISBN 978-1-4200-8592-1. Computational Linguistics, 37, 395--397, 2011.

Detecting geographical references in the form of place names and associated spatial natural language

Jochen L. Leidner and Michael D. Lieberman Detecting geographical references in the form of place names and associated spatial natural language. SIGSPATIAL Special, 3, 5--11, 2011.

Legal Document Clustering With Build-in Topic Segmentation

Qiang Lu, Jack G. Conrad, Khalid Al-Kofahi, and William Keenan. Legal Document Clustering With Build-in Topic Segmentation. Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM-11), 2011.

Clustering is a useful tool for helping users navigate, summarize, and organize large quantities of textual documents available on the Internet, in news sources, and in digital libraries. A variety of clustering methods have also been applied to the legal domain, with various degrees of success. Some unique characteristics of legal content as well as the nature of the legal domain present a number of challenges. For example, legal documents are often multi-topical, contain carefully crafted, professional, domain-specific language, and possess a broad and unevenly distributed coverage of legal issues. Moreover, unlike widely accessible documents on the Internet, where search and categorization services are generally free, the legal profession is still largely a fee-for-service field...

Summarize this! - Recipes for multi-lingual automatic summarization

Frank Schilder and Liang Zhou (2011). In Multilingual Natural Language Applications: From Theory to Practice, Imed Zitouni and Daniel M. Bikel (Eds.), Summarize this! - Recipes for multi-lingual automatic summarization. IBM Press.

2010

Using Cross-Lingual Projections to Generate Semantic Role Labeled Annotated Corpus for Urdu - A Resource Poor Language

Smruthi Mukund, Debanjan Ghosh, and Rohini Srihari. Using Cross-Lingual Projections to Generate Semantic Role Labeled Annotated Corpus for Urdu - A Resource Poor Language. Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), 797--805, 2010.
http://www.aclweb.org/anthology/C10-1090

In this paper we explore the possibility of using cross lingual projections that help to automatically induce role-semantic annotations in the PropBank paradigm for Urdu, a resource poor language. This technique provides annotation projections based on word alignments. It is relatively inexpensive and has the potential to reduce human effort involved in creating semantic role resources. The projection model exploits lexical as well as syntactic information on an English-Urdu parallel corpus. We show that our method generates reasonably good annotations with an accuracy of 92\% on short structured sentences. Using the automatically generated annotated corpus, we conduct preliminary experiments to create a semantic role labeler for Urdu. The

Hunting for the Black Swan: Risk Mining from Text

Jochen Leidner and Frank Schilder. Hunting for the Black Swan: Risk Mining from Text. Proceedings of the ACL 2010 System Demonstrations, 54--59, 2010.
http://www.aclweb.org/anthology/P10-4010

In the business world, analyzing and dealing with risk permeates all decisions and actions. However, to date, risk identification, the first step in the risk management cycle, has always been a manual activity with little to no intelligent software tool support. In addition, although companies are required to list risks to their business in their annual SEC filings in the USA, these descriptions are often very high-level and vague. In this paper, we introduce Risk Mining, which is the task of identifying a set of risks pertaining to a business area or entity. We argue that by combining Web mining and Information Extraction (IE) techniques, risks can be detected automatically before they materialize, thus providing valuable business intelligence. We describe a system that induces a risk...

Brain connectivity analysis by reduction to pair classification

Emanuele Olivetti, Sriharsha Veeramachaneni, Susanne Greiner, and Paolo Avesani. Brain connectivity analysis by reduction to pair classification. Proceedings of 2nd International Workshop on Cognitive Information Processing (CIP), 275-280, 2010.
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5604101

Brain connectivity studies aim at describing the connections within the brain. Diffusion and functional MRI techniques provide different kinds of information to understand brain connectivity non-invasively. Fiber tract segmentation is the task of identifying pathways of neuronal axons connecting different brain areas from MRI data. In this work we propose a method to investigate the role of both diffusion and functional MRI data for supervised tract segmentation based on learning the pairwise relationships between streamlines. Experiments on real data demonstrate the promise of the approach.

Concord - A Tool that Automates the Construction of Record Resolution Systems

Christopher Dozier, Hugo Molina-Salgado, Merine Thomas, and Sriharsha Veeramachaneni. Concord - A Tool that Automates the Construction of Record Resolution Systems. Proceedings of the Workshop on Named Entity Resolution at the Eighth International Conference on Language Resources and Evaluation (LREC 2010), 2010.

We describe an application we created called Concord that enables software engineers to build and execute Java based record resolution systems (RRS) quickly. Concord allows developers to interactively configure a RRS by specifying match feature functions, blocking functions, and unsupervised machine learning methods for a specific resolution problem. From the developer's defined configuration parameters, Concord creates a Java based RRS that generates training data, learns a matching model and resolves the records in the input files. As far as we know, Concord is unique among RRS generators in that it allows users to select feature functions which are customized for particular field types and in that it allows users to create matching models in a novel unsupervised way using a...

Book Review: Representation and Management of Narrative Information: Theoretical Principles and Implementation

Frank Schilder Book Review: Representation and Management of Narrative Information: Theoretical Principles and Implementation. Computational Linguistics, 36, 151-156, 2010.

Gian Piero Zarri's book summarizes more than a decade of his research on knowledge representation for narrative text. The centerpiece of Zarri's work is the Narrative Knowledge Representation Language (NKRL), which he describes and compares to other competing theories. In addition, he discusses how to model the meaning of narrative text by giving many real-world examples. NKRL provides three different components or capabilities: (a) a representation system, (b) inferencing, and (c) an implementation. It is implemented via a Java-based system that shows how a representational theory can be applied to narrative texts.

Building and Operating a Hadoop/MapReduce Cluster from Commodity Components: A Case Study

Jochen L. Leidner and Gary Berosik Building and Operating a Hadoop/MapReduce Cluster from Commodity Components: A Case Study. ;login:, 26--37, 2010.
http://www.usenix.org/publications/login/2010-02/openpdfs/leidner.pdf

This tutorial presents a recipe for the construction of a compute cluster for processing large volumes of data, using cheap, easily available personal computer hardware (Intel/AMD based PCs) and freely available open source software (Ubuntu Linux, Apache Hadoop).

E-Discovery Revisited: the Need for Artificial Intelligence beyond Information Retrieval

Jack G. Conrad E-Discovery Revisited: the Need for Artificial Intelligence beyond Information Retrieval. Artificial Intelligence and Law, 18, 1-25, 2010.
http://dx.doi.org/10.1007/s10506-010-9096-6

In this work, we provide a broad overview of the distinct stages of E-Discovery. We portray them as an interconnected, often complex workflow process, while relating them to the general Electronic Discovery Reference Model (EDRM). We start with the definition of E-Discovery. We then describe the very positive role that NIST's Text REtrieval Conference (TREC) has added to the science of E-Discovery, in terms of the tasks involved and the evaluation of the legal discovery work performed. Given the critical nature that data analysis plays at various stages of the process, we present a pyramid model, which complements the EDRM model: for gathering and hosting; indexing; searching and navigating; and finally consolidating and summarizing E-Discovery findings. Next we discuss where the...

Filter-based Data Partitioning for Training Multiple Classifier Systems

Rozita A. Dara, Masoud Makrehchi, and Mohamed S. Kamel Filter-based Data Partitioning for Training Multiple Classifier Systems. IEEE Transactions on Knowledge and Data Engineering, 22, 508-522, 2010.

Data partitioning methods such as bagging and boosting have been extensively used in multiple classifier systems. These methods have shown a great potential for improving classification accuracy. This study is concerned with the analysis of training data distribution and its impact on the performance of multiple classifier systems. In this study, several feature-based and class-based measures are proposed. These measures can be used to estimate statistical characteristics of the training partitions. To assess the effectiveness of different types of training partitions, we generated a large number of disjoint training partitions with distinctive distributions. Then, we empirically assessed these training partitions and their impact on the performance of the system by utilizing the...

Simultaneous measurement of RBC velocity, flux, hematocrit and shear rate in vascular networks

Walid S Kamoun, Sung-Suk Chae, Delphine A Lacorre, James A Tyrrell, Mariela Mitre, Marijn A Gillissen, Dai Fukumura, Rakesh K Jain, and Lance L Munn Simultaneous measurement of RBC velocity, flux, hematocrit and shear rate in vascular networks. Nature Methods, 7, 655-660, 2010.
http://www.nature.com/nmeth/journal/v7/n8/full/nmeth.1475.html

Not all tumor vessels are equal. Tumor-associated vasculature includes immature vessels, regressing vessels, transport vessels undergoing arteriogenesis and peritumor vessels influenced by tumor growth factors. Current techniques for analyzing tumor blood flow do not discriminate between vessel subtypes and only measure average changes from a population of dissimilar vessels. We developed methodologies for simultaneously quantifying blood flow (velocity, flux, hematocrit and shear rate) in extended networks at single-capillary resolution in vivo. Our approach relies on deconvolution of signals produced by labeled red blood cells as they move relative to the scanning laser of a confocal or multiphoton microscope and provides fully resolved three-dimensional flow profiles within vessel...

Unsupervised Learning for Reranking-based Patent Retrieval

Wenhui Liao and Sriharsha Veeramachaneni. Unsupervised Learning for Reranking-based Patent Retrieval. 3rd International Workshop on Patent Information Retrieval, in 19th ACM Conference on Information and Knowledge Management (ICKM), 2010.

We present a reranking-based patent retrieval system where the query text is a patent claim, which may be from an existing patent. The novelty of our approach is the automatic generating of training data for learning the ranker. The ranking is based on several features of the candidate patent, such as the text similarity to the claim, international patent code overlap, and internal citation structure of the candidates. Our approach more than doubles the average number of relevant patents in the top 5 over a strong baseline retrieval system.

An Information Theoretic Approach to Generating Fuzzy Hypercubes for If-Then Classifiers

Masoud Makrehchi and M.S. Kamel An Information Theoretic Approach to Generating Fuzzy Hypercubes for If-Then Classifiers. Journal of Intelligent and Fuzzy Systems, 21, 2010.

In this paper, a framework for automatic generation of fuzzy membership functions and fuzzy rules from training data is proposed. The main focus of this paper is designing fuzzy if-then classifiers; however the proposed method can be employed in designing a wide range of fuzzy system applications. After the fuzzy membership functions are modeled by their supports, an optimization technique, based on a multi-objective real coded genetic algorithm with adaptive cross over and mutation probabilities, is implemented to find near optimal supports. Employing interpretability constraint in parameter representation and encoding, we ensure that the generated fuzzy membership function does have a semantic meaning. The fitness function of the genetic algorithm, which estimates the quality of the...

1999

Name Recognition and Retrieval Performance

Paul Thompson and Christopher Dozier (1999). Natural Language Information Retrieval. Strzalkowski, Tomek (Eds.), Name Recognition and Retrieval Performance. (pp. 261--272). Dordrecht: Kluwer Academic.
http://www.amazon.com/gp/product/0792356853

The main application of name searching has be name matching in a database of names. This paper discusses a different application: improving information retrieval through name recognition. It investigates name recognition accuracy, and the effect on retrieval performance of indexing and searching personal names differently from non-name terms in the context of ranked retrieval. The main conclusions are: that name recognition in text can be effective; that names occur frequently enough in a variety of domains, including those of legal documents and news databases, to make recognition worthwhile; and that retrieval performance can be improved using name searching.

Genetic Algorithms

Ken Williams and Brad Murray Genetic Algorithms. The Perl Journal, 4, 1999.
http://www.foo.be/docs/tpj/issues/vol4_3/tpj0403-0005.html

Evolving algebraic expressions.

1998

The Structure of Judicial Opinions: Identifying Internal Components and their Relationships

Jack G. Conrad and Daniel P. Dabney. The Structure of Judicial Opinions: Identifying Internal Components and their Relationships. Proceedings of the 5th International ISKO Conference (ISKO-98), Structures and Relations in Knowledge Organization, 413 ff., 1998.

Empirical research on basic components of American judicial opinions has only scratched the surface. Lack of a coordinated pool of legal experts or adequate computational resources are but two reasons responsible for this deficiency. We have undertaken a three phase study to uncover fundamental components of judicial opinions found in American case law. The study was aided by a team of twelve expert attorney-editors with a combined total of 135 years of legal editing experience. The hypothesis underlying the experiment was that after years of working closely with thousands of judicial opinions, expert attorneys would develop a refined and internalized schema of the content and structure of legal cases. In this study participants were permitted to describe both concept-related and...

1997

Name Searching and Information Retrieval

Paul Thompson and Christopher Dozier. Name Searching and Information Retrieval. Proceedings of 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP 1997), 134--140, 1997.

1996

Uncertainty in Information Retrieval Systems

Howard R. Turtle and W. Bruce Croft (1996). In Uncertainty Management in Information Systems, Uncertainty in Information Retrieval Systems. (pp. 189-224).

Any effective retrieval system includes three major components: the identification and representation of document content, the acquisition and representation of the information need, and the specification of a matching function that selects relevant documents based on these representations. Uncertainty must be dealt with in each of these components.

1995

Text Retrieval in the Legal World

Howard R. Turtle Text Retrieval in the Legal World. Artificial Intelligence and Law, 3, 5-54, 1995.

The ability to find relevant materials in large document collections is a fundamental component of legal research. The emergence of large machine-readable collections of legal materials has stimulated research aimed at improving the quality of the tools used to access these collections. Important research has been conducted within the traditional information retrieval, the artificial intelligence, and the legal communities with varying degrees of interaction between these groups. This article provides an introduction to text retrieval and surveys the main research related to the retrieval of legal materials.

Query Evaluation: Strategies and Optimizations

Howard R. Turtle and James Flood Query Evaluation: Strategies and Optimizations. Information Processing & Management, 31, 831-850, 1995.
http://dx.doi.org/10.1016/0306-4573(95)00020-H

This paper discusses the two major query evaluation strategies used in large text retrieval systems and analyzes the performance of these strategies. We then discuss several optimization techniques that can be used to reduce evaluation costs and present simulation results to compare the performance of these optimization techniques when evaluating natural language queries with a collection of full text legal materials.

1994

A System for Discovering Relationships by Feature Extraction from Text Databases

Jack G. Conrad and Mary Hunter Utt. A System for Discovering Relationships by Feature Extraction from Text Databases. Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 3-6 July 1994 (Special Issue of the SIGIR Forum), 260-270, 1994.

A method for accessing text-based information using domain-specific features rather than documents alone is presented. The basis of this approach is the ability to automatically extract features from large text databases, and identify statistically significant relationships or associations between those features. The techniques supporting this approach are discussed, and examples from an application using these techniques, named the Associations System, are illustrated using the Wall Street Journal database. In this particular application, the features extracted are company and person names. The series of tests run on the Associations System demonstrate that feature extraction can be quite accurate, and that the relationships generated are reliable. In addition to conventional measures...

TREC-3 Ad Hoc Retrieval and Routing Experiments using the WIN System

Paul Thompson, Howard R. Turtle, Bokyung Yang, and James Flood. TREC-3 Ad Hoc Retrieval and Routing Experiments using the WIN System. TREC, 1-7, 1994.

The WIN retrieval engine is West's implementation of the inference network retrieval model. The inference net model ranks documents based on the combination of different evidence, e.g., text representations, such as words, phrases, or paragraphs, in a consistent probabilistic framework. WIN is based on the same retrieval model as the INQUERY system that has been used in previous TREC competitions. The two retrieval engines have common roots but have evolved separately -- WIN has focused on the retrieval of legal materials from large (>50 gigabyte) collections in a commercial online environment that supports both Boolean and natural language retrieval. For TREC-3 we decided to run an essentially unmodified version of WIN to see how well a state-of-the-art commercial system compares to...

Natural Language vs. Boolean Query Evaluation: A Comparison of Retrieval Performance

Howard R. Turtle. Natural Language vs. Boolean Query Evaluation: A Comparison of Retrieval Performance. Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 3-6 July 1994 (Special Issue of the SIGIR Forum), 212-220, 1994.

The results of experiments comparing the relative performance of natural language and Boolean query formulations are presented. The experiments show that on average a current generation natural language system provides better retrieval performance than expert searchers using a Boolean retrieval system when searching full-text legal materials. Methodological issues are reviewed and the effect of database size on query formulation strategy is discussed.

AI @ Thomson Reuters

Thomson Reuters and Generative AI: Defining a new era for how legal and tax professionals work

Learn more

Categories

Top products

Recommended Products

Categories

Top products for corporations

Top products for accounting firms

Recommended Products

Recommended Products

Categories

Top products

Recommended Products

Who We Serve

Services

Platforms

Media Solutions

About

Recommended Products

Recommended Products

APIs by industry

Use case library

Related sites

United States Support

International support

New releases

Join a TR community

Free trials & demos

Publications

Multi-Modal News Understanding with Professionally Labelled Videos (ReutersViLNews)

Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence

Evaluating Interactive Topic Models in Applied Settings

The Right Model for the Job: An Evaluation of Legal Multi-Label Classification Baselines

ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

Measuring the Groundedness of Legal Question-Answering Systems

LLM-Based Robust Product Classification in Commerce and Compliance

Towards an Automated Pointwise Evaluation Metric for Generated Long-Form Legal Summaries

CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-Training

Composing Knowledge and Compression Interventions for Language Models

Online Adaptation of Language Models with a Memory of Amortized Contexts

Unleashing the Power of Meta-Tuning for Few-Shot Generalization Through Sparse Interpolated Experts

QUARE: 2nd Workshop on Measuring the Quality of Explanations in Recommender Systems

Effects of XAI on Legal Process

Handwritten and Printed Text Segmentation: A Signature Case Study

Exploring the Effectiveness of Prompt Engineering for Legal Reasoning Tasks

A Theoretical Analysis of Out-of-Distribution Detection in Multi-Label Classification

Unleashing the Power of Large Language Models for Legal Applications

The 3rd International Workshop on Mining and Learning in the Legal Domain

Adapting Open-Source LLMs for Contract Drafting and Analyzing Single-Role vs. Multi-Role Behavior in ChatGPT for Synthetic Data Generation

Long Text Classification Using Transformers with Paragraph Selection Strategies

A Comparative Study of Prompting Strategies for Legal Text Classification

Extracting Complex Named Entities in Legal Documents via Weakly Supervised Object Detection

Context-Aware Classification of Legal Document Pages

Uncertainty Quantification for Text Classification

Making a Computational Attorney

Enhanced Discrete Multi-Modal Hashing: More Constraints Yet Less Time to Learn

A Framework for Monitoring and Retraining Language Models in Real-World Applications

An Analysis of Negation in Natural Language Understanding Corpora

Gender and Racial Stereotype Detection in Legal Opinion Word Embeddings

BudgetLongformer: Can we Cheaply Pretrain a SOTA Legal Language Model From Scratch?

Thirty Years of Artificial Intelligence and Law: The Second Decade

Human in the Loop Information Extraction Increases Efficiency and Trust

Cognitive Strategies Prompts: Creativity Triggers for Human Centered AI Opportunity Detection

Triple Diamond Design Process: Human-Centered Design for Data-Driven Innovation

On the Effectiveness of Pre-Trained Language Models for Legal Natural Language Processing: An Empirical Study

Multi-Label Legal Document Classification: A Deep Learning-Based Approach with Label-Attention and Domain-Specific Pre-Training

Thirty Years of Artificial Intelligence and Law: The Third Decade

Legal Prompting: Teaching a Language Model to Think Like a Lawyer

Learning Trustworthy Web Sources to Derive Correct Answers and Reduce Health Misinformation in Search

Leveraging Narrative to Generate Movie Script

Learning to Classify Relations Between Entities from Noisy Data – A Meta Instance Reweighting Approach

2021

Active Curriculum Learning

Noise Over Fear of Missing Out

Predicting the Success of Domain Adaptation in Text Similarity

Understanding Dataset Shift and Potential Remedies

TweetDrought: A Deep-Learning Drought Impacts Recognizer Based on Twitter Data

Using Transformers to Improve Answer Retrieval for Legal Questions

Multilingual Hope Speech Detection for Code-Mixed and Transliterated Texts

Extracting Possessions from Text: Experiments and Error Analysis

Tamil lyrics corpus: Analysis and experiments

The Role of Explanations of AI Systems: Beyond Trust and Helping to Form Mental Models