- Print
- DarkLight
- PDF
Overview
Ushur Language Intelligence (LI) necessitates data collection for the effective training of its AI and ML models, particularly for the Ushur SmartMail solution. The process involves preparing email classification data from enterprise and production emails. This collaborative effort between your team and the Ushur team should commence at the project's inception.
Steps Involved
Data Collation and Anonymization
How It Works
- Data Collation: Collect emails with specified topics and indexed categories, ensuring data across various dates and times to handle variance.
- Anonymization: Mask or remove Personally Identifiable Information (PII) to maintain data privacy. Use the Ushur Anonymizer tool to remove all PII before sending the data to Ushur.
Importance
Ensures that Ushur Language Intelligence can learn from the data without exposing sensitive information.
Tools and Techniques
Ushur Anonymizer (a Python-based tool).
Responsibilities
Your Role: Download, anonymize, and send the data.
Ushur Team: Guide on data hygiene and provide best practices for achieving business accuracy goals.
Data Preprocessing
How It Works
The Ushur team analyzes the data quality and performs preprocessing steps to remove extraneous noise (HTML tags, symbols, unwanted data). They identify gaps in the email content and remove duplicates.
Importance
Reduces noise, enhancing the accuracy of email classification.
Responsibilities
Your Role: Send the data to Ushur via SFTP or another secure method.
Ushur Team: Perform data preprocessing.
Data Analysis
How It Works
The Ushur team analyzes the data file to ensure it is correctly formatted and contains the necessary information for training. The data file should be in .csv format with two columns: topic and phrase.
Importance
Proper formatting is essential for effective training of the Ushur AI.
Responsibilities
Your Role: Ensure the .csv file is correctly formatted.
Ushur Team: Analyze the data file.
Data Labeling
How It Works
Label data with the appropriate topic (work type/classification topic).
Importance
Accurate labeling improves the precision of Ushur's AI in classifying topics.
Responsibilities
Your Role and Ushur Team: Ensure data categories are well-separated and avoid overlaps to improve AI training accuracy.
Note
Ushur engineers will provide recommendations to improve data quality and ensure business accuracy goals are met.
The collaboration between your team and Ushur is crucial for the successful implementation of the Ushur SmartMail solution.