Kuldeep Singh Sidhu's picture
6 3

Kuldeep Singh Sidhu

singhsidhukuldeep

AI & ML interests

😃 TOP 3 on HuggingFace for posts 🤗 Seeking contributors for a completely open-source 🚀 Data Science platform! singhsidhukuldeep.github.io

Recent Activity

posted an update about 20 hours ago
Exciting Research Alert: Remining Hard Negatives for Domain Adaptation in Dense Retrieval Researchers from the University of Amsterdam have introduced R-GPL, an innovative approach to improve domain adaptation in dense retrievers. The technique enhances the existing GPL (Generative Pseudo Labeling) framework by continuously remining hard negatives during the training process. Key Technical Insights: - The method leverages domain-adapted models to mine higher quality hard negatives incrementally every 30,000 steps during training - Uses MarginMSE loss for training with data triplets (Query, Relevant Doc, Hard Negative Doc) - Implements mean pooling over hidden states for dense representations with 350 token sequence length - Combines query generation with pseudo-labels from cross-encoder models Performance Highlights: - Outperforms baseline GPL in 13/14 BEIR datasets - Shows significant improvements in 9/12 LoTTE datasets - Achieves remarkable 4.4 point gain on TREC-COVID dataset Under the Hood: The system continuously refreshes hard negatives using the model undergoing domain adaptation. This creates a feedback loop where the model gets better at identifying relevant documents in the target domain, leading to higher quality training signals. Analysis reveals that domain-adapted models retrieve documents with higher relevancy scores in top-100 hard negatives compared to baseline approaches. This confirms the model's enhanced capability to identify challenging but informative training examples. This research opens new possibilities for efficient dense retrieval systems that can adapt to different domains without requiring labeled training data.
posted an update 2 days ago
Exciting breakthrough in Streaming Recommendation Systems! @BytedanceTalk researchers have developed "Long-Term Interest Clock" (LIC), a revolutionary approach to understand user preferences throughout the day. >> Technical Innovation The system introduces two groundbreaking modules: - Clock-based General Search Unit (Clock-GSU): Intelligently retrieves relevant user behaviors by analyzing time patterns and content similarity - Clock-based Exact Search Unit (Clock-ESU): Employs time-gap-aware attention mechanism to precisely model user interests >> Key Advantages LIC addresses critical limitations of existing systems by: - Providing fine-grained time perception instead of discrete hour-based recommendations - Analyzing long-term user behavior patterns rather than just short-term interactions - Operating at item-level granularity versus broad category-level interests >> Real-World Impact Already deployed in Douyin Music App, the system has demonstrated remarkable results: - 0.122% improvement in user active days - Significant boost in engagement metrics including likes and play rates - Enhanced user satisfaction with reduced dislike rates >> Under the Hood The system processes user behavior sequences spanning an entire year, utilizing multi-head attention mechanisms and sophisticated time-gap calculations to understand user preferences. It pre-computes embeddings stored in parameter servers for real-time performance, making it highly scalable for production environments. This innovation marks a significant step forward in personalized content delivery, especially for streaming platforms where user preferences vary throughout the day. The research has been accepted for presentation at WWW '25, Sydney.
posted an update 3 days ago
Exciting Research Alert: Revolutionizing Complex Information Retrieval! A groundbreaking paper from researchers at MIT, AWS AI, and UPenn introduces ARM (Alignment-Oriented LLM-based Retrieval Method), a novel approach to tackle complex information retrieval challenges. >> Key Innovations Information Alignment The method first decomposes queries into keywords and aligns them with available data using both BM25 and embedding similarity, ensuring comprehensive coverage of information needs. Structure Alignment ARM employs a sophisticated mixed-integer programming solver to identify connections between data objects, exploring relationships beyond simple semantic matching. Self-Verification The system includes a unique self-verification mechanism where the LLM evaluates and aggregates results from multiple retrieval paths, ensuring accuracy and completeness. >> Performance Highlights The results are impressive: - Outperforms standard RAG by up to 5.2 points in execution accuracy on Bird dataset - Achieves 19.3 points higher F1 scores compared to existing approaches on OTT-QA - Reduces the number of required LLM calls while maintaining superior retrieval quality >> Technical Implementation The system uses a three-step process: 1. N-gram indexing and embedding computation for all data objects 2. Constrained beam decoding for information alignment 3. Mixed-integer programming optimization for structure exploration This research represents a significant step forward in making complex information retrieval more efficient and accurate. The team's work demonstrates how combining traditional optimization techniques with modern LLM capabilities can solve challenging retrieval problems.
View all activity

Organizations

MLX Community's profile picture Social Post Explorers's profile picture C4AI Community's profile picture