Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
ETF Screening Process and Key Points Overview
Retrieve ETF list: Use get_all_securities([‘etf’]) to get all market ETFs, then filter for those established before January 1, 2013 (start_date < 2023-01-01) to ensure sufficient historical data.
Exclude low-liquidity ETFs: Manually remove specific ETFs with very low average trading volume (e.g., 159003.XSHE China Merchants Fast Track ETF, 159005.XSHE Huatai Wealth Quick Money ETF, etc., average volume ≤ 2.92kw).
Data Range: Obtain closing prices for the past 240 trading days up to the current date (today).
Return Processing: Calculate daily returns (pchg = close.pct_change()), forming an ETF return matrix (prices, rows=trading days, columns=ETF codes).
Clustering Goal: Group ETFs with similar trends to reduce duplicates.
Parameters: Set number of clusters n_clusters=30 (to avoid too few clusters that may merge dissimilar ETFs), use KMeans algorithm with random_state=42.
Within-Cluster Selection: Keep only the earliest established ETF in each cluster, because:
Calculate silhouette score: approximately 0.4512 (moderate level, indicating decent compactness and separation, but room for improvement).
Correlation matrix: Compute correlation matrix of ETF returns (corr = prices[df.code].corr()).
High-correlation pairs: For pairs with correlation > 0.85, keep only the ETF with earlier establishment date, remove the others (e.g., remove 159922.XSHE, 512100.XSHG, etc.).
Threshold: Remove ETFs established after 2020 (e.g., 513060.XSHG Hang Seng Healthcare, 515790.XSHG Photovoltaic ETF, etc.), to ensure remaining ETFs have richer historical data (useful for model training).
Special Handling for Treasury Bond ETFs: If used for model training, exclude 511010.XSHE Treasury Bond ETF—its trend is nearly linear (similar to Yu’ebao), with minimal volatility, which can interfere with the model’s learning of volatility features and is unnecessary for prediction.
Handling Downward-Trending ETFs: The results may include long-term declining ETFs (e.g., healthcare ETF, real estate ETF). Whether to exclude depends on strategy goals:
Visualization for validation: Plot remaining ETFs’ price charts (e.g., since 2017) to manually verify if correlations and distributions meet expectations (low correlation, reasonable spread).
Final Filtering Summary:
Through four steps—initial filtering → clustering deduplication → secondary correlation filtering → optional establishment date filtering—obtain a pool of ETFs with good liquidity, low trend correlation, and sufficient historical data. The core goal is to provide diverse, high-quality underlying assets for strategies or models.