Claude
Using Edit & Bash tools, newest Claude 3.5 Sonnet hits 49% on SWE-bench's GitHub issue tests, despite challenges like hidden tests & high token costs.
"Claude 3.5 Sonnet scored 93.7% on HumanEval, tops SWE-bench Verified, offers real-time debugging and test generation via Amazon Bedrock inference"
"New code sandbox allows Claude to run JavaScript, clean data, create visualisations, and perform real-time analysis on uploaded CSV files."
''Plug and play AI now moves cursors, clicks, and types through virtual keyboards, moving beyond traditional AI-specific interface requirements."
Anthropic implements measures to prevent AI misuse in 2024 US elections, including policy updates, detection systems, and redirects to voting information.
Anthropic launches Message Batches API, allowing processing of up to 10,000 queries within 24 hours at half the cost of standard API calls.