Anthropic
New prompt improver uses chain-of-thought, XML standardisation, and prefill features to enhance prompts, boosting accuracy by 30% in classification tests.
"AI systems went from solving 2% of real coding problems to 49% in one year. Anthropic warns the window for safe regulation is closing fast."
Using Edit & Bash tools, newest Claude 3.5 Sonnet hits 49% on SWE-bench's GitHub issue tests, despite challenges like hidden tests & high token costs.
"Claude 3.5 Sonnet scored 93.7% on HumanEval, tops SWE-bench Verified, offers real-time debugging and test generation via Amazon Bedrock inference"
"New code sandbox allows Claude to run JavaScript, clean data, create visualisations, and perform real-time analysis on uploaded CSV files."
''Plug and play AI now moves cursors, clicks, and types through virtual keyboards, moving beyond traditional AI-specific interface requirements."